# HG changeset patch # User Steve Losh # Date 1615950935 14400 # Node ID a9d8b7a86226e19060d70dc5a35dd8206a798078 # Parent c635f15b37c1095d619e9b1c303e0284513ec7df Finish post diff -r c635f15b37c1 -r a9d8b7a86226 content/blog/2021/01/small-common-lisp-cli-programs.markdown --- a/content/blog/2021/01/small-common-lisp-cli-programs.markdown Sun Dec 27 22:08:26 2020 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,854 +0,0 @@ -(:title "Writing Small CLI Programs in Common Lisp" - :snip "Somewhere between tiny shell scripts and full projects." - :date "2021-01-02T15:50:00Z" - :draft t) - -I've found Common Lisp to be a good language for this. But it can be a little -intimidating to get started, especially for beginners, because Common Lisp is -a very flexible language and doesn't lock you into one way of working. In this -post I'll describe how I write small, stand-alone command line programs in -Common Lisp. - -
- -## Requirements - -When you're writing programs in Common Lisp, you've got a lot of options. For -this use case, laying out the requirements I have helped me decide on an -approach. - -First, each new program should be one single file. A few ancillary files for -all scripts together (e.g. a `Makefile`) are okay, but adding a new program -should mean adding one single file. For larger programs a full project -directory and ASDF system are great, but for small programs having one file per -program reduces the overhead quite a bit. - -The programs need to be able to be developed in the typical Common Lisp -interactive style (in my case: with Swank and VLIME). The interactive -development is one of the best parts of working in Common Lisp, and I won't give -it up. In particular, this means that a shell-script style approach, with -`#!/path/to/sbcl --script` and the top and directly running code at the top -level in the file does not work for two main reasons: - -* Loading that file will fail due to the shebang unless you have some ugly - reader macros in your startup file. -* The program will need to do things like parsing command-line arguments and - exiting with an error code, and running `exit` would kill the Swank process. - -The programs need to be able to use libraries, so Quicklisp will need to be -involved. Common Lisp has a lot of nice things built-in, but there's some -libraries that are just too useful to pass up. - -The programs will need to have proper user interfaces. Command line arguments -must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abc foo -d` -should work as expected), malformed or unknown options must be caught instead of -dropping them on the floor, error messages should be meaningful, and the -`--help` should be thoroughly and thoughtfully written. - -Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp` -and defines a package `foo` with a function called `toplevel`) is okay if it -makes my life easier. These programs are just for me, so I don't have to worry -about people wanting to create executables with spaces in the name or something. - -## Solution Skeleton - -After trying a number of different approaches, I've settled on a solution that -I'm pretty happy with. First I'll describe the general approach, then we'll -look at one actual example program in its entirety. - -### Directory Structure - -I keep all my small single-file Common Lisp programs in a `lisp` directory -inside my dotfiles repository. Its contents look like this: - -``` -…/lisp/ - bin/ - foo - bar - man/ - man1/ - foo.1 - bar.1 - build-binary.sh - build-manual.sh - Makefile - foo.lisp - bar.lisp -``` - -The `bin` directory is where the executable files end up. I've added it to my -`$PATH` so I don't have to symlink or copy the binaries anywhere. - -`man` contains the generated `man` pages. Because it's adjacent to `bin` (which -is on my path) the `man` program automatically finds the `man` pages as -expected. - -`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make -building programs easier. - -The `.lisp` files are, of course, the programs. Each new program I want to add -only requires adding the `.lisp` file in this directory and running -`make`. - -### Lisp Files - -All my small Common Lisp programs follow a few conventions, which makes building -them easier. Let's look at the skeleton of a `foo.lisp` file as an example. -I'll show the entire file here, and then step through it piece by piece. - -```lisp -(eval-when (:compile-toplevel :load-toplevel :execute) - (ql:quickload '(… :with-user-abort) :silent t)) - -(defpackage :foo - (:use :cl) - (:export :toplevel …)) - -(in-package :foo) - -;;;; Configuration ----------------------------------------------- -(defparameter *version* "1.0.0") -(defparameter *some-option* nil) - -;;;; Errors ------------------------------------------------------ -(define-condition user-error (error) ()) - -(define-condition missing-foo (user-error) () - (:report "A foo is required, but none was supplied.")) - -;;;; Functionality ----------------------------------------------- -(defun foo (string) - …) - -;;;; Run --------------------------------------------------------- -(defun run (arguments) - (map nil #'foo arguments)) - -;;;; User Interface ---------------------------------------------- -(defmacro exit-on-ctrl-c (&body body) - `(handler-case (with-user-abort:with-user-abort (progn ,@body)) - (with-user-abort:user-abort () (sb-ext:exit :code 130)))) - -(defparameter *ui* - (adopt:make-interface - :name "foo" - …)) - -(defun toplevel () - (sb-ext:disable-debugger) - (exit-on-ctrl-c - (multiple-value-bind (arguments options) - (adopt:parse-options-or-exit *ui*) - … ; Handle options. - (handler-case (run arguments) - (user-error (e) - (format *error-output* "error: ~A~%" e) - (adopt:exit 1)))))) -``` - -Let's go through each chunk of this. - -```lisp -(eval-when (:compile-toplevel :load-toplevel :execute) - (ql:quickload '(:with-user-abort …) :silent t)) -``` - -First we `quickload` any necessary libraries. We always want to do this, even -when compiling the file, because we need the appropriate packages to be loaded -when we try to use their symbols later in the file. - -[`with-user-abort`](https://github.com/compufox/with-user-abort) is a library -for portably handling `control-c`, which all of these small programs use. - -```lisp -(defpackage :foo - (:use :cl) - (:export :toplevel *ui*)) - -(in-package :foo) -``` - -Next we define a package `foo` and switch to it. The package is always named -the same as the resulting binary and the basename of the file. The package -always exports the symbols `toplevel` and `*ui*`. - -```lisp -;;;; Configuration ----------------------------------------------- -(defparameter *version* "1.0.0") -(defparameter *some-option* nil) -``` - -Next we define any configuration variables. These will be set later after -parsing the command line arguments (when we run the command line program) or -at the REPL (when developing interactively). - -```lisp -;;;; Errors ------------------------------------------------------ -(define-condition user-error (error) ()) - -(define-condition missing-foo (user-error) () - (:report "A foo is required, but none was supplied.")) -``` - -We define a `user-error` condition, and any errors the user might make will -inherit from it. This will make it easy to treat user errors (e.g. passing -a mangled regular expression like `(foo+` as an argument) differently from -programming errors (i.e. bugs). This makes it easier to treat those errors -differently: - -* Bugs should print a backtrace or enter the debugger. -* Expected user errors should print a helpful error message with no backtrace or debugger. - -```lisp -;;;; Functionality ----------------------------------------------- -(defun foo (string) - …) -``` - -Next we have the actual functionality of the program. - -```lisp -;;;; Run --------------------------------------------------------- -(defun run (arguments) - (map nil #'foo arguments)) -``` - -We now define a function `run` that takes some arguments (as strings) and -performs the main work of the program. - -Importantly, `run` does **not** handle command line argument parsing, and it does -**not** exit the program with an error code, which means we can safely call it -to "run the program" when we're developing interactively without worrying about -it killing our Lisp process. - -Finally, we need to define the command line interface. - -```lisp -;;;; User Interface ---------------------------------------------- -(defmacro exit-on-ctrl-c (&body body) - `(handler-case (with-user-abort:with-user-abort (progn ,@body)) - (with-user-abort:user-abort () (sb-ext:exit :code 130)))) -``` - -We'll make a little macro around `with-user-abort` to make it less wordy. We'll -[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the -user presses `ctrl-c`. - -```lisp -(defparameter *ui* - (adopt:make-interface - :name "foo" - …)) -``` - -Here we define the `*ui*` variable whose symbol we exported above. [Adopt][] is -a command line argument parsing library I wrote. If you want to use a different -library, feel free. - -[Adopt]: https://docs.stevelosh.com/adopt - -```lisp -(defun toplevel () - (sb-ext:disable-debugger) - (exit-on-ctrl-c - (multiple-value-bind (arguments options) - (adopt:parse-options-or-exit *ui*) - … ; Handle options. - (handler-case (run arguments) - (user-error (e) - (format *error-output* "error: ~A~%" e) - (adopt:exit 1)))))) -``` - -And finally we define the `toplevel` function. This will only ever be called -when the program is run as a standalone program, never interactively. It -handles all the work beyond the main guts of the program that are handled by the -`run` function: - -* Disabling or enabling the debugger. -* Exiting the process with the appropriate status code on errors. -* Parsing command line arguments. - -That's it for the structure of the `.lisp` files. - -### Building Binaries - -`build-binary.sh` is a small script to build the executable binaries from the -`.lisp` files. `./build-binary.sh foo.lisp` will build `foo`: - -```bash -#!/usr/bin/env bash - -set -euo pipefail - -LISP=$1 -NAME=$(basename "$1" .lisp) -shift - -sbcl --load "$LISP" \ - --eval "(sb-ext:save-lisp-and-die \"$NAME\" - :executable t - :save-runtime-options t - :toplevel '$NAME:toplevel)" -``` - -Here we see where the naming conventions have become important — we know that -the package is named the same as the binary and that it will have the symbol -`toplevel` exported, which names the entry point for the binary. - -### Building Man Pages - -`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s -built-in `man` page generation. If you don't care about building `man` pages -for your personal programs (I admit, it's a little bit silly) you can ignore -this. - -```bash -#!/usr/bin/env bash - -set -euo pipefail - -LISP=$1 -NAME=$(basename "$LISP" .lisp) -OUT="$NAME.1" -shift - -sbcl --load "$LISP" \ - --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede) - (adopt:print-manual $NAME:*ui* :stream f))" \ - --quit -``` - -This is why we always name the Adopt interface variable `*ui*` and export it -from the package. - -### Makefile - -Finally we have a simple `Makefile` so we can run `make` to regenerate any -out of date binaries and `man` pages: - -```make -files := $(wildcard *.lisp) -names := $(files:.lisp=) - -.PHONY: all clean $(names) - -all: $(names) - -$(names): %: bin/% man/man1/%.1 - -bin/%: %.lisp build-binary.sh Makefile - mkdir -p bin - ./build-binary.sh $< - mv $(@F) bin/ - -man/man1/%.1: %.lisp build-manual.sh Makefile - mkdir -p man/man1 - ./build-manual.sh $< - mv $(@F) man/man1/ - -clean: - rm -rf bin man -``` - -We use a `wildcard` to automatically find the `.lisp` files so we don't have to -do anything other than adding a new file when we want to make a new program. - -The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses -a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern) -to automatically define the phony rules for building each program. If -`$(names)` is `foo bar` this line effectively defines two phony rules: - -``` -foo: bin/foo man/man1/foo.1 -bar: bin/bar man/man1/bar.1 -``` - -This lets us run `make foo` to make the binary and `man` page for `foo.lisp`. - -## Case Study: A Batch Coloring Utility - -Now that we've seen the skeleton, let's look at one of my actual programs that -I use all the time. It's called `batchcolor` and it's used to highlight regular -expression matches in text (usually log files in my case) with a twist: each -unique match is highlighted in a separate color, which makes it easier to -visually parse the result. - -For example, suppose we have some log files with lines of the form ` -[] ` where request ID is a UUID, and messages might -contain other UUIDs for various things. Such a log file might look something -like this: - -``` -2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/ -2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving... -2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request. -2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404. -2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/ -2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving... -2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/ -2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200. -2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused. -2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500. -``` - -If I try to just read this directly, it's easy for my eyes to glaze over unless -I laboriously read line-by-line. I can use `grep` to highlight the UUIDs, but -that honestly doesn't help too much: - - grep -P '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}|$' foo.log - -`batchcolor` also highlights matches, but highlights each unique match in its -own color: - -This is *much* easier for me to visually parse. The interleaving of separate -request logs is now obvious from the colors of the IDs, and it's easy to match -up various user IDs and thing IDs at a glance. Did you even notice that the two -thing IDs were different before? - -`batchcolor` has a few other simple quality of life features, like picking -explicit colors for specific strings (e.g. red for `ERR`): - -I wrap up this `batchcolor` invocation in an alias and use it to `tail` log -files when developing locally almost every day, and it makes reading the log -output *much* easier. - -Let's step through its code piece by piece. - -### Libraries - -```lisp -(eval-when (:compile-toplevel :load-toplevel :execute) - (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t)) -``` - -First we `quickload` libraries. We'll use [Adopt][] for command line argument -processing, [cl-ppcre][] for regular expressions, and the previously-mentioned -[with-user-abort][] to handle `control-c`. - -### Package - -```lisp -(defpackage :batchcolor - (:use :cl) - (:export :toplevel :*ui*)) - -(in-package :batchcolor) -``` - -We define and switch to the appropriately-named package. Nothing special here. - -### Configuration - -```lisp -;;;; Configuration ------------------------------------------------------------ -(defparameter *version* "1.0.0") -(defparameter *start* 0) -(defparameter *dark* t) -``` - -Next we `defparameter` some variables to hold useful values (like the version) -and settings. `*start*` will be used later when randomizing colors, don't worry -about it for now. - -### Errors - -```lisp -;;;; Errors ------------------------------------------------------------------- -(define-condition user-error (error) ()) - -(define-condition missing-regex (user-error) () - (:report "A regular expression is required.")) - -(define-condition malformed-regex (user-error) - ((underlying-error :initarg :underlying-error)) - (:report (lambda (c s) - (format s "Invalid regex: ~A" (slot-value c 'underlying-error))))) - -(define-condition overlapping-groups (user-error) () - (:report "Invalid regex: seems to contain overlapping capturing groups.")) - -(define-condition malformed-explicit (user-error) - ((spec :initarg :spec)) - (:report (lambda (c s) - (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5." - (slot-value c 'spec))))) -``` - -Here we define the user errors. Some of these are self-explanatory, while -others will make more sense later once we see them in action. The specific -details aren't as important as the overall idea: for user errors we know might -happen, display a helpful error message instead of just spewing a backtrace at -the user. - -### Colorization - -Next we have the actual meat of the program. Obviously this is going to be -completely different for every program, so feel free to skip this if you don't -care about this specific problem. - -```lisp -;;;; Functionality ------------------------------------------------------------ -(defun rgb-code (r g b) - ;; The 256 color mode color values are essentially r/g/b in base 6, but - ;; shifted 16 higher to account for the intiial 8+8 colors. - (+ (* r 36) - (* g 6) - (* b 1) - 16)) -``` - -We're going to highlight different matches with different colors. We'll need -a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI -colors isn't enough. Full 24-bit truecolor is overkill, but the 8-bit ANSI -colors will work nicely. If we ignore the base colors, we essentially have -6 x 6 x 6 = 216 colors to work with. `rgb-code` will take the red, green, and -blue values from `0` to `5` and return the color code. See [Wikipedia][8bit] -for more information. - -[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit - -```lisp -(defun make-colors (excludep) - (let ((result (make-array 256 :fill-pointer 0))) - (dotimes (r 6) - (dotimes (g 6) - (dotimes (b 6) - (unless (funcall excludep (+ r g b)) - (vector-push-extend (rgb-code r g b) result))))) - result)) - -(defparameter *dark-colors* (make-colors (lambda (v) (< v 3)))) -(defparameter *light-colors* (make-colors (lambda (v) (> v 11)))) -``` - -Now we can build some arrays of colors. We *could* use any of the 216 available -colors, but in practice we probably don't want to, because the darkest colors -will be too dark to read on a dark terminal, and vice versa for light terminals. -In a concession to practicality we'll generate two separate arrays of colors, -one that excludes colors whose total value is too dark and one excluding those -that are too light. - -(You might notice that `*dark-colors*` is "the array of colors for dark -terminals" and not "the array of colors which are not light". Naming things is -hard.) - -Note that these arrays will be generated when the `batchcolor.lisp` file is -`load`ed, which is *when we build the binary*. They *won't* be recomputed every -time you run the resulting binary. In this case it doesn't really matter (the -arrays are small) but it's worth remembering in case you ever have some data you -want (or don't want) to compute at build time instead of run time. - -```lisp -(defparameter *explicits* (make-hash-table :test #'equal)) -``` - -Here we make a hash table to store the strings and colors for strings we want to -explicitly color (e.g. `ERR` should be red, `INFO` cyan). The keys will be the -strings and values the RGB codes. - -```lisp -(defun djb2 (string) - ;; http://www.cse.yorku.ca/~oz/hash.html - (reduce (lambda (hash c) - (mod (+ (* 33 hash) c) (expt 2 64))) - string - :initial-value 5381 - :key #'char-code)) - -(defun find-color (string) - (gethash string *explicits* - (let ((colors (if *dark* *dark-colors* *light-colors*))) - (aref colors - (mod (+ (djb2 string) *start*) - (length colors)))))) -``` - -For strings that we want to explicitly color, we just look up the appropriate -code in `*explicits*` and return it. - -Otherwise, we want to highlight unique matches in different colors. There are -a number of different ways we could do this. For example, we could randomly -pick a color the first time we see a string and store it in a hash table for -subsequent encounters. But this would mean we'd grow over time, and one of the -things I often use this utility for is `tail -f`ing long-running processes when -developing locally, so the memory usage would grow and grow over time until the -`batchcolor` process was restarted, which isn't ideal. - -Instead, we'll hash each string with a simple [DJB hash][djb] and use it to -index into the appropriate array of colors. This ensures that identical matches -get identical colors, and avoids having to store every match we've ever seen. - -We'll talk about `*start*` later, ignore it for now (it's `0` by default). - -[djb]: http://www.cse.yorku.ca/~oz/hash.html - -```lisp -(defun ansi-color-start (color) - (format nil "~C[38;5;~Dm" #\Escape color)) - -(defun ansi-color-end () - (format nil "~C[0m" #\Escape)) - -(defun print-colorized (string) - (format *standard-output* "~A~A~A" - (ansi-color-start (find-color string)) - string - (ansi-color-end))) -``` - -Next we have some functions to output the appropriate ANSI escapes to highlight -our matches. We could use a library for this but it's only 2 lines. [It's not -worth it](http://xn--rpa.cc/irl/term.html). - -And now we have the beating heart of the program: - -```lisp -(defun colorize-line (scanner line &aux (start 0)) - (ppcre:do-scans (ms me rs re scanner line) - ;; If we don't have any register groups, colorize the entire match. - ;; Otherwise, colorize each matched capturing group. - (let* ((regs? (plusp (length rs))) - (starts (if regs? (remove nil rs) (list ms))) - (ends (if regs? (remove nil re) (list me)))) - (map nil (lambda (word-start word-end) - (unless (<= start word-start) - (error 'overlapping-groups)) - (write-string line *standard-output* :start start :end word-start) - (print-colorized (subseq line word-start word-end)) - (setf start word-end)) - starts ends))) - (write-line line *standard-output* :start start)) -``` - -`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with -any of the desired matches colorized appropriately. There are a few things to -note here. - -First: if the regular expression contains any capturing groups, we will only -colorize those parts of the match. For example, if you run `batchcolor -'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves -will be highlighted, not the surrounding angle brackets. Otherwise, if there -are no capturing groups in the regular expression, we'll highlight the entire -match (as if there were one big capturing group around the whole thing). - -Second: overlapping capturing groups are explicitly disallowed and -a `user-error` signaled if we notice any. It's not clear what do to in this -case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be? -Highlight `f` and `oo` in the same color? In different colors? Should the `oo` -be a different color than the `oo` in `boo`? There's too many options with no -clear winner, so we'll just tell the user to be more clear. - -To do the actual work, we iterate over each match and print the non-highlighted -text before the match, then print the highlighted match. Finally we print any -remaining text after the last match. - -### Not-Quite-Top-Level Interface - -```lisp -;;;; Run ---------------------------------------------------------------------- -(defun run% (scanner stream) - (loop :for line = (read-line stream nil) - :while line - :do (colorize-line scanner line))) - -(defun run (pattern paths) - (let ((scanner (handler-case (ppcre:create-scanner pattern) - (ppcre:ppcre-syntax-error (c) - (error 'malformed-regex :underlying-error c)))) - (paths (or paths '("-")))) - (dolist (path paths) - (if (string= "-" path) - (run% scanner *standard-input*) - (with-open-file (stream path :direction :input) - (run% scanner stream)))))) -``` - -Here we have the not-quite-top-level interface to the program. `run` takes -a pattern string and a list of paths and runs the colorization on each path. -This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>" -"foo.txt")`, so we can test without worrying about killing the Lisp process. - -### User Interface - -In the last chunk of the file we have the user interface. There are a couple of -things to note here. - -I'm using a command line argument parsing library I wrote myself: [Adopt][]. -But if you prefer another library (and there are quite a few around) feel free -to use it — it should be pretty easy to adapt this setup to a different library. -The only things you'd need to change would be the `toplevel` function and the -`build-manual.sh` script (if you even care about building `man` pages at all). - -You might also notice that the user interface for the program is almost as much -code as the entire rest of the program. This may seem disconcerting at first, -but I think it makes a certain kind of sense. When you're writing code to -interface with an external system, a messier and more complicated external -system will usually require more code than a cleaner and simpler external -system. A human brain is probably the messiest and most complicated external -system you'll ever have to deal with, so it's worth taking the extra time and -code to be especially careful when writing an interface to it. - -```lisp -(defparameter *option-help* - (adopt:make-option 'help - :help "Display help and exit." - :long "help" - :short #\h - :reduce (constantly t))) - -(defparameter *option-version* - (adopt:make-option 'version - :help "Display version information and exit." - :long "version" - :reduce (constantly t))) -``` - -```lisp -(adopt:defparameters (*option-debug* *option-no-debug*) - (adopt:make-boolean-options 'debug - :long "debug" - :short #\d - :help "Enable the Lisp debugger." - :help-no "Disable the Lisp debugger (the default).")) -``` - -```lisp -(adopt:defparameters (*option-randomize* *option-no-randomize*) - (adopt:make-boolean-options 'randomize - :help "Randomize the choice of color each run." - :help-no "Do not randomize the choice of color each run (the default)." - :long "randomize" - :short #\r)) - -(adopt:defparameters (*option-dark* *option-light*) - (adopt:make-boolean-options 'dark - :name-no 'light - :long "dark" - :long-no "light" - :help "Optimize for dark terminals (the default)." - :help-no "Optimize for light terminals." - :initial-value t)) -``` - -```lisp -;;;; User Interface ----------------------------------------------------------- -(defun parse-explicit (spec) - (ppcre:register-groups-bind - ((#'parse-integer r g b) string) - ("^([0-5]),([0-5]),([0-5]):(.+)$" spec) - (return-from parse-explicit (cons string (rgb-code r g b)))) - (error 'malformed-explicit :spec spec)) - -(defparameter *option-explicit* - (adopt:make-option 'explicit - :parameter "R,G,B:STRING" - :help "Highlight STRING in an explicit color. May be given multiple times." - :manual (format nil "~ - Highlight STRING in an explicit color instead of randomly choosing one. ~ - R, G, and B must be 0-5. STRING is treated as literal string, not a regex. ~ - Note that this doesn't automatically add STRING to the overall regex, you ~ - must do that yourself! This is a known bug that may be fixed in the future.") - :long "explicit" - :short #\e - :key #'parse-explicit - :reduce #'adopt:collect)) -``` - -```lisp -(adopt:define-string *help-text* - "batchcolor takes a regular expression and matches it against standard ~ - input one line at a time. Each unique match is highlighted in its own color.~@ - ~@ - If the regular expression contains any capturing groups, only those parts of ~ - the matches will be highlighted. Otherwise the entire match will be ~ - highlighted. Overlapping capturing groups are not supported.") - -(adopt:define-string *extra-manual-text* - "If no FILEs are given, standard input will be used. A file of - stands for ~ - standard input as well.~@ - ~@ - Overlapping capturing groups are not supported because it's not clear what ~ - the result should be. For example: what should ((f)oo|(b)oo) highlight when ~ - matched against 'foo'? Should it highlight 'foo' in one color? The 'f' in ~ - one color and 'oo' in another color? Should that 'oo' be the same color as ~ - the 'oo' in 'boo' even though the overall match was different? There are too ~ - many possible behaviors and no clear winner, so batchcolor disallows ~ - overlapping capturing groups entirely.") - -(defparameter *examples* - '(("Colorize IRC nicknames in a chat log:" - . "cat channel.log | batchcolor '<(\\\\w+)>'") - ("Colorize UUIDs in a request log:" - . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'") - ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):" - . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log") - ("Colorize earmuffed symbols in a Lisp file:" - . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp"))) -``` - -```lisp -(defparameter *ui* - (adopt:make-interface - :name "batchcolor" - :usage "[OPTIONS] REGEX [FILE...]" - :summary "colorize regex matches in batches" - :help *help-text* - :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*) - :examples *examples* - :contents (list - *option-help* - *option-version* - *option-debug* - *option-no-debug* - (adopt:make-group 'color-options - :title "Color Options" - :options (list *option-randomize* - *option-no-randomize* - *option-dark* - *option-light* - *option-explicit*))))) -``` - -### Top-Level Interface - -```lisp -(defmacro exit-on-ctrl-c (&body body) - `(handler-case (with-user-abort:with-user-abort (progn ,@body)) - (with-user-abort:user-abort () (adopt:exit 130)))) - -(defun configure (options) - (loop :for (string . rgb) :in (gethash 'explicit options) - :do (setf (gethash string *explicits*) rgb)) - (setf *start* (if (gethash 'randomize options) - (random 256 (make-random-state t)) - 0) - *dark* (gethash 'dark options))) - -(defun toplevel () - (sb-ext:disable-debugger) - (exit-on-ctrl-c - (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*) - (when (gethash 'debug options) - (sb-ext:enable-debugger)) - (handler-case - (cond - ((gethash 'help options) (adopt:print-help-and-exit *ui*)) - ((gethash 'version options) (write-line *version*) (adopt:exit)) - ((null arguments) (error 'missing-regex)) - (t (destructuring-bind (pattern . files) arguments - (configure options) - (run pattern files)))) - (user-error (e) (adopt:print-error-and-exit e)))))) -``` - - - -## More Information - -* ieure link -* dotfiles repo link - - - diff -r c635f15b37c1 -r a9d8b7a86226 content/blog/2021/03/small-common-lisp-cli-programs.markdown --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/blog/2021/03/small-common-lisp-cli-programs.markdown Tue Mar 16 23:15:35 2021 -0400 @@ -0,0 +1,988 @@ +(:title "Writing Small CLI Programs in Common Lisp" + :snip "Somewhere between tiny shell scripts and full projects." + :date "2021-03-17T16:00:00Z" + :draft t) + +I write a lot of command-line programs. For tiny programs I usually go with the +typical UNIX approach: throw together a half-assed shell script and move on. +For large programs I make a full Common Lisp project, with an ASDF system +definition and such. But there's a middle ground of small*ish* programs that +don't warrant a full repository on their own, but for which I still want a real +interface with proper `--help` and error handling. + +I've found Common Lisp to be a good language for writing these small command +line programs. But it can be a little intimidating to get started (especially +for beginners) because Common Lisp is a very flexible language and doesn't lock +you into one way of working. + +In this post I'll describe how I write small, stand-alone command line programs +in Common Lisp. It might work for you, or you might want to modify things to +fit your own needs. + +
+ +## Requirements + +When you're writing programs in Common Lisp, you've got a lot of options. +Laying out the requirements I have helped me decide on an approach. + +First: each new program should be one single file. A few other files for the +collection as a whole (e.g. a `Makefile`) are okay, but once everything is set +up creating a new program should mean adding one single file. For larger +programs a full project directory and ASDF system are great, but for small +programs having one file per program reduces the mental overhead quite a bit. + +The programs need to be able to be developed in the typical Common Lisp +interactive style (in my case: with Swank and VLIME). Interactive development +is one of the best parts of working in Common Lisp, and I'm not willing to give +it up. In particular this means that a shell-script style approach, with +`#!/path/to/sbcl --script` and the top and directly running code at the top +level in the file, doesn't work for two main reasons: + +* `load`ing that file will fail due to the shebang unless you have some ugly + reader macros in your startup file. +* The program will need to do things like parsing command-line arguments and + exiting with an error code, and calling `exit` would kill the Swank process. + +The programs need to be able to use libraries, so Quicklisp will need to be +involved. Common Lisp has a lot of nice things built-in, but there are some +libraries that are just too useful to pass up. + +The programs will need to have proper user interfaces. Command line arguments +must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abcfoo -d` +should work as expected), malformed or unknown options must be caught instead of +dropping them on the floor, error messages should be meaningful, and the +`--help` should be thoroughly and thoughtfully written so I can remember how to +use the program months later. A `man` page is a nice bonus, but not required. + +Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp` +and defines a package `foo` with a function called `toplevel`) is okay if it +makes my life easier. These programs are just for me, so I don't have to worry +about people wanting to create executables with spaces in the name or something. + +Portability between Common Lisp implementations is nice to have, but not +required. If using a bit of SBCL-specific grease will let me avoid a bunch of +extra dependencies, that's fine for these small personal programs. + +## Solution Skeleton + +After trying a number of different approaches I've settled on a solution that +I'm pretty happy with. First I'll describe the general approach, then we'll +look at one actual example program in its entirety. + +### Directory Structure + +I keep all my small single-file Common Lisp programs in a `lisp` directory +inside my dotfiles repository. Its contents look like this: + +``` +…/dotfiles/lisp/ + bin/ + foo + bar + man/ + man1/ + foo.1 + bar.1 + build-binary.sh + build-manual.sh + Makefile + foo.lisp + bar.lisp +``` + +The `bin` directory is where the executable files end up. I've added it to my +`$PATH` so I don't have to symlink or copy the binaries anywhere. + +`man` contains the generated `man` pages. Because it's adjacent to `bin` (which +is on my path) the `man` program automatically finds the `man` pages as +expected. + +`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make +building programs easier. + +The `.lisp` files are the programs. Each new program I want to add only +requires adding the `.lisp` file in this directory and running +`make`. + +### Lisp Files + +My small Common Lisp programs follow a few conventions that make building them +easier. Let's look at the skeleton of a `foo.lisp` file as an example. I'll +show the entire file here, and then step through it piece by piece. + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(:with-user-abort …) :silent t)) + +(defpackage :foo + (:use :cl) + (:export :toplevel *ui*)) + +(in-package :foo) + +;;;; Configuration ----------------------------------------------- +(defparameter *whatever* 123) + +;;;; Errors ------------------------------------------------------ +(define-condition user-error (error) ()) + +(define-condition missing-foo (user-error) () + (:report "A foo is required, but none was supplied.")) + +;;;; Functionality ----------------------------------------------- +(defun foo (string) + …) + +;;;; Run --------------------------------------------------------- +(defun run (arguments) + (map nil #'foo arguments)) + +;;;; User Interface ---------------------------------------------- +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (sb-ext:exit :code 130)))) + +(defparameter *ui* + (adopt:make-interface + :name "foo" + …)) + +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*) + … ; Handle options. + (handler-case (run arguments) + (user-error (e) (adopt:print-error-and-exit e)))))) +``` + +Let's go through each chunk of this. + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(:with-user-abort …) :silent t)) +``` + +First we `quickload` any necessary libraries. We always want to do this, even +when compiling the file, because we need the appropriate packages to exist when +we try to use their symbols later in the file. + +[with-user-abort][] is a library for easily handling `control-c`, which all of +these small programs will use. + +[with-user-abort]: https://github.com/compufox/with-user-abort + +```lisp +(defpackage :foo + (:use :cl) + (:export :toplevel *ui*)) + +(in-package :foo) +``` + +Next we define a package `foo` and switch to it. The package is always named +the same as the resulting binary and the basename of the file, and always +exports the symbols `toplevel` and `*ui*`. These conventions make it easy to +build everything automatically with `make` later. + +```lisp +;;;; Configuration ----------------------------------------------- +(defparameter *whatever* 123) +``` + +Next we define any configuration variables. These will be set later after +parsing the command line arguments (when we run the command line program) or +at the REPL (when developing interactively). + +```lisp +;;;; Errors ------------------------------------------------------ +(define-condition user-error (error) ()) + +(define-condition missing-foo (user-error) () + (:report "A foo is required, but none was supplied.")) +``` + +We define a `user-error` condition, and any errors the user might make will +inherit from it. This will make it easy to treat user errors (e.g. passing +a mangled regular expression like `(foo+` as an argument) differently from +programming errors (i.e. bugs). This makes it easier to treat those errors +differently: + +* Bugs should print a backtrace or enter the debugger. +* Expected user errors should print a helpful error message with no backtrace or debugger. + +```lisp +;;;; Functionality ----------------------------------------------- +(defun foo (string) + …) +``` + +Next we have the actual functionality of the program. + +```lisp +;;;; Run --------------------------------------------------------- +(defun run (arguments) + (map nil #'foo arguments)) +``` + +We define a function `run` that takes some arguments (as strings) and performs +the main work of the program. + +Importantly, `run` does *not* handle command line argument parsing, and it does +*not* exit the program with an error code, which means we can safely call it to +say "run the whole program" when we're developing interactively without worrying +about it killing our Lisp process. + +Now we need to define the command line interface. + +```lisp +;;;; User Interface ---------------------------------------------- +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (adopt:exit 130)))) +``` + +We'll make a little macro around `with-user-abort` to make it less wordy. We'll +[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the +user presses `ctrl-c`. Maybe some day I'll pull this into Adopt so I don't have +to copy these three lines everywhere. + +```lisp +(defparameter *ui* + (adopt:make-interface + :name "foo" + …)) +``` + +Here we define the `*ui*` variable whose symbol we exported above. [Adopt][] is +a command line argument parsing library I wrote. If you want to use a different +library, feel free. + +[Adopt]: https://docs.stevelosh.com/adopt + +```lisp +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*) + … ; Handle options. + (handler-case (run arguments) + (user-error (e) (adopt:print-error-and-exit e)))))) +``` + +And finally we define the `toplevel` function. This will only ever be called +when the program is run as a standalone program, never interactively. It +handles all the work beyond the main guts of the program (which are handled by +the `run` function), including: + +* Disabling or enabling the debugger. +* Exiting the process with an appropriate status code on errors. +* Parsing command line arguments. +* Setting the values of the configuration parameters. +* Calling `run`. + +That's it for the structure of the `.lisp` files. + +### Building Binaries + +`build-binary.sh` is a small script to build the executable binaries from the +`.lisp` files. `./build-binary.sh foo.lisp` will build `foo`: + +```bash +#!/usr/bin/env bash + +set -euo pipefail + +LISP=$1 +NAME=$(basename "$1" .lisp) +shift + +sbcl --load "$LISP" \ + --eval "(sb-ext:save-lisp-and-die \"$NAME\" + :executable t + :save-runtime-options t + :toplevel '$NAME:toplevel)" +``` + +Here we see where the naming conventions have become important — we know that +the package is named the same as the binary and that it will have the symbol +`toplevel` exported, which always names the entry point for the binary. + +### Building Man Pages + +`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s +built-in `man` page generation. If you don't care about building `man` pages +for your personal programs you can ignore this. I admit that generating `man` +pages for these programs is a little bit silly because they're only for my own +personal use, but I get it for free with Adopt, so why not? + +```bash +#!/usr/bin/env bash + +set -euo pipefail + +LISP=$1 +NAME=$(basename "$LISP" .lisp) +OUT="$NAME.1" +shift + +sbcl --load "$LISP" \ + --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede) + (adopt:print-manual $NAME:*ui* :stream f))" \ + --quit +``` + +This is why we always name the Adopt interface variable `*ui*` and export it +from the package. + +### Makefile + +Finally we have a simple `Makefile` so we can run `make` to regenerate any +out of date binaries and `man` pages: + +```make +files := $(wildcard *.lisp) +names := $(files:.lisp=) + +.PHONY: all clean $(names) + +all: $(names) + +$(names): %: bin/% man/man1/%.1 + +bin/%: %.lisp build-binary.sh Makefile + mkdir -p bin + ./build-binary.sh $< + mv $(@F) bin/ + +man/man1/%.1: %.lisp build-manual.sh Makefile + mkdir -p man/man1 + ./build-manual.sh $< + mv $(@F) man/man1/ + +clean: + rm -rf bin man +``` + +We use a `wildcard` to automatically find the `.lisp` files so we don't have to +do anything extra after adding a new file when we want to make a new program. + +The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses +a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern) +to automatically define the phony rules for building each program. If +`$(names)` is `foo bar` this line effectively defines two phony rules: + +``` +foo: bin/foo man/man1/foo.1 +bar: bin/bar man/man1/bar.1 +``` + +This lets us run `make foo` to make both the binary and `man` page for +`foo.lisp`. + +## Case Study: A Batch Coloring Utility + +Now that we've seen the skeleton, let's look at one of my actual programs that +I use all the time. It's called `batchcolor` and it's used to highlight regular +expression matches in text (usually log files) with a twist: each unique match +is highlighted in a separate color, which makes it easier to visually parse the +result. + +For example: suppose we have some log files with lines of the form ` +[] ` where request ID is a UUID, and messages might +contain other UUIDs for various things. Such a log file might look something +like this: + +``` +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/ +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving... +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request. +2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404. +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/ +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving... +2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/ +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200. +2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused. +2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500. +``` + +If I try to just read this directly, it's easy for my eyes to glaze over unless +I laboriously walk line-by-line. + +[![Screenshot of uncolored log output](/static/images/blog/2021/03/uncolored.png)](/static/images/blog/2021/03/uncolored.png) + +I could use `grep` to highlight the UUIDs: + + grep -P \ + '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \ + example.log + +Unfortunately that doesn't really help too much because all the UUIDs are +highlighted the same color: + +[![Screenshot of grep-colored log output](/static/images/blog/2021/03/grepcolored.png)](/static/images/blog/2021/03/grepcolored.png) + +To get a more readable version of the log, I use `batchcolor`: + + batchcolor \ + '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \ + example.log + +`batchcolor` also highlights matches, but it highlights each unique match in its +own color: + +[![Screenshot of batchcolored log output](/static/images/blog/2021/03/batchcolored.png)](/static/images/blog/2021/03/batchcolored.png) + +This is *much* easier for me to visually parse. The interleaving of separate +request logs is now obvious from the colors of the IDs, and it's easy to match +up various user IDs and thing IDs at a glance. Did you even notice that the two +thing IDs were different before? + +`batchcolor` has a few other quality of life features, like picking explicit +colors for specific strings (e.g. red for `ERR`): + +[![Screenshot of fully batchcolored log output](/static/images/blog/2021/03/batchcoloredfull.png)](/static/images/blog/2021/03/batchcoloredfull.png) + +I use this particular `batchcolor` invocation so often I've put it in its own +tiny shell script. I use it to `tail` log files when developing locally almost +every day, and it makes visually scanning the log output *much* easier. It can +come in handy for other kinds of text too, like highlighting nicknames in an IRC +log. + +Let's step through its code piece by piece. + +### Libraries + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t)) +``` + +First we `quickload` libraries. We'll use [Adopt][] for command line argument +processing, [cl-ppcre][] for regular expressions, and the previously-mentioned +[with-user-abort][] to handle `control-c`. + +[cl-ppcre]: http://edicl.github.io/cl-ppcre/ + +### Package + +```lisp +(defpackage :batchcolor + (:use :cl) + (:export :toplevel :*ui*)) + +(in-package :batchcolor) +``` + +We define and switch to the appropriately-named package. Nothing special here. + +### Configuration + +```lisp +;;;; Configuration ------------------------------------------------------------ +(defparameter *start* 0) +(defparameter *dark* t) +``` + +Next we `defparameter` some variables to hold some settings. `*start*` will be +used later when randomizing colors, don't worry about it for now. + +### Errors + +```lisp +;;;; Errors ------------------------------------------------------------------- +(define-condition user-error (error) ()) + +(define-condition missing-regex (user-error) () + (:report "A regular expression is required.")) + +(define-condition malformed-regex (user-error) + ((underlying-error :initarg :underlying-error)) + (:report (lambda (c s) + (format s "Invalid regex: ~A" (slot-value c 'underlying-error))))) + +(define-condition overlapping-groups (user-error) () + (:report "Invalid regex: seems to contain overlapping capturing groups.")) + +(define-condition malformed-explicit (user-error) + ((spec :initarg :spec)) + (:report + (lambda (c s) + (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5." + (slot-value c 'spec))))) +``` + +Here we define the user errors. Some of these are self-explanatory, while +others will make more sense later once we see them in action. The specific +details aren't as important as the overall idea: for user errors we know might +happen, display a helpful error message instead of just spewing a backtrace at +the user. + +### Colorization + +Next we have the actual meat of the program. Obviously this is going to be +completely different for every program, so feel free to skip this if you don't +care about this specific problem. + +```lisp +;;;; Functionality ------------------------------------------------------------ +(defun rgb-code (r g b) + ;; The 256 color mode color values are essentially r/g/b in base 6, but + ;; shifted 16 higher to account for the intiial 8+8 colors. + (+ (* r 36) + (* g 6) + (* b 1) + 16)) +``` + +We're going to highlight different matches with different colors. We'll need +a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI +colors isn't enough. Full 24-bit truecolor is overkill, but the 8-bit ANSI +colors will work nicely. If we ignore the base colors, we essentially have +6 x 6 x 6 = 216 colors to work with. `rgb-code` will take the red, green, and +blue values from `0` to `5` and return the color code. See [Wikipedia][8bit] +for more information. + +[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit + +```lisp +(defun make-colors (excludep) + (let ((result (make-array 256 :fill-pointer 0))) + (dotimes (r 6) + (dotimes (g 6) + (dotimes (b 6) + (unless (funcall excludep (+ r g b)) + (vector-push-extend (rgb-code r g b) result))))) + result)) + +(defparameter *dark-colors* (make-colors (lambda (v) (< v 3)))) +(defparameter *light-colors* (make-colors (lambda (v) (> v 11)))) +``` + +Now we can build some arrays of colors. We *could* use any of the 216 available +colors, but in practice we probably don't want to, because the darkest colors +will be too dark to read on a dark terminal, and vice versa for light terminals. +In a concession to practicality we'll generate two separate arrays of colors, +one that excludes colors whose total value is too dark and one excluding those +that are too light. + +(Notice that `*dark-colors*` is "the array of colors which are suitable for use +on dark terminals" and not "the array of colors which are *themselves* dark". +Naming things is hard.) + +Note that these arrays will be generated when the `batchcolor.lisp` file is +`load`ed, which is *when we build the binary*. They *won't* be recomputed every +time you run the resulting binary. In this case it doesn't really matter (the +arrays are small) but it's worth remembering in case you ever have some data you +want (or don't want) to compute at build time instead of run time. + +```lisp +(defparameter *explicits* (make-hash-table :test #'equal)) +``` + +Here we make a hash table to store the strings and colors for strings we want to +explicitly color (e.g. `ERR` should be red, `INFO` cyan). The keys will be the +strings and values the RGB codes. + +```lisp +(defun djb2 (string) + ;; http://www.cse.yorku.ca/~oz/hash.html + (reduce (lambda (hash c) + (mod (+ (* 33 hash) c) (expt 2 64))) + string + :initial-value 5381 + :key #'char-code)) + +(defun find-color (string) + (gethash string *explicits* + (let ((colors (if *dark* *dark-colors* *light-colors*))) + (aref colors + (mod (+ (djb2 string) *start*) + (length colors)))))) +``` + +For strings that we want to explicitly color, we just look up the appropriate +code in `*explicits*` and return it. + +Otherwise, we want to highlight unique matches in different colors. There are +a number of different ways we could do this, for example: we could randomly pick +a color the first time we see a string and store it in a hash table for +subsequent encounters. But this would mean we'd grow that hash table over time, +and one of the things I often use this utility for is `tail -f`ing long-running +processes when developing locally, so the memory usage would grow and grow until +the `batchcolor` process was restarted, which isn't ideal. + +Instead, we'll hash each string with a simple [DJB hash][djb] and use it to +index into the appropriate array of colors. This ensures that identical matches +get identical colors, and avoids having to store every match we've ever seen. + +There will be some collisions, but there's not much we can do about that with +only ~200 colors to work with. We could have used 16-bit colors like +I mentioned before, but then we'd have to worry about picking colors different +enough for humans to easily tell apart, and for this simple utility I didn't +want to bother. + +We'll talk about `*start*` later, ignore it for now (it's `0` by default). + +[djb]: http://www.cse.yorku.ca/~oz/hash.html + +```lisp +(defun ansi-color-start (color) + (format nil "~C[38;5;~Dm" #\Escape color)) + +(defun ansi-color-end () + (format nil "~C[0m" #\Escape)) + +(defun print-colorized (string) + (format *standard-output* "~A~A~A" + (ansi-color-start (find-color string)) + string + (ansi-color-end))) +``` + +Next we have some functions to output the appropriate ANSI escapes to highlight +our matches. We could use a library for this but it's only two lines. [It's +not worth it](http://xn--rpa.cc/irl/term.html). + +And now we have the beating heart of the program: + +```lisp +(defun colorize-line (scanner line &aux (start 0)) + (ppcre:do-scans (ms me rs re scanner line) + ;; If we don't have any register groups, colorize the entire match. + ;; Otherwise, colorize each matched capturing group. + (let* ((regs? (plusp (length rs))) + (starts (if regs? (remove nil rs) (list ms))) + (ends (if regs? (remove nil re) (list me)))) + (map nil (lambda (word-start word-end) + (unless (<= start word-start) + (error 'overlapping-groups)) + (write-string line *standard-output* :start start :end word-start) + (print-colorized (subseq line word-start word-end)) + (setf start word-end)) + starts ends))) + (write-line line *standard-output* :start start)) +``` + +`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with +any of the desired matches colorized appropriately. There are a few things to +note here. + +First: if the regular expression contains any capturing groups, we'll only +colorize those parts of the match. For example: if you run `batchcolor +'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves +will be highlighted, not the surrounding angle brackets. Otherwise, if there +are no capturing groups in the regular expression, we'll highlight the entire +match (as if there were one big capturing group around the whole thing). + +Second: overlapping capturing groups are explicitly disallowed and +a `user-error` signaled if we notice any. It's not clear what do to in this +case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be? +Highlight `f` and `oo` in the same color? In different colors? Should the `oo` +be a different color than the `oo` in `boo`? There's too many options with no +clear winner, so we'll just tell the user to be more clear. + +To do the actual work we iterate over each match and print the non-highlighted +text before the match, then print the highlighted match. Finally we print any +remaining text after the last match. + +### Not-Quite-Top-Level Interface + +```lisp +;;;; Run ---------------------------------------------------------------------- +(defun run% (scanner stream) + (loop :for line = (read-line stream nil) + :while line + :do (colorize-line scanner line))) + +(defun run (pattern paths) + (let ((scanner (handler-case (ppcre:create-scanner pattern) + (ppcre:ppcre-syntax-error (c) + (error 'malformed-regex :underlying-error c)))) + (paths (or paths '("-")))) + (dolist (path paths) + (if (string= "-" path) + (run% scanner *standard-input*) + (with-open-file (stream path :direction :input) + (run% scanner stream)))))) +``` + +Here we have the not-quite-top-level interface to the program. `run` takes +a pattern string and a list of paths and runs the colorization on each path. +This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>" +"foo.txt")`, so we can test without worrying about killing the Lisp process. + +### User Interface + +In the last chunk of the file we have the user interface. There are a couple of +things to note here. + +I'm using a command line argument parsing library I wrote myself: [Adopt][]. +I won't go over exactly what all the various Adopt functions do. Most of them +should be fairly easy to understand, but [check out the Adopt +documentation][adopt-usage] for the full story if you're curious. + +[adopt-usage]: https://docs.stevelosh.com/adopt/usage/ + +If you prefer another library (and there are quite a few around) feel free +to use it — it should be pretty easy to adapt this setup to a different library. +The only things you'd need to change would be the `toplevel` function and the +`build-manual.sh` script (if you even care about building `man` pages at all). + +You might also notice that the user interface for the program is almost as much +code as the entire rest of the program. This may seem strange, but I think it +makes a certain kind of sense. When you're writing code to interface with an +external system, a messier and more complicated external system will usually +require more code than a cleaner and simpler external system. A human brain is +probably the messiest and most complicated external system you'll ever have to +deal with, so it's worth taking the extra time and code to be especially careful +when writing an interface to it. + +First we'll define a typical `-h`/`--help` option: + +```lisp +(defparameter *option-help* + (adopt:make-option 'help + :help "Display help and exit." + :long "help" + :short #\h + :reduce (constantly t))) +``` + +Next we'll define a pair of options for enabling/disabling the Lisp debugger: + +```lisp +(adopt:defparameters (*option-debug* *option-no-debug*) + (adopt:make-boolean-options 'debug + :long "debug" + :short #\d + :help "Enable the Lisp debugger." + :help-no "Disable the Lisp debugger (the default).")) +``` + +By default the debugger will be off, so any unexpected error will print +a backtrace to standard error and exit with a nonzero exit code. This is the +default because if I add a `batchcolor` somewhere in a shell script, I probably +don't want to suddenly hang the entire script if something breaks. But we still +want to be *able* to get into the debugger manually if something goes wrong. +This is Common Lisp — we don't have to settle for a stack trace or core dump, we +can have a real interactive debugger in the final binary. + +Note how Adopt's `make-boolean-options` function creates *two* options here: + +* `-d`/`--debug` will enable the debugger. +* `-D`/`--no-debug` will disable the debugger. + +Even though *disabled* is the default, it's still important to have both +switches for boolean options like this. If someone wants the debugger to be +*enabled* by default instead (along with some other configuration options), they +might have a shell alias like this: + + alias bcolor='batchcolor --debug --foo --bar' + +But sometimes they might want to temporarily *disable* the debugger for a single +run. Without a `--no-debug` option, they would have to run the vanilla +`batchcolor` and retype all the *other* options. But having the `--no-debug` +option allows them to just say: + + bcolor --no-debug + +This would expand to: + + batchcolor --debug --foo --bar --no-debug + +The later option wins, and the user gets the behavior they expect. + +Next we'll define some color-related options. First an option to randomize the +colors each run, instead of always picking the same color for a particular +string, and then a toggle for choosing colors that work for dark or light +terminals: + +```lisp +(adopt:defparameters (*option-randomize* *option-no-randomize*) + (adopt:make-boolean-options 'randomize + :help "Randomize the choice of color each run." + :help-no "Do not randomize the choice of color each run (the default)." + :long "randomize" + :short #\r)) + +(adopt:defparameters (*option-dark* *option-light*) + (adopt:make-boolean-options 'dark + :name-no 'light + :long "dark" + :long-no "light" + :help "Optimize for dark terminals (the default)." + :help-no "Optimize for light terminals." + :initial-value t)) +``` + +The last option we'll define is `-e`/`--explicit`, to allow the user to select +an explicit color for a particular string: + +```lisp +(defun parse-explicit (spec) + (ppcre:register-groups-bind + ((#'parse-integer r g b) string) + ("^([0-5]),([0-5]),([0-5]):(.+)$" spec) + (return-from parse-explicit (cons string (rgb-code r g b)))) + (error 'malformed-explicit :spec spec)) + +(defparameter *option-explicit* + (adopt:make-option 'explicit + :parameter "R,G,B:STRING" + :help "Highlight STRING in an explicit color. May be given multiple times." + :manual (format nil "~ + Highlight STRING in an explicit color instead of randomly choosing one. ~ + R, G, and B must be 0-5. STRING is treated as literal string, not a regex. ~ + Note that this doesn't automatically add STRING to the overall regex, you ~ + must do that yourself! This is a known bug that may be fixed in the future.") + :long "explicit" + :short #\e + :key #'parse-explicit + :reduce #'adopt:collect)) +``` + +Notice how we signal a `malformed-explicit` condition if the user gives us +mangled text. This is a subtype of `user-error`, so the program will print the +error and exit even if the debugger is enabled. We also include a slightly more +verbose description in the `man` page than the terse one in the `--help` text. + +Next we write the main help and manual text, as well as some real-world +examples: + +```lisp +(adopt:define-string *help-text* + "batchcolor takes a regular expression and matches it against standard ~ + input one line at a time. Each unique match is highlighted in its own color.~@ + ~@ + If the regular expression contains any capturing groups, only those parts of ~ + the matches will be highlighted. Otherwise the entire match will be ~ + highlighted. Overlapping capturing groups are not supported.") + +(adopt:define-string *extra-manual-text* + "If no FILEs are given, standard input will be used. A file of - stands for ~ + standard input as well.~@ + ~@ + Overlapping capturing groups are not supported because it's not clear what ~ + the result should be. For example: what should ((f)oo|(b)oo) highlight when ~ + matched against 'foo'? Should it highlight 'foo' in one color? The 'f' in ~ + one color and 'oo' in another color? Should that 'oo' be the same color as ~ + the 'oo' in 'boo' even though the overall match was different? There are too ~ + many possible behaviors and no clear winner, so batchcolor disallows ~ + overlapping capturing groups entirely.") + +(defparameter *examples* + '(("Colorize IRC nicknames in a chat log:" + . "cat channel.log | batchcolor '<(\\\\w+)>'") + ("Colorize UUIDs in a request log:" + . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'") + ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):" + . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log") + ("Colorize earmuffed symbols in a Lisp file:" + . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp"))) +``` + +Finally we can wire everything together in the main Adopt interface: + +```lisp +(defparameter *ui* + (adopt:make-interface + :name "batchcolor" + :usage "[OPTIONS] REGEX [FILE...]" + :summary "colorize regex matches in batches" + :help *help-text* + :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*) + :examples *examples* + :contents (list + *option-help* + *option-debug* + *option-no-debug* + (adopt:make-group 'color-options + :title "Color Options" + :options (list *option-randomize* + *option-no-randomize* + *option-dark* + *option-light* + *option-explicit*))))) +``` + +All that's left to do is the top-level function that will be called when the +binary is executed. + +### Top-Level Interface + +Before we write `toplevel` we've got a couple of helpers: + +```lisp +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (adopt:exit 130)))) + +(defun configure (options) + (loop :for (string . rgb) :in (gethash 'explicit options) + :do (setf (gethash string *explicits*) rgb)) + (setf *start* (if (gethash 'randomize options) + (random 256 (make-random-state t)) + 0) + *dark* (gethash 'dark options))) +``` + +Our `toplevel` function looks much like the one in the skeleton, but fleshed out +a bit more: + +```lisp +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*) + (when (gethash 'debug options) + (sb-ext:enable-debugger)) + (handler-case + (cond + ((gethash 'help options) (adopt:print-help-and-exit *ui*)) + ((null arguments) (error 'missing-regex)) + (t (destructuring-bind (pattern . files) arguments + (configure options) + (run pattern files)))) + (user-error (e) (adopt:print-error-and-exit e)))))) +``` + +This `toplevel` has a few extra bits beyond the skeletal example. + +First, we disable the debugger immediately, and then re-enable it later if the +user asks us to. We want to keep it disabled until *after* argument parsing +because we can't know whether the user wants it or not until we parse the +arguments. + +Instead of just blindly running `run`, we check for `--help` and print it if +desired. We also validate that the user passes the correct amount of arguments, +signaling a subtype of `user-error` if they don't. Assuming everything looks +good we handle the configuration, call `run`, and that's it! + +Running `make` generates `bin/batchcolor` and `man/man1/batchcolor.1`, and we +can view our log files in beautiful color. + +## More Information + +I hope this overview was helpful. This has worked for me, but Common Lisp is +a flexible language, so if you want to use this layout as a starting point and +modify it for your own needs, go for it! + +If you want to see some more examples you can find them in [my dotfiles +repository](https://hg.stevelosh.com/dotfiles/file/tip/lisp). Some of the more +fun ones include: + +* `weather` for displaying the weather over the next few hours so I can tell if + I need a jacket or umbrella before I go out for a walk. +* `retry` to retry shell commands if they fail, with options for how many times + to retry, strategies for waiting/backing off on failure, etc. +* `pick` to interactively filter the output of one command into another + (inspired by the `pick` program in "The UNIX Programming Environment" but with + more options). + +The approach I laid out in this post works well for small, single-file programs. +If you're creating a larger program you'll probably want to move to a full ASDF +system in its own directory/repository. My friend Ian [wrote a post about +that](http://atomized.org/blog/2020/07/06/common-lisp-in-practice/) which you +might find interesting. diff -r c635f15b37c1 -r a9d8b7a86226 static/images/blog/2021/03/batchcolored.png Binary file static/images/blog/2021/03/batchcolored.png has changed diff -r c635f15b37c1 -r a9d8b7a86226 static/images/blog/2021/03/batchcoloredfull.png Binary file static/images/blog/2021/03/batchcoloredfull.png has changed diff -r c635f15b37c1 -r a9d8b7a86226 static/images/blog/2021/03/grepcolored.png Binary file static/images/blog/2021/03/grepcolored.png has changed diff -r c635f15b37c1 -r a9d8b7a86226 static/images/blog/2021/03/uncolored.png Binary file static/images/blog/2021/03/uncolored.png has changed