--- a/content/blog/2021/01/small-common-lisp-cli-programs.markdown Sun Dec 27 22:08:26 2020 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,854 +0,0 @@
-(:title "Writing Small CLI Programs in Common Lisp"
- :snip "Somewhere between tiny shell scripts and full projects."
- :date "2021-01-02T15:50:00Z"
- :draft t)
-
-I've found Common Lisp to be a good language for this. But it can be a little
-intimidating to get started, especially for beginners, because Common Lisp is
-a very flexible language and doesn't lock you into one way of working. In this
-post I'll describe how I write small, stand-alone command line programs in
-Common Lisp.
-
-<div id="toc"></div>
-
-## Requirements
-
-When you're writing programs in Common Lisp, you've got a lot of options. For
-this use case, laying out the requirements I have helped me decide on an
-approach.
-
-First, each new program should be one single file. A few ancillary files for
-all scripts together (e.g. a `Makefile`) are okay, but adding a new program
-should mean adding one single file. For larger programs a full project
-directory and ASDF system are great, but for small programs having one file per
-program reduces the overhead quite a bit.
-
-The programs need to be able to be developed in the typical Common Lisp
-interactive style (in my case: with Swank and VLIME). The interactive
-development is one of the best parts of working in Common Lisp, and I won't give
-it up. In particular, this means that a shell-script style approach, with
-`#!/path/to/sbcl --script` and the top and directly running code at the top
-level in the file does not work for two main reasons:
-
-* Loading that file will fail due to the shebang unless you have some ugly
- reader macros in your startup file.
-* The program will need to do things like parsing command-line arguments and
- exiting with an error code, and running `exit` would kill the Swank process.
-
-The programs need to be able to use libraries, so Quicklisp will need to be
-involved. Common Lisp has a lot of nice things built-in, but there's some
-libraries that are just too useful to pass up.
-
-The programs will need to have proper user interfaces. Command line arguments
-must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abc foo -d`
-should work as expected), malformed or unknown options must be caught instead of
-dropping them on the floor, error messages should be meaningful, and the
-`--help` should be thoroughly and thoughtfully written.
-
-Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp`
-and defines a package `foo` with a function called `toplevel`) is okay if it
-makes my life easier. These programs are just for me, so I don't have to worry
-about people wanting to create executables with spaces in the name or something.
-
-## Solution Skeleton
-
-After trying a number of different approaches, I've settled on a solution that
-I'm pretty happy with. First I'll describe the general approach, then we'll
-look at one actual example program in its entirety.
-
-### Directory Structure
-
-I keep all my small single-file Common Lisp programs in a `lisp` directory
-inside my dotfiles repository. Its contents look like this:
-
-```
-…/lisp/
- bin/
- foo
- bar
- man/
- man1/
- foo.1
- bar.1
- build-binary.sh
- build-manual.sh
- Makefile
- foo.lisp
- bar.lisp
-```
-
-The `bin` directory is where the executable files end up. I've added it to my
-`$PATH` so I don't have to symlink or copy the binaries anywhere.
-
-`man` contains the generated `man` pages. Because it's adjacent to `bin` (which
-is on my path) the `man` program automatically finds the `man` pages as
-expected.
-
-`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make
-building programs easier.
-
-The `.lisp` files are, of course, the programs. Each new program I want to add
-only requires adding the `<programname>.lisp` file in this directory and running
-`make`.
-
-### Lisp Files
-
-All my small Common Lisp programs follow a few conventions, which makes building
-them easier. Let's look at the skeleton of a `foo.lisp` file as an example.
-I'll show the entire file here, and then step through it piece by piece.
-
-```lisp
-(eval-when (:compile-toplevel :load-toplevel :execute)
- (ql:quickload '(… :with-user-abort) :silent t))
-
-(defpackage :foo
- (:use :cl)
- (:export :toplevel …))
-
-(in-package :foo)
-
-;;;; Configuration -----------------------------------------------
-(defparameter *version* "1.0.0")
-(defparameter *some-option* nil)
-
-;;;; Errors ------------------------------------------------------
-(define-condition user-error (error) ())
-
-(define-condition missing-foo (user-error) ()
- (:report "A foo is required, but none was supplied."))
-
-;;;; Functionality -----------------------------------------------
-(defun foo (string)
- …)
-
-;;;; Run ---------------------------------------------------------
-(defun run (arguments)
- (map nil #'foo arguments))
-
-;;;; User Interface ----------------------------------------------
-(defmacro exit-on-ctrl-c (&body body)
- `(handler-case (with-user-abort:with-user-abort (progn ,@body))
- (with-user-abort:user-abort () (sb-ext:exit :code 130))))
-
-(defparameter *ui*
- (adopt:make-interface
- :name "foo"
- …))
-
-(defun toplevel ()
- (sb-ext:disable-debugger)
- (exit-on-ctrl-c
- (multiple-value-bind (arguments options)
- (adopt:parse-options-or-exit *ui*)
- … ; Handle options.
- (handler-case (run arguments)
- (user-error (e)
- (format *error-output* "error: ~A~%" e)
- (adopt:exit 1))))))
-```
-
-Let's go through each chunk of this.
-
-```lisp
-(eval-when (:compile-toplevel :load-toplevel :execute)
- (ql:quickload '(:with-user-abort …) :silent t))
-```
-
-First we `quickload` any necessary libraries. We always want to do this, even
-when compiling the file, because we need the appropriate packages to be loaded
-when we try to use their symbols later in the file.
-
-[`with-user-abort`](https://github.com/compufox/with-user-abort) is a library
-for portably handling `control-c`, which all of these small programs use.
-
-```lisp
-(defpackage :foo
- (:use :cl)
- (:export :toplevel *ui*))
-
-(in-package :foo)
-```
-
-Next we define a package `foo` and switch to it. The package is always named
-the same as the resulting binary and the basename of the file. The package
-always exports the symbols `toplevel` and `*ui*`.
-
-```lisp
-;;;; Configuration -----------------------------------------------
-(defparameter *version* "1.0.0")
-(defparameter *some-option* nil)
-```
-
-Next we define any configuration variables. These will be set later after
-parsing the command line arguments (when we run the command line program) or
-at the REPL (when developing interactively).
-
-```lisp
-;;;; Errors ------------------------------------------------------
-(define-condition user-error (error) ())
-
-(define-condition missing-foo (user-error) ()
- (:report "A foo is required, but none was supplied."))
-```
-
-We define a `user-error` condition, and any errors the user might make will
-inherit from it. This will make it easy to treat user errors (e.g. passing
-a mangled regular expression like `(foo+` as an argument) differently from
-programming errors (i.e. bugs). This makes it easier to treat those errors
-differently:
-
-* Bugs should print a backtrace or enter the debugger.
-* Expected user errors should print a helpful error message with no backtrace or debugger.
-
-```lisp
-;;;; Functionality -----------------------------------------------
-(defun foo (string)
- …)
-```
-
-Next we have the actual functionality of the program.
-
-```lisp
-;;;; Run ---------------------------------------------------------
-(defun run (arguments)
- (map nil #'foo arguments))
-```
-
-We now define a function `run` that takes some arguments (as strings) and
-performs the main work of the program.
-
-Importantly, `run` does **not** handle command line argument parsing, and it does
-**not** exit the program with an error code, which means we can safely call it
-to "run the program" when we're developing interactively without worrying about
-it killing our Lisp process.
-
-Finally, we need to define the command line interface.
-
-```lisp
-;;;; User Interface ----------------------------------------------
-(defmacro exit-on-ctrl-c (&body body)
- `(handler-case (with-user-abort:with-user-abort (progn ,@body))
- (with-user-abort:user-abort () (sb-ext:exit :code 130))))
-```
-
-We'll make a little macro around `with-user-abort` to make it less wordy. We'll
-[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the
-user presses `ctrl-c`.
-
-```lisp
-(defparameter *ui*
- (adopt:make-interface
- :name "foo"
- …))
-```
-
-Here we define the `*ui*` variable whose symbol we exported above. [Adopt][] is
-a command line argument parsing library I wrote. If you want to use a different
-library, feel free.
-
-[Adopt]: https://docs.stevelosh.com/adopt
-
-```lisp
-(defun toplevel ()
- (sb-ext:disable-debugger)
- (exit-on-ctrl-c
- (multiple-value-bind (arguments options)
- (adopt:parse-options-or-exit *ui*)
- … ; Handle options.
- (handler-case (run arguments)
- (user-error (e)
- (format *error-output* "error: ~A~%" e)
- (adopt:exit 1))))))
-```
-
-And finally we define the `toplevel` function. This will only ever be called
-when the program is run as a standalone program, never interactively. It
-handles all the work beyond the main guts of the program that are handled by the
-`run` function:
-
-* Disabling or enabling the debugger.
-* Exiting the process with the appropriate status code on errors.
-* Parsing command line arguments.
-
-That's it for the structure of the `.lisp` files.
-
-### Building Binaries
-
-`build-binary.sh` is a small script to build the executable binaries from the
-`.lisp` files. `./build-binary.sh foo.lisp` will build `foo`:
-
-```bash
-#!/usr/bin/env bash
-
-set -euo pipefail
-
-LISP=$1
-NAME=$(basename "$1" .lisp)
-shift
-
-sbcl --load "$LISP" \
- --eval "(sb-ext:save-lisp-and-die \"$NAME\"
- :executable t
- :save-runtime-options t
- :toplevel '$NAME:toplevel)"
-```
-
-Here we see where the naming conventions have become important — we know that
-the package is named the same as the binary and that it will have the symbol
-`toplevel` exported, which names the entry point for the binary.
-
-### Building Man Pages
-
-`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s
-built-in `man` page generation. If you don't care about building `man` pages
-for your personal programs (I admit, it's a little bit silly) you can ignore
-this.
-
-```bash
-#!/usr/bin/env bash
-
-set -euo pipefail
-
-LISP=$1
-NAME=$(basename "$LISP" .lisp)
-OUT="$NAME.1"
-shift
-
-sbcl --load "$LISP" \
- --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede)
- (adopt:print-manual $NAME:*ui* :stream f))" \
- --quit
-```
-
-This is why we always name the Adopt interface variable `*ui*` and export it
-from the package.
-
-### Makefile
-
-Finally we have a simple `Makefile` so we can run `make` to regenerate any
-out of date binaries and `man` pages:
-
-```make
-files := $(wildcard *.lisp)
-names := $(files:.lisp=)
-
-.PHONY: all clean $(names)
-
-all: $(names)
-
-$(names): %: bin/% man/man1/%.1
-
-bin/%: %.lisp build-binary.sh Makefile
- mkdir -p bin
- ./build-binary.sh $<
- mv $(@F) bin/
-
-man/man1/%.1: %.lisp build-manual.sh Makefile
- mkdir -p man/man1
- ./build-manual.sh $<
- mv $(@F) man/man1/
-
-clean:
- rm -rf bin man
-```
-
-We use a `wildcard` to automatically find the `.lisp` files so we don't have to
-do anything other than adding a new file when we want to make a new program.
-
-The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses
-a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern)
-to automatically define the phony rules for building each program. If
-`$(names)` is `foo bar` this line effectively defines two phony rules:
-
-```
-foo: bin/foo man/man1/foo.1
-bar: bin/bar man/man1/bar.1
-```
-
-This lets us run `make foo` to make the binary and `man` page for `foo.lisp`.
-
-## Case Study: A Batch Coloring Utility
-
-Now that we've seen the skeleton, let's look at one of my actual programs that
-I use all the time. It's called `batchcolor` and it's used to highlight regular
-expression matches in text (usually log files in my case) with a twist: each
-unique match is highlighted in a separate color, which makes it easier to
-visually parse the result.
-
-For example, suppose we have some log files with lines of the form `<timestamp>
-[<request ID>] <level> <message>` where request ID is a UUID, and messages might
-contain other UUIDs for various things. Such a log file might look something
-like this:
-
-```
-2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/
-2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving...
-2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request.
-2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404.
-2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/
-2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving...
-2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/
-2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200.
-2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused.
-2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500.
-```
-
-If I try to just read this directly, it's easy for my eyes to glaze over unless
-I laboriously read line-by-line. I can use `grep` to highlight the UUIDs, but
-that honestly doesn't help too much:
-
- grep -P '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}|$' foo.log
-
-`batchcolor` also highlights matches, but highlights each unique match in its
-own color:
-
-This is *much* easier for me to visually parse. The interleaving of separate
-request logs is now obvious from the colors of the IDs, and it's easy to match
-up various user IDs and thing IDs at a glance. Did you even notice that the two
-thing IDs were different before?
-
-`batchcolor` has a few other simple quality of life features, like picking
-explicit colors for specific strings (e.g. red for `ERR`):
-
-I wrap up this `batchcolor` invocation in an alias and use it to `tail` log
-files when developing locally almost every day, and it makes reading the log
-output *much* easier.
-
-Let's step through its code piece by piece.
-
-### Libraries
-
-```lisp
-(eval-when (:compile-toplevel :load-toplevel :execute)
- (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t))
-```
-
-First we `quickload` libraries. We'll use [Adopt][] for command line argument
-processing, [cl-ppcre][] for regular expressions, and the previously-mentioned
-[with-user-abort][] to handle `control-c`.
-
-### Package
-
-```lisp
-(defpackage :batchcolor
- (:use :cl)
- (:export :toplevel :*ui*))
-
-(in-package :batchcolor)
-```
-
-We define and switch to the appropriately-named package. Nothing special here.
-
-### Configuration
-
-```lisp
-;;;; Configuration ------------------------------------------------------------
-(defparameter *version* "1.0.0")
-(defparameter *start* 0)
-(defparameter *dark* t)
-```
-
-Next we `defparameter` some variables to hold useful values (like the version)
-and settings. `*start*` will be used later when randomizing colors, don't worry
-about it for now.
-
-### Errors
-
-```lisp
-;;;; Errors -------------------------------------------------------------------
-(define-condition user-error (error) ())
-
-(define-condition missing-regex (user-error) ()
- (:report "A regular expression is required."))
-
-(define-condition malformed-regex (user-error)
- ((underlying-error :initarg :underlying-error))
- (:report (lambda (c s)
- (format s "Invalid regex: ~A" (slot-value c 'underlying-error)))))
-
-(define-condition overlapping-groups (user-error) ()
- (:report "Invalid regex: seems to contain overlapping capturing groups."))
-
-(define-condition malformed-explicit (user-error)
- ((spec :initarg :spec))
- (:report (lambda (c s)
- (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5."
- (slot-value c 'spec)))))
-```
-
-Here we define the user errors. Some of these are self-explanatory, while
-others will make more sense later once we see them in action. The specific
-details aren't as important as the overall idea: for user errors we know might
-happen, display a helpful error message instead of just spewing a backtrace at
-the user.
-
-### Colorization
-
-Next we have the actual meat of the program. Obviously this is going to be
-completely different for every program, so feel free to skip this if you don't
-care about this specific problem.
-
-```lisp
-;;;; Functionality ------------------------------------------------------------
-(defun rgb-code (r g b)
- ;; The 256 color mode color values are essentially r/g/b in base 6, but
- ;; shifted 16 higher to account for the intiial 8+8 colors.
- (+ (* r 36)
- (* g 6)
- (* b 1)
- 16))
-```
-
-We're going to highlight different matches with different colors. We'll need
-a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI
-colors isn't enough. Full 24-bit truecolor is overkill, but the 8-bit ANSI
-colors will work nicely. If we ignore the base colors, we essentially have
-6 x 6 x 6 = 216 colors to work with. `rgb-code` will take the red, green, and
-blue values from `0` to `5` and return the color code. See [Wikipedia][8bit]
-for more information.
-
-[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit
-
-```lisp
-(defun make-colors (excludep)
- (let ((result (make-array 256 :fill-pointer 0)))
- (dotimes (r 6)
- (dotimes (g 6)
- (dotimes (b 6)
- (unless (funcall excludep (+ r g b))
- (vector-push-extend (rgb-code r g b) result)))))
- result))
-
-(defparameter *dark-colors* (make-colors (lambda (v) (< v 3))))
-(defparameter *light-colors* (make-colors (lambda (v) (> v 11))))
-```
-
-Now we can build some arrays of colors. We *could* use any of the 216 available
-colors, but in practice we probably don't want to, because the darkest colors
-will be too dark to read on a dark terminal, and vice versa for light terminals.
-In a concession to practicality we'll generate two separate arrays of colors,
-one that excludes colors whose total value is too dark and one excluding those
-that are too light.
-
-(You might notice that `*dark-colors*` is "the array of colors for dark
-terminals" and not "the array of colors which are not light". Naming things is
-hard.)
-
-Note that these arrays will be generated when the `batchcolor.lisp` file is
-`load`ed, which is *when we build the binary*. They *won't* be recomputed every
-time you run the resulting binary. In this case it doesn't really matter (the
-arrays are small) but it's worth remembering in case you ever have some data you
-want (or don't want) to compute at build time instead of run time.
-
-```lisp
-(defparameter *explicits* (make-hash-table :test #'equal))
-```
-
-Here we make a hash table to store the strings and colors for strings we want to
-explicitly color (e.g. `ERR` should be red, `INFO` cyan). The keys will be the
-strings and values the RGB codes.
-
-```lisp
-(defun djb2 (string)
- ;; http://www.cse.yorku.ca/~oz/hash.html
- (reduce (lambda (hash c)
- (mod (+ (* 33 hash) c) (expt 2 64)))
- string
- :initial-value 5381
- :key #'char-code))
-
-(defun find-color (string)
- (gethash string *explicits*
- (let ((colors (if *dark* *dark-colors* *light-colors*)))
- (aref colors
- (mod (+ (djb2 string) *start*)
- (length colors))))))
-```
-
-For strings that we want to explicitly color, we just look up the appropriate
-code in `*explicits*` and return it.
-
-Otherwise, we want to highlight unique matches in different colors. There are
-a number of different ways we could do this. For example, we could randomly
-pick a color the first time we see a string and store it in a hash table for
-subsequent encounters. But this would mean we'd grow over time, and one of the
-things I often use this utility for is `tail -f`ing long-running processes when
-developing locally, so the memory usage would grow and grow over time until the
-`batchcolor` process was restarted, which isn't ideal.
-
-Instead, we'll hash each string with a simple [DJB hash][djb] and use it to
-index into the appropriate array of colors. This ensures that identical matches
-get identical colors, and avoids having to store every match we've ever seen.
-
-We'll talk about `*start*` later, ignore it for now (it's `0` by default).
-
-[djb]: http://www.cse.yorku.ca/~oz/hash.html
-
-```lisp
-(defun ansi-color-start (color)
- (format nil "~C[38;5;~Dm" #\Escape color))
-
-(defun ansi-color-end ()
- (format nil "~C[0m" #\Escape))
-
-(defun print-colorized (string)
- (format *standard-output* "~A~A~A"
- (ansi-color-start (find-color string))
- string
- (ansi-color-end)))
-```
-
-Next we have some functions to output the appropriate ANSI escapes to highlight
-our matches. We could use a library for this but it's only 2 lines. [It's not
-worth it](http://xn--rpa.cc/irl/term.html).
-
-And now we have the beating heart of the program:
-
-```lisp
-(defun colorize-line (scanner line &aux (start 0))
- (ppcre:do-scans (ms me rs re scanner line)
- ;; If we don't have any register groups, colorize the entire match.
- ;; Otherwise, colorize each matched capturing group.
- (let* ((regs? (plusp (length rs)))
- (starts (if regs? (remove nil rs) (list ms)))
- (ends (if regs? (remove nil re) (list me))))
- (map nil (lambda (word-start word-end)
- (unless (<= start word-start)
- (error 'overlapping-groups))
- (write-string line *standard-output* :start start :end word-start)
- (print-colorized (subseq line word-start word-end))
- (setf start word-end))
- starts ends)))
- (write-line line *standard-output* :start start))
-```
-
-`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with
-any of the desired matches colorized appropriately. There are a few things to
-note here.
-
-First: if the regular expression contains any capturing groups, we will only
-colorize those parts of the match. For example, if you run `batchcolor
-'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves
-will be highlighted, not the surrounding angle brackets. Otherwise, if there
-are no capturing groups in the regular expression, we'll highlight the entire
-match (as if there were one big capturing group around the whole thing).
-
-Second: overlapping capturing groups are explicitly disallowed and
-a `user-error` signaled if we notice any. It's not clear what do to in this
-case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be?
-Highlight `f` and `oo` in the same color? In different colors? Should the `oo`
-be a different color than the `oo` in `boo`? There's too many options with no
-clear winner, so we'll just tell the user to be more clear.
-
-To do the actual work, we iterate over each match and print the non-highlighted
-text before the match, then print the highlighted match. Finally we print any
-remaining text after the last match.
-
-### Not-Quite-Top-Level Interface
-
-```lisp
-;;;; Run ----------------------------------------------------------------------
-(defun run% (scanner stream)
- (loop :for line = (read-line stream nil)
- :while line
- :do (colorize-line scanner line)))
-
-(defun run (pattern paths)
- (let ((scanner (handler-case (ppcre:create-scanner pattern)
- (ppcre:ppcre-syntax-error (c)
- (error 'malformed-regex :underlying-error c))))
- (paths (or paths '("-"))))
- (dolist (path paths)
- (if (string= "-" path)
- (run% scanner *standard-input*)
- (with-open-file (stream path :direction :input)
- (run% scanner stream))))))
-```
-
-Here we have the not-quite-top-level interface to the program. `run` takes
-a pattern string and a list of paths and runs the colorization on each path.
-This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>"
-"foo.txt")`, so we can test without worrying about killing the Lisp process.
-
-### User Interface
-
-In the last chunk of the file we have the user interface. There are a couple of
-things to note here.
-
-I'm using a command line argument parsing library I wrote myself: [Adopt][].
-But if you prefer another library (and there are quite a few around) feel free
-to use it — it should be pretty easy to adapt this setup to a different library.
-The only things you'd need to change would be the `toplevel` function and the
-`build-manual.sh` script (if you even care about building `man` pages at all).
-
-You might also notice that the user interface for the program is almost as much
-code as the entire rest of the program. This may seem disconcerting at first,
-but I think it makes a certain kind of sense. When you're writing code to
-interface with an external system, a messier and more complicated external
-system will usually require more code than a cleaner and simpler external
-system. A human brain is probably the messiest and most complicated external
-system you'll ever have to deal with, so it's worth taking the extra time and
-code to be especially careful when writing an interface to it.
-
-```lisp
-(defparameter *option-help*
- (adopt:make-option 'help
- :help "Display help and exit."
- :long "help"
- :short #\h
- :reduce (constantly t)))
-
-(defparameter *option-version*
- (adopt:make-option 'version
- :help "Display version information and exit."
- :long "version"
- :reduce (constantly t)))
-```
-
-```lisp
-(adopt:defparameters (*option-debug* *option-no-debug*)
- (adopt:make-boolean-options 'debug
- :long "debug"
- :short #\d
- :help "Enable the Lisp debugger."
- :help-no "Disable the Lisp debugger (the default)."))
-```
-
-```lisp
-(adopt:defparameters (*option-randomize* *option-no-randomize*)
- (adopt:make-boolean-options 'randomize
- :help "Randomize the choice of color each run."
- :help-no "Do not randomize the choice of color each run (the default)."
- :long "randomize"
- :short #\r))
-
-(adopt:defparameters (*option-dark* *option-light*)
- (adopt:make-boolean-options 'dark
- :name-no 'light
- :long "dark"
- :long-no "light"
- :help "Optimize for dark terminals (the default)."
- :help-no "Optimize for light terminals."
- :initial-value t))
-```
-
-```lisp
-;;;; User Interface -----------------------------------------------------------
-(defun parse-explicit (spec)
- (ppcre:register-groups-bind
- ((#'parse-integer r g b) string)
- ("^([0-5]),([0-5]),([0-5]):(.+)$" spec)
- (return-from parse-explicit (cons string (rgb-code r g b))))
- (error 'malformed-explicit :spec spec))
-
-(defparameter *option-explicit*
- (adopt:make-option 'explicit
- :parameter "R,G,B:STRING"
- :help "Highlight STRING in an explicit color. May be given multiple times."
- :manual (format nil "~
- Highlight STRING in an explicit color instead of randomly choosing one. ~
- R, G, and B must be 0-5. STRING is treated as literal string, not a regex. ~
- Note that this doesn't automatically add STRING to the overall regex, you ~
- must do that yourself! This is a known bug that may be fixed in the future.")
- :long "explicit"
- :short #\e
- :key #'parse-explicit
- :reduce #'adopt:collect))
-```
-
-```lisp
-(adopt:define-string *help-text*
- "batchcolor takes a regular expression and matches it against standard ~
- input one line at a time. Each unique match is highlighted in its own color.~@
- ~@
- If the regular expression contains any capturing groups, only those parts of ~
- the matches will be highlighted. Otherwise the entire match will be ~
- highlighted. Overlapping capturing groups are not supported.")
-
-(adopt:define-string *extra-manual-text*
- "If no FILEs are given, standard input will be used. A file of - stands for ~
- standard input as well.~@
- ~@
- Overlapping capturing groups are not supported because it's not clear what ~
- the result should be. For example: what should ((f)oo|(b)oo) highlight when ~
- matched against 'foo'? Should it highlight 'foo' in one color? The 'f' in ~
- one color and 'oo' in another color? Should that 'oo' be the same color as ~
- the 'oo' in 'boo' even though the overall match was different? There are too ~
- many possible behaviors and no clear winner, so batchcolor disallows ~
- overlapping capturing groups entirely.")
-
-(defparameter *examples*
- '(("Colorize IRC nicknames in a chat log:"
- . "cat channel.log | batchcolor '<(\\\\w+)>'")
- ("Colorize UUIDs in a request log:"
- . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'")
- ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):"
- . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log")
- ("Colorize earmuffed symbols in a Lisp file:"
- . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp")))
-```
-
-```lisp
-(defparameter *ui*
- (adopt:make-interface
- :name "batchcolor"
- :usage "[OPTIONS] REGEX [FILE...]"
- :summary "colorize regex matches in batches"
- :help *help-text*
- :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*)
- :examples *examples*
- :contents (list
- *option-help*
- *option-version*
- *option-debug*
- *option-no-debug*
- (adopt:make-group 'color-options
- :title "Color Options"
- :options (list *option-randomize*
- *option-no-randomize*
- *option-dark*
- *option-light*
- *option-explicit*)))))
-```
-
-### Top-Level Interface
-
-```lisp
-(defmacro exit-on-ctrl-c (&body body)
- `(handler-case (with-user-abort:with-user-abort (progn ,@body))
- (with-user-abort:user-abort () (adopt:exit 130))))
-
-(defun configure (options)
- (loop :for (string . rgb) :in (gethash 'explicit options)
- :do (setf (gethash string *explicits*) rgb))
- (setf *start* (if (gethash 'randomize options)
- (random 256 (make-random-state t))
- 0)
- *dark* (gethash 'dark options)))
-
-(defun toplevel ()
- (sb-ext:disable-debugger)
- (exit-on-ctrl-c
- (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
- (when (gethash 'debug options)
- (sb-ext:enable-debugger))
- (handler-case
- (cond
- ((gethash 'help options) (adopt:print-help-and-exit *ui*))
- ((gethash 'version options) (write-line *version*) (adopt:exit))
- ((null arguments) (error 'missing-regex))
- (t (destructuring-bind (pattern . files) arguments
- (configure options)
- (run pattern files))))
- (user-error (e) (adopt:print-error-and-exit e))))))
-```
-
-
-
-## More Information
-
-* ieure link
-* dotfiles repo link
-
-
-
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/content/blog/2021/03/small-common-lisp-cli-programs.markdown Tue Mar 16 23:15:35 2021 -0400
@@ -0,0 +1,988 @@
+(:title "Writing Small CLI Programs in Common Lisp"
+ :snip "Somewhere between tiny shell scripts and full projects."
+ :date "2021-03-17T16:00:00Z"
+ :draft t)
+
+I write a lot of command-line programs. For tiny programs I usually go with the
+typical UNIX approach: throw together a half-assed shell script and move on.
+For large programs I make a full Common Lisp project, with an ASDF system
+definition and such. But there's a middle ground of small*ish* programs that
+don't warrant a full repository on their own, but for which I still want a real
+interface with proper `--help` and error handling.
+
+I've found Common Lisp to be a good language for writing these small command
+line programs. But it can be a little intimidating to get started (especially
+for beginners) because Common Lisp is a very flexible language and doesn't lock
+you into one way of working.
+
+In this post I'll describe how I write small, stand-alone command line programs
+in Common Lisp. It might work for you, or you might want to modify things to
+fit your own needs.
+
+<div id="toc"></div>
+
+## Requirements
+
+When you're writing programs in Common Lisp, you've got a lot of options.
+Laying out the requirements I have helped me decide on an approach.
+
+First: each new program should be one single file. A few other files for the
+collection as a whole (e.g. a `Makefile`) are okay, but once everything is set
+up creating a new program should mean adding one single file. For larger
+programs a full project directory and ASDF system are great, but for small
+programs having one file per program reduces the mental overhead quite a bit.
+
+The programs need to be able to be developed in the typical Common Lisp
+interactive style (in my case: with Swank and VLIME). Interactive development
+is one of the best parts of working in Common Lisp, and I'm not willing to give
+it up. In particular this means that a shell-script style approach, with
+`#!/path/to/sbcl --script` and the top and directly running code at the top
+level in the file, doesn't work for two main reasons:
+
+* `load`ing that file will fail due to the shebang unless you have some ugly
+ reader macros in your startup file.
+* The program will need to do things like parsing command-line arguments and
+ exiting with an error code, and calling `exit` would kill the Swank process.
+
+The programs need to be able to use libraries, so Quicklisp will need to be
+involved. Common Lisp has a lot of nice things built-in, but there are some
+libraries that are just too useful to pass up.
+
+The programs will need to have proper user interfaces. Command line arguments
+must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abcfoo -d`
+should work as expected), malformed or unknown options must be caught instead of
+dropping them on the floor, error messages should be meaningful, and the
+`--help` should be thoroughly and thoughtfully written so I can remember how to
+use the program months later. A `man` page is a nice bonus, but not required.
+
+Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp`
+and defines a package `foo` with a function called `toplevel`) is okay if it
+makes my life easier. These programs are just for me, so I don't have to worry
+about people wanting to create executables with spaces in the name or something.
+
+Portability between Common Lisp implementations is nice to have, but not
+required. If using a bit of SBCL-specific grease will let me avoid a bunch of
+extra dependencies, that's fine for these small personal programs.
+
+## Solution Skeleton
+
+After trying a number of different approaches I've settled on a solution that
+I'm pretty happy with. First I'll describe the general approach, then we'll
+look at one actual example program in its entirety.
+
+### Directory Structure
+
+I keep all my small single-file Common Lisp programs in a `lisp` directory
+inside my dotfiles repository. Its contents look like this:
+
+```
+…/dotfiles/lisp/
+ bin/
+ foo
+ bar
+ man/
+ man1/
+ foo.1
+ bar.1
+ build-binary.sh
+ build-manual.sh
+ Makefile
+ foo.lisp
+ bar.lisp
+```
+
+The `bin` directory is where the executable files end up. I've added it to my
+`$PATH` so I don't have to symlink or copy the binaries anywhere.
+
+`man` contains the generated `man` pages. Because it's adjacent to `bin` (which
+is on my path) the `man` program automatically finds the `man` pages as
+expected.
+
+`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make
+building programs easier.
+
+The `.lisp` files are the programs. Each new program I want to add only
+requires adding the `<programname>.lisp` file in this directory and running
+`make`.
+
+### Lisp Files
+
+My small Common Lisp programs follow a few conventions that make building them
+easier. Let's look at the skeleton of a `foo.lisp` file as an example. I'll
+show the entire file here, and then step through it piece by piece.
+
+```lisp
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (ql:quickload '(:with-user-abort …) :silent t))
+
+(defpackage :foo
+ (:use :cl)
+ (:export :toplevel *ui*))
+
+(in-package :foo)
+
+;;;; Configuration -----------------------------------------------
+(defparameter *whatever* 123)
+
+;;;; Errors ------------------------------------------------------
+(define-condition user-error (error) ())
+
+(define-condition missing-foo (user-error) ()
+ (:report "A foo is required, but none was supplied."))
+
+;;;; Functionality -----------------------------------------------
+(defun foo (string)
+ …)
+
+;;;; Run ---------------------------------------------------------
+(defun run (arguments)
+ (map nil #'foo arguments))
+
+;;;; User Interface ----------------------------------------------
+(defmacro exit-on-ctrl-c (&body body)
+ `(handler-case (with-user-abort:with-user-abort (progn ,@body))
+ (with-user-abort:user-abort () (sb-ext:exit :code 130))))
+
+(defparameter *ui*
+ (adopt:make-interface
+ :name "foo"
+ …))
+
+(defun toplevel ()
+ (sb-ext:disable-debugger)
+ (exit-on-ctrl-c
+ (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
+ … ; Handle options.
+ (handler-case (run arguments)
+ (user-error (e) (adopt:print-error-and-exit e))))))
+```
+
+Let's go through each chunk of this.
+
+```lisp
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (ql:quickload '(:with-user-abort …) :silent t))
+```
+
+First we `quickload` any necessary libraries. We always want to do this, even
+when compiling the file, because we need the appropriate packages to exist when
+we try to use their symbols later in the file.
+
+[with-user-abort][] is a library for easily handling `control-c`, which all of
+these small programs will use.
+
+[with-user-abort]: https://github.com/compufox/with-user-abort
+
+```lisp
+(defpackage :foo
+ (:use :cl)
+ (:export :toplevel *ui*))
+
+(in-package :foo)
+```
+
+Next we define a package `foo` and switch to it. The package is always named
+the same as the resulting binary and the basename of the file, and always
+exports the symbols `toplevel` and `*ui*`. These conventions make it easy to
+build everything automatically with `make` later.
+
+```lisp
+;;;; Configuration -----------------------------------------------
+(defparameter *whatever* 123)
+```
+
+Next we define any configuration variables. These will be set later after
+parsing the command line arguments (when we run the command line program) or
+at the REPL (when developing interactively).
+
+```lisp
+;;;; Errors ------------------------------------------------------
+(define-condition user-error (error) ())
+
+(define-condition missing-foo (user-error) ()
+ (:report "A foo is required, but none was supplied."))
+```
+
+We define a `user-error` condition, and any errors the user might make will
+inherit from it. This will make it easy to treat user errors (e.g. passing
+a mangled regular expression like `(foo+` as an argument) differently from
+programming errors (i.e. bugs). This makes it easier to treat those errors
+differently:
+
+* Bugs should print a backtrace or enter the debugger.
+* Expected user errors should print a helpful error message with no backtrace or debugger.
+
+```lisp
+;;;; Functionality -----------------------------------------------
+(defun foo (string)
+ …)
+```
+
+Next we have the actual functionality of the program.
+
+```lisp
+;;;; Run ---------------------------------------------------------
+(defun run (arguments)
+ (map nil #'foo arguments))
+```
+
+We define a function `run` that takes some arguments (as strings) and performs
+the main work of the program.
+
+Importantly, `run` does *not* handle command line argument parsing, and it does
+*not* exit the program with an error code, which means we can safely call it to
+say "run the whole program" when we're developing interactively without worrying
+about it killing our Lisp process.
+
+Now we need to define the command line interface.
+
+```lisp
+;;;; User Interface ----------------------------------------------
+(defmacro exit-on-ctrl-c (&body body)
+ `(handler-case (with-user-abort:with-user-abort (progn ,@body))
+ (with-user-abort:user-abort () (adopt:exit 130))))
+```
+
+We'll make a little macro around `with-user-abort` to make it less wordy. We'll
+[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the
+user presses `ctrl-c`. Maybe some day I'll pull this into Adopt so I don't have
+to copy these three lines everywhere.
+
+```lisp
+(defparameter *ui*
+ (adopt:make-interface
+ :name "foo"
+ …))
+```
+
+Here we define the `*ui*` variable whose symbol we exported above. [Adopt][] is
+a command line argument parsing library I wrote. If you want to use a different
+library, feel free.
+
+[Adopt]: https://docs.stevelosh.com/adopt
+
+```lisp
+(defun toplevel ()
+ (sb-ext:disable-debugger)
+ (exit-on-ctrl-c
+ (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
+ … ; Handle options.
+ (handler-case (run arguments)
+ (user-error (e) (adopt:print-error-and-exit e))))))
+```
+
+And finally we define the `toplevel` function. This will only ever be called
+when the program is run as a standalone program, never interactively. It
+handles all the work beyond the main guts of the program (which are handled by
+the `run` function), including:
+
+* Disabling or enabling the debugger.
+* Exiting the process with an appropriate status code on errors.
+* Parsing command line arguments.
+* Setting the values of the configuration parameters.
+* Calling `run`.
+
+That's it for the structure of the `.lisp` files.
+
+### Building Binaries
+
+`build-binary.sh` is a small script to build the executable binaries from the
+`.lisp` files. `./build-binary.sh foo.lisp` will build `foo`:
+
+```bash
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+LISP=$1
+NAME=$(basename "$1" .lisp)
+shift
+
+sbcl --load "$LISP" \
+ --eval "(sb-ext:save-lisp-and-die \"$NAME\"
+ :executable t
+ :save-runtime-options t
+ :toplevel '$NAME:toplevel)"
+```
+
+Here we see where the naming conventions have become important — we know that
+the package is named the same as the binary and that it will have the symbol
+`toplevel` exported, which always names the entry point for the binary.
+
+### Building Man Pages
+
+`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s
+built-in `man` page generation. If you don't care about building `man` pages
+for your personal programs you can ignore this. I admit that generating `man`
+pages for these programs is a little bit silly because they're only for my own
+personal use, but I get it for free with Adopt, so why not?
+
+```bash
+#!/usr/bin/env bash
+
+set -euo pipefail
+
+LISP=$1
+NAME=$(basename "$LISP" .lisp)
+OUT="$NAME.1"
+shift
+
+sbcl --load "$LISP" \
+ --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede)
+ (adopt:print-manual $NAME:*ui* :stream f))" \
+ --quit
+```
+
+This is why we always name the Adopt interface variable `*ui*` and export it
+from the package.
+
+### Makefile
+
+Finally we have a simple `Makefile` so we can run `make` to regenerate any
+out of date binaries and `man` pages:
+
+```make
+files := $(wildcard *.lisp)
+names := $(files:.lisp=)
+
+.PHONY: all clean $(names)
+
+all: $(names)
+
+$(names): %: bin/% man/man1/%.1
+
+bin/%: %.lisp build-binary.sh Makefile
+ mkdir -p bin
+ ./build-binary.sh $<
+ mv $(@F) bin/
+
+man/man1/%.1: %.lisp build-manual.sh Makefile
+ mkdir -p man/man1
+ ./build-manual.sh $<
+ mv $(@F) man/man1/
+
+clean:
+ rm -rf bin man
+```
+
+We use a `wildcard` to automatically find the `.lisp` files so we don't have to
+do anything extra after adding a new file when we want to make a new program.
+
+The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses
+a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern)
+to automatically define the phony rules for building each program. If
+`$(names)` is `foo bar` this line effectively defines two phony rules:
+
+```
+foo: bin/foo man/man1/foo.1
+bar: bin/bar man/man1/bar.1
+```
+
+This lets us run `make foo` to make both the binary and `man` page for
+`foo.lisp`.
+
+## Case Study: A Batch Coloring Utility
+
+Now that we've seen the skeleton, let's look at one of my actual programs that
+I use all the time. It's called `batchcolor` and it's used to highlight regular
+expression matches in text (usually log files) with a twist: each unique match
+is highlighted in a separate color, which makes it easier to visually parse the
+result.
+
+For example: suppose we have some log files with lines of the form `<timestamp>
+[<request ID>] <level> <message>` where request ID is a UUID, and messages might
+contain other UUIDs for various things. Such a log file might look something
+like this:
+
+```
+2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/
+2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving...
+2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request.
+2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404.
+2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/
+2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving...
+2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/
+2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200.
+2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused.
+2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500.
+```
+
+If I try to just read this directly, it's easy for my eyes to glaze over unless
+I laboriously walk line-by-line.
+
+[![Screenshot of uncolored log output](/static/images/blog/2021/03/uncolored.png)](/static/images/blog/2021/03/uncolored.png)
+
+I could use `grep` to highlight the UUIDs:
+
+ grep -P \
+ '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \
+ example.log
+
+Unfortunately that doesn't really help too much because all the UUIDs are
+highlighted the same color:
+
+[![Screenshot of grep-colored log output](/static/images/blog/2021/03/grepcolored.png)](/static/images/blog/2021/03/grepcolored.png)
+
+To get a more readable version of the log, I use `batchcolor`:
+
+ batchcolor \
+ '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \
+ example.log
+
+`batchcolor` also highlights matches, but it highlights each unique match in its
+own color:
+
+[![Screenshot of batchcolored log output](/static/images/blog/2021/03/batchcolored.png)](/static/images/blog/2021/03/batchcolored.png)
+
+This is *much* easier for me to visually parse. The interleaving of separate
+request logs is now obvious from the colors of the IDs, and it's easy to match
+up various user IDs and thing IDs at a glance. Did you even notice that the two
+thing IDs were different before?
+
+`batchcolor` has a few other quality of life features, like picking explicit
+colors for specific strings (e.g. red for `ERR`):
+
+[![Screenshot of fully batchcolored log output](/static/images/blog/2021/03/batchcoloredfull.png)](/static/images/blog/2021/03/batchcoloredfull.png)
+
+I use this particular `batchcolor` invocation so often I've put it in its own
+tiny shell script. I use it to `tail` log files when developing locally almost
+every day, and it makes visually scanning the log output *much* easier. It can
+come in handy for other kinds of text too, like highlighting nicknames in an IRC
+log.
+
+Let's step through its code piece by piece.
+
+### Libraries
+
+```lisp
+(eval-when (:compile-toplevel :load-toplevel :execute)
+ (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t))
+```
+
+First we `quickload` libraries. We'll use [Adopt][] for command line argument
+processing, [cl-ppcre][] for regular expressions, and the previously-mentioned
+[with-user-abort][] to handle `control-c`.
+
+[cl-ppcre]: http://edicl.github.io/cl-ppcre/
+
+### Package
+
+```lisp
+(defpackage :batchcolor
+ (:use :cl)
+ (:export :toplevel :*ui*))
+
+(in-package :batchcolor)
+```
+
+We define and switch to the appropriately-named package. Nothing special here.
+
+### Configuration
+
+```lisp
+;;;; Configuration ------------------------------------------------------------
+(defparameter *start* 0)
+(defparameter *dark* t)
+```
+
+Next we `defparameter` some variables to hold some settings. `*start*` will be
+used later when randomizing colors, don't worry about it for now.
+
+### Errors
+
+```lisp
+;;;; Errors -------------------------------------------------------------------
+(define-condition user-error (error) ())
+
+(define-condition missing-regex (user-error) ()
+ (:report "A regular expression is required."))
+
+(define-condition malformed-regex (user-error)
+ ((underlying-error :initarg :underlying-error))
+ (:report (lambda (c s)
+ (format s "Invalid regex: ~A" (slot-value c 'underlying-error)))))
+
+(define-condition overlapping-groups (user-error) ()
+ (:report "Invalid regex: seems to contain overlapping capturing groups."))
+
+(define-condition malformed-explicit (user-error)
+ ((spec :initarg :spec))
+ (:report
+ (lambda (c s)
+ (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5."
+ (slot-value c 'spec)))))
+```
+
+Here we define the user errors. Some of these are self-explanatory, while
+others will make more sense later once we see them in action. The specific
+details aren't as important as the overall idea: for user errors we know might
+happen, display a helpful error message instead of just spewing a backtrace at
+the user.
+
+### Colorization
+
+Next we have the actual meat of the program. Obviously this is going to be
+completely different for every program, so feel free to skip this if you don't
+care about this specific problem.
+
+```lisp
+;;;; Functionality ------------------------------------------------------------
+(defun rgb-code (r g b)
+ ;; The 256 color mode color values are essentially r/g/b in base 6, but
+ ;; shifted 16 higher to account for the intiial 8+8 colors.
+ (+ (* r 36)
+ (* g 6)
+ (* b 1)
+ 16))
+```
+
+We're going to highlight different matches with different colors. We'll need
+a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI
+colors isn't enough. Full 24-bit truecolor is overkill, but the 8-bit ANSI
+colors will work nicely. If we ignore the base colors, we essentially have
+6 x 6 x 6 = 216 colors to work with. `rgb-code` will take the red, green, and
+blue values from `0` to `5` and return the color code. See [Wikipedia][8bit]
+for more information.
+
+[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit
+
+```lisp
+(defun make-colors (excludep)
+ (let ((result (make-array 256 :fill-pointer 0)))
+ (dotimes (r 6)
+ (dotimes (g 6)
+ (dotimes (b 6)
+ (unless (funcall excludep (+ r g b))
+ (vector-push-extend (rgb-code r g b) result)))))
+ result))
+
+(defparameter *dark-colors* (make-colors (lambda (v) (< v 3))))
+(defparameter *light-colors* (make-colors (lambda (v) (> v 11))))
+```
+
+Now we can build some arrays of colors. We *could* use any of the 216 available
+colors, but in practice we probably don't want to, because the darkest colors
+will be too dark to read on a dark terminal, and vice versa for light terminals.
+In a concession to practicality we'll generate two separate arrays of colors,
+one that excludes colors whose total value is too dark and one excluding those
+that are too light.
+
+(Notice that `*dark-colors*` is "the array of colors which are suitable for use
+on dark terminals" and not "the array of colors which are *themselves* dark".
+Naming things is hard.)
+
+Note that these arrays will be generated when the `batchcolor.lisp` file is
+`load`ed, which is *when we build the binary*. They *won't* be recomputed every
+time you run the resulting binary. In this case it doesn't really matter (the
+arrays are small) but it's worth remembering in case you ever have some data you
+want (or don't want) to compute at build time instead of run time.
+
+```lisp
+(defparameter *explicits* (make-hash-table :test #'equal))
+```
+
+Here we make a hash table to store the strings and colors for strings we want to
+explicitly color (e.g. `ERR` should be red, `INFO` cyan). The keys will be the
+strings and values the RGB codes.
+
+```lisp
+(defun djb2 (string)
+ ;; http://www.cse.yorku.ca/~oz/hash.html
+ (reduce (lambda (hash c)
+ (mod (+ (* 33 hash) c) (expt 2 64)))
+ string
+ :initial-value 5381
+ :key #'char-code))
+
+(defun find-color (string)
+ (gethash string *explicits*
+ (let ((colors (if *dark* *dark-colors* *light-colors*)))
+ (aref colors
+ (mod (+ (djb2 string) *start*)
+ (length colors))))))
+```
+
+For strings that we want to explicitly color, we just look up the appropriate
+code in `*explicits*` and return it.
+
+Otherwise, we want to highlight unique matches in different colors. There are
+a number of different ways we could do this, for example: we could randomly pick
+a color the first time we see a string and store it in a hash table for
+subsequent encounters. But this would mean we'd grow that hash table over time,
+and one of the things I often use this utility for is `tail -f`ing long-running
+processes when developing locally, so the memory usage would grow and grow until
+the `batchcolor` process was restarted, which isn't ideal.
+
+Instead, we'll hash each string with a simple [DJB hash][djb] and use it to
+index into the appropriate array of colors. This ensures that identical matches
+get identical colors, and avoids having to store every match we've ever seen.
+
+There will be some collisions, but there's not much we can do about that with
+only ~200 colors to work with. We could have used 16-bit colors like
+I mentioned before, but then we'd have to worry about picking colors different
+enough for humans to easily tell apart, and for this simple utility I didn't
+want to bother.
+
+We'll talk about `*start*` later, ignore it for now (it's `0` by default).
+
+[djb]: http://www.cse.yorku.ca/~oz/hash.html
+
+```lisp
+(defun ansi-color-start (color)
+ (format nil "~C[38;5;~Dm" #\Escape color))
+
+(defun ansi-color-end ()
+ (format nil "~C[0m" #\Escape))
+
+(defun print-colorized (string)
+ (format *standard-output* "~A~A~A"
+ (ansi-color-start (find-color string))
+ string
+ (ansi-color-end)))
+```
+
+Next we have some functions to output the appropriate ANSI escapes to highlight
+our matches. We could use a library for this but it's only two lines. [It's
+not worth it](http://xn--rpa.cc/irl/term.html).
+
+And now we have the beating heart of the program:
+
+```lisp
+(defun colorize-line (scanner line &aux (start 0))
+ (ppcre:do-scans (ms me rs re scanner line)
+ ;; If we don't have any register groups, colorize the entire match.
+ ;; Otherwise, colorize each matched capturing group.
+ (let* ((regs? (plusp (length rs)))
+ (starts (if regs? (remove nil rs) (list ms)))
+ (ends (if regs? (remove nil re) (list me))))
+ (map nil (lambda (word-start word-end)
+ (unless (<= start word-start)
+ (error 'overlapping-groups))
+ (write-string line *standard-output* :start start :end word-start)
+ (print-colorized (subseq line word-start word-end))
+ (setf start word-end))
+ starts ends)))
+ (write-line line *standard-output* :start start))
+```
+
+`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with
+any of the desired matches colorized appropriately. There are a few things to
+note here.
+
+First: if the regular expression contains any capturing groups, we'll only
+colorize those parts of the match. For example: if you run `batchcolor
+'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves
+will be highlighted, not the surrounding angle brackets. Otherwise, if there
+are no capturing groups in the regular expression, we'll highlight the entire
+match (as if there were one big capturing group around the whole thing).
+
+Second: overlapping capturing groups are explicitly disallowed and
+a `user-error` signaled if we notice any. It's not clear what do to in this
+case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be?
+Highlight `f` and `oo` in the same color? In different colors? Should the `oo`
+be a different color than the `oo` in `boo`? There's too many options with no
+clear winner, so we'll just tell the user to be more clear.
+
+To do the actual work we iterate over each match and print the non-highlighted
+text before the match, then print the highlighted match. Finally we print any
+remaining text after the last match.
+
+### Not-Quite-Top-Level Interface
+
+```lisp
+;;;; Run ----------------------------------------------------------------------
+(defun run% (scanner stream)
+ (loop :for line = (read-line stream nil)
+ :while line
+ :do (colorize-line scanner line)))
+
+(defun run (pattern paths)
+ (let ((scanner (handler-case (ppcre:create-scanner pattern)
+ (ppcre:ppcre-syntax-error (c)
+ (error 'malformed-regex :underlying-error c))))
+ (paths (or paths '("-"))))
+ (dolist (path paths)
+ (if (string= "-" path)
+ (run% scanner *standard-input*)
+ (with-open-file (stream path :direction :input)
+ (run% scanner stream))))))
+```
+
+Here we have the not-quite-top-level interface to the program. `run` takes
+a pattern string and a list of paths and runs the colorization on each path.
+This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>"
+"foo.txt")`, so we can test without worrying about killing the Lisp process.
+
+### User Interface
+
+In the last chunk of the file we have the user interface. There are a couple of
+things to note here.
+
+I'm using a command line argument parsing library I wrote myself: [Adopt][].
+I won't go over exactly what all the various Adopt functions do. Most of them
+should be fairly easy to understand, but [check out the Adopt
+documentation][adopt-usage] for the full story if you're curious.
+
+[adopt-usage]: https://docs.stevelosh.com/adopt/usage/
+
+If you prefer another library (and there are quite a few around) feel free
+to use it — it should be pretty easy to adapt this setup to a different library.
+The only things you'd need to change would be the `toplevel` function and the
+`build-manual.sh` script (if you even care about building `man` pages at all).
+
+You might also notice that the user interface for the program is almost as much
+code as the entire rest of the program. This may seem strange, but I think it
+makes a certain kind of sense. When you're writing code to interface with an
+external system, a messier and more complicated external system will usually
+require more code than a cleaner and simpler external system. A human brain is
+probably the messiest and most complicated external system you'll ever have to
+deal with, so it's worth taking the extra time and code to be especially careful
+when writing an interface to it.
+
+First we'll define a typical `-h`/`--help` option:
+
+```lisp
+(defparameter *option-help*
+ (adopt:make-option 'help
+ :help "Display help and exit."
+ :long "help"
+ :short #\h
+ :reduce (constantly t)))
+```
+
+Next we'll define a pair of options for enabling/disabling the Lisp debugger:
+
+```lisp
+(adopt:defparameters (*option-debug* *option-no-debug*)
+ (adopt:make-boolean-options 'debug
+ :long "debug"
+ :short #\d
+ :help "Enable the Lisp debugger."
+ :help-no "Disable the Lisp debugger (the default)."))
+```
+
+By default the debugger will be off, so any unexpected error will print
+a backtrace to standard error and exit with a nonzero exit code. This is the
+default because if I add a `batchcolor` somewhere in a shell script, I probably
+don't want to suddenly hang the entire script if something breaks. But we still
+want to be *able* to get into the debugger manually if something goes wrong.
+This is Common Lisp — we don't have to settle for a stack trace or core dump, we
+can have a real interactive debugger in the final binary.
+
+Note how Adopt's `make-boolean-options` function creates *two* options here:
+
+* `-d`/`--debug` will enable the debugger.
+* `-D`/`--no-debug` will disable the debugger.
+
+Even though *disabled* is the default, it's still important to have both
+switches for boolean options like this. If someone wants the debugger to be
+*enabled* by default instead (along with some other configuration options), they
+might have a shell alias like this:
+
+ alias bcolor='batchcolor --debug --foo --bar'
+
+But sometimes they might want to temporarily *disable* the debugger for a single
+run. Without a `--no-debug` option, they would have to run the vanilla
+`batchcolor` and retype all the *other* options. But having the `--no-debug`
+option allows them to just say:
+
+ bcolor --no-debug
+
+This would expand to:
+
+ batchcolor --debug --foo --bar --no-debug
+
+The later option wins, and the user gets the behavior they expect.
+
+Next we'll define some color-related options. First an option to randomize the
+colors each run, instead of always picking the same color for a particular
+string, and then a toggle for choosing colors that work for dark or light
+terminals:
+
+```lisp
+(adopt:defparameters (*option-randomize* *option-no-randomize*)
+ (adopt:make-boolean-options 'randomize
+ :help "Randomize the choice of color each run."
+ :help-no "Do not randomize the choice of color each run (the default)."
+ :long "randomize"
+ :short #\r))
+
+(adopt:defparameters (*option-dark* *option-light*)
+ (adopt:make-boolean-options 'dark
+ :name-no 'light
+ :long "dark"
+ :long-no "light"
+ :help "Optimize for dark terminals (the default)."
+ :help-no "Optimize for light terminals."
+ :initial-value t))
+```
+
+The last option we'll define is `-e`/`--explicit`, to allow the user to select
+an explicit color for a particular string:
+
+```lisp
+(defun parse-explicit (spec)
+ (ppcre:register-groups-bind
+ ((#'parse-integer r g b) string)
+ ("^([0-5]),([0-5]),([0-5]):(.+)$" spec)
+ (return-from parse-explicit (cons string (rgb-code r g b))))
+ (error 'malformed-explicit :spec spec))
+
+(defparameter *option-explicit*
+ (adopt:make-option 'explicit
+ :parameter "R,G,B:STRING"
+ :help "Highlight STRING in an explicit color. May be given multiple times."
+ :manual (format nil "~
+ Highlight STRING in an explicit color instead of randomly choosing one. ~
+ R, G, and B must be 0-5. STRING is treated as literal string, not a regex. ~
+ Note that this doesn't automatically add STRING to the overall regex, you ~
+ must do that yourself! This is a known bug that may be fixed in the future.")
+ :long "explicit"
+ :short #\e
+ :key #'parse-explicit
+ :reduce #'adopt:collect))
+```
+
+Notice how we signal a `malformed-explicit` condition if the user gives us
+mangled text. This is a subtype of `user-error`, so the program will print the
+error and exit even if the debugger is enabled. We also include a slightly more
+verbose description in the `man` page than the terse one in the `--help` text.
+
+Next we write the main help and manual text, as well as some real-world
+examples:
+
+```lisp
+(adopt:define-string *help-text*
+ "batchcolor takes a regular expression and matches it against standard ~
+ input one line at a time. Each unique match is highlighted in its own color.~@
+ ~@
+ If the regular expression contains any capturing groups, only those parts of ~
+ the matches will be highlighted. Otherwise the entire match will be ~
+ highlighted. Overlapping capturing groups are not supported.")
+
+(adopt:define-string *extra-manual-text*
+ "If no FILEs are given, standard input will be used. A file of - stands for ~
+ standard input as well.~@
+ ~@
+ Overlapping capturing groups are not supported because it's not clear what ~
+ the result should be. For example: what should ((f)oo|(b)oo) highlight when ~
+ matched against 'foo'? Should it highlight 'foo' in one color? The 'f' in ~
+ one color and 'oo' in another color? Should that 'oo' be the same color as ~
+ the 'oo' in 'boo' even though the overall match was different? There are too ~
+ many possible behaviors and no clear winner, so batchcolor disallows ~
+ overlapping capturing groups entirely.")
+
+(defparameter *examples*
+ '(("Colorize IRC nicknames in a chat log:"
+ . "cat channel.log | batchcolor '<(\\\\w+)>'")
+ ("Colorize UUIDs in a request log:"
+ . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'")
+ ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):"
+ . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log")
+ ("Colorize earmuffed symbols in a Lisp file:"
+ . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp")))
+```
+
+Finally we can wire everything together in the main Adopt interface:
+
+```lisp
+(defparameter *ui*
+ (adopt:make-interface
+ :name "batchcolor"
+ :usage "[OPTIONS] REGEX [FILE...]"
+ :summary "colorize regex matches in batches"
+ :help *help-text*
+ :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*)
+ :examples *examples*
+ :contents (list
+ *option-help*
+ *option-debug*
+ *option-no-debug*
+ (adopt:make-group 'color-options
+ :title "Color Options"
+ :options (list *option-randomize*
+ *option-no-randomize*
+ *option-dark*
+ *option-light*
+ *option-explicit*)))))
+```
+
+All that's left to do is the top-level function that will be called when the
+binary is executed.
+
+### Top-Level Interface
+
+Before we write `toplevel` we've got a couple of helpers:
+
+```lisp
+(defmacro exit-on-ctrl-c (&body body)
+ `(handler-case (with-user-abort:with-user-abort (progn ,@body))
+ (with-user-abort:user-abort () (adopt:exit 130))))
+
+(defun configure (options)
+ (loop :for (string . rgb) :in (gethash 'explicit options)
+ :do (setf (gethash string *explicits*) rgb))
+ (setf *start* (if (gethash 'randomize options)
+ (random 256 (make-random-state t))
+ 0)
+ *dark* (gethash 'dark options)))
+```
+
+Our `toplevel` function looks much like the one in the skeleton, but fleshed out
+a bit more:
+
+```lisp
+(defun toplevel ()
+ (sb-ext:disable-debugger)
+ (exit-on-ctrl-c
+ (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
+ (when (gethash 'debug options)
+ (sb-ext:enable-debugger))
+ (handler-case
+ (cond
+ ((gethash 'help options) (adopt:print-help-and-exit *ui*))
+ ((null arguments) (error 'missing-regex))
+ (t (destructuring-bind (pattern . files) arguments
+ (configure options)
+ (run pattern files))))
+ (user-error (e) (adopt:print-error-and-exit e))))))
+```
+
+This `toplevel` has a few extra bits beyond the skeletal example.
+
+First, we disable the debugger immediately, and then re-enable it later if the
+user asks us to. We want to keep it disabled until *after* argument parsing
+because we can't know whether the user wants it or not until we parse the
+arguments.
+
+Instead of just blindly running `run`, we check for `--help` and print it if
+desired. We also validate that the user passes the correct amount of arguments,
+signaling a subtype of `user-error` if they don't. Assuming everything looks
+good we handle the configuration, call `run`, and that's it!
+
+Running `make` generates `bin/batchcolor` and `man/man1/batchcolor.1`, and we
+can view our log files in beautiful color.
+
+## More Information
+
+I hope this overview was helpful. This has worked for me, but Common Lisp is
+a flexible language, so if you want to use this layout as a starting point and
+modify it for your own needs, go for it!
+
+If you want to see some more examples you can find them in [my dotfiles
+repository](https://hg.stevelosh.com/dotfiles/file/tip/lisp). Some of the more
+fun ones include:
+
+* `weather` for displaying the weather over the next few hours so I can tell if
+ I need a jacket or umbrella before I go out for a walk.
+* `retry` to retry shell commands if they fail, with options for how many times
+ to retry, strategies for waiting/backing off on failure, etc.
+* `pick` to interactively filter the output of one command into another
+ (inspired by the `pick` program in "The UNIX Programming Environment" but with
+ more options).
+
+The approach I laid out in this post works well for small, single-file programs.
+If you're creating a larger program you'll probably want to move to a full ASDF
+system in its own directory/repository. My friend Ian [wrote a post about
+that](http://atomized.org/blog/2020/07/06/common-lisp-in-practice/) which you
+might find interesting.