content/blog/2021/03/small-common-lisp-cli-programs.markdown @ 2d0b4dae45a0

Draft
author Steve Losh <steve@stevelosh.com>
date Fri, 01 Oct 2021 13:40:51 -0400
parents fc26c1c090dd
children (none)
(:title "Writing Small CLI Programs in Common Lisp"
 :snip "Somewhere between tiny shell scripts and full projects."
 :date "2021-03-17T16:10:00Z"
 :draft nil)

I write a lot of command-line programs.  For tiny programs I usually go with the
typical UNIX approach: throw together a half-assed shell script and move on.
For large programs I make a full Common Lisp project, with an ASDF system
definition and such.  But there's a middle ground of small*ish* programs that
don't warrant a full repository on their own, but for which I still want a real
interface with proper `--help` and error handling.

I've found Common Lisp to be a good language for writing these small command
line programs.  But it can be a little intimidating to get started (especially
for beginners) because Common Lisp is a very flexible language and doesn't lock
you into one way of working.

In this post I'll describe how I write small, stand-alone command line programs
in Common Lisp.  It might work for you, or you might want to modify things to
fit your own needs.

<div id="toc"></div>

## Requirements

When you're writing programs in Common Lisp, you've got a lot of options.
Laying out the requirements I have helped me decide on an approach.

First: each new program should be one single file.  A few other files for the
collection as a whole (e.g. a `Makefile`) are okay, but once everything is set
up creating a new program should mean adding one single file.  For larger
programs a full project directory and ASDF system are great, but for small
programs having one file per program reduces the mental overhead quite a bit.

The programs need to be able to be developed in the typical Common Lisp
interactive style (in my case: with Swank and VLIME).  Interactive development
is one of the best parts of working in Common Lisp, and I'm not willing to give
it up.  In particular this means that a shell-script style approach, with
`#!/path/to/sbcl --script` and the top and directly running code at the top
level in the file, doesn't work for two main reasons:

* `load`ing that file will fail due to the shebang unless you have some ugly
  reader macros in your startup file.
* The program will need to do things like parsing command-line arguments and
  exiting with an error code, and calling `exit` would kill the Swank process.

The programs need to be able to use libraries, so Quicklisp will need to be
involved.  Common Lisp has a lot of nice things built-in, but there are some
libraries that are just too useful to pass up.

The programs will need to have proper user interfaces.  Command line arguments
must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abcfoo -d`
should work as expected), malformed or unknown options must be caught instead of
dropping them on the floor, error messages should be meaningful, and the
`--help` should be thoroughly and thoughtfully written so I can remember how to
use the program months later.  A `man` page is a nice bonus, but not required.

Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp`
and defines a package `foo` with a function called `toplevel`) is okay if it
makes my life easier.  These programs are just for me, so I don't have to worry
about people wanting to create executables with spaces in the name or something.

Portability between Common Lisp implementations is nice to have, but not
required.  If using a bit of SBCL-specific grease will let me avoid a bunch of
extra dependencies, that's fine for these small personal programs.

## Solution Skeleton

After trying a number of different approaches I've settled on a solution that
I'm pretty happy with.  First I'll describe the general approach, then we'll
look at one actual example program in its entirety.

### Directory Structure

I keep all my small single-file Common Lisp programs in a `lisp` directory
inside my dotfiles repository.  Its contents look like this:

```
…/dotfiles/lisp/
    bin/
        foo
        bar
    man/
        man1/
            foo.1
            bar.1
    build-binary.sh
    build-manual.sh
    Makefile
    foo.lisp
    bar.lisp
```

The `bin` directory is where the executable files end up.  I've added it to my
`$PATH` so I don't have to symlink or copy the binaries anywhere.

`man` contains the generated `man` pages.  Because it's adjacent to `bin` (which
is on my path) the `man` program automatically finds the `man` pages as
expected.

`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make
building programs easier.

The `.lisp` files are the programs.  Each new program I want to add only
requires adding the `<programname>.lisp` file in this directory and running
`make`.

### Lisp Files

My small Common Lisp programs follow a few conventions that make building them
easier.  Let's look at the skeleton of a `foo.lisp` file as an example.  I'll
show the entire file here, and then step through it piece by piece.

```lisp
(eval-when (:compile-toplevel :load-toplevel :execute)
  (ql:quickload '(:with-user-abort …) :silent t))

(defpackage :foo
  (:use :cl)
  (:export :toplevel *ui*))

(in-package :foo)

;;;; Configuration -----------------------------------------------
(defparameter *whatever* 123)

;;;; Errors ------------------------------------------------------
(define-condition user-error (error) ())

(define-condition missing-foo (user-error) ()
  (:report "A foo is required, but none was supplied."))

;;;; Functionality -----------------------------------------------
(defun foo (string)
  …)

;;;; Run ---------------------------------------------------------
(defun run (arguments)
  (map nil #'foo arguments))

;;;; User Interface ----------------------------------------------
(defmacro exit-on-ctrl-c (&body body)
  `(handler-case (with-user-abort:with-user-abort (progn ,@body))
     (with-user-abort:user-abort () (sb-ext:exit :code 130))))

(defparameter *ui*
  (adopt:make-interface
    :name "foo"
    …))

(defun toplevel ()
  (sb-ext:disable-debugger)
  (exit-on-ctrl-c
    (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
      … ; Handle options.
      (handler-case (run arguments)
        (user-error (e) (adopt:print-error-and-exit e))))))
```

Let's go through each chunk of this.

```lisp
(eval-when (:compile-toplevel :load-toplevel :execute)
  (ql:quickload '(:with-user-abort …) :silent t))
```

First we `quickload` any necessary libraries.  We always want to do this, even
when compiling the file, because we need the appropriate packages to exist when
we try to use their symbols later in the file.

[with-user-abort][] is a library for easily handling `control-c`, which all of
these small programs will use.

[with-user-abort]: https://github.com/compufox/with-user-abort

```lisp
(defpackage :foo
  (:use :cl)
  (:export :toplevel *ui*))

(in-package :foo)
```

Next we define a package `foo` and switch to it.  The package is always named
the same as the resulting binary and the basename of the file, and always
exports the symbols `toplevel` and `*ui*`.  These conventions make it easy to
build everything automatically with `make` later.

```lisp
;;;; Configuration -----------------------------------------------
(defparameter *whatever* 123)
```

Next we define any configuration variables.  These will be set later after
parsing the command line arguments (when we run the command line program) or
at the REPL (when developing interactively).

```lisp
;;;; Errors ------------------------------------------------------
(define-condition user-error (error) ())

(define-condition missing-foo (user-error) ()
  (:report "A foo is required, but none was supplied."))
```

We define a `user-error` condition, and any errors the user might make will
inherit from it.  This will make it easy to treat user errors (e.g. passing
a mangled regular expression like `(foo+` as an argument) differently from
programming errors (i.e. bugs).  This makes it easier to treat those errors
differently:

* Bugs should print a backtrace or enter the debugger.
* Expected user errors should print a helpful error message with no backtrace or debugger.

```lisp
;;;; Functionality -----------------------------------------------
(defun foo (string)
  …)
```

Next we have the actual functionality of the program.

```lisp
;;;; Run ---------------------------------------------------------
(defun run (arguments)
  (map nil #'foo arguments))
```

We define a function `run` that takes some arguments (as strings) and performs
the main work of the program.

Importantly, `run` does *not* handle command line argument parsing, and it does
*not* exit the program with an error code, which means we can safely call it to
say "run the whole program" when we're developing interactively without worrying
about it killing our Lisp process.

Now we need to define the command line interface.

```lisp
;;;; User Interface ----------------------------------------------
(defmacro exit-on-ctrl-c (&body body)
  `(handler-case (with-user-abort:with-user-abort (progn ,@body))
     (with-user-abort:user-abort () (adopt:exit 130))))
```

We'll make a little macro around `with-user-abort` to make it less wordy.  We'll
[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the
user presses `ctrl-c`.  Maybe some day I'll pull this into Adopt so I don't have
to copy these three lines everywhere.

```lisp
(defparameter *ui*
  (adopt:make-interface
    :name "foo"
    …))
```

Here we define the `*ui*` variable whose symbol we exported above.  [Adopt][] is
a command line argument parsing library I wrote.  If you want to use a different
library, feel free.

[Adopt]: https://docs.stevelosh.com/adopt

```lisp
(defun toplevel ()
  (sb-ext:disable-debugger)
  (exit-on-ctrl-c
    (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
      … ; Handle options.
      (handler-case (run arguments)
        (user-error (e) (adopt:print-error-and-exit e))))))
```

And finally we define the `toplevel` function.  This will only ever be called
when the program is run as a standalone program, never interactively.  It
handles all the work beyond the main guts of the program (which are handled by
the `run` function), including:

* Disabling or enabling the debugger.
* Exiting the process with an appropriate status code on errors.
* Parsing command line arguments.
* Setting the values of the configuration parameters.
* Calling `run`.

That's it for the structure of the `.lisp` files.

### Building Binaries

`build-binary.sh` is a small script to build the executable binaries from the
`.lisp` files.  `./build-binary.sh foo.lisp` will build `foo`:

```bash
#!/usr/bin/env bash

set -euo pipefail

LISP=$1
NAME=$(basename "$1" .lisp)
shift

sbcl --load "$LISP" \
     --eval "(sb-ext:save-lisp-and-die \"$NAME\"
               :executable t
               :save-runtime-options t
               :toplevel '$NAME:toplevel)"
```

Here we see where the naming conventions have become important — we know that
the package is named the same as the binary and that it will have the symbol
`toplevel` exported, which always names the entry point for the binary.

### Building Man Pages

`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s
built-in `man` page generation.  If you don't care about building `man` pages
for your personal programs you can ignore this.  I admit that generating `man`
pages for these programs is a little bit silly because they're only for my own
personal use, but I get it for free with Adopt, so why not?

```bash
#!/usr/bin/env bash

set -euo pipefail

LISP=$1
NAME=$(basename "$LISP" .lisp)
OUT="$NAME.1"
shift

sbcl --load "$LISP" \
     --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede)
               (adopt:print-manual $NAME:*ui* :stream f))" \
     --quit
```

This is why we always name the Adopt interface variable `*ui*` and export it
from the package.

### Makefile

Finally we have a simple `Makefile` so we can run `make` to regenerate any
out of date binaries and `man` pages:

```make
files := $(wildcard *.lisp)
names := $(files:.lisp=)

.PHONY: all clean $(names)

all: $(names)

$(names): %: bin/% man/man1/%.1

bin/%: %.lisp build-binary.sh Makefile
	mkdir -p bin
	./build-binary.sh $<
	mv $(@F) bin/

man/man1/%.1: %.lisp build-manual.sh Makefile
	mkdir -p man/man1
	./build-manual.sh $<
	mv $(@F) man/man1/

clean:
	rm -rf bin man
```

We use a `wildcard` to automatically find the `.lisp` files so we don't have to
do anything extra after adding a new file when we want to make a new program.

The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses
a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern)
to automatically define the phony rules for building each program.  If
`$(names)` is `foo bar` this line effectively defines two phony rules:

```
foo: bin/foo man/man1/foo.1
bar: bin/bar man/man1/bar.1
```

This lets us run `make foo` to make both the binary and `man` page for
`foo.lisp`.

## Case Study: A Batch Coloring Utility

Now that we've seen the skeleton, let's look at one of my actual programs that
I use all the time.  It's called `batchcolor` and it's used to highlight regular
expression matches in text (usually log files) with a twist: each unique match
is highlighted in a separate color, which makes it easier to visually parse the
result.

For example: suppose we have some log files with lines of the form `<timestamp>
[<request ID>] <level> <message>` where request ID is a UUID, and messages might
contain other UUIDs for various things.  Such a log file might look something
like this:

```
2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/
2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving...
2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request.
2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404.
2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/
2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving...
2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/
2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200.
2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused.
2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500.
```

If I try to just read this directly, it's easy for my eyes to glaze over unless
I laboriously walk line-by-line.

[![Screenshot of uncolored log output](/static/images/blog/2021/03/uncolored.png)](/static/images/blog/2021/03/uncolored.png)

I could use `grep` to highlight the UUIDs:

    grep -P \
        '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \
        example.log

Unfortunately that doesn't really help too much because all the UUIDs are
highlighted the same color:

[![Screenshot of grep-colored log output](/static/images/blog/2021/03/grepcolored.png)](/static/images/blog/2021/03/grepcolored.png)

To get a more readable version of the log, I use `batchcolor`:

    batchcolor \
        '\b[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}\b' \
        example.log

`batchcolor` also highlights matches, but it highlights each unique match in its
own color:

[![Screenshot of batchcolored log output](/static/images/blog/2021/03/batchcolored.png)](/static/images/blog/2021/03/batchcolored.png)

This is *much* easier for me to visually parse.  The interleaving of separate
request logs is now obvious from the colors of the IDs, and it's easy to match
up various user IDs and thing IDs at a glance.  Did you even notice that the two
thing IDs were different before?

`batchcolor` has a few other quality of life features, like picking explicit
colors for specific strings (e.g. red for `ERR`):

[![Screenshot of fully batchcolored log output](/static/images/blog/2021/03/batchcoloredfull.png)](/static/images/blog/2021/03/batchcoloredfull.png)

I use this particular `batchcolor` invocation so often I've put it in its own
tiny shell script.  I use it to `tail` log files when developing locally almost
every day, and it makes visually scanning the log output *much* easier.  It can
come in handy for other kinds of text too, like highlighting nicknames in an IRC
log.

Let's step through its code piece by piece.

### Libraries

```lisp
(eval-when (:compile-toplevel :load-toplevel :execute)
  (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t))
```

First we `quickload` libraries.  We'll use [Adopt][] for command line argument
processing, [cl-ppcre][] for regular expressions, and the previously-mentioned
[with-user-abort][] to handle `control-c`.

[cl-ppcre]: http://edicl.github.io/cl-ppcre/

### Package

```lisp
(defpackage :batchcolor
  (:use :cl)
  (:export :toplevel :*ui*))

(in-package :batchcolor)
```

We define and switch to the appropriately-named package.  Nothing special here.

### Configuration

```lisp
;;;; Configuration ------------------------------------------------------------
(defparameter *start* 0)
(defparameter *dark* t)
```

Next we `defparameter` some variables to hold some settings.  `*start*` will be
used later when randomizing colors, don't worry about it for now.

### Errors

```lisp
;;;; Errors -------------------------------------------------------------------
(define-condition user-error (error) ())

(define-condition missing-regex (user-error) ()
  (:report "A regular expression is required."))

(define-condition malformed-regex (user-error)
  ((underlying-error :initarg :underlying-error))
  (:report (lambda (c s)
             (format s "Invalid regex: ~A" (slot-value c 'underlying-error)))))

(define-condition overlapping-groups (user-error) ()
  (:report "Invalid regex: seems to contain overlapping capturing groups."))

(define-condition malformed-explicit (user-error)
  ((spec :initarg :spec))
  (:report
    (lambda (c s)
      (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5."
              (slot-value c 'spec)))))
```

Here we define the user errors.  Some of these are self-explanatory, while
others will make more sense later once we see them in action.  The specific
details aren't as important as the overall idea: for user errors we know might
happen, display a helpful error message instead of just spewing a backtrace at
the user.

### Colorization

Next we have the actual meat of the program.  Obviously this is going to be
completely different for every program, so feel free to skip this if you don't
care about this specific problem.

```lisp
;;;; Functionality ------------------------------------------------------------
(defun rgb-code (r g b)
  ;; The 256 color mode color values are essentially r/g/b in base 6, but
  ;; shifted 16 higher to account for the intiial 8+8 colors.
  (+ (* r 36)
     (* g 6)
     (* b 1)
     16))
```

We're going to highlight different matches with different colors.  We'll need
a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI
colors isn't enough.  Full 24-bit truecolor is overkill, but the 8-bit ANSI
colors will work nicely.  If we ignore the base colors, we essentially have
6 x 6 x 6 = 216 colors to work with.  `rgb-code` will take the red, green, and
blue values from `0` to `5` and return the color code.  See [Wikipedia][8bit]
for more information.

[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit

```lisp
(defun make-colors (excludep)
  (let ((result (make-array 256 :fill-pointer 0)))
    (dotimes (r 6)
      (dotimes (g 6)
        (dotimes (b 6)
          (unless (funcall excludep (+ r g b))
            (vector-push-extend (rgb-code r g b) result)))))
    result))

(defparameter *dark-colors*  (make-colors (lambda (v) (< v 3))))
(defparameter *light-colors* (make-colors (lambda (v) (> v 11))))
```

Now we can build some arrays of colors.  We *could* use any of the 216 available
colors, but in practice we probably don't want to, because the darkest colors
will be too dark to read on a dark terminal, and vice versa for light terminals.
In a concession to practicality we'll generate two separate arrays of colors,
one that excludes colors whose total value is too dark and one excluding those
that are too light.

(Notice that `*dark-colors*` is "the array of colors which are suitable for use
on dark terminals" and not "the array of colors which are *themselves* dark".
Naming things is hard.)

Note that these arrays will be generated when the `batchcolor.lisp` file is
`load`ed, which is *when we build the binary*.  They *won't* be recomputed every
time you run the resulting binary.  In this case it doesn't really matter (the
arrays are small) but it's worth remembering in case you ever have some data you
want (or don't want) to compute at build time instead of run time.

```lisp
(defparameter *explicits* (make-hash-table :test #'equal))
```

Here we make a hash table to store the strings and colors for strings we want to
explicitly color (e.g. `ERR` should be red, `INFO` cyan).  The keys will be the
strings and values the RGB codes.

```lisp
(defun djb2 (string)
  ;; http://www.cse.yorku.ca/~oz/hash.html
  (reduce (lambda (hash c)
            (mod (+ (* 33 hash) c) (expt 2 64)))
          string
          :initial-value 5381
          :key #'char-code))

(defun find-color (string)
  (gethash string *explicits*
           (let ((colors (if *dark* *dark-colors* *light-colors*)))
             (aref colors
                   (mod (+ (djb2 string) *start*)
                        (length colors))))))
```

For strings that we want to explicitly color, we just look up the appropriate
code in `*explicits*` and return it.

Otherwise, we want to highlight unique matches in different colors.  There are
a number of different ways we could do this, for example: we could randomly pick
a color the first time we see a string and store it in a hash table for
subsequent encounters.  But this would mean we'd grow that hash table over time,
and one of the things I often use this utility for is `tail -f`ing long-running
processes when developing locally, so the memory usage would grow and grow until
the `batchcolor` process was restarted, which isn't ideal.

Instead, we'll hash each string with a simple [DJB hash][djb] and use it to
index into the appropriate array of colors.  This ensures that identical matches
get identical colors, and avoids having to store every match we've ever seen.

There will be some collisions, but there's not much we can do about that with
only ~200 colors to work with.  We could have used 16-bit colors like
I mentioned before, but then we'd have to worry about picking colors different
enough for humans to easily tell apart, and for this simple utility I didn't
want to bother.

We'll talk about `*start*` later, ignore it for now (it's `0` by default).

[djb]: http://www.cse.yorku.ca/~oz/hash.html

```lisp
(defun ansi-color-start (color)
  (format nil "~C[38;5;~Dm" #\Escape color))

(defun ansi-color-end ()
  (format nil "~C[0m" #\Escape))

(defun print-colorized (string)
  (format *standard-output* "~A~A~A"
          (ansi-color-start (find-color string))
          string
          (ansi-color-end)))
```

Next we have some functions to output the appropriate ANSI escapes to highlight
our matches.  We could use a library for this but it's only two lines.  [It's
not worth it](http://xn--rpa.cc/irl/term.html).

And now we have the beating heart of the program:

```lisp
(defun colorize-line (scanner line &aux (start 0))
  (ppcre:do-scans (ms me rs re scanner line)
    ;; If we don't have any register groups, colorize the entire match.
    ;; Otherwise, colorize each matched capturing group.
    (let* ((regs? (plusp (length rs)))
           (starts (if regs? (remove nil rs) (list ms)))
           (ends   (if regs? (remove nil re) (list me))))
      (map nil (lambda (word-start word-end)
                 (unless (<= start word-start)
                   (error 'overlapping-groups))
                 (write-string line *standard-output* :start start :end word-start)
                 (print-colorized (subseq line word-start word-end))
                 (setf start word-end))
           starts ends)))
  (write-line line *standard-output* :start start))
```

`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with
any of the desired matches colorized appropriately.  There are a few things to
note here.

First: if the regular expression contains any capturing groups, we'll only
colorize those parts of the match.  For example: if you run `batchcolor
'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves
will be highlighted, not the surrounding angle brackets.  Otherwise, if there
are no capturing groups in the regular expression, we'll highlight the entire
match (as if there were one big capturing group around the whole thing).

Second: overlapping capturing groups are explicitly disallowed and
a `user-error` signaled if we notice any.  It's not clear what do to in this
case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be?
Highlight `f` and `oo` in the same color?  In different colors?  Should the `oo`
be a different color than the `oo` in `boo`?  There's too many options with no
clear winner, so we'll just tell the user to be more clear.

To do the actual work we iterate over each match and print the non-highlighted
text before the match, then print the highlighted match.  Finally we print any
remaining text after the last match.

### Not-Quite-Top-Level Interface

```lisp
;;;; Run ----------------------------------------------------------------------
(defun run% (scanner stream)
  (loop :for line = (read-line stream nil)
        :while line
        :do (colorize-line scanner line)))

(defun run (pattern paths)
  (let ((scanner (handler-case (ppcre:create-scanner pattern)
                   (ppcre:ppcre-syntax-error (c)
                     (error 'malformed-regex :underlying-error c))))
        (paths (or paths '("-"))))
    (dolist (path paths)
      (if (string= "-" path)
        (run% scanner *standard-input*)
        (with-open-file (stream path :direction :input)
          (run% scanner stream))))))
```

Here we have the not-quite-top-level interface to the program.  `run` takes
a pattern string and a list of paths and runs the colorization on each path.
This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>"
"foo.txt")`, so we can test without worrying about killing the Lisp process.

### User Interface

In the last chunk of the file we have the user interface.  There are a couple of
things to note here.

I'm using a command line argument parsing library I wrote myself: [Adopt][].
I won't go over exactly what all the various Adopt functions do.  Most of them
should be fairly easy to understand, but [check out the Adopt
documentation][adopt-usage] for the full story if you're curious.

[adopt-usage]: https://docs.stevelosh.com/adopt/usage/

If you prefer another library (and there are quite a few around) feel free
to use it — it should be pretty easy to adapt this setup to a different library.
The only things you'd need to change would be the `toplevel` function and the
`build-manual.sh` script (if you even care about building `man` pages at all).

You might also notice that the user interface for the program is almost as much
code as the entire rest of the program.  This may seem strange, but I think it
makes a certain kind of sense.  When you're writing code to interface with an
external system, a messier and more complicated external system will usually
require more code than a cleaner and simpler external system.  A human brain is
probably the messiest and most complicated external system you'll ever have to
deal with, so it's worth taking the extra time and code to be especially careful
when writing an interface to it.

First we'll define a typical `-h`/`--help` option:

```lisp
(defparameter *option-help*
  (adopt:make-option 'help
    :help "Display help and exit."
    :long "help"
    :short #\h
    :reduce (constantly t)))
```

Next we'll define a pair of options for enabling/disabling the Lisp debugger:

```lisp
(adopt:defparameters (*option-debug* *option-no-debug*)
  (adopt:make-boolean-options 'debug
    :long "debug"
    :short #\d
    :help "Enable the Lisp debugger."
    :help-no "Disable the Lisp debugger (the default)."))
```

By default the debugger will be off, so any unexpected error will print
a backtrace to standard error and exit with a nonzero exit code.  This is the
default because if I add a `batchcolor` somewhere in a shell script, I probably
don't want to suddenly hang the entire script if something breaks.  But we still
want to be *able* to get into the debugger manually if something goes wrong.
This is Common Lisp — we don't have to settle for a stack trace or core dump, we
can have a real interactive debugger in the final binary.

Note how Adopt's `make-boolean-options` function creates *two* options here:

* `-d`/`--debug` will enable the debugger.
* `-D`/`--no-debug` will disable the debugger.

Even though *disabled* is the default, it's still important to have both
switches for boolean options like this.  If someone wants the debugger to be
*enabled* by default instead (along with some other configuration options), they
might have a shell alias like this:

    alias bcolor='batchcolor --debug --foo --bar'

But sometimes they might want to temporarily *disable* the debugger for a single
run.  Without a `--no-debug` option, they would have to run the vanilla
`batchcolor` and retype all the *other* options.  But having the `--no-debug`
option allows them to just say:

    bcolor --no-debug

This would expand to:

    batchcolor --debug --foo --bar --no-debug

The later option wins, and the user gets the behavior they expect.

Next we'll define some color-related options.  First an option to randomize the
colors each run, instead of always picking the same color for a particular
string, and then a toggle for choosing colors that work for dark or light
terminals:

```lisp
(adopt:defparameters (*option-randomize* *option-no-randomize*)
  (adopt:make-boolean-options 'randomize
    :help "Randomize the choice of color each run."
    :help-no "Do not randomize the choice of color each run (the default)."
    :long "randomize"
    :short #\r))

(adopt:defparameters (*option-dark* *option-light*)
  (adopt:make-boolean-options 'dark
    :name-no 'light
    :long "dark"
    :long-no "light"
    :help "Optimize for dark terminals (the default)."
    :help-no "Optimize for light terminals."
    :initial-value t))
```

The last option we'll define is `-e`/`--explicit`, to allow the user to select
an explicit color for a particular string:

```lisp
(defun parse-explicit (spec)
  (ppcre:register-groups-bind
      ((#'parse-integer r g b) string)
      ("^([0-5]),([0-5]),([0-5]):(.+)$" spec)
    (return-from parse-explicit (cons string (rgb-code r g b))))
  (error 'malformed-explicit :spec spec))

(defparameter *option-explicit*
  (adopt:make-option 'explicit
    :parameter "R,G,B:STRING"
    :help "Highlight STRING in an explicit color.  May be given multiple times."
    :manual (format nil "~
      Highlight STRING in an explicit color instead of randomly choosing one.  ~
      R, G, and B must be 0-5.  STRING is treated as literal string, not a regex.  ~
      Note that this doesn't automatically add STRING to the overall regex, you ~
      must do that yourself!  This is a known bug that may be fixed in the future.")
    :long "explicit"
    :short #\e
    :key #'parse-explicit
    :reduce #'adopt:collect))
```

Notice how we signal a `malformed-explicit` condition if the user gives us
mangled text.  This is a subtype of `user-error`, so the program will print the
error and exit even if the debugger is enabled.  We also include a slightly more
verbose description in the `man` page than the terse one in the `--help` text.

Next we write the main help and manual text, as well as some real-world
examples:

```lisp
(adopt:define-string *help-text*
  "batchcolor takes a regular expression and matches it against standard ~
   input one line at a time.  Each unique match is highlighted in its own color.~@
   ~@
   If the regular expression contains any capturing groups, only those parts of ~
   the matches will be highlighted.  Otherwise the entire match will be ~
   highlighted.  Overlapping capturing groups are not supported.")

(adopt:define-string *extra-manual-text*
  "If no FILEs are given, standard input will be used.  A file of - stands for ~
   standard input as well.~@
   ~@
   Overlapping capturing groups are not supported because it's not clear what ~
   the result should be.  For example: what should ((f)oo|(b)oo) highlight when ~
   matched against 'foo'?  Should it highlight 'foo' in one color?  The 'f' in ~
   one color and 'oo' in another color?  Should that 'oo' be the same color as ~
   the 'oo' in 'boo' even though the overall match was different?  There are too ~
   many possible behaviors and no clear winner, so batchcolor disallows ~
   overlapping capturing groups entirely.")

(defparameter *examples*
  '(("Colorize IRC nicknames in a chat log:"
     . "cat channel.log | batchcolor '<(\\\\w+)>'")
    ("Colorize UUIDs in a request log:"
     . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'")
    ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):"
     . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log")
    ("Colorize earmuffed symbols in a Lisp file:"
     . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp")))
```

Finally we can wire everything together in the main Adopt interface:

```lisp
(defparameter *ui*
  (adopt:make-interface
    :name "batchcolor"
    :usage "[OPTIONS] REGEX [FILE...]"
    :summary "colorize regex matches in batches"
    :help *help-text*
    :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*)
    :examples *examples*
    :contents (list
                *option-help*
                *option-debug*
                *option-no-debug*
                (adopt:make-group 'color-options
                                  :title "Color Options"
                                  :options (list *option-randomize*
                                                 *option-no-randomize*
                                                 *option-dark*
                                                 *option-light*
                                                 *option-explicit*)))))
```

All that's left to do is the top-level function that will be called when the
binary is executed.

### Top-Level Interface

Before we write `toplevel` we've got a couple of helpers:

```lisp
(defmacro exit-on-ctrl-c (&body body)
  `(handler-case (with-user-abort:with-user-abort (progn ,@body))
     (with-user-abort:user-abort () (adopt:exit 130))))

(defun configure (options)
  (loop :for (string . rgb) :in (gethash 'explicit options)
        :do (setf (gethash string *explicits*) rgb))
  (setf *start* (if (gethash 'randomize options)
                  (random 256 (make-random-state t))
                  0)
        *dark* (gethash 'dark options)))
```

Our `toplevel` function looks much like the one in the skeleton, but fleshed out
a bit more:

```lisp
(defun toplevel ()
  (sb-ext:disable-debugger)
  (exit-on-ctrl-c
    (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*)
      (when (gethash 'debug options)
        (sb-ext:enable-debugger))
      (handler-case
          (cond
            ((gethash 'help options) (adopt:print-help-and-exit *ui*))
            ((null arguments) (error 'missing-regex))
            (t (destructuring-bind (pattern . files) arguments
                 (configure options)
                 (run pattern files))))
        (user-error (e) (adopt:print-error-and-exit e))))))
```

This `toplevel` has a few extra bits beyond the skeletal example.

First, we disable the debugger immediately, and then re-enable it later if the
user asks us to.  We want to keep it disabled until *after* argument parsing
because we can't know whether the user wants it or not until we parse the
arguments.

Instead of just blindly running `run`, we check for `--help` and print it if
desired.  We also validate that the user passes the correct amount of arguments,
signaling a subtype of `user-error` if they don't.  Assuming everything looks
good we handle the configuration, call `run`, and that's it!

Running `make` generates `bin/batchcolor` and `man/man1/batchcolor.1`, and we
can view our log files in beautiful color.

## More Information

I hope this overview was helpful.  This has worked for me, but Common Lisp is
a flexible language, so if you want to use this layout as a starting point and
modify it for your own needs, go for it!

If you want to see some more examples you can find them in [my dotfiles
repository](https://hg.stevelosh.com/dotfiles/file/tip/lisp).  Some of the more
fun ones include:

* `weather` for displaying the weather over the next few hours so I can tell if
  I need a jacket or umbrella before I go out for a walk.
* `retry` to retry shell commands if they fail, with options for how many times
  to retry, strategies for waiting/backing off on failure, etc.
* `pick` to interactively filter the output of one command into another
  (inspired by the `pick` program in "The UNIX Programming Environment" but with
  more options).

The approach I laid out in this post works well for small, single-file programs.
If you're creating a larger program you'll probably want to move to a full ASDF
system in its own directory/repository.  My friend Ian [wrote a post about
that](http://atomized.org/blog/2020/07/06/common-lisp-in-practice/) which you
might find interesting.