# HG changeset patch # User Steve Losh # Date 1609124906 18000 # Node ID c635f15b37c1095d619e9b1c303e0284513ec7df # Parent deaaa26b726614874aee4c8c735f4243e9ab6711 Start draft diff -r deaaa26b7266 -r c635f15b37c1 content/blog/2018/08/types-and-classes-in-common-lisp.markdown --- a/content/blog/2018/08/types-and-classes-in-common-lisp.markdown Sat Apr 18 13:10:31 2020 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,166 +0,0 @@ -+++ -title = "Types and Classes in Common Lisp" -snip = "They're not the same thing!" -date = 2018-08-30T16:00:00Z -draft = true - -+++ - -One thing that often confuses people new to Common Lisp is the differences -(and interactions) between types and classes in the language. Type and classes -are two completely separate things in Common Lisp, but if you're coming from -modern languages it's easy to get the two blurred and confused. Hopefully this -post will make the distinction more clear. - -
- -## Objects - -Before we dive into defining types and classes we should define what an "object" -is, because the term will come up immediately. The following definition will be -good enough for our purposes here: - -**An object is a hunk of bits somewhere in memory.** - -Objects are the things the garbage collector manages. They're the things you -pass to functions, and the things you return from them. Objects have identity, -and that identity can be compared with `eq`. - -I realize that this is a little handwavey, but I think it's good enough to work -with for now. - -TODO values - -(There are a few corner cases (e.g. fixnums), but you can safely ignore them -while trying to wrap your head around this post.) - -## Types - -Types in Common Lisp can be summed up in one single line: - -**A type is a set of objects.** - -We already saw what objects are. "Set" in this definition is a set in the -mathematical sense: an unordered collection of elements (possibly *infinitely -many* elements) with no duplicates. - -That's it. That's all there is to it. This probably seems like a weird -definition if you've never thought much about it before, but let's look at some -examples to see what falls out of it. - -### Type Designators - -First we need a way to specify types. Common Lisp has a concept called [type -designators][TODO] for this purpose. A type designator is something that -represents the given type. - -Let's look at a common type: the set of all integers. Obviously it would be -impractical to talk about this set by listing out all of its members. -A mathematician would denote this type as Z TODO. A Common Lisp programmer -would use the type designator `integer`. - -### Being Of a Type - -What are some things we might want to do with a type? One thing might be to ask -whether a particular object "is of that type", done with `typep` in Common Lisp -(and `instanceOf` in Java, TODO in Python, etc). But what does this actually -*mean*? - -When you're thinking of types as sets of objects, asking whether an object is of -a particular type essentially asking if the object is a member of that set! - -Let's look at couple of examples: - - (typep 42 'integer) - -Here we're asking "Is `42` a member of the set of all integers?" - - (typep x 'symbol) - -Now we're asking "Is the object that `x` evaluates to a member of the set of all -symbols?". - - (typecase foo - (symbol ...a...) - (integer ...b...) - (number ...c...)) - -And now we're saying "Evaluate `foo`. If the result is a member of the set of -all symbols, do `a`. Otherwise if it's a member of the set of all integers, do -`b`. Otherwise if it's a member of the set of all numbers, do `c`. Otherwise -return `nil`.". - -### Subtypes and Supertypes - -So checking if an object is of a particular type is simply the set membership -operation. It turns out that other set operations also have useful definitions -when you think this way: - -* If `foo` is a supertype of `bar`, that means `foo` is a superset of `bar`. -* If `bar` is a subtype of `foo`, that means `bar` is a subtype of `foo`. - -For example: - -* The set of all integers is a subset of the set of all real numbers, which - makes `integer` a subtype of `real`. -* The set of all symbols is a superset of the set of all keyword symbols, so - `symbol` is a supertype of `keyword`. -* The set of all floating point numbers is neither a subset nor a superset of - the set of all symbols, so neither is a subtype of the other. - -Common Lisp's numeric tower consists of nice sequences of types that get more -and more specific as you take further subsets: - -`number` ⊆ `real` ⊆ `rational` ⊆ `integer` ⊆ `fixnum` - -### Explicit Designation - -There are other ways to designate sets too. You can use the `member` type -designator to just list out the members of a set by hand if you want: - - (member 1 2 3) ; => designates the set {1, 2, 3} - -### Everything and Nothing - -The type `t` is the set of all objects. The type `nil` is the empty set. - -This last one can sometimes cause confusion. The symbol `nil` is *not* of type -`nil` because the symbol `nil` is not a member of the empty set (because it's -empty!). If you want to talk about the set containing the symbol `nil`, that's -called `null`: - -```lisp -(typep nil nil) ; => NIL, nil is NOT a member of the empty set -(typep nil null) ; => T, nil IS a member of { nil } -``` - -### Type Combinations - -Let's look at some more set operations. Set complement is probably the -simplest, and Common Lisp supports this with the `not` compound type specifier -TODO: - -```lisp -(typep 1/2 '(not integer)) ; => T -(typep "hello" '(not integer)) ; => T -(typep 42 '(not integer)) ; => NIL -``` - -Set union is covered with `and`: - -```lisp -(typep 1 '(or integer string)) ; => T -(typep "hi" '(or integer string)) ; => T -(typep :what '(or integer string)) ; => NIL -``` - -And of course set intersection is done with `and`: - -```lisp -(typep 1 ) -``` - - -## Classes - -## Blurring the Line diff -r deaaa26b7266 -r c635f15b37c1 content/blog/2021/01/small-common-lisp-cli-programs.markdown --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/blog/2021/01/small-common-lisp-cli-programs.markdown Sun Dec 27 22:08:26 2020 -0500 @@ -0,0 +1,854 @@ +(:title "Writing Small CLI Programs in Common Lisp" + :snip "Somewhere between tiny shell scripts and full projects." + :date "2021-01-02T15:50:00Z" + :draft t) + +I've found Common Lisp to be a good language for this. But it can be a little +intimidating to get started, especially for beginners, because Common Lisp is +a very flexible language and doesn't lock you into one way of working. In this +post I'll describe how I write small, stand-alone command line programs in +Common Lisp. + +
+ +## Requirements + +When you're writing programs in Common Lisp, you've got a lot of options. For +this use case, laying out the requirements I have helped me decide on an +approach. + +First, each new program should be one single file. A few ancillary files for +all scripts together (e.g. a `Makefile`) are okay, but adding a new program +should mean adding one single file. For larger programs a full project +directory and ASDF system are great, but for small programs having one file per +program reduces the overhead quite a bit. + +The programs need to be able to be developed in the typical Common Lisp +interactive style (in my case: with Swank and VLIME). The interactive +development is one of the best parts of working in Common Lisp, and I won't give +it up. In particular, this means that a shell-script style approach, with +`#!/path/to/sbcl --script` and the top and directly running code at the top +level in the file does not work for two main reasons: + +* Loading that file will fail due to the shebang unless you have some ugly + reader macros in your startup file. +* The program will need to do things like parsing command-line arguments and + exiting with an error code, and running `exit` would kill the Swank process. + +The programs need to be able to use libraries, so Quicklisp will need to be +involved. Common Lisp has a lot of nice things built-in, but there's some +libraries that are just too useful to pass up. + +The programs will need to have proper user interfaces. Command line arguments +must be robustly parsed (e.g. collapsing `-a -b -c foo -d` into `-abc foo -d` +should work as expected), malformed or unknown options must be caught instead of +dropping them on the floor, error messages should be meaningful, and the +`--help` should be thoroughly and thoughtfully written. + +Relying on some basic conventions (e.g. a command `foo` is always in `foo.lisp` +and defines a package `foo` with a function called `toplevel`) is okay if it +makes my life easier. These programs are just for me, so I don't have to worry +about people wanting to create executables with spaces in the name or something. + +## Solution Skeleton + +After trying a number of different approaches, I've settled on a solution that +I'm pretty happy with. First I'll describe the general approach, then we'll +look at one actual example program in its entirety. + +### Directory Structure + +I keep all my small single-file Common Lisp programs in a `lisp` directory +inside my dotfiles repository. Its contents look like this: + +``` +…/lisp/ + bin/ + foo + bar + man/ + man1/ + foo.1 + bar.1 + build-binary.sh + build-manual.sh + Makefile + foo.lisp + bar.lisp +``` + +The `bin` directory is where the executable files end up. I've added it to my +`$PATH` so I don't have to symlink or copy the binaries anywhere. + +`man` contains the generated `man` pages. Because it's adjacent to `bin` (which +is on my path) the `man` program automatically finds the `man` pages as +expected. + +`build-binary.sh`, `build-manual.sh`, and `Makefile` are some glue to make +building programs easier. + +The `.lisp` files are, of course, the programs. Each new program I want to add +only requires adding the `.lisp` file in this directory and running +`make`. + +### Lisp Files + +All my small Common Lisp programs follow a few conventions, which makes building +them easier. Let's look at the skeleton of a `foo.lisp` file as an example. +I'll show the entire file here, and then step through it piece by piece. + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(… :with-user-abort) :silent t)) + +(defpackage :foo + (:use :cl) + (:export :toplevel …)) + +(in-package :foo) + +;;;; Configuration ----------------------------------------------- +(defparameter *version* "1.0.0") +(defparameter *some-option* nil) + +;;;; Errors ------------------------------------------------------ +(define-condition user-error (error) ()) + +(define-condition missing-foo (user-error) () + (:report "A foo is required, but none was supplied.")) + +;;;; Functionality ----------------------------------------------- +(defun foo (string) + …) + +;;;; Run --------------------------------------------------------- +(defun run (arguments) + (map nil #'foo arguments)) + +;;;; User Interface ---------------------------------------------- +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (sb-ext:exit :code 130)))) + +(defparameter *ui* + (adopt:make-interface + :name "foo" + …)) + +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) + (adopt:parse-options-or-exit *ui*) + … ; Handle options. + (handler-case (run arguments) + (user-error (e) + (format *error-output* "error: ~A~%" e) + (adopt:exit 1)))))) +``` + +Let's go through each chunk of this. + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(:with-user-abort …) :silent t)) +``` + +First we `quickload` any necessary libraries. We always want to do this, even +when compiling the file, because we need the appropriate packages to be loaded +when we try to use their symbols later in the file. + +[`with-user-abort`](https://github.com/compufox/with-user-abort) is a library +for portably handling `control-c`, which all of these small programs use. + +```lisp +(defpackage :foo + (:use :cl) + (:export :toplevel *ui*)) + +(in-package :foo) +``` + +Next we define a package `foo` and switch to it. The package is always named +the same as the resulting binary and the basename of the file. The package +always exports the symbols `toplevel` and `*ui*`. + +```lisp +;;;; Configuration ----------------------------------------------- +(defparameter *version* "1.0.0") +(defparameter *some-option* nil) +``` + +Next we define any configuration variables. These will be set later after +parsing the command line arguments (when we run the command line program) or +at the REPL (when developing interactively). + +```lisp +;;;; Errors ------------------------------------------------------ +(define-condition user-error (error) ()) + +(define-condition missing-foo (user-error) () + (:report "A foo is required, but none was supplied.")) +``` + +We define a `user-error` condition, and any errors the user might make will +inherit from it. This will make it easy to treat user errors (e.g. passing +a mangled regular expression like `(foo+` as an argument) differently from +programming errors (i.e. bugs). This makes it easier to treat those errors +differently: + +* Bugs should print a backtrace or enter the debugger. +* Expected user errors should print a helpful error message with no backtrace or debugger. + +```lisp +;;;; Functionality ----------------------------------------------- +(defun foo (string) + …) +``` + +Next we have the actual functionality of the program. + +```lisp +;;;; Run --------------------------------------------------------- +(defun run (arguments) + (map nil #'foo arguments)) +``` + +We now define a function `run` that takes some arguments (as strings) and +performs the main work of the program. + +Importantly, `run` does **not** handle command line argument parsing, and it does +**not** exit the program with an error code, which means we can safely call it +to "run the program" when we're developing interactively without worrying about +it killing our Lisp process. + +Finally, we need to define the command line interface. + +```lisp +;;;; User Interface ---------------------------------------------- +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (sb-ext:exit :code 130)))) +``` + +We'll make a little macro around `with-user-abort` to make it less wordy. We'll +[exit with a status of 130](https://tldp.org/LDP/abs/html/exitcodes.html) if the +user presses `ctrl-c`. + +```lisp +(defparameter *ui* + (adopt:make-interface + :name "foo" + …)) +``` + +Here we define the `*ui*` variable whose symbol we exported above. [Adopt][] is +a command line argument parsing library I wrote. If you want to use a different +library, feel free. + +[Adopt]: https://docs.stevelosh.com/adopt + +```lisp +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) + (adopt:parse-options-or-exit *ui*) + … ; Handle options. + (handler-case (run arguments) + (user-error (e) + (format *error-output* "error: ~A~%" e) + (adopt:exit 1)))))) +``` + +And finally we define the `toplevel` function. This will only ever be called +when the program is run as a standalone program, never interactively. It +handles all the work beyond the main guts of the program that are handled by the +`run` function: + +* Disabling or enabling the debugger. +* Exiting the process with the appropriate status code on errors. +* Parsing command line arguments. + +That's it for the structure of the `.lisp` files. + +### Building Binaries + +`build-binary.sh` is a small script to build the executable binaries from the +`.lisp` files. `./build-binary.sh foo.lisp` will build `foo`: + +```bash +#!/usr/bin/env bash + +set -euo pipefail + +LISP=$1 +NAME=$(basename "$1" .lisp) +shift + +sbcl --load "$LISP" \ + --eval "(sb-ext:save-lisp-and-die \"$NAME\" + :executable t + :save-runtime-options t + :toplevel '$NAME:toplevel)" +``` + +Here we see where the naming conventions have become important — we know that +the package is named the same as the binary and that it will have the symbol +`toplevel` exported, which names the entry point for the binary. + +### Building Man Pages + +`build-manual.sh` is similar and builds the `man` pages using [Adopt][]'s +built-in `man` page generation. If you don't care about building `man` pages +for your personal programs (I admit, it's a little bit silly) you can ignore +this. + +```bash +#!/usr/bin/env bash + +set -euo pipefail + +LISP=$1 +NAME=$(basename "$LISP" .lisp) +OUT="$NAME.1" +shift + +sbcl --load "$LISP" \ + --eval "(with-open-file (f \"$OUT\" :direction :output :if-exists :supersede) + (adopt:print-manual $NAME:*ui* :stream f))" \ + --quit +``` + +This is why we always name the Adopt interface variable `*ui*` and export it +from the package. + +### Makefile + +Finally we have a simple `Makefile` so we can run `make` to regenerate any +out of date binaries and `man` pages: + +```make +files := $(wildcard *.lisp) +names := $(files:.lisp=) + +.PHONY: all clean $(names) + +all: $(names) + +$(names): %: bin/% man/man1/%.1 + +bin/%: %.lisp build-binary.sh Makefile + mkdir -p bin + ./build-binary.sh $< + mv $(@F) bin/ + +man/man1/%.1: %.lisp build-manual.sh Makefile + mkdir -p man/man1 + ./build-manual.sh $< + mv $(@F) man/man1/ + +clean: + rm -rf bin man +``` + +We use a `wildcard` to automatically find the `.lisp` files so we don't have to +do anything other than adding a new file when we want to make a new program. + +The most notable line here is `$(names): %: bin/% man/man1/%.1` which uses +a [static pattern rule](https://www.gnu.org/software/make/manual/html_node/Static-Pattern.html#Static-Pattern) +to automatically define the phony rules for building each program. If +`$(names)` is `foo bar` this line effectively defines two phony rules: + +``` +foo: bin/foo man/man1/foo.1 +bar: bin/bar man/man1/bar.1 +``` + +This lets us run `make foo` to make the binary and `man` page for `foo.lisp`. + +## Case Study: A Batch Coloring Utility + +Now that we've seen the skeleton, let's look at one of my actual programs that +I use all the time. It's called `batchcolor` and it's used to highlight regular +expression matches in text (usually log files in my case) with a twist: each +unique match is highlighted in a separate color, which makes it easier to +visually parse the result. + +For example, suppose we have some log files with lines of the form ` +[] ` where request ID is a UUID, and messages might +contain other UUIDs for various things. Such a log file might look something +like this: + +``` +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8b7543/ +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543 is not cached, retrieving... +2021-01-02 14:01:45 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] WARN User 28b2d548-eff1-471c-b807-cc2bcee76b7d does not have access to thing 7ca6d8d2-5038-42bd-a559-b3ee0c8b7543, denying request. +2021-01-02 14:01:46 [f788a624-8dcd-4c5e-b1e8-681d0a68a8d3] INFO Returning HTTP 404. +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Incoming request GET /users/28b2d548-eff1-471c-b807-cc2bcee76b7d/things/7ca6d8d2-5038-42bd-a559-b3ee0c8d7543/ +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Thing 7ca6d8d2-5038-42bd-a559-b3ee0c8d7543 is not cached, retrieving... +2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] INFO Incoming request POST /users/sign-up/ +2021-01-02 14:01:46 [bea6ae06-bd06-4d2a-ae35-3e83fea2edc7] INFO Returning HTTP 200. +2021-01-02 14:01:46 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Error running SQL query: connection refused. +2021-01-02 14:01:47 [b04ced1d-1cfa-4315-aaa9-0e245ff9a8e1] ERR Returning HTTP 500. +``` + +If I try to just read this directly, it's easy for my eyes to glaze over unless +I laboriously read line-by-line. I can use `grep` to highlight the UUIDs, but +that honestly doesn't help too much: + + grep -P '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}|$' foo.log + +`batchcolor` also highlights matches, but highlights each unique match in its +own color: + +This is *much* easier for me to visually parse. The interleaving of separate +request logs is now obvious from the colors of the IDs, and it's easy to match +up various user IDs and thing IDs at a glance. Did you even notice that the two +thing IDs were different before? + +`batchcolor` has a few other simple quality of life features, like picking +explicit colors for specific strings (e.g. red for `ERR`): + +I wrap up this `batchcolor` invocation in an alias and use it to `tail` log +files when developing locally almost every day, and it makes reading the log +output *much* easier. + +Let's step through its code piece by piece. + +### Libraries + +```lisp +(eval-when (:compile-toplevel :load-toplevel :execute) + (ql:quickload '(:adopt :cl-ppcre :with-user-abort) :silent t)) +``` + +First we `quickload` libraries. We'll use [Adopt][] for command line argument +processing, [cl-ppcre][] for regular expressions, and the previously-mentioned +[with-user-abort][] to handle `control-c`. + +### Package + +```lisp +(defpackage :batchcolor + (:use :cl) + (:export :toplevel :*ui*)) + +(in-package :batchcolor) +``` + +We define and switch to the appropriately-named package. Nothing special here. + +### Configuration + +```lisp +;;;; Configuration ------------------------------------------------------------ +(defparameter *version* "1.0.0") +(defparameter *start* 0) +(defparameter *dark* t) +``` + +Next we `defparameter` some variables to hold useful values (like the version) +and settings. `*start*` will be used later when randomizing colors, don't worry +about it for now. + +### Errors + +```lisp +;;;; Errors ------------------------------------------------------------------- +(define-condition user-error (error) ()) + +(define-condition missing-regex (user-error) () + (:report "A regular expression is required.")) + +(define-condition malformed-regex (user-error) + ((underlying-error :initarg :underlying-error)) + (:report (lambda (c s) + (format s "Invalid regex: ~A" (slot-value c 'underlying-error))))) + +(define-condition overlapping-groups (user-error) () + (:report "Invalid regex: seems to contain overlapping capturing groups.")) + +(define-condition malformed-explicit (user-error) + ((spec :initarg :spec)) + (:report (lambda (c s) + (format s "Invalid explicit spec ~S, must be of the form \"R,G,B:string\" with colors being 0-5." + (slot-value c 'spec))))) +``` + +Here we define the user errors. Some of these are self-explanatory, while +others will make more sense later once we see them in action. The specific +details aren't as important as the overall idea: for user errors we know might +happen, display a helpful error message instead of just spewing a backtrace at +the user. + +### Colorization + +Next we have the actual meat of the program. Obviously this is going to be +completely different for every program, so feel free to skip this if you don't +care about this specific problem. + +```lisp +;;;; Functionality ------------------------------------------------------------ +(defun rgb-code (r g b) + ;; The 256 color mode color values are essentially r/g/b in base 6, but + ;; shifted 16 higher to account for the intiial 8+8 colors. + (+ (* r 36) + (* g 6) + (* b 1) + 16)) +``` + +We're going to highlight different matches with different colors. We'll need +a reasonable amount of colors to make this useful, so using the basic 8/16 ANSI +colors isn't enough. Full 24-bit truecolor is overkill, but the 8-bit ANSI +colors will work nicely. If we ignore the base colors, we essentially have +6 x 6 x 6 = 216 colors to work with. `rgb-code` will take the red, green, and +blue values from `0` to `5` and return the color code. See [Wikipedia][8bit] +for more information. + +[8bit]: https://en.wikipedia.org/wiki/ANSI_escape_code#8-bit + +```lisp +(defun make-colors (excludep) + (let ((result (make-array 256 :fill-pointer 0))) + (dotimes (r 6) + (dotimes (g 6) + (dotimes (b 6) + (unless (funcall excludep (+ r g b)) + (vector-push-extend (rgb-code r g b) result))))) + result)) + +(defparameter *dark-colors* (make-colors (lambda (v) (< v 3)))) +(defparameter *light-colors* (make-colors (lambda (v) (> v 11)))) +``` + +Now we can build some arrays of colors. We *could* use any of the 216 available +colors, but in practice we probably don't want to, because the darkest colors +will be too dark to read on a dark terminal, and vice versa for light terminals. +In a concession to practicality we'll generate two separate arrays of colors, +one that excludes colors whose total value is too dark and one excluding those +that are too light. + +(You might notice that `*dark-colors*` is "the array of colors for dark +terminals" and not "the array of colors which are not light". Naming things is +hard.) + +Note that these arrays will be generated when the `batchcolor.lisp` file is +`load`ed, which is *when we build the binary*. They *won't* be recomputed every +time you run the resulting binary. In this case it doesn't really matter (the +arrays are small) but it's worth remembering in case you ever have some data you +want (or don't want) to compute at build time instead of run time. + +```lisp +(defparameter *explicits* (make-hash-table :test #'equal)) +``` + +Here we make a hash table to store the strings and colors for strings we want to +explicitly color (e.g. `ERR` should be red, `INFO` cyan). The keys will be the +strings and values the RGB codes. + +```lisp +(defun djb2 (string) + ;; http://www.cse.yorku.ca/~oz/hash.html + (reduce (lambda (hash c) + (mod (+ (* 33 hash) c) (expt 2 64))) + string + :initial-value 5381 + :key #'char-code)) + +(defun find-color (string) + (gethash string *explicits* + (let ((colors (if *dark* *dark-colors* *light-colors*))) + (aref colors + (mod (+ (djb2 string) *start*) + (length colors)))))) +``` + +For strings that we want to explicitly color, we just look up the appropriate +code in `*explicits*` and return it. + +Otherwise, we want to highlight unique matches in different colors. There are +a number of different ways we could do this. For example, we could randomly +pick a color the first time we see a string and store it in a hash table for +subsequent encounters. But this would mean we'd grow over time, and one of the +things I often use this utility for is `tail -f`ing long-running processes when +developing locally, so the memory usage would grow and grow over time until the +`batchcolor` process was restarted, which isn't ideal. + +Instead, we'll hash each string with a simple [DJB hash][djb] and use it to +index into the appropriate array of colors. This ensures that identical matches +get identical colors, and avoids having to store every match we've ever seen. + +We'll talk about `*start*` later, ignore it for now (it's `0` by default). + +[djb]: http://www.cse.yorku.ca/~oz/hash.html + +```lisp +(defun ansi-color-start (color) + (format nil "~C[38;5;~Dm" #\Escape color)) + +(defun ansi-color-end () + (format nil "~C[0m" #\Escape)) + +(defun print-colorized (string) + (format *standard-output* "~A~A~A" + (ansi-color-start (find-color string)) + string + (ansi-color-end))) +``` + +Next we have some functions to output the appropriate ANSI escapes to highlight +our matches. We could use a library for this but it's only 2 lines. [It's not +worth it](http://xn--rpa.cc/irl/term.html). + +And now we have the beating heart of the program: + +```lisp +(defun colorize-line (scanner line &aux (start 0)) + (ppcre:do-scans (ms me rs re scanner line) + ;; If we don't have any register groups, colorize the entire match. + ;; Otherwise, colorize each matched capturing group. + (let* ((regs? (plusp (length rs))) + (starts (if regs? (remove nil rs) (list ms))) + (ends (if regs? (remove nil re) (list me)))) + (map nil (lambda (word-start word-end) + (unless (<= start word-start) + (error 'overlapping-groups)) + (write-string line *standard-output* :start start :end word-start) + (print-colorized (subseq line word-start word-end)) + (setf start word-end)) + starts ends))) + (write-line line *standard-output* :start start)) +``` + +`colorize-line` takes a CL-PPCRE scanner and a line, and outputs the line with +any of the desired matches colorized appropriately. There are a few things to +note here. + +First: if the regular expression contains any capturing groups, we will only +colorize those parts of the match. For example, if you run `batchcolor +'^<(\\w+)> '` to colorize the nicks in an IRC log, only the nicknames themselves +will be highlighted, not the surrounding angle brackets. Otherwise, if there +are no capturing groups in the regular expression, we'll highlight the entire +match (as if there were one big capturing group around the whole thing). + +Second: overlapping capturing groups are explicitly disallowed and +a `user-error` signaled if we notice any. It's not clear what do to in this +case — if we match `((f)oo|(b)oo)` against `foo`, what should the output be? +Highlight `f` and `oo` in the same color? In different colors? Should the `oo` +be a different color than the `oo` in `boo`? There's too many options with no +clear winner, so we'll just tell the user to be more clear. + +To do the actual work, we iterate over each match and print the non-highlighted +text before the match, then print the highlighted match. Finally we print any +remaining text after the last match. + +### Not-Quite-Top-Level Interface + +```lisp +;;;; Run ---------------------------------------------------------------------- +(defun run% (scanner stream) + (loop :for line = (read-line stream nil) + :while line + :do (colorize-line scanner line))) + +(defun run (pattern paths) + (let ((scanner (handler-case (ppcre:create-scanner pattern) + (ppcre:ppcre-syntax-error (c) + (error 'malformed-regex :underlying-error c)))) + (paths (or paths '("-")))) + (dolist (path paths) + (if (string= "-" path) + (run% scanner *standard-input*) + (with-open-file (stream path :direction :input) + (run% scanner stream)))))) +``` + +Here we have the not-quite-top-level interface to the program. `run` takes +a pattern string and a list of paths and runs the colorization on each path. +This is safe to call interactively from the REPL, e.g. `(run "<(\\w+)>" +"foo.txt")`, so we can test without worrying about killing the Lisp process. + +### User Interface + +In the last chunk of the file we have the user interface. There are a couple of +things to note here. + +I'm using a command line argument parsing library I wrote myself: [Adopt][]. +But if you prefer another library (and there are quite a few around) feel free +to use it — it should be pretty easy to adapt this setup to a different library. +The only things you'd need to change would be the `toplevel` function and the +`build-manual.sh` script (if you even care about building `man` pages at all). + +You might also notice that the user interface for the program is almost as much +code as the entire rest of the program. This may seem disconcerting at first, +but I think it makes a certain kind of sense. When you're writing code to +interface with an external system, a messier and more complicated external +system will usually require more code than a cleaner and simpler external +system. A human brain is probably the messiest and most complicated external +system you'll ever have to deal with, so it's worth taking the extra time and +code to be especially careful when writing an interface to it. + +```lisp +(defparameter *option-help* + (adopt:make-option 'help + :help "Display help and exit." + :long "help" + :short #\h + :reduce (constantly t))) + +(defparameter *option-version* + (adopt:make-option 'version + :help "Display version information and exit." + :long "version" + :reduce (constantly t))) +``` + +```lisp +(adopt:defparameters (*option-debug* *option-no-debug*) + (adopt:make-boolean-options 'debug + :long "debug" + :short #\d + :help "Enable the Lisp debugger." + :help-no "Disable the Lisp debugger (the default).")) +``` + +```lisp +(adopt:defparameters (*option-randomize* *option-no-randomize*) + (adopt:make-boolean-options 'randomize + :help "Randomize the choice of color each run." + :help-no "Do not randomize the choice of color each run (the default)." + :long "randomize" + :short #\r)) + +(adopt:defparameters (*option-dark* *option-light*) + (adopt:make-boolean-options 'dark + :name-no 'light + :long "dark" + :long-no "light" + :help "Optimize for dark terminals (the default)." + :help-no "Optimize for light terminals." + :initial-value t)) +``` + +```lisp +;;;; User Interface ----------------------------------------------------------- +(defun parse-explicit (spec) + (ppcre:register-groups-bind + ((#'parse-integer r g b) string) + ("^([0-5]),([0-5]),([0-5]):(.+)$" spec) + (return-from parse-explicit (cons string (rgb-code r g b)))) + (error 'malformed-explicit :spec spec)) + +(defparameter *option-explicit* + (adopt:make-option 'explicit + :parameter "R,G,B:STRING" + :help "Highlight STRING in an explicit color. May be given multiple times." + :manual (format nil "~ + Highlight STRING in an explicit color instead of randomly choosing one. ~ + R, G, and B must be 0-5. STRING is treated as literal string, not a regex. ~ + Note that this doesn't automatically add STRING to the overall regex, you ~ + must do that yourself! This is a known bug that may be fixed in the future.") + :long "explicit" + :short #\e + :key #'parse-explicit + :reduce #'adopt:collect)) +``` + +```lisp +(adopt:define-string *help-text* + "batchcolor takes a regular expression and matches it against standard ~ + input one line at a time. Each unique match is highlighted in its own color.~@ + ~@ + If the regular expression contains any capturing groups, only those parts of ~ + the matches will be highlighted. Otherwise the entire match will be ~ + highlighted. Overlapping capturing groups are not supported.") + +(adopt:define-string *extra-manual-text* + "If no FILEs are given, standard input will be used. A file of - stands for ~ + standard input as well.~@ + ~@ + Overlapping capturing groups are not supported because it's not clear what ~ + the result should be. For example: what should ((f)oo|(b)oo) highlight when ~ + matched against 'foo'? Should it highlight 'foo' in one color? The 'f' in ~ + one color and 'oo' in another color? Should that 'oo' be the same color as ~ + the 'oo' in 'boo' even though the overall match was different? There are too ~ + many possible behaviors and no clear winner, so batchcolor disallows ~ + overlapping capturing groups entirely.") + +(defparameter *examples* + '(("Colorize IRC nicknames in a chat log:" + . "cat channel.log | batchcolor '<(\\\\w+)>'") + ("Colorize UUIDs in a request log:" + . "tail -f /var/log/foo | batchcolor '[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}'") + ("Colorize some keywords explicitly and IPv4 addresses randomly (note that the keywords have to be in the main regex too, not just in the -e options):" + . "batchcolor 'WARN|INFO|ERR|(?:[0-9]{1,3}\\\\.){3}[0-9]{1,3}' -e '5,0,0:ERR' -e '5,4,0:WARN' -e '2,2,5:INFO' foo.log") + ("Colorize earmuffed symbols in a Lisp file:" + . "batchcolor '(?:^|[^*])([*][-a-zA-Z0-9]+[*])(?:$|[^*])' tests/test.lisp"))) +``` + +```lisp +(defparameter *ui* + (adopt:make-interface + :name "batchcolor" + :usage "[OPTIONS] REGEX [FILE...]" + :summary "colorize regex matches in batches" + :help *help-text* + :manual (format nil "~A~2%~A" *help-text* *extra-manual-text*) + :examples *examples* + :contents (list + *option-help* + *option-version* + *option-debug* + *option-no-debug* + (adopt:make-group 'color-options + :title "Color Options" + :options (list *option-randomize* + *option-no-randomize* + *option-dark* + *option-light* + *option-explicit*))))) +``` + +### Top-Level Interface + +```lisp +(defmacro exit-on-ctrl-c (&body body) + `(handler-case (with-user-abort:with-user-abort (progn ,@body)) + (with-user-abort:user-abort () (adopt:exit 130)))) + +(defun configure (options) + (loop :for (string . rgb) :in (gethash 'explicit options) + :do (setf (gethash string *explicits*) rgb)) + (setf *start* (if (gethash 'randomize options) + (random 256 (make-random-state t)) + 0) + *dark* (gethash 'dark options))) + +(defun toplevel () + (sb-ext:disable-debugger) + (exit-on-ctrl-c + (multiple-value-bind (arguments options) (adopt:parse-options-or-exit *ui*) + (when (gethash 'debug options) + (sb-ext:enable-debugger)) + (handler-case + (cond + ((gethash 'help options) (adopt:print-help-and-exit *ui*)) + ((gethash 'version options) (write-line *version*) (adopt:exit)) + ((null arguments) (error 'missing-regex)) + (t (destructuring-bind (pattern . files) arguments + (configure options) + (run pattern files)))) + (user-error (e) (adopt:print-error-and-exit e)))))) +``` + + + +## More Information + +* ieure link +* dotfiles repo link + + +