content/blog/2022/04/fun-with-macros-do-file.markdown @ a55e0be16f56

Publish do-file
author Steve Losh <steve@stevelosh.com>
date Tue, 19 Apr 2022 09:46:21 -0400
parents 5510909795e6
children 351848b6eab4
(:title "Fun with Macros: Do-File"
 :snip "Part 3 in a series of short posts about fun Common Lisp Macros."
 :date "2022-04-19T13:45:00Z"
 :draft nil)

It's been a while, but it's time to take a look at another fun little Common
Lisp macro with some interesting things inside it: `do-file`.

<div id="toc"></div>

## Usage

The macro we'll be taking a look at today is called `do-file`.  It's used to
open a file and iterate over the contents using a reader function, saving you
some tedious boilerplate.

First let's look at some examples of how you could use it.  Processing each
line of a file is the default:

```lisp
(do-file (line "foo.txt")
  (unless (string= "" line)
    (write-line (string-upcase line))))
```

Using a different reader function and [another
macro](/blog/2018/05/fun-with-macros-gathering/) to gather data from inside the
iteration:

```lisp
(gathering
  (do-file (n :reader #'read-integer)
    (when (primep n)
      (gather n))))
```

Passing along options to the underlying `open`, and returning early:

```lisp
(do-file (form "foo.lisp" :reader #'read :external-format :EBCDIC-US)
  (when (eq form :stop)
    (return :stopped-early))
  (print form))
```

All of these could of course be done in other ways.  You could have a separate
function that reads the file into a sequence and then pass that to `mapcar` or
something else, but it can be wasteful to cons up the entire list if you're only
going to process items and don't need to retain then (or if you're going to stop
early).

You could also write a `mapc-file` that takes a function instead of making this
a macro, but sometimes it's nice to not have to wrap things in a thunk.  It's
probably worth having that function as an additional tool in the toolbox though!

## Implementation

Here's the full implementation of the macro:

```lisp
(let ((eof (gensym "EOF")))
  (defmacro do-file ((symbol path
                      &rest open-options
                      &key (reader '#'read-line) &allow-other-keys)
                     &body body)
    "Iterate over the contents of `file` using `reader`.

    During iteration, `symbol` will be set to successive values read from the
    file by `reader`.

    `reader` can be any function that conforms to the usual reading interface,
    i.e. anything that can handle `(read-foo stream eof-error-p eof-value)`.

    Any keyword arguments other than `:reader` will be passed along to `open`.

    If `nil` is used for one of the `:if-…` options to `open` and this results
    in `open` returning `nil`, no iteration will take place.

    An implicit block named `nil` surrounds the iteration, so `return` can be
    used to terminate early.

    Returns `nil`.

    Examples:

      (do-file (line \"foo.txt\")
        (print line))

      (do-file (form \"foo.lisp\" :reader #'read :external-format :EBCDIC-US)
        (when (eq form :stop)
          (return :stopped-early))
        (print form))

      (do-file (line \"does-not-exist.txt\" :if-does-not-exist nil)
        (this-will-not-be-executed))

    "
    (let ((open-options (alexandria:remove-from-plist open-options :reader)))
      (alexandria:with-gensyms (stream)
        (alexandria:once-only (path reader)
          `(when-let ((,stream (open ,path :direction :input ,@open-options)))
             (unwind-protect
                 (do ((,symbol
                       (funcall ,reader ,stream nil ',eof)
                       (funcall ,reader ,stream nil ',eof)))
                     ((eq ,symbol ',eof))
                   ,@body)
               (close ,stream))))))))
```

There are a few interesting things to talk about here.

### Let Over Defmacro

The very first line is unusual: instead of the `defmacro` being the top level
form, we wrap it in a `let` to generate one single unique EOF sentinel object:

```lisp
(let ((eof (gensym "EOF")))
  (defmacro do-file (…)
    …))
```

We could put the `let` inside the macro, but then we'd be generating a separate
EOF object for every use of the macro, which is wasteful.

### &rest and &key

Note how the argument list of the macro takes both `&rest` and `&key` arguments, and uses
`&allow-other-keys` to let the macro take arbitrary keyword arguments


```lisp
(defmacro do-file ((symbol path
                    &rest open-options
                    &key (reader '#'read-line) &allow-other-keys)
                   &body body)
  (let ((open-options (alexandria:remove-from-plist open-options :reader)))

    (when-let ((,stream (open ,path :direction :input ,@open-options)))
      …)))
```

We pass along any keyword arguments we get (aside from the special `:reader`
argument for this macro) to `open`.  Using `&allow-other-keys` means we don't
need to hardcode all the possible options to `open`, and also allows for
additional implementation-specific options to be passed to `open` if the user
wants.

We could have omitted the keyword arguments entirely, taken the arguments as
a raw `&rest`, and pulled out `:reader` ourselves with `getf`.  But doing it
this way means we don't have to fiddle around doing that, and also can also
provide slightly nicer documentation in an editor when it shows the macro's
argument list in the status bar.  We'll also get a nicer error if we
accidentally pass an odd number of keyword arguments.

One more thing before we move on: note the extra level of quoting for the
`(reader '#'read-line)` default value.  It's important to remember that this is
a *macro*, and so when someone writes `(do-file (… :reader #'foo) …)` the macro
isn't getting the *function* `foo` because it's not evaluated yet, it's getting
the *list* `(function foo)`.  But the default value is *evaluated* when the
argument is missing, so we need the extra layer of quoting to make sure the
result makes sense and matches what we'd be getting normally.

### Macros Using Macros

We use `with-gensyms` and `once-only` from Alexandria to maintain good hygiene
in the macro.  We also use [`when-let`](/blog/2018/07/fun-with-macros-if-let/)
to avoid some more boilerplate:

```lisp
(defmacro do-file (…)
  (alexandria:with-gensyms (stream)
    (alexandria:once-only (path reader)
      `(when-let ((,stream (open ,path :direction :input ,@open-options)))
         (unwind-protect
             (do …)
           (close ,stream))))))
```

### Don't Loop

Finally we get to the meat of the macro:

```lisp
(do ((,symbol
     (funcall ,reader ,stream nil ',eof)
     (funcall ,reader ,stream nil ',eof)))
    ((eq ,symbol ',eof))
  ,@body)
```

Unfortunately we need to use the tedious `do` instead of `loop` here to avoid an
annoying bug: if we expanded into a `loop` call, and the user is calling this
from their *own* loop, and they use `(loop-finish)` in the body code, then it
would finish *our* loop instead of *their* loop, which would very confusing.

Imagine the user wrote this very contrived example:

```lisp
(defun find-the-cat (&rest paths)
  (loop
    :with result = nil
    :for (path . remaining) :on paths
    :for i :from 1
    :do (do-file (line path)
          (when (string= line "meow")
            (setf result path)
            (loop-finish))) ;; This should obviously go to the finally below.
    :finally
    (when result
      (format t "Found cat after searching ~D files (did not search ~D other~:P)."
              i (length remaining))
      (return result))))
```

If `do-file` expanded into a `loop` form, then the `(loop-finish)` would only
terminate *that* loop.

The same issue kind of applies with the implicit block named `nil` around `do`.
But this is much less surprising for a macro named `do-…`, and we've documented
it in the docstring, so that's probably okay.

### Repetition Allergies

Using `do` here is a little annoying because the init form and the step form are
exactly the same.  If you're allergic to repeating yourself you could use `#n=`
and `#n#` reader macros to get around it:

```lisp
(do ((,symbol #1=(funcall ,reader ,stream nil ',eof) #1#))
    ((eq ,symbol ',eof))
  ,@body)
```

I find this more confusing than helpful, but to each their own.

## Result

We've got a nice little macro for easily iterating over files piece by piece.
It can take any reader function that conforms to the usual `(read-foo stream
eof-error-p eof-value)` interface, which means we can write our own reader
functions that will compose nicely with the macro.