5510909795e6

Add do-file post
[view raw] [browse files]
author Steve Losh <steve@stevelosh.com>
date Mon, 18 Apr 2022 21:22:22 -0400
parents 08283802d226
children edeb31bc40cc
branches/tags (none)
files content/blog/2022/04/fun-with-macros-do-file.markdown

Changes

--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/content/blog/2022/04/fun-with-macros-do-file.markdown	Mon Apr 18 21:22:22 2022 -0400
@@ -0,0 +1,243 @@
+(:title "Fun with Macros: Do-File"
+ :snip "Part 3 in a series of short posts about fun Common Lisp Macros."
+ :date "2022-04-19T13:15:00Z"
+ :draft t)
+
+It's been a while, but it's time to take a look at another fun little Common
+Lisp macro: `do-file`.
+
+<div id="toc"></div>
+
+## Usage
+
+The macro we'll be taking a look at today is called `do-file`.  It's used to
+open a file and iterate over the contents using a reader function, saving you
+some tedious boilerplate.
+
+First let's look at some examples of how you could use it.  Processing each
+line of a file is the default:
+
+```lisp
+(do-file (line "foo.txt")
+  (unless (string= "" line)
+    (write-line (string-upcase line))))
+```
+
+Using a different reader function and [another
+macro](/blog/2018/05/fun-with-macros-gathering/) to gather data from inside the
+iteration:
+
+```lisp
+(gathering
+  (do-file (n :reader #'read-integer)
+    (when (primep n)
+      (gather n))))
+```
+
+Passing along options to the underlying `open`, and returning early:
+
+```lisp
+(do-file (form "foo.lisp" :reader #'read :external-format :EBCDIC-US)
+  (when (eq form :stop)
+    (return :stopped-early))
+  (print form))
+```
+
+All of these could of course be done in other ways.  You could have a separate
+function that reads the file into a sequence and then pass that to `mapcar` or
+something else, but it can be wasteful to cons up the entire list if you're only
+going to process items and don't need to retain then (or if you're going to stop
+early).
+
+You could also write a `mapc-file` that takes a function instead of making this
+a macro, but sometimes it's nice to not have to wrap things in a thunk.  It's
+probably worth having that function as an additional tool in the toolbox though!
+
+## Implementation
+
+Here's the full implementation of the macro:
+
+```lisp
+(let ((eof (gensym "EOF")))
+  (defmacro do-file ((symbol path
+                      &rest open-options
+                      &key (reader '#'read-line) &allow-other-keys)
+                     &body body)
+    "Iterate over the contents of `file` using `reader`.
+
+    During iteration, `symbol` will be set to successive values read from the
+    file by `reader`.
+
+    `reader` can be any function that conforms to the usual reading interface,
+    i.e. anything that can handle `(read-foo stream eof-error-p eof-value)`.
+
+    Any keyword arguments other than `:reader` will be passed along to `open`.
+
+    If `nil` is used for one of the `:if-…` options to `open` and this results
+    in `open` returning `nil`, no iteration will take place.
+
+    An implicit block named `nil` surrounds the iteration, so `return` can be
+    used to terminate early.
+
+    Returns `nil`.
+
+    Examples:
+
+      (do-file (line \"foo.txt\")
+        (print line))
+
+      (do-file (form \"foo.lisp\" :reader #'read :external-format :EBCDIC-US)
+        (when (eq form :stop)
+          (return :stopped-early))
+        (print form))
+
+      (do-file (line \"does-not-exist.txt\" :if-does-not-exist nil)
+        (this-will-not-be-executed))
+
+    "
+    (let ((open-options (alexandria:remove-from-plist open-options :reader)))
+      (alexandria:with-gensyms (stream)
+        (alexandria:once-only (path reader)
+          `(when-let ((,stream (open ,path :direction :input ,@open-options)))
+             (unwind-protect
+                 (do ((,symbol
+                       (funcall ,reader ,stream nil ',eof)
+                       (funcall ,reader ,stream nil ',eof)))
+                     ((eq ,symbol ',eof))
+                   ,@body)
+               (close ,stream))))))))
+```
+
+There are a few interesting things to talk about here.
+
+### Let Over Defmacro
+
+The very first line is unusual: instead of the `defmacro` being the top level
+form, we wrap it in a `let` to generate one single unique EOF sentinel object:
+
+```lisp
+(let ((eof (gensym "EOF")))
+  (defmacro do-file (…)
+    …))
+```
+
+We could put the `let` inside the macro, but then we'd be generating a separate
+EOF object for every use of the macro, which is wasteful.
+
+### &rest and &key
+
+Note how the argument list of the macro takes both `&rest` and `&key` arguments, and uses
+`&allow-other-keys` to let the macro take arbitrary keyword arguments
+
+
+```lisp
+(defmacro do-file ((symbol path
+                    &rest open-options
+                    &key (reader '#'read-line) &allow-other-keys)
+                   &body body)
+  (let ((open-options (alexandria:remove-from-plist open-options :reader)))
+    …
+    (when-let ((,stream (open ,path :direction :input ,@open-options)))
+      …)))
+```
+
+We pass along any keyword arguments we get (aside from the special `:reader`
+argument for this macro) to `open`.  Using `&allow-other-keys` means we don't
+need to hardcode all the possible options to `open`, and also allows for
+additional implementation-specific options to be passed to `open` if the user
+wants.
+
+We could have omitted the keyword arguments entirely, taken the arguments as
+a raw `&rest`, and pulled out `:reader` ourselves with `getf`.  But doing it
+this way means we don't have to fiddle around doing that, and also can also
+provide slightly nicer documentation in an editor when it shows the macro's
+argument list in the status bar.  We'll also get a nicer error if we
+accidentally pass an odd number of keyword arguments.
+
+One more thing before we move on: note the extra level of quoting for the
+`(reader '#'read-line)` default value.  It's important to remember that this is
+a *macro*, and so when someone writes `(do-file (… :reader #'foo) …)` the macro
+isn't getting the *function* `foo` because it's not evaluated yet, it's getting
+the *list* `(function foo)`.  But the default value is *evaluated* when the
+argument is missing, so we need the extra layer of quoting to make sure the
+result makes sense and matches what we'd be getting normally.
+
+### Macros Using Macros
+
+We use `with-gensyms` and `once-only` from Alexandria to maintain good hygiene
+in the macro.  We also use [`when-let`](/blog/2018/07/fun-with-macros-if-let/)
+to avoid some more boilerplate:
+
+```lisp
+(defmacro do-file (…)
+  (alexandria:with-gensyms (stream)
+    (alexandria:once-only (path reader)
+      `(when-let ((,stream (open ,path :direction :input ,@open-options)))
+         (unwind-protect
+             (do …)
+           (close ,stream))))))
+```
+
+### Don't Loop
+
+Finally we get to the meat of the macro:
+
+```lisp
+(do ((,symbol
+     (funcall ,reader ,stream nil ',eof)
+     (funcall ,reader ,stream nil ',eof)))
+    ((eq ,symbol ',eof))
+  ,@body)
+```
+
+Unfortunately we need to use the tedious `do` instead of `loop` here to avoid an
+annoying bug: if we expanded into a `loop` call, and the user is calling this
+from their *own* loop, and they use `(loop-finish)` in the body code, then it
+would finish *our* loop instead of *their* loop, which would very confusing.
+
+Imagine the user wrote this very contrived example:
+
+```lisp
+(defun find-the-cat (&rest paths)
+  (loop
+    :with result = nil
+    :for (path . remaining) :on paths
+    :for i :from 1
+    :do (do-file (line path)
+          (when (string= line "meow")
+            (setf result path)
+            (loop-finish))) ;; This should obviously go to the finally below.
+    :finally
+    (when result
+      (format t "Found cat after searching ~D files (did not search ~D other~:P)."
+              i (length remaining))
+      (return result))))
+```
+
+If `do-file` expanded into a `loop` form, then the `(loop-finish)` would only
+terminate *that* loop.
+
+The same issue kind of applies with the implicit block named `nil` around `do`.
+But this is much less surprising for a macro named `do-…`, and we've documented
+it in the docstring, so that's probably okay.
+
+### Repetition Allergies
+
+Using `do` here is a little annoying because the init form and the step form are
+exactly the same.  If you're allergic to repeating yourself you could use `#n=`
+and `#n#` reader macros to get around it:
+
+```lisp
+(do ((,symbol #1=(funcall ,reader ,stream nil ',eof) #1#))
+    ((eq ,symbol ',eof))
+  ,@body)
+```
+
+I find this more confusing than helpful, but to each their own.
+
+## Result
+
+We've got a nice little macro for easily iterating over files piece by piece.
+It can take any reader function that conforms to the usual `(read-foo stream
+eof-error-p eof-value)` interface, which means we can write our own reader
+functions that will compose nicely with the macro.