# HG changeset patch # User Steve Losh # Date 1650331342 14400 # Node ID 5510909795e69c4568a4a764e681bbefb75dbdeb # Parent 08283802d226f13fd376b9650858f514f8aff8fc Add do-file post diff -r 08283802d226 -r 5510909795e6 content/blog/2022/04/fun-with-macros-do-file.markdown --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/blog/2022/04/fun-with-macros-do-file.markdown Mon Apr 18 21:22:22 2022 -0400 @@ -0,0 +1,243 @@ +(:title "Fun with Macros: Do-File" + :snip "Part 3 in a series of short posts about fun Common Lisp Macros." + :date "2022-04-19T13:15:00Z" + :draft t) + +It's been a while, but it's time to take a look at another fun little Common +Lisp macro: `do-file`. + +
+ +## Usage + +The macro we'll be taking a look at today is called `do-file`. It's used to +open a file and iterate over the contents using a reader function, saving you +some tedious boilerplate. + +First let's look at some examples of how you could use it. Processing each +line of a file is the default: + +```lisp +(do-file (line "foo.txt") + (unless (string= "" line) + (write-line (string-upcase line)))) +``` + +Using a different reader function and [another +macro](/blog/2018/05/fun-with-macros-gathering/) to gather data from inside the +iteration: + +```lisp +(gathering + (do-file (n :reader #'read-integer) + (when (primep n) + (gather n)))) +``` + +Passing along options to the underlying `open`, and returning early: + +```lisp +(do-file (form "foo.lisp" :reader #'read :external-format :EBCDIC-US) + (when (eq form :stop) + (return :stopped-early)) + (print form)) +``` + +All of these could of course be done in other ways. You could have a separate +function that reads the file into a sequence and then pass that to `mapcar` or +something else, but it can be wasteful to cons up the entire list if you're only +going to process items and don't need to retain then (or if you're going to stop +early). + +You could also write a `mapc-file` that takes a function instead of making this +a macro, but sometimes it's nice to not have to wrap things in a thunk. It's +probably worth having that function as an additional tool in the toolbox though! + +## Implementation + +Here's the full implementation of the macro: + +```lisp +(let ((eof (gensym "EOF"))) + (defmacro do-file ((symbol path + &rest open-options + &key (reader '#'read-line) &allow-other-keys) + &body body) + "Iterate over the contents of `file` using `reader`. + + During iteration, `symbol` will be set to successive values read from the + file by `reader`. + + `reader` can be any function that conforms to the usual reading interface, + i.e. anything that can handle `(read-foo stream eof-error-p eof-value)`. + + Any keyword arguments other than `:reader` will be passed along to `open`. + + If `nil` is used for one of the `:if-…` options to `open` and this results + in `open` returning `nil`, no iteration will take place. + + An implicit block named `nil` surrounds the iteration, so `return` can be + used to terminate early. + + Returns `nil`. + + Examples: + + (do-file (line \"foo.txt\") + (print line)) + + (do-file (form \"foo.lisp\" :reader #'read :external-format :EBCDIC-US) + (when (eq form :stop) + (return :stopped-early)) + (print form)) + + (do-file (line \"does-not-exist.txt\" :if-does-not-exist nil) + (this-will-not-be-executed)) + + " + (let ((open-options (alexandria:remove-from-plist open-options :reader))) + (alexandria:with-gensyms (stream) + (alexandria:once-only (path reader) + `(when-let ((,stream (open ,path :direction :input ,@open-options))) + (unwind-protect + (do ((,symbol + (funcall ,reader ,stream nil ',eof) + (funcall ,reader ,stream nil ',eof))) + ((eq ,symbol ',eof)) + ,@body) + (close ,stream)))))))) +``` + +There are a few interesting things to talk about here. + +### Let Over Defmacro + +The very first line is unusual: instead of the `defmacro` being the top level +form, we wrap it in a `let` to generate one single unique EOF sentinel object: + +```lisp +(let ((eof (gensym "EOF"))) + (defmacro do-file (…) + …)) +``` + +We could put the `let` inside the macro, but then we'd be generating a separate +EOF object for every use of the macro, which is wasteful. + +### &rest and &key + +Note how the argument list of the macro takes both `&rest` and `&key` arguments, and uses +`&allow-other-keys` to let the macro take arbitrary keyword arguments + + +```lisp +(defmacro do-file ((symbol path + &rest open-options + &key (reader '#'read-line) &allow-other-keys) + &body body) + (let ((open-options (alexandria:remove-from-plist open-options :reader))) + … + (when-let ((,stream (open ,path :direction :input ,@open-options))) + …))) +``` + +We pass along any keyword arguments we get (aside from the special `:reader` +argument for this macro) to `open`. Using `&allow-other-keys` means we don't +need to hardcode all the possible options to `open`, and also allows for +additional implementation-specific options to be passed to `open` if the user +wants. + +We could have omitted the keyword arguments entirely, taken the arguments as +a raw `&rest`, and pulled out `:reader` ourselves with `getf`. But doing it +this way means we don't have to fiddle around doing that, and also can also +provide slightly nicer documentation in an editor when it shows the macro's +argument list in the status bar. We'll also get a nicer error if we +accidentally pass an odd number of keyword arguments. + +One more thing before we move on: note the extra level of quoting for the +`(reader '#'read-line)` default value. It's important to remember that this is +a *macro*, and so when someone writes `(do-file (… :reader #'foo) …)` the macro +isn't getting the *function* `foo` because it's not evaluated yet, it's getting +the *list* `(function foo)`. But the default value is *evaluated* when the +argument is missing, so we need the extra layer of quoting to make sure the +result makes sense and matches what we'd be getting normally. + +### Macros Using Macros + +We use `with-gensyms` and `once-only` from Alexandria to maintain good hygiene +in the macro. We also use [`when-let`](/blog/2018/07/fun-with-macros-if-let/) +to avoid some more boilerplate: + +```lisp +(defmacro do-file (…) + (alexandria:with-gensyms (stream) + (alexandria:once-only (path reader) + `(when-let ((,stream (open ,path :direction :input ,@open-options))) + (unwind-protect + (do …) + (close ,stream)))))) +``` + +### Don't Loop + +Finally we get to the meat of the macro: + +```lisp +(do ((,symbol + (funcall ,reader ,stream nil ',eof) + (funcall ,reader ,stream nil ',eof))) + ((eq ,symbol ',eof)) + ,@body) +``` + +Unfortunately we need to use the tedious `do` instead of `loop` here to avoid an +annoying bug: if we expanded into a `loop` call, and the user is calling this +from their *own* loop, and they use `(loop-finish)` in the body code, then it +would finish *our* loop instead of *their* loop, which would very confusing. + +Imagine the user wrote this very contrived example: + +```lisp +(defun find-the-cat (&rest paths) + (loop + :with result = nil + :for (path . remaining) :on paths + :for i :from 1 + :do (do-file (line path) + (when (string= line "meow") + (setf result path) + (loop-finish))) ;; This should obviously go to the finally below. + :finally + (when result + (format t "Found cat after searching ~D files (did not search ~D other~:P)." + i (length remaining)) + (return result)))) +``` + +If `do-file` expanded into a `loop` form, then the `(loop-finish)` would only +terminate *that* loop. + +The same issue kind of applies with the implicit block named `nil` around `do`. +But this is much less surprising for a macro named `do-…`, and we've documented +it in the docstring, so that's probably okay. + +### Repetition Allergies + +Using `do` here is a little annoying because the init form and the step form are +exactly the same. If you're allergic to repeating yourself you could use `#n=` +and `#n#` reader macros to get around it: + +```lisp +(do ((,symbol #1=(funcall ,reader ,stream nil ',eof) #1#)) + ((eq ,symbol ',eof)) + ,@body) +``` + +I find this more confusing than helpful, but to each their own. + +## Result + +We've got a nice little macro for easily iterating over files piece by piece. +It can take any reader function that conforms to the usual `(read-foo stream +eof-error-p eof-value)` interface, which means we can write our own reader +functions that will compose nicely with the macro.