content/blog/2018/05/fun-with-macros-gathering.markdown @ ff107268c4e3

Links, etc
author Steve Losh <steve@stevelosh.com>
date Wed, 21 Aug 2024 09:10:34 -0400
parents f5556130bda1
children (none)
(
:title "Fun with Macros: Gathering"
:snip "Part 1 in a series of short posts about fun Common Lisp Macros."
:date "2018-05-21T16:05:00Z"
:draft nil

)

I haven't written anything in a while.  But after seeing the [metaprogramming
video on Computerphile](https://www.youtube.com/watch?v=dw-y3vNDRWk) the other
day I felt the urge to write about Lisp again, so I decided to do a small series
of posts on fun/useful little Common Lisp macros I've made over the past couple
of years.

<div id="toc"></div>

## The Macro

The first macro we'll look at is `gathering`:

```lisp
(defmacro gathering (&body body)
  "Run `body` to gather some things and return a fresh list of them.

  `body` will be executed with the symbol `gather` bound to a
  function of one argument.  Once `body` has finished, a list of
  everything `gather` was called on will be returned.

  It's handy for pulling results out of code that executes
  procedurally and doesn't return anything, like `maphash` or
  Alexandria's `map-permutations`.

  The `gather` function can be passed to other functions, but should
  not be retained once the `gathering` form has returned (it would
  be useless to do so anyway).

  Examples:

    (gathering
      (dotimes (i 5)
        (gather i))
    =>
    (0 1 2 3 4)

    (gathering
      (mapc #'gather '(1 2 3))
      (mapc #'gather '(a b)))
    =>
    (1 2 3 a b)

  "
  (alexandria:with-gensyms (result)
    `(let ((,result nil))
       (flet ((gather (item)
                (push item ,result)
                item))
         ,@body)
       (nreverse ,result))))
```

## Notes

As the docstring mentions, sometimes you'll encounter procedural code that
iterates over things but doesn't return any results (CL's `maphash` and
Alexandria's `map-permutations` are two examples).  The `gathering` macro
provides an easy way to plug into the guts of the iteration and get a list back.
The docstring describes how to use the macro, but there's a couple of extra
things to note before we move on.

### Default LET Bindings

I'm aware that the `(let ((,result nil)) ...)` in the macro could be simplified
to `(let (,result) ...)` because bindings without an initial value default to
`nil`.  I just personally dislike that style and prefer to be explicit about the
initial values, even when they're `nil`.

### Hygiene

The `flet`ed function that actually does the gathering is named as the symbol
`gather`, not a gensym.  Some folks will dislike that because it captures the
function name.  I don't mind it.  I consider that behavior part of the
intended/documented API of the macro, and because the symbol `gather` is coming
from the macro's package you'll need to import it (or package qualify it) to
access it.  

If you want to provide a bit more safety here you could potentially define
a vanilla function for `gather` like this:

```lisp
(defun gather (value)
  "Gather `value` into the result.  Must be called from inside `gathering`."
  (error "GATHER be called from within a GATHERING macro."))
```

Doing this would mean calling `gather` outside a `gathering` block would signal
an error, and someone defining their *own* `gather` function (which could
accidentally get shadowed by the macro's `flet` later) would get a warning about
redefining the function, which would hopefully make them realize the potential
conflict earlier.

Of course you could also write your own version of `gathering` that takes in
a symbol to use for the function, if you prefer.  That would avoid all of these
issues, at the cost of making every `(gathering ...)` slightly more verbose.

### Data Structures

My actual implementation of this macro uses a simple queue data structure
instead of a list to avoid having to reverse the result at the end, but I didn't
want to include an extra dependency just for this blog post.

### Dynamic Extent

Feel free to skip this section.  It gets a little bit into the weeds.

One potential optimization we could make is to declare the `gather` function to
have [dynamic extent][].  This means that some implementations (e.g. SBCL) [can
stack allocate the closure][sbcl dynamic extent] and save a heap allocation.

```lisp
(defmacro gathering-dynamic-extent (&body body)
  (alexandria:with-gensyms (result)
    `(let ((,result nil))
       (flet ((gather (item)
                (push item ,result)
                item))
          (declare (dynamic-extent #'gather)) ; NEW
         ,@body)
       (nreverse ,result))))
```

Whether this matters much depends on the usage patterns.  If you use `gathering`
with some code that typically gathers no elements (and only occasionally gathers
a few), you won't have to heap allocate *anything* for the common cases, and
things will be a little more efficient.  If you typically gather a bunch of
things then you're heap allocating all the cons cells anyway, so allocating the
closure wouldn't be a big deal.

If you want to actually see the stack allocation in action, it might be a little
trickier than you think.  SBCL is pretty smart about inlining things, so it
might not allocate the closure at runtime *at all*, even *without* the
`dynamic-extent` declaration:

```lisp
(disassemble (lambda ()
               (gathering
                 (mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 149 bytes. Origin: #x10040B8340
; 40:       498B4C2460       MOV RCX, [R12+96]      ; no-arg-parsing entry point
                                                    ; thread.binding-stack-pointer
; 45:       48894DF8         MOV [RBP-8], RCX
; 49:       BB17001020       MOV EBX, #x20100017    ; NIL
; 4E:       488B0D9BFFFFFF   MOV RCX, [RIP-101]     ; '(1 2 3)
; 55:       EB4D             JMP L3
; 57:       660F1F840000000000 NOP
; 60: L0:   4881FA17001020   CMP RDX, #x20100017    ; NIL
; 67:       744A             JEQ L4
; 69:       488B71F9         MOV RSI, [RCX-7]
; 6D:       49896C2440       MOV [R12+64], RBP      ; thread.pseudo-atomic-bits
; 72:       4D8B5C2418       MOV R11, [R12+24]      ; thread.alloc-region
; 77:       498D5310         LEA RDX, [R11+16]
; 7B:       493B542420       CMP RDX, [R12+32]
; 80:       7746             JNBE L5
; 82:       4989542418       MOV [R12+24], RDX      ; thread.alloc-region
; 87: L1:   498D5307         LEA RDX, [R11+7]
; 8B:       49316C2440       XOR [R12+64], RBP      ; thread.pseudo-atomic-bits
; 90:       7403             JEQ L2
; 92:       0F0B09           BREAK 9                ; pending interrupt trap
; 95: L2:   488972F9         MOV [RDX-7], RSI
; 99:       48895A01         MOV [RDX+1], RBX
; 9D:       488BDA           MOV RBX, RDX
; A0:       488B4901         MOV RCX, [RCX+1]
; A4: L3:   488BD1           MOV RDX, RCX
; A7:       8D41F9           LEA EAX, [RCX-7]
; AA:       A80F             TEST AL, 15
; AC:       74B2             JEQ L0
; AE:       0F0B0A           BREAK 10               ; error trap
; B1:       2F               BYTE #X2F              ; OBJECT-NOT-LIST-ERROR
; B2:       10               BYTE #X10              ; RDX
; B3: L4:   488BD3           MOV RDX, RBX
; B6:       B902000000       MOV ECX, 2
; BB:       FF7508           PUSH QWORD PTR [RBP+8]
; BE:       B8B8733120       MOV EAX, #x203173B8    ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; C3:       FFE0             JMP RAX
; C5:       0F0B10           BREAK 16               ; Invalid argument count trap
; C8: L5:   6A10             PUSH 16
; CA:       41BB3000B021     MOV R11D, #x21B00030   ; ALLOC-TO-R11
; D0:       41FFD3           CALL R11
; D3:       EBB2             JMP L1
NIL
```

Here SBCL [knows what `mapc` is][package lock] and expands it inline (notice how
there's no function call to `mapc`).  It then realizes it never needs to
allocate a closure for `gather` at all, it can just inline that too!  All the
allocation stuff in that disassembly is for the `push`ing of new cons cells (to
convince yourself of this, run `(disassemble (lambda (&aux result) (push
1 result) result))` and compare the assembly).

But if we make our own version of `mapc`:

```lisp
(defun my-mapc (function list)
  (mapc function list))
```

Now we can see a difference, because SBCL won't necessarily know it can
stack allocate the closure unless we tell it.  Here's what the disassembly
looks like *without* the `dynamic-extent` inside `gathering`:

```lisp
(disassemble (lambda ()
               (gathering
                 (my-mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 212 bytes. Origin: #x1004331A20
; 20:       498B4C2460       MOV RCX, [R12+96]         ; no-arg-parsing entry point
                                                       ; thread.binding-stack-pointer
; 25:       48894DF8         MOV [RBP-8], RCX
; 29:       B817001020       MOV EAX, #x20100017       ; NIL
; 2E:       49896C2440       MOV [R12+64], RBP         ; thread.pseudo-atomic-bits
; 33:       4D8B5C2418       MOV R11, [R12+24]         ; thread.alloc-region
; 38:       498D5B10         LEA RBX, [R11+16]
; 3C:       493B5C2420       CMP RBX, [R12+32]
; 41:       0F874D010000     JNBE #x1004331B94
; 47:       49895C2418       MOV [R12+24], RBX         ; thread.alloc-region
; 4C:       498D5B0F         LEA RBX, [R11+15]
; 50:       66C743F14101     MOV WORD PTR [RBX-15], 321
; 56:       488943F9         MOV [RBX-7], RAX
; 5A:       49316C2440       XOR [R12+64], RBP         ; thread.pseudo-atomic-bits
; 5F:       7403             JEQ L0
; 61:       0F0B09           BREAK 9                   ; pending interrupt trap
; 64: L0:   48895DE8         MOV [RBP-24], RBX
; 68:       49896C2440       MOV [R12+64], RBP         ; thread.pseudo-atomic-bits
; 6D:       4D8B5C2418       MOV R11, [R12+24]         ; thread.alloc-region
; 72:       498D4B20         LEA RCX, [R11+32]
; 76:       493B4C2420       CMP RCX, [R12+32]
; 7B:       0F8723010000     JNBE #x1004331BA4
; 81:       49894C2418       MOV [R12+24], RCX         ; thread.alloc-region
; 86:       498D4B0B         LEA RCX, [R11+11]
; 8A:       B835020000       MOV EAX, 565
; 8F:       480B042508011020 OR RAX, [#x20100108]      ; SB-VM:FUNCTION-LAYOUT
; 97:       488941F5         MOV [RCX-11], RAX
; 9B:       488D058E000000   LEA RAX, [RIP+142]        ; = #x1004331B30
; A2:       488941FD         MOV [RCX-3], RAX
; A6:       49316C2440       XOR [R12+64], RBP         ; thread.pseudo-atomic-bits
; AB:       7403             JEQ L1
; AD:       0F0B09           BREAK 9                   ; pending interrupt trap
; B0: L1:   48895905         MOV [RCX+5], RBX
; B4:       488BD1           MOV RDX, RCX
; B7:       488D4424F0       LEA RAX, [RSP-16]
; BC:       4883EC10         SUB RSP, 16
; C0:       488B3DE9FEFFFF   MOV RDI, [RIP-279]        ; '(1 2 3)
; C7:       B904000000       MOV ECX, 4
; CC:       488928           MOV [RAX], RBP
; CF:       488BE8           MOV RBP, RAX
; D2:       B838734F20       MOV EAX, #x204F7338       ; #<FDEFN MY-MAPC>
; D7:       FFD0             CALL RAX
; D9:       480F42E3         CMOVB RSP, RBX
; DD:       488B5DE8         MOV RBX, [RBP-24]
; E1:       488B53F9         MOV RDX, [RBX-7]
; E5:       B902000000       MOV ECX, 2
; EA:       FF7508           PUSH QWORD PTR [RBP+8]
; ED:       B8B8733120       MOV EAX, #x203173B8       ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; F2:       FFE0             JMP RAX
NIL
```

Now we can see it's allocating the closure on the heap (note the
`SB-VM:FUNCTION-LAYOUT`).

If we add the `dynamic-extent` back into `gathering`:

```lisp
(disassemble (lambda ()
               (gathering-dynamic-extent
                 (my-mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 136 bytes. Origin: #x10043F46E0
; 6E0:       498B4C2460       MOV RCX, [R12+96]      ; no-arg-parsing entry point
                                                     ; thread.binding-stack-pointer
; 6E5:       48894DF8         MOV [RBP-8], RCX
; 6E9:       48C745E817001020 MOV QWORD PTR [RBP-24], #x20100017  ; NIL
; 6F1:       488BDC           MOV RBX, RSP
; 6F4:       48895DE0         MOV [RBP-32], RBX
; 6F8:       4883EC20         SUB RSP, 32
; 6FC:       4883E4F0         AND RSP, -16
; 700:       488D4C240B       LEA RCX, [RSP+11]
; 705:       B835020000       MOV EAX, 565
; 70A:       480B042508011020 OR RAX, [#x20100108]   ; SB-VM:FUNCTION-LAYOUT
; 712:       488941F5         MOV [RCX-11], RAX
; 716:       488D0583000000   LEA RAX, [RIP+131]     ; = #x10043F47A0
; 71D:       488941FD         MOV [RCX-3], RAX
; 721:       48896905         MOV [RCX+5], RBP
; 725:       488BD1           MOV RDX, RCX
; 728:       488D4424F0       LEA RAX, [RSP-16]
; 72D:       4883EC10         SUB RSP, 16
; 731:       488B3D38FFFFFF   MOV RDI, [RIP-200]     ; '(1 2 3)
; 738:       B904000000       MOV ECX, 4
; 73D:       488928           MOV [RAX], RBP
; 740:       488BE8           MOV RBP, RAX
; 743:       B838734F20       MOV EAX, #x204F7338    ; #<FDEFN MY-MAPC>
; 748:       FFD0             CALL RAX
; 74A:       480F42E3         CMOVB RSP, RBX
; 74E:       488B5DE0         MOV RBX, [RBP-32]
; 752:       488BE3           MOV RSP, RBX
; 755:       488B55E8         MOV RDX, [RBP-24]
; 759:       B902000000       MOV ECX, 2
; 75E:       FF7508           PUSH QWORD PTR [RBP+8]
; 761:       B8B8733120       MOV EAX, #x203173B8    ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; 766:       FFE0             JMP RAX
NIL
```

Much nicer.  However, this optimization comes with a price: safety.

The `gather` closure should never be called outside of the `gathering` block
that defines it.  As the docstring says: it would be useless to do so anyway,
because the result has already been returned.  But what would happen if the user
accidentally calls the closure?  Let's try it out, first on the original
version without the `dynamic-extent` declaration:

```lisp
(defparameter *f* nil)

(defun leak (function)
  (setf *f* function))

(gathering (leak #'gather))

(funcall *f* 1)

; => 1
```

Here the closure is heap-allocated, so calling it later is fine (if useless).
But what happens if we try our optimized version?

```lisp
(gathering-dynamic-extent (leak #'gather))

(funcall *f*)

; CORRUPTION WARNING in SBCL pid 19575(tid 0x7fffb6814380):
; Memory fault at 0xffffffff849ff8e7 (pc=0x19ff8df, sp=0x19ff890)
; The integrity of this image is possibly compromised.
; Continuing with fingers crossed.
; 
; debugger invoked on a SB-SYS:MEMORY-FAULT-ERROR in thread
; #<THREAD "main thread" RUNNING {1001936BF3}>:
;   Unhandled memory fault at #xFFFFFFFF849FF8E7.
; 
; restarts (invokable by number or by possibly-abbreviated name):
;   0: [ABORT] Exit debugger, returning to top level.
; 
; (SB-SYS:MEMORY-FAULT-ERROR)
; 0]
```

Things get even worse if you're brave (foolish) enough to be running with
`(declaim (safety 0))`:

```lisp
(gathering-dynamic-extent (leak #'gather))

(funcall *f*)

; debugger invoked on a SB-KERNEL:FLOATING-POINT-EXCEPTION in thread
; #<THREAD "main thread" RUNNING {1001936BF3}>:
;   An arithmetic error SB-KERNEL:FLOATING-POINT-EXCEPTION was signalled.
; No traps are enabled? How can this be?
; 
; 
; restarts (invokable by number or by possibly-abbreviated name):
;   0: [ABORT] Exit debugger, returning to top level.
; 
; ("bogus stack frame")
; 0]
```

The moral of this story is that although we can optimize for a little bit of
speed, it comes at a price that might not be worth paying.

Here's an exercise for you: make the original heap-allocated version signal an
error (with a nice error message) when called outside of its `gathering` block,
instead of silently doing something useless.

[dynamic extent]: http://clhs.lisp.se/Body/d_dynami.htm
[sbcl dynamic extent]: http://www.sbcl.org/manual/#Dynamic_002dextent-allocation
[package lock]: http://www.lispworks.com/documentation/lw50/CLHS/Body/11_abab.htm

## Examples

Let's finish up by looking at some places where I've found this macro to be
handy.  I won't go into too much depth about the individual pieces of code, but
feel free to ask if you have questions.

### Pandigitals

First up, an example from my Project Euler `utils.lisp` file:

```lisp
(defun pandigitals (&optional (start 1) (end 9))
  "Return a list of all `start` to `end` (inclusive) pandigital numbers."
  (gathering
    (map-permutations
      (lambda (digits)
          ;; 0-to-n pandigitals are annoying because we don't want
          ;; to include those with a 0 first.
          (unless (zerop (first digits))
            (gather (digits-to-number digits))))
      (irange start end)
      :copy nil)))
```

This is a prime example of where Alexandria's `map-permutations` not returning
anything is annoying, but also shows how `gathering` provides you a little more
flexibility.  Even if `map-permutations` returned a list of results, we'd still
need to filter out the ones that start with zero.  With `gathering` we can avoid
collecting the unneeded results at all by simply not calling `gather` on them.

### Hash Table Contents

Next is an easy way to convert a hash table to lists of `(key value)`:

```lisp
(defun hash-table-contents (hash-table)
  "Return a fresh list of `(key value)` elements of `hash-table`."
  (gathering (maphash (compose #'gather #'list) hash-table)))
```

Because `gather` is a `flet`ed function we can use it like any other function.
Here we `compose` it with `list` and pass it off to `maphash` (which, for some
reason, doesn't return its results).

### Triangle Generation

We'll end with a triangle-generation function from [my procedural art
bot](https://twitter.com/bit_loom/):

```lisp
(defun generate-universe-balancing (depth)
  (gathering
    (labels ((should-stop-p (iteration)
               (or (= depth iteration)
                   (and (> iteration 6)
                        (randomp (map-range 0 depth
                                            0.0 0.05
                                            iteration)
                                 #'rand))))
             (recur (triangle &optional (iteration 0))
               (if (should-stop-p iteration)
                 (gather triangle)
                 (map nil (rcurry #'recur (1+ iteration))
                      (split-triangle-self-balancing triangle)))))
      (map nil #'recur (initial-triangles)))))
```

This is used to generate the triangles for images like this:

[![Recursive triangles](/static/images/blog/2018/05/triangles.jpeg)](/static/images/blog/2018/05/triangles.jpeg)

Using `gathering` lets me write the recursive generation algorithm in a natural
way, and just plug in a simple `(gather triangle)` when we finally bottom out at
the base case.