content/blog/2018/05/fun-with-macros-gathering.markdown @ 42973f4a80ce

Add `gathering` entry
author Steve Losh <steve@stevelosh.com>
date Sun, 20 May 2018 18:51:20 -0400
parents (none)
children e32eb216565e
+++
title = "Fun with Macros: Gathering"
snip = "Part 1 in a series of small posts about fun Common Lisp Macros"
date = 2018-05-21T16:40:00Z
draft = true

+++

I haven't written anything in a while.  But after seeing the [metaprogramming
video on Computerphile](https://www.youtube.com/watch?v=dw-y3vNDRWk) the other
day I felt the urge to write about Lisp again, so I decided to do a small series
of posts on fun/useful little Common Lisp macros I've made over the past couple
of years.

<div id="toc"></div>

## The Macro

The first macro we'll look at is `gathering`:

```lisp
(defmacro gathering (&body body)
  "Run `body` to gather some things and return a fresh list of them.

  `body` will be executed with the symbol `gather` bound to a
  function of one argument.  Once `body` has finished, a list of
  everything `gather` was called on will be returned.

  It's handy for pulling results out of code that executes
  procedurally and doesn't return anything, like `maphash` or
  Alexandria's `map-permutations`.

  The `gather` function can be passed to other functions, but should
  not be retained once the `gathering` form has returned (it would
  be useless to do so anyway).

  Examples:

    (gathering
      (dotimes (i 5)
        (gather i))
    =>
    (0 1 2 3 4)

    (gathering
      (mapc #'gather '(1 2 3))
      (mapc #'gather '(a b)))
    =>
    (1 2 3 a b)

  "
  (alexandria:with-gensyms (result)
    `(let ((,result nil))
       (flet ((gather (item)
                (push item ,result)))
         (declare (dynamic-extent #'gather))
         ,@body)
       (nreverse ,result))))
```

## Notes

As the docstring mentions, sometimes you'll encounter procedural code that
iterates over things, but doesn't gather up and return any results (CL's
`maphash` and Alexandria's `map-permutations` are two examples).  The
`gathering` macro provides an easy way to plug into the guts of the iteration
and get a list back.

The docstring describes how to use the macro, but there's a couple of extra
things to note before we move on.

### Default LET Bindings

I'm aware that the `(let ((,result nil)) ...)` in the macro could be simplified
to `(let (,result) ...)` because bindings without an initial value default to
`nil`.  I just personally dislike that style and prefer to be explicit about the
initial values, even when they're `nil`.

### Hygiene

The `flet`ed function that actually does the gathering is named as the symbol
`gather`, not a gensym.  Some folks will dislike that because it captures the
function name.

I don't mind it.  I consider that behavior part of the intended/documented API
of the macro, and because the symbol `gather` is coming from the macro's package
you'll need to import it (or package qualify it) to access it.

If you want to provide a bit more safety here you could potentially define
a vanilla function for `gather` like this:

```lisp
(defun gather (value)
  "Gather `value` into the resulting list.  Must be called from inside `gathering`."
  (error "GATHER be called from within a GATHERING macro."))
```

Doing this would mean calling `gather` outside a `gathering` block would signal
an error, and someone defining their *own* `gather` function (which could
accidentally get shadowed by the macro's `flet` later) would get a warning about
redefining the function, which would hopefully make them realize the potential
conflict earlier.

Of course you could write your own version of `gathering` that takes in a symbol
to use for the function, if you prefer.

### Data Structures

My actual implementation of this macro uses a simple queue data structure
instead of a list to avoid having to reverse the result at the end, but I didn't
want to include an extra dependency just for this blog post.

### Dynamic Extent

Feel free to skip this section.  It gets a little bit into the weeds.

You might have noticed that the `gather` function is declared to have [dynamic
extent][].  This means that some implementations (e.g. SBCL) [can stack allocate
the closure][sbcl dynamic extent] and save a heap allocation.

Whether this matters much depends on the usage patterns.  If you use `gathering`
with some code that typically gathers no elements (and only occasionally gathers
a few), you won't have to heap allocate *anything* for the common cases, and
things will be a little more efficient.  If you typically gather a bunch of
things then you're heap allocating all the cons cells anyway, so allocating the
closure wouldn't be a big deal.

I decided to go ahead and enable the efficiency optimization.  It doesn't make
any sense to be calling the closure once `gathering` has returned, so why not?

If you want to actually see the stack allocation in action, it might be a little
trickier than you think.  SBCL is pretty smart about inlining things, so it
might not allocate the closure at runtime *at all*, even *without* the
`dynamic-extent` declaration:

```lisp
(disassemble (lambda ()
               (gathering
                 (mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 149 bytes. Origin: #x10040B8340
; 40:       498B4C2460       MOV RCX, [R12+96]      ; no-arg-parsing entry point
                                                    ; thread.binding-stack-pointer
; 45:       48894DF8         MOV [RBP-8], RCX
; 49:       BB17001020       MOV EBX, #x20100017    ; NIL
; 4E:       488B0D9BFFFFFF   MOV RCX, [RIP-101]     ; '(1 2 3)
; 55:       EB4D             JMP L3
; 57:       660F1F840000000000 NOP
; 60: L0:   4881FA17001020   CMP RDX, #x20100017    ; NIL
; 67:       744A             JEQ L4
; 69:       488B71F9         MOV RSI, [RCX-7]
; 6D:       49896C2440       MOV [R12+64], RBP      ; thread.pseudo-atomic-bits
; 72:       4D8B5C2418       MOV R11, [R12+24]      ; thread.alloc-region
; 77:       498D5310         LEA RDX, [R11+16]
; 7B:       493B542420       CMP RDX, [R12+32]
; 80:       7746             JNBE L5
; 82:       4989542418       MOV [R12+24], RDX      ; thread.alloc-region
; 87: L1:   498D5307         LEA RDX, [R11+7]
; 8B:       49316C2440       XOR [R12+64], RBP      ; thread.pseudo-atomic-bits
; 90:       7403             JEQ L2
; 92:       0F0B09           BREAK 9                ; pending interrupt trap
; 95: L2:   488972F9         MOV [RDX-7], RSI
; 99:       48895A01         MOV [RDX+1], RBX
; 9D:       488BDA           MOV RBX, RDX
; A0:       488B4901         MOV RCX, [RCX+1]
; A4: L3:   488BD1           MOV RDX, RCX
; A7:       8D41F9           LEA EAX, [RCX-7]
; AA:       A80F             TEST AL, 15
; AC:       74B2             JEQ L0
; AE:       0F0B0A           BREAK 10               ; error trap
; B1:       2F               BYTE #X2F              ; OBJECT-NOT-LIST-ERROR
; B2:       10               BYTE #X10              ; RDX
; B3: L4:   488BD3           MOV RDX, RBX
; B6:       B902000000       MOV ECX, 2
; BB:       FF7508           PUSH QWORD PTR [RBP+8]
; BE:       B8B8733120       MOV EAX, #x203173B8    ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; C3:       FFE0             JMP RAX
; C5:       0F0B10           BREAK 16               ; Invalid argument count trap
; C8: L5:   6A10             PUSH 16
; CA:       41BB3000B021     MOV R11D, #x21B00030   ; ALLOC-TO-R11
; D0:       41FFD3           CALL R11
; D3:       EBB2             JMP L1
NIL
```

Here SBCL [knows what `mapc` is][package lock] and expands it inline (notice how
there's no function call to `mapc`).  It then realizes it never needs to
allocate a closure for `gather` at all, it can just inline that too!  All the
allocation stuff in that disassembly is for the `push`ing of new cons cells (to
convince yourself of this, run `(disassemble (lambda (&aux result) (push
1 result) result))` and compare the assembly).

But if we make our own version of `mapc`:

```lisp
(defun my-mapc (function list)
  (mapc function list))
```

Now we can see a difference, because SBCL won't necessarily know it can
stack allocate the closure unless we tell it.  Here's what the disassembly
looks like *without* the `dynamic-extent` inside `gathering`:

```lisp
(disassemble (lambda ()
               (gathering
                 (my-mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 212 bytes. Origin: #x1004331A20
; 20:       498B4C2460       MOV RCX, [R12+96]         ; no-arg-parsing entry point
                                                       ; thread.binding-stack-pointer
; 25:       48894DF8         MOV [RBP-8], RCX
; 29:       B817001020       MOV EAX, #x20100017       ; NIL
; 2E:       49896C2440       MOV [R12+64], RBP         ; thread.pseudo-atomic-bits
; 33:       4D8B5C2418       MOV R11, [R12+24]         ; thread.alloc-region
; 38:       498D5B10         LEA RBX, [R11+16]
; 3C:       493B5C2420       CMP RBX, [R12+32]
; 41:       0F874D010000     JNBE #x1004331B94
; 47:       49895C2418       MOV [R12+24], RBX         ; thread.alloc-region
; 4C:       498D5B0F         LEA RBX, [R11+15]
; 50:       66C743F14101     MOV WORD PTR [RBX-15], 321
; 56:       488943F9         MOV [RBX-7], RAX
; 5A:       49316C2440       XOR [R12+64], RBP         ; thread.pseudo-atomic-bits
; 5F:       7403             JEQ L0
; 61:       0F0B09           BREAK 9                   ; pending interrupt trap
; 64: L0:   48895DE8         MOV [RBP-24], RBX
; 68:       49896C2440       MOV [R12+64], RBP         ; thread.pseudo-atomic-bits
; 6D:       4D8B5C2418       MOV R11, [R12+24]         ; thread.alloc-region
; 72:       498D4B20         LEA RCX, [R11+32]
; 76:       493B4C2420       CMP RCX, [R12+32]
; 7B:       0F8723010000     JNBE #x1004331BA4
; 81:       49894C2418       MOV [R12+24], RCX         ; thread.alloc-region
; 86:       498D4B0B         LEA RCX, [R11+11]
; 8A:       B835020000       MOV EAX, 565
; 8F:       480B042508011020 OR RAX, [#x20100108]      ; SB-VM:FUNCTION-LAYOUT
; 97:       488941F5         MOV [RCX-11], RAX
; 9B:       488D058E000000   LEA RAX, [RIP+142]        ; = #x1004331B30
; A2:       488941FD         MOV [RCX-3], RAX
; A6:       49316C2440       XOR [R12+64], RBP         ; thread.pseudo-atomic-bits
; AB:       7403             JEQ L1
; AD:       0F0B09           BREAK 9                   ; pending interrupt trap
; B0: L1:   48895905         MOV [RCX+5], RBX
; B4:       488BD1           MOV RDX, RCX
; B7:       488D4424F0       LEA RAX, [RSP-16]
; BC:       4883EC10         SUB RSP, 16
; C0:       488B3DE9FEFFFF   MOV RDI, [RIP-279]        ; '(1 2 3)
; C7:       B904000000       MOV ECX, 4
; CC:       488928           MOV [RAX], RBP
; CF:       488BE8           MOV RBP, RAX
; D2:       B838734F20       MOV EAX, #x204F7338       ; #<FDEFN MY-MAPC>
; D7:       FFD0             CALL RAX
; D9:       480F42E3         CMOVB RSP, RBX
; DD:       488B5DE8         MOV RBX, [RBP-24]
; E1:       488B53F9         MOV RDX, [RBX-7]
; E5:       B902000000       MOV ECX, 2
; EA:       FF7508           PUSH QWORD PTR [RBP+8]
; ED:       B8B8733120       MOV EAX, #x203173B8       ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; F2:       FFE0             JMP RAX
NIL
```

Now we can see it's allocating the closure on the heap (note the
`SB-VM:FUNCTION-LAYOUT`).

If we add the `dynamic-extent` back into `gathering`:

```lisp
(disassemble (lambda ()
               (gathering
                 (my-mapc #'gather '(1 2 3)))))

; disassembly for (LAMBDA ())
; Size: 136 bytes. Origin: #x10043F46E0
; 6E0:       498B4C2460       MOV RCX, [R12+96]      ; no-arg-parsing entry point
                                                     ; thread.binding-stack-pointer
; 6E5:       48894DF8         MOV [RBP-8], RCX
; 6E9:       48C745E817001020 MOV QWORD PTR [RBP-24], #x20100017  ; NIL
; 6F1:       488BDC           MOV RBX, RSP
; 6F4:       48895DE0         MOV [RBP-32], RBX
; 6F8:       4883EC20         SUB RSP, 32
; 6FC:       4883E4F0         AND RSP, -16
; 700:       488D4C240B       LEA RCX, [RSP+11]
; 705:       B835020000       MOV EAX, 565
; 70A:       480B042508011020 OR RAX, [#x20100108]   ; SB-VM:FUNCTION-LAYOUT
; 712:       488941F5         MOV [RCX-11], RAX
; 716:       488D0583000000   LEA RAX, [RIP+131]     ; = #x10043F47A0
; 71D:       488941FD         MOV [RCX-3], RAX
; 721:       48896905         MOV [RCX+5], RBP
; 725:       488BD1           MOV RDX, RCX
; 728:       488D4424F0       LEA RAX, [RSP-16]
; 72D:       4883EC10         SUB RSP, 16
; 731:       488B3D38FFFFFF   MOV RDI, [RIP-200]     ; '(1 2 3)
; 738:       B904000000       MOV ECX, 4
; 73D:       488928           MOV [RAX], RBP
; 740:       488BE8           MOV RBP, RAX
; 743:       B838734F20       MOV EAX, #x204F7338    ; #<FDEFN MY-MAPC>
; 748:       FFD0             CALL RAX
; 74A:       480F42E3         CMOVB RSP, RBX
; 74E:       488B5DE0         MOV RBX, [RBP-32]
; 752:       488BE3           MOV RSP, RBX
; 755:       488B55E8         MOV RDX, [RBP-24]
; 759:       B902000000       MOV ECX, 2
; 75E:       FF7508           PUSH QWORD PTR [RBP+8]
; 761:       B8B8733120       MOV EAX, #x203173B8    ; #<FDEFN SB-IMPL::LIST-NREVERSE>
; 766:       FFE0             JMP RAX
NIL
```

Much nicer.

[dynamic extent]: http://clhs.lisp.se/Body/d_dynami.htm
[sbcl dynamic extent]: http://www.sbcl.org/manual/#Dynamic_002dextent-allocation
[package lock]: http://www.lispworks.com/documentation/lw50/CLHS/Body/11_abab.htm

## Examples

Let's finish up by looking at some places where I've found this macro to be
handy.  I won't go into too much depth about the individual pieces of code, but
feel free to ask if you have questions.

### Pandigitals

First up, an example from my Project Euler `utils.lisp` file:

```lisp
(defun pandigitals (&optional (start 1) (end 9))
  "Return a list of all `start` to `end` (inclusive) pandigital numbers."
  (gathering
    (map-permutations
      (lambda (digits)
          ;; 0-to-n pandigitals are annoying because we don't want
          ;; to include those with a 0 first.
          (unless (zerop (first digits))
            (gather (digits-to-number digits))))
      (irange start end)
      :copy nil)))
```

This is a prime example of where Alexandria's `map-permutations` not returning
anything is annoying, but also shows how `gathering` provides you a little more
flexibility.  Even if `map-permutations` returned a list of results, we'd still
need to filter out the ones that start with zero.  With `gathering` we can avoid
collecting the unneeded results at all by simply not calling `gather` on them.

### Hash Table Contents

Next is an easy way to convert a hash table to lists of `(key value)`:

```lisp
(defun hash-table-contents (hash-table)
  "Return a fresh list of `(key value)` elements of `hash-table`."
  (gathering (maphash (compose #'gather #'list) hash-table)))
```

Because `gather` is a `flet`ed function we can use it like any other function.
Here we `compose` it with `list` and pass it off to `maphash` (which, for some
reason, doesn't return its results).

### Triangle Generation

We'll end with a triangle-generation function from [my procedural art
bot](https://twitter.com/bit_loom/):

```lisp
(defun generate-universe-balancing (depth)
  (gathering
    (labels ((should-stop-p (iteration)
               (or (= depth iteration)
                   (and (> iteration 6)
                        (randomp (map-range 0 depth
                                            0.0 0.05
                                            iteration)
                                 #'rand))))
             (recur (triangle &optional (iteration 0))
               (if (should-stop-p iteration)
                 (gather triangle)
                 (map nil (rcurry #'recur (1+ iteration))
                      (split-triangle-self-balancing triangle)))))
      (map nil #'recur (initial-triangles)))))
```

This is used to generate the triangles for images like this:

[![Recursive triangles](/media/images/blog/2018/05/triangles.jpeg)](/media/images/blog/2018/05/triangles.jpeg)

Using `gathering` lets me write the recursive generation algorithm in a natural
way, and just plug in a simple `(gather triangle)` when we finally bottom out at
the base case.