content/blog/2016/08/playing-with-syntax.html @ c467a9d32cdd

Add playing with syntax entry
author Steve Losh <steve@stevelosh.com>
date Mon, 15 Aug 2016 20:55:05 +0000
parents (none)
children db148e22d042
    {% extends "_post.html" %}

    {% hyde
        title: "Playing With Syntax"
        snip: "Lisp lets you evolve your language."
        created: 2016-08-16 13:45:00
    %}

    {% block article %}

One of the things I love about Lisp is that it gives you the ability to change
and mold the syntax of the language to what you need.  In this post I want to
look at the evolution of a little macro I've been playing around with for
a while now.

Mutation is a fundamental concept in most programming languages.  Functional
programming may be beautiful, but mutation is still useful (and fast).  In most
languages assignment is done with an `=` or `:=` operator:

    :::lisp
    x = 10

In Common Lisp the operator is named `setf` instead of `=` for historical
reasons, and of course it's used in prefix notation:

    :::lisp
    (setf x 10)

Aside from the prefix ordering, Common Lisp's syntax is already a bit more
elegant because you can set arbitrarily many variables without repeating the
assignment operator over and over again:

    :::lisp
    ; Have to type out `=` three times
    x = 10
    y = 20
    z = 30

    ; But only one `setf` required
    (setf x 10
          y 20
          z 30)

`setf` assigns values one-by-one, in-order.  Common Lisp also has `psetf` which
is "parallel `setf`".  This evaluates all the values first, *then* sets all the
variables at once:

    :::lisp
    (let ((a 10) (b 20))
      (setf a 50
            b (+ 1 a))   ; => (50 51)

      (psetf a 90
             b (+ 1 a))) ; => (90 52)

Assignment is nice, but often you want to change the value of a variable by
taking its *current* value and computing something with it.  If you're making
a ball that travels across the screen you might have something like this:

    :::text
    def move_ball(ball, distance):
        ball.x = ball.x + distance

    (defun move-ball (ball distance)
      (setf (slot-value ball 'x)
            (+ (slot-value ball 'x)
                distance)))

If you find `slot-value` too wordy to type, you could make a macro to get rid of
it (and save yourself a quote too).  `.` is already used for something else so
you'd need to pick a different character:

    :::lisp
    (defmacro $ (instance slot-symbol)
      `(slot-value ,instance ',slot-symbol))

    (defun move-ball (ball distance)
      (setf ($ ball x)
            (+ ($ ball x)
                distance)))

In practice, though, you rarely use `slot-value`.  Instead you usually use
accessor functions:

    :::lisp
    (defun move-ball (ball distance)
      (setf (ball-x ball)
            (+ (ball-x ball)
                distance)))

Anyway, back to assignment.  What we wrote above *works*, but it's a bit
annoying to have to type the "place" that we're working with twice.  To cut down
on this repetition many languages offer operators like `+=` and `++`:

    :::text
    x = x + 10

    x += 10

    y = y + 1

    y++

Common Lisp's version of these is called `incf`.

    :::lisp
    (incf x 10)
    (incf y)

Notice how the s-expression syntax means we can use the same operator for both
of the forms, instead of needing separate `+=` and `++` operators.

Other languages often have similar "in-place" operators for other numeric
operations like `-=`, `--`, `*=`, and so on.  Common Lisp has `decf` for
subtraction, but doesn't have versions for multiplication or division built in.
But we can add them with a few macros:

    :::lisp
    (defmacro mulf (place factor)
      `(setf ,place (* ,place ,factor)))

    (defmacro divf (place divisor)
      `(setf ,place (/ ,place ,divisor)))

Unfortunately these are not quite right -- we've committed the cardinal
macro-writing sin of splicing in the `place` expression multiple times.  If
`place` has side effects they'll happen more times than expected.

Luckily, Common Lisp has a macro that wraps up the boring process of handling
this for us.  `define-modify-macro` lets us fix the problem:

    :::lisp
    (define-modify-macro mulf (factor) *)
    (define-modify-macro divf (divisor) /)

Now we've got versions of `*=` and `/=` in Common Lisp.  While we're here we
should do things *right* and allow `divf` to be used with just a place, which
will mean "set `place` to `1/place`":

    :::lisp
    (define-modify-macro divf (&optional divisor)
      (lambda (value divisor)
        (if (null divisor)
          (/ 1 value)
          (/ value divisor))))

    (let ((x 10))
      (divf x 2) ; => 5
      (divf x)   ; => 1/5
      (divf x))  ; => 5

Note that we don't really need the `1` in `(/ 1 value)`, because Common Lisp's
`/` works the same way when you call it with a single argument.  I'm not sure
what behavior would make sense for a unary `(mulf place)`, so let's ignore that
one and move on.

So far we've been talking about the idea of setting a variable to the result of
computing some function on its current value.  We used `define-modify-macro` to
define some support for extra functions, but defining a mutator macro for every
function we might possibly want to use could get tedious.  What if we generalize
this a bit more?

    :::lisp
    (define-modify-macro callf (function)
      (lambda (value function)
        (funcall function value)))

    (let ((x -1))
      (callf x #'abs)   ; => 1
      (callf x #'1+)    ; => 2
      (callf x #'sin)   ; => 0.9092974
      (callf x #'round) ; => 1
      (callf x #'-))    ; => -1

Now we've got something a bit higher-level.  With `callf` we can mutate
a variable with a unary function.  But what about functions that need more
arguments?  Again, Common Lisp does the right thing and handles `&rest`
parameters correctly in `define-modify-macro`:

    :::lisp
    (define-modify-macro callf (function &rest args)
      (lambda (value function &rest args)
        (apply function value args)))

    (let ((x -1))
      (callf x #'abs)    ; => 1
      (callf x #'+ 10)   ; => 11
      (callf x #'* 2 4)) ; => 88

    (let ((foo (list 0 1 2 3)))
      (callf foo #'reverse)
      ; => (3 2 1 0)

      (callf foo #'append (list -1 -2 -3)))
      ; => (3 2 1 0 -1 -2 -3)

That's better.  Now we can mutate a variable by applying a function to its
current value and some other arguments.  This might come in handy when we don't
want to bother defining a full-blown modification macro just to use it once or
twice.

At this point we've got something similar to Michael Malis' [zap][] macro.
A minor difference is for `zap` the order of the function and the place are
reversed.  I believe his reason was that it makes the resulting form look more
like the function call that is conceptually being made:

    :::lisp
    (zap     #'+  x  5)
    ;          ↓  ↓  ↓
    (setf x (  +  x  5  ))

Unfortunately, because the place is no longer the first argument we can't use
`define-modify-macro` to handle the single-evaluation boilerplate for us any
more.  Instead we need to use `get-setf-expander` and work with its results.
Malis' excellent [Getting Places][] article explains how that works, and if you
want to fully understand the rest of the code in this post you should read that
one first.

The difference between `callf` and Malis' `zap` are just cosmetic.  But they're
still not completely flexible.  What if we want to call a function, but we need
the value to be an argument other than the first?  Of course we could use
a `lambda`:

    :::lisp
    (let ((x (list :coffee :soda :tea)))
      (callf x (lambda (l) (remove :soda l))))


But that is *really* ugly.  It would be nice if we had a way to specify which
argument to the function should be the current value.  This is Lisp, so we can
pick whatever syntax we want.  What should we choose?  Often a good first step
is to look at how other languages do things.  Clojure's anonymous function
syntax is one possibility:

    :::lisp
    (map #(+ 10 %) [1 2 3])
    ; => (11 12 13)

Clojure uses the magic symbol `%` as a placeholder for the argument.  We can
modify Malis' `zap` macro to do the same.  We'll call it `zapf`, and we'll swap
the place back into the first argument like all the other modify macros, since
the aesthetic reason no longer holds.  The changes are in bold:

<div class="codehilite"><pre>
; Original version
(defmacro zap (fn place &rest args)
  (multiple-value-bind
        (temps exprs stores store-expr access-expr)
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores)
              (funcall ,fn ,access-expr ,@args)))
       ,store-expr)))

; New version
(defmacro <b>zapf</b> (<b>place fn</b> &rest args)
  (multiple-value-bind
        (temps exprs stores store-expr access-expr)
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores)
             (funcall ,fn <b>,@(substitute access-expr '% args)</b>)))
      ,store-expr)))
</pre></div>

And now we can use `%` to say where the place's value should go:

    :::lisp
    (let ((x (list :coffee :soda :tea)))
      (zapf x #'remove :soda %)
      ; => (:coffee :tea)

      (zapf x #'cons :water %)
      ; => (:water :coffee :tea)

      (zapf x #'append % (list :cocoa :chai)))
      ; => (:water :coffee :tea :cocoa :chai)

Note that `substitute` will replace *all* occurrences of `%` with the access
expression, so we can do something like `(zapf x #'append % %)` if we want.

Some people will find this new version unpleasant, because it effectively
captures the `%` variable.  It's not hygienic, by design.  I personally don't
mind it, and I like the brevity it allows.

We're almost done, but I want to push things just a *bit* further.  Let's
revisit the original example of the ball moving across the screen:

    :::text
    def move_ball(ball, distance):
        ball.x = ball.x + distance

    (defun move-ball (ball distance)
      (setf (ball-x ball)
            (+ (ball-x ball) distance)))

Of course we can simply use `+=` or `incf` here:

    :::text
    def move_ball(ball, distance):
        ball.x += distance

    (defun move-ball (ball distance)
      (incf (ball-x ball) distance))

Suppose we also wanted to make the ball wrap around the screen when it flies off
one side.  We can do this using modulus:

    :::text
    def move_ball(ball, distance, screen_width):
        ball.x += distance
        ball.x %= screen_width

    (defun move-ball (ball distance screen-width)
      (incf (ball-x ball) distance)
      (callf (ball-x ball) #'mod % screen-width))

We could define `modf` if we wanted to make that second call a bit nicer.  But
let's take a moment to reflect.  Notice how we're back to having to type out
`(ball-x ball)` twice again.  It's better than typing it four times, but can we
do better?

`zapf` currently takes a place, a function, and a list of arguments.  What if we
made it more like `setf`, and just have it take a place and an expression?  We
could still use `%` as a placeholder, but could let it appear anywhere in the
(possibly nested) expression form.

One way to do this would be to simply replace `substitute` with `subst` in the
`zapf` code.  (The difference between these extremely-confusingly-named
functions is that `substitute` replaces element of a sequence, and `subst`
replaces elements of a tree.)

This, however, is the wrong strategy.  Blindly replacing `%` everywhere in the
expression will work for many cases, but will break in edge cases like (god help
you) nested `zapf` forms.  We could try to walk the code of the expression we
get, but down this path lies madness and implementation-specificity.

The solution is much, much simpler.  We can just use a normal `let` to capture
`%` in the expression:

<div class="codehilite"><pre>
; Final version
(defmacro <b>zapf</b> (<b>place expr</b>)
  (multiple-value-bind
        (temps exprs stores store-expr access-expr)
      (get-setf-expansion place)
    `(let* (,@(mapcar #'list temps exprs)
            (,(car stores)
             <b>(let ((% ,access-expr))
               ,expr)</b>))
      ,store-expr)))
</pre></div>

And now `move-ball` only requires the bare minimum of typing:

    :::lisp
    (defun move-ball (ball distance screen-width)
      (zapf (ball-x ball)
            (mod (+ distance %) screen-width)))

For completeness it would be nice to allow `zapf` to take any number of
place/value pairs, just like `setf`.  I'll leave this as an exercise for you to
try if you're interested.

It took me a while to arrive at this form of the macro.  I personally find it
lovely and readable.  If you disagree, that's fine too!  Lisp is a wonderful
substrate for building a language that fits how you think and talk about
problems.  Play around with your own syntax and find out what feels most natural
for you.

Before we leave, let's just ponder one more thing.  Performance is often ignored
these days, but what if we care about making the computer go fast?  The final
`zapf` macro is handy and expressive, but what cost have we paid for this
abstraction?

First let's tell SBCL that we want to go fast and return to our original `setf`
strategy:

    :::lisp
    (deftype nonnegative-fixnum ()
      `(integer 0 ,most-positive-fixnum))

    (deftype positive-fixnum ()
      `(integer 1 ,most-positive-fixnum))

    (defstruct ball
      (x 0 :type nonnegative-fixnum)
      (y 0 :type nonnegative-fixnum))

    (declaim (ftype (function (ball fixnum positive-fixnum))
                    move-ball))

    (defun move-ball (ball distance screen-width)
      (declare (optimize (speed 3) (safety 0) (debug 0)))
      (setf (ball-x ball)
            (mod (+ distance (ball-x ball))
                 screen-width)))

How does that turn out?

    :::text
    ; disassembly for MOVE-BALL
    ; Size: 82 bytes. Origin: #x1005E5E12E
    ; 2E:       488B410D         MOV RAX, [RCX+13]       ; no-arg-parsing entry point
    ; 32:       48D1F8           SAR RAX, 1
    ; 35:       498D3C00         LEA RDI, [R8+RAX]
    ; 39:       488BDE           MOV RBX, RSI
    ; 3C:       48D1FB           SAR RBX, 1
    ; 3F:       4885DB           TEST RBX, RBX
    ; 42:       7431             JEQ L3
    ; 44:       488BC7           MOV RAX, RDI
    ; 47:       4899             CQO
    ; 49:       48F7FB           IDIV RAX, RBX
    ; 4C:       4C8BC0           MOV R8, RAX
    ; 4F:       488BC2           MOV RAX, RDX
    ; 52:       48D1E0           SHL RAX, 1
    ; 55:       4885C0           TEST RAX, RAX
    ; 58:       7510             JNE L2
    ; 5A: L0:   488BD8           MOV RBX, RAX
    ; 5D: L1:   4889590D         MOV [RCX+13], RBX
    ; 61:       488BD3           MOV RDX, RBX
    ; 64:       488BE5           MOV RSP, RBP
    ; 67:       F8               CLC
    ; 68:       5D               POP RBP
    ; 69:       C3               RET
    ; 6A: L2:   4885FF           TEST RDI, RDI
    ; 6D:       7DEB             JNL L0
    ; 6F:       488D1C30         LEA RBX, [RAX+RSI]
    ; 73:       EBE8             JMP L1
    ; 75: L3:   0F0B0A           BREAK 10                ; error trap
    ; 78:       07               BYTE #X07
    ; 79:       08               BYTE #X08               ; DIVISION-BY-ZERO-ERROR
    ; 7A:       FE9E03           BYTE #XFE, #X9E, #X03   ; RDI
    ; 7D:       FE9E01           BYTE #XFE, #X9E, #X01   ; RBX

Well that's not *too* bad.  I'm not sure why SBCL still performs a check for
division by zero for the `mod` even though we said that `screen-width` can't
possibly be zero.  But hey, we're at a manageable amount of assembly, which is
pretty nice for a high-level language!

Now for the `zapf` version:

    :::lisp
    (defun move-ball (ball distance screen-width)
      (declare (optimize (speed 3) (safety 0) (debug 0)))
      (zapf (ball-x ball)
            (mod (+ distance %) screen-width)))

Okay, let's bite the bullet.  How bad are we talking?

    :::text
    ; disassembly for MOVE-BALL
    ; Size: 70 bytes. Origin: #x1005F47C7B
    ; 7B:       488B410D         MOV RAX, [RCX+13]       ; no-arg-parsing entry point
    ; 7F:       48D1F8           SAR RAX, 1
    ; 82:       488D3C03         LEA RDI, [RBX+RAX]
    ; 86:       4885F6           TEST RSI, RSI
    ; 89:       742B             JEQ L2
    ; 8B:       488BC7           MOV RAX, RDI
    ; 8E:       4899             CQO
    ; 90:       48F7FE           IDIV RAX, RSI
    ; 93:       488BD8           MOV RBX, RAX
    ; 96:       488D0412         LEA RAX, [RDX+RDX]
    ; 9A:       4885C0           TEST RAX, RAX
    ; 9D:       750D             JNE L1
    ; 9F: L0:   48D1E2           SHL RDX, 1
    ; A2:       4889510D         MOV [RCX+13], RDX
    ; A6:       488BE5           MOV RSP, RBP
    ; A9:       F8               CLC
    ; AA:       5D               POP RBP
    ; AB:       C3               RET
    ; AC: L1:   4885FF           TEST RDI, RDI
    ; AF:       7DEE             JNL L0
    ; B1:       4801F2           ADD RDX, RSI
    ; B4:       EBE9             JMP L0
    ; B6: L2:   0F0B0A           BREAK 10                ; error trap
    ; B9:       07               BYTE #X07
    ; BA:       08               BYTE #X08               ; DIVISION-BY-ZERO-ERROR
    ; BB:       FE9E03           BYTE #XFE, #X9E, #X03   ; RDI
    ; BE:       FE1E03           BYTE #XFE, #X1E, #X03   ; RSI

Wait, it's actually *shorter*?  What about all those extra variables the `zapf`
macro expands into?  It turns out that SBCL is really quite good at optimizing
`let` statements and local variables.

This is why I love Common Lisp.  You can be rewriting the syntax of the language
and working on a tower of abstraction one moment, and looking at X86 assembly to
see what the hell the computer is actually *doing* the next.  It's wonderful.

[Getting Places]: http://malisper.me/2015/09/22/getting-places/
[Zap]: http://malisper.me/2015/09/29/zap/

    {% endblock article %}