content/blog/2016/12/chip8-cpu.markdown @ 88f0e0d8b891

Typo
author Steve Losh <steve@stevelosh.com>
date Mon, 19 Dec 2016 13:21:27 -0500
parents 83558c9af4a9
children ce548c40237a
+++
title = "CHIP-8 in Common Lisp: The CPU"
snip = "Let's write an emulator."
date = 2016-12-19T17:45:00Z
draft = false

+++

A while back I decided to try to write a Game Boy emulator in Common Lisp based
on [this series of articles][imran].  I made some good progress but eventually
got bogged down because I was trying to learn a bunch of complex new things at
once:

* How to write an emulator
* How to use Qt with Common Lisp
* How the Game Boy works internally

Instead of dragging on, I decided to take a break and try something simpler:
a [CHIP-8][] emulator/interpreter.  The CHIP-8 is much simpler than the Game
Boy, which made it easier to experiment with the rest of the infrastructure.

In this post and a couple of future ones I'll walk through all of my CHIP-8
emulator implementation.  To give you a rough idea of the size of the project,
`cloc` reports:

* Basic emulator: 415 lines
* Debugging/disassembling infrastructure: 142 lines
* Screen GUI: 153 lines
* Graphical debugger: 295 lines

This first post will deal with emulating the CHIP-8's CPU.

The full emulator source is on [BitBucket][] and [GitHub][].

[imran]: http://imrannazar.com/GameBoy-Emulation-in-JavaScript
[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8
[BitBucket]: https://bitbucket.org/sjl/cl-chip8
[GitHub]: https://github.com/sjl/cl-chip8

<div id="toc"></div>

## Libraries

The emulator uses a few Common Lisp libraries to make things easier:

* [bordeaux-threads][] to handle threading.
* [cl-arrows][] for an implementation of Clojure's `-<>` threading macro.
* [cl-losh][] is my own personal utility library.
* [cl-portaudio][] for audio.
* [iterate][] for a much nicer version of `loop` called `iterate`.  My utility
  library contains several `iterate` drivers (some of which I've written about
  before).
* [qtools][] to handle creating a GUI with Qt.
* [quickutil][] for some miscellaneous utility functions from Alexandria and
  elsewhere.

I'll try to remember to mention whenever I use a function that's not built-in to
Common Lisp, but I might forget, in which case it's probably in one of these.

[bordeaux-threads]: https://common-lisp.net/project/bordeaux-threads/
[cl-arrows]: https://github.com/nightfly19/cl-arrows
[cl-losh]: https://github.com/sjl/cl-losh
[cl-portaudio]: https://filonenko-mikhail.github.io/cl-portaudio/
[iterate]: https://common-lisp.net/project/iterate/
[qtools]: https://shinmera.github.io/qtools/
[quickutil]: https://github.com/tarballs-are-good/quickutil

## CHIP-8 References

There's a good amount of information available about the CHIP-8 online.  The
references I used most often were:

* [Cowgod's "CHIP-8 Technical Reference"](http://devernay.free.fr/hacks/chip8/C8TECH10.HTM)
* [Laurence Muller's "How to write an emulator (CHIP-8 interpreter)"](http://www.multigesture.net/articles/how-to-write-an-emulator-CHIP-8-interpreter/)
* [Matthew Mikolay's "Mastering CHIP-8"](http://mattmik.com/files/chip8/mastering/chip8.html)
* [(Super)CHIP 8 Secrets](https://github.com/AfBu/haxe-CHIP-8-emulator/wiki/\(Super\)CHIP-8-Secrets)

## The Main Data Structure

Let's dive into the code.  We'll start with the main data structure that will
hold an instance of a CHIP-8 for emulation:


```lisp
(defstruct chip
  (running t :type boolean)
  ; ...more to come later
  )
```

We're using a Lisp struct instead of a CLOS class because we're going to be
accessing the fields of this thing *a lot* and CLOS accessors can be
comparatively slow.  This is one of the very few concessions we'll make to
performance.

The first field is `running`, which is just a boolean that represents whether
the emulator is currently running.

We'll eventually have multiple threads poking at this struct, and they'll
generally be doing something like `(loop :while (chip-running chip) :do ...)`.
That way when we want to quit the emulator we can just set `running` to `nil`
and everything will (eventually) stop.
### Registers

The CHIP-8 has sixteen main registers, as well as a few other special ones:


```lisp
(defstruct chip
  ; ...
  (registers (make-array 16 :element-type 'int8)
             :type (simple-array int8 (16))
             :read-only t)
  (index 0 :type int16)
  (program-counter #x200 :type int12)
  ; ...
  )
```

We'll keep the main registers in an array, and the others will just be separate
slots.  I've added type declarations in the struct for two reasons:

* With a high `safety` declaration many implementations (including SBCL, the one
  I'm using) will type check at runtime to make sure we're setting things
  appropriately.
* With low `safety` and high `speed` declarations, SBCL can generate much faster
  code if it knows the types of the struct slots.

To make the integer types a bit less wordy I've defined a few simple synonyms:

```lisp
(deftype int4 () '(unsigned-byte 4))
(deftype int8 () '(unsigned-byte 8))
(deftype int12 () '(unsigned-byte 12))
(deftype int16 () '(unsigned-byte 16))
```

So `registers` is a 16-element simple array of `(unsigned-byte 8)`s.  It's
`read-only` because we'll be changing the *elements* of the array, but we should
never be swapping out the entire array itself.

The index register is 16 bits, and the program counter can store up to 12 bits
(we'll never need more than that because of the size of the CHIP-8's memory).

Also note that the program counter starts at address `#x200`, because that's
where the ROM data eventually gets loaded into the CHIP-8 memory.

### The Stack

The CHIP-8 also has an internal stack to store return addresses when calling
procedures.  We'll just use a Lisp vector with a fill pointer for this:

```lisp
(defstruct chip
  ; ...
  (stack (make-array 16 :element-type 'int12 :fill-pointer 0)
         :type (vector int12 16)
         :read-only t)
  ; ...
  )
```

### Memory

The CHIP-8 has 4 kilobytes of main memory:

```lisp
(defconstant +memory-size+ (* 1024 4))

(defstruct chip
  ; ...
  (memory (make-array +memory-size+ :element-type 'int8)
          :type (simple-array int8 (#.+memory-size+))
          :read-only t)
  ; ...
  )
```

Pretty simple.  Note the `#.` reader macro trick/hack to be able to use
a variable where we normally need a raw type specifier.

### Currently-Loaded ROM

Finally we'll add a slot for keeping track of the path to the currently-loaded
ROM, for easy resetting later:

```lisp
(defstruct chip
  ; ...
  (loaded-rom nil :type (or null string))
  ; ...
  )
```

That's it for now.  We'll need a few more slots once we get to things like
graphics and sound, but I'll introduce them when we need them.

### The Flag Register

Register number 15, or `#xF` in hex, is special.  It's nicknamed the "flag"
register and gets set specially by certain instructions.  We could just access
it like the rest of the registers in our code:

```lisp
(setf (aref (chip-registers chip) 15) 1) ; set flag to 1
(print (aref (chip-registers chip) #xF)) ; print the flag
```

But even with the `#xF` hex index to get that mnemonic "F" this is a bit too
hard to read.  Let's define some extra reader and writer functions to clean
things up:

```lisp
(defun-inline chip-flag (chip)
  (aref (chip-registers chip) #xF))

(defun-inline (setf chip-flag) (new-value chip)
  (setf (aref (chip-registers chip) #xF) new-value))
```

[`defun-inline`][defun-inline] is from my utility library — it just `defun`s the
function and `declaim`s it inline all in one step.  Now things are much nicer:

```lisp
(setf (chip-flag chip) 1) ; set flag to 1
(print (chip-flag chip)) ; print the flag
```

[defun-inline]: https://github.com/sjl/cl-losh/blob/master/DOCUMENTATION.markdown#defun-inline-macro

## Removing Tedium

What we've got so far *works*, but I want to add one more piece of syntactic
sugar before moving on.

We're going to be accessing the slots of the `chip` struct a *lot*, and it's
going to get tedious to write `(chip-SLOT chip)` over and over again.  For
example, when we're resetting the emulator we'll have to do something like this:

```lisp
(defun reset (chip)
  (fill (chip-memory chip) 0)
  (fill (chip-registers chip) 0)
  (setf (chip-running chip) t
        (chip-program-counter chip) #x200
        (fill-pointer (chip-stack chip)) 0))
```

This is annoying.  Languages like Javascript and Python use `.` for slot
access, so it ends up being a bit more concise: `chip.memory()` instead of
`(chip-memory chip)`.

We could use Lisp's `with-accessors` to clean up the actual usage a bit:

```lisp
(defun reset (chip)
  (with-accessors ((memory chip-memory)
                   (registers chip-registers)
                   (running chip-running)
                   (program-counter chip-program-counter)
                   (stack chip-stack))
      chip
    (fill memory 0)
    (fill registers 0)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0)))
```

The *usage* looks much nicer now (`(fill memory 0)` is wonderfully readable) but
we haven't actually fixed anything.  We've just shifted all the typing a few
lines up.  But this is Lisp, we can do better!

Ideally what we'd like is to be able to do say something like `(with-chip (chip)
...)` to mean the giant `with-accessors` form above:

```lisp
(defun reset (chip)
  (with-chip (chip)
    (fill memory 0)
    (fill registers 0)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0)))
```

This is nice and readable.  Some folks will dislike the fact that it introduces
new variable bindings that are "hidden" in the macro definition, but I like the
concision you get from it, especially for a small project like this.

We could write `with-chip` ourselves, if we wanted to.  It would look something
like this:

```lisp
(defmacro with-chip ((chip) &body body)
  `(with-accessors ((memory chip-memory)
                    (registers chip-registers)
                    ; ...
                    )
       ,chip
     ,@body)`
)
```

It's not hard to write, just tedious.  But we're using Lisp, so anything tedious
calls out for abstraction.  Another macro from my utility library is
[`define-with-macro`][define-with-macro].  This is a macro-defining macro that
we can use to define `with-chip` for us:

```lisp
(define-with-macro chip
  running memory stack registers index program-counter flag)
```

If the hairiness of a macro-defining macro scares you, don't worry about it.  We
could have just written it by hand as shown above.  I've just needed `with-FOO`
macros like these often enough that it's been worth it to abstract away the
tedium of writing them.

So one way or another we've got a nice `with-chip` macro that will let us access
the fields of our struct with bare names.

[define-with-macro]: https://github.com/sjl/cl-losh/blob/master/DOCUMENTATION.markdown#define-with-macro-macro

## Infrastructure

We'll need to define a bit of infrastructure for running things before we jump
into implementing the CHIP-8 CPU instructions.

### Resetting

Before we can emulate a ROM, we need to read it into our memory array.  I've
chosen to do this in the `reset` function so we can just `(reset my-chip)` to
reload everything at once — it wouldn't make much sense to load a new ROM
*without* resetting the rest of the emulation state (though the results could
be... "interesting").

`reset` looks like this:

```lisp
(defun reset (chip)
  (with-chip (chip)
    (fill memory 0)
    (fill registers 0)
    (replace memory (read-file-into-byte-vector loaded-rom)
             :start1 #x200)
    (setf running t
          program-counter #x200
          (fill-pointer stack) 0))
  (values))
```

This is pretty self explanatory, except for the actual ROM-loading bit:

```lisp
(replace memory (read-file-into-byte-vector loaded-rom)
         :start1 #x200)
```

`replace` is just the standard Common Lisp `replace` function that copies the
contents of one sequence into another.  `read-file-into-byte-vector` is from
Alexandria, and will just read the file at `loaded-rom` into a byte vector.
Then all we need to do is say that we want to start the copying at index `#x200`
in the destination, because that's where the CHIP-8 ROM data is supposed start
(there's other internal data (like font sprites) before it, which we'll see
later).

One last thing to note is that `reset` just returns `(values)`, i.e. it returns
nothing at all.  This is something I sometimes do for functions I call at the
REPL for side effects, to avoid returning a meaningless result.  I think of it
as an application of [The Rule of Silence][] for Lisp.

[The Rule of Silence]: http://www.linfo.org/rule_of_silence.html

### Loading ROMs

Now that we've got a reset function, it's trivial to define a function to load
a new ROM into our emulator:

```lisp
(defun load-rom (chip filename)
  (setf (chip-loaded-rom chip) filename)
  (reset chip))
```

I didn't bother with the `with-chip` macro here because we're only accessing
a single field.

### The Main Loop(s)

Now let's wrap things up into a nice interface.  We'll define a top-level `run`
function that will be what we call to fire up the emulator:


```lisp
(defparameter *c* nil)

(defun run (rom-filename)
  (let ((chip (make-chip)))
    (setf *c* chip)
    (load-rom chip rom-filename)
    (run-cpu chip)))
```

This will get more complicated in the future, but for now it's pretty simple.
Make a new `chip` object, load the specified ROM, and start emulating.

I've added a `*c*` global variable that's bound to the currently-running
emulator so we can poke at it in NREPL or SLIME as it's running.

Now to write `run-cpu`:

```lisp
(defconstant +cycles-per-second+ 500)
(defconstant +cycles-before-sleep+ 10)

(defun run-cpu (chip)
  (iterate
    (while (chip-running chip))
    (emulate-cycle chip)
    (for tick :every-nth +cycles-before-sleep+ :do
         (sleep (/ +cycles-before-sleep+ +cycles-per-second+)))))
```

CHIP-8 was designed to run on much weaker hardware than we have these days.
Even though we're emulating it, if we just run as fast as possible we'll be
running *far* too fast to be playable.

`+cycles-per-second+` is a constant describing our ideal emulation speed.
Anything from 300 to 800 or so is playable, depending on how fast you want games
to run.  We'll just use 500 for now as a happy medium.

It would be wasteful to call `(sleep)` after every single cycle, so instead
we'll batch them together: run 10 instructions, sleep for a bit, repeat.  This
is the job of the `(for ... every-nth N)` iterate driver, which is also in my
utility library.  Every 10 iterations through the loop it will sleep for the
appropriate amount of time.

The size of the cycle batches is arbitrary — larger batches will result in fewer
`(sleep)` calls, but if you go too large you'll start noticing the emulator
getting "jumpy".  10-cycle batches at 500 cycles per second means that the
emulator will sleep for about 1/50 of a second each time, which isn't too
noticeable.

Now we've got the main loop all set up and just need to emulate each individual
cycle.

### Individual Cycles

CHIP-8 instructions are each two bytes long, and are stored
[big-endian](https://en.wikipedia.org/wiki/Endianness) in memory.  So to emulate
a single cycle we:

* Read the two bytes starting at the program counter and concatenate them to get the instruction.
* Advance the program counter (to avoid having to do it inside every single instruction).
* Dispatch to the appropriate instruction code.

We'll define `emulate-cycle` and a couple of helper functions for this:

```lisp
(defun-inline chop (size integer)
  (ldb (byte size 0) integer))

(defun-inline cat-bytes (high-order low-order)
  (dpb high-order (byte 8 8) low-order))

(defun emulate-cycle (chip)
  (with-chip (chip)
   (let ((instruction (cat-bytes (aref memory program-counter)
                                 (aref memory (1+ program-counter)))))
     (zapf program-counter (chop 12 (+ % 2)))
     (dispatch-instruction chip instruction))))
```

`chop` truncates an integer to the given number of bits.  `cat-bytes`
concatenates two bytes.  These are pretty simple, but I prefer the more
descriptive names over writing `dpb` (deposit byte) and `ldb` (load byte)
everywhere.

`zapf` is from my utility library.  I've written [an entire post][zapf] about
it.

What happens when the program counter is at the final index into memory?  None
of the CHIP-8 references I found specify what should happen.  We could signal an
error, or just wrap around to 0.  I've chosen the latter by chopping `(+
program-counter 2)` to 12 bits, but signaling an error would be easy too.

[zapf]: http://stevelosh.com/blog/2016/08/playing-with-syntax/

### Instruction Dispatch

The CHIP-8's instruction scheme is a bit different than most others that I've
seen.

For systems like the Game Boy instructions have a one- or two-byte opcode, with
arguments following.  For systems like this you can dispatch on opcode by doing
a giant `case` statement:

```lisp
(case opcode
  (#x00 (op-foo ...))
  (#x01 (op-bar ...))
  ; ...
  )
```

But a more efficient way to do it is often to shove all the `op-` functions into
an array, and then just use the opcode itself as the index into the array to
find the function:

```lisp
(funcall (aref opcodes opcode) ...)
```

But the CHIP-8 differentiates instructions a bit strangely.  All instructions
are two bytes long, *including* their arguments.  Instead of just using the
first N bits as the opcode, some instructions use a combination of high- and
low-order bits.  For example, the logical `AND`, `OR`, and `XOR` are specified
by starting the instruction with an `8` nibble and ending it with `1`, `2`, or
`3`, with the "arguments" being the two nibbles in the middle:

    8xy1 - OR Vx, Vy
    8xy2 - AND Vx, Vy
    8xy3 - XOR Vx, Vy

This makes it rather annoying to use some kind of function table approach.
There are only a few dozen opcodes, so instead of trying to hack something
together we'll just use a big old `case` instead:

```lisp
(defun dispatch-instruction (chip instruction)
  (macrolet ((call (name) `(,name chip instruction)))
    (ecase (logand #xF000 instruction)
      (#x0000 (ecase instruction
                (#x00E0 (call op-cls))
                (#x00EE (call op-ret))))
      (#x1000 (call op-jp-imm))
      (#x2000 (call op-call))
      (#x3000 (call op-se-reg-imm))
      (#x4000 (call op-sne-reg-imm))
      (#x5000 (ecase (logand #x000F instruction)
                (#x0 (call op-se-reg-reg))))
      (#x6000 (call op-ld-reg<imm))
      (#x7000 (call op-add-reg<imm))
      (#x8000 (ecase (logand #x000F instruction)
                (#x0 (call op-ld-reg<reg))
                (#x1 (call op-or))
                (#x2 (call op-and))
                (#x3 (call op-xor))
                (#x4 (call op-add-reg<reg))
                (#x5 (call op-sub-reg<reg))
                (#x6 (call op-shr))
                (#x7 (call op-subn-reg<reg))
                (#xE (call op-shl))))
      (#x9000 (ecase (logand #x000F instruction)
                (#x0 (call op-sne-reg-reg))))
      (#xA000 (call op-ld-i<imm))
      (#xB000 (call op-jp-imm+reg))
      (#xC000 (call op-rand))
      (#xD000 (call op-draw))
      (#xE000 (ecase (logand #x00FF instruction)
                (#x9E (call op-skp))
                (#xA1 (call op-sknp))))
      (#xF000 (ecase (logand #x00FF instruction)
                (#x07 (call op-ld-reg<dt))
                (#x0A (call op-ld-reg<key))
                (#x15 (call op-ld-dt<reg))
                (#x18 (call op-ld-st<reg))
                (#x1E (call op-add-index<reg))
                (#x29 (call op-ld-font<vx))
                (#x33 (call op-ld-bcd<vx))
                (#x55 (call op-ld-mem<regs))
                (#x65 (call op-ld-regs<mem)))))))
```

The `macrolet` at the beginning saves us a bit of typing, but otherwise this is
just a big boring table to figure out which instruction to call.

That's all the boring infrastructure work we need — on to implementing the
actual instructions.

## Instructions

The CHIP-8 supports thirty-six instructions, all of which we'll need to
implement.  We'll start with the simpler ones, and we'll be leaving some of the
others (the graphics/sound related ones) for later articles.

(In contrast, the Game Boy has roughly five hundred instructions, so now you
might see why I thought CHIP-8 would be simpler!)

### Random Numbers

We'll start with the `RND` instruction, which generates random numbers.  This
may seem like an odd place to start, but I want to describe some extra syntactic
sugar and `RND` is a nice standalone instruction to use as an example.

Cowgod's reference describes `RND` like so:

    Cxkk - RND Vx, byte
    Set Vx = random byte AND kk.

    The interpreter generates a random number from 0 to 255,
    which is then ANDed with the value kk. The results are
    stored in Vx.

A first stab at the implementation in Lisp might look like this:


```lisp
(defun op-rand (chip instruction)
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (with-chip (chip)
      (setf (aref registers reg)
            (logand (random 256) mask)))))
```

This would work, but it's an awful lot of code for something that just takes
a sentence or two to describe.  We're going to be defining a lot of
instructions, so let's take a few minutes and make our lives easier.

### Define-Instruction

We'll create a `define-instruction` macro that will abstract away some of the
boring bits.  The goal is to end up with instruction definitions that read as
close to the documentation as possible.

We'll start by removing the need to use the `with-chip` macro.  Every
instruction needs to deal with the `chip` struct, so let's not repeat ourselves
thirty-six times.  Every instruction will also take `chip` and `instruction`
arguments, so we can remove those too:

```lisp
(defmacro define-instruction (name &body body)
  `(defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        ,@body)
      nil))

(define-instruction op-rand
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (setf (aref registers reg)
          (logand (random 256) mask))))
```

That's a little better.  We're going to be calling these instructions a *lot*,
so it wouldn't hurt to add a type declaration for each function.  This is easy
because they all take and return the same types:

```lisp
(defmacro define-instruction (name &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name)) ; NEW
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        ,@body)
      nil)))
```

Now all of our instruction functions will have type hints, and SBCL can check
them (if `safety` is high) or use them to generate faster code (if `speed` is
high).

We'll be accessing the values of registers quite often (`(aref registers ...)`)
so let's add a little `macrolet` to make that read nicer:

```lisp
(defmacro define-instruction (name &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name))
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        (macrolet ((register (index)             ; NEW
                     `(aref registers ,index)))  ; NEW
          ,@body))
      nil)))

(define-instruction op-rand
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (setf (register reg)                         ; NEW
          (logand (random 256) mask))))
```

Now instead of `(aref registers ...)` we can just say `(register ...)`.  Cool.

The one thing that still bothers me is having to manually pull the instruction
arguments out of the instruction with bitmasking.  It would be much nicer if we
could just declare what the arguments look like and have the computer generate
the appropriate masking code to deal with them.  The goal is to be able to write
something like this:

```lisp
(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))
```

The `(_ r (mask 2))` is the "argument list" of the instruction.  It says:

* A single-nibble value which we don't care about: `_`
* A single-nibble value should be bound to `r`: `r`
* A two-nibble value should be bound to `mask`: `(mask 2)`

Compare the Lisp code and Cowgod's documentation:

```
Cxkk - RND Vx, byte
Set Vx = random byte AND kk.

(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))
```

Cowgod uses `V` as a shorthand for `register`, and calls the argument `x`
instead of `r`, but otherwise I think this is pretty damn close to the
documentation for actual running code.

So let's implement the argument-parsing macro.  Feel free to skip this if you're
not really comfortable with macros.  But it's a good demonstration of how you
can use them to get a nice clean language for a simple project.

First we'll wrap the `body` in a `let`:

```lisp
                                   ; NEW
(defmacro define-instruction (name argument-list &body body)
  `(progn
    (declaim (ftype (function (chip int16) null) ,name))
    (defun ,name (chip instruction)
      (declare (ignorable instruction))
      (with-chip (chip)
        (macrolet ((register (index)
                     `(aref registers ,index)))
          (let ,(parse-instruction-argument-bindings argument-list) ; NEW
            ,@body))
        nil))))
```

We'll use a helper function to parse the argument list into a list of `let`
bindings, instead of trying to cram it all into this one macro.  It might look
scary, but it's not actually too bad:

```lisp
(defun parse-instruction-argument-bindings (argument-list)
  (flet ((normalize-arg (arg)
           (destructuring-bind (symbol &optional (nibbles 1))
               (ensure-list arg)
             (list symbol nibbles))))
    (iterate
      (for (symbol nibbles) :in (mapcar #'normalize-arg argument-list))
      (for position :first 3 :then (- position nibbles))
      (when (not (eql symbol '_))
        (collect `(,symbol (ldb (byte ,(* nibbles 4)
                                      ,(* position 4))
                                instruction)))))))
```

`normalize-arg` takes each argument from the list and turns it into `(symbol
nibbles)`, so we don't have to write the lengths for single-nibble arguments
like this: `((_ 1) (r 1) (mask 2))`.

Then we loop through each `(symbol nibbles)` pair in the argument list.  We
start at position `3` in the byte (the highest-order nibble) because we want to
write the argument list from left to right, high-order to low-order, like the
documentation.

For each pair, if the symbol is anything other than `_` we collect a `let`
binding with the appropriate `ldb` call for it.  So for our example of `(_
r (mask 2))` we end up with two `let` bindings:

* `(r    (ldb (byte 4 8) instruction))`
* `(mask (ldb (byte 8 0) instruction))`

Notice that we only bother parsing out the bindings we *need* from the
instruction.  For instructions that don't need any arguments we'll end up with
an empty `(let () ...)` which the compiler will optimize away.

This might seem a bit hairy.  The nice thing about using a helper function like
this is that we can call it on its own with a variety of argument lists to see
what gets generated:

```
(map nil #'print
  (parse-instruction-argument-bindings '(_ r (mask 2))))

(R (LDB (BYTE 4 8) INSTRUCTION))
(MASK (LDB (BYTE 8 0) INSTRUCTION))

(map nil #'print
  (parse-instruction-argument-bindings '(_ x y _)))

(X (LDB (BYTE 4 8) INSTRUCTION))
(Y (LDB (BYTE 4 4) INSTRUCTION))

(map nil #'print
  (parse-instruction-argument-bindings '(_ (foo 3))))

(FOO (LDB (BYTE 12 0) INSTRUCTION))
```

And that wraps up `define-instruction`.  So instead of writing `RND` as
a vanilla function:

```lisp
(defun op-rand (chip instruction)
  (let ((reg  (logand #x0F00 instruction))
        (mask (logand #x00FF instruction)))
    (with-chip (chip)
      (setf (aref registers reg)
            (logand (random 256) mask)))))
```

We can do something much cleaner:

```lisp
(define-instruction op-rand (_ r (mask 2))              ;; RND
  (setf (register r)
        (logand (random 256) mask)))
```

This is going to pay off nicely as we implement the other thirty-five
instructions.

### Jumps and Calls

The CHIP-8 has two instructions for jumping directly to an address.  The first
jumps to an immediate, literal address.  The second adds an immediate value to
whatever is in register `#x0` and jumps to the result:

```lisp
(define-instruction op-jp-imm (_ (target 3))            ;; JP addr
  (setf program-counter target))

(define-instruction op-jp-imm+reg (_ (target 3))        ;; JP V0 + addr
  (setf program-counter (chop 12 (+ target (register 0)))))
```

Our `define-instruction` macro is already paying off.  The immediate values of
these arguments are three nibbles each, but our macro doesn't care.

There are also two instructions for implementing traditional function calls and
returns using the CHIP-8's stack:

```lisp
(define-instruction op-call (_ (target 3))              ;; CALL addr
  (vector-push program-counter stack)
  (setf program-counter target))

(define-instruction op-ret ()                           ;; RET
  (setf program-counter (vector-pop stack)))
```

We just use the built in `vector-push` and `vector-pop` functions to manage the
stack's fill pointer trivially.

Note that we're NOT using `vector-push-extend` because the CHIP-8 stack is
defined by the spec to only hold at most sixteen addresses — any ROM that tries
to push more is broken.

### Binary-Coded Decimal

Let's take another self-contained instruction next: `BCD`, which stands for
[binary-coded decimal][BCD] (some references call this instruction `LD B, Vx`
but I think it's different enough to deserve its own name).

The problem this instruction is designed to solve goes something like this.
Let's say you've got a game where the player has a score that changes over time.
You can store the player's score in a register or memory somewhere, and
add/subtract to/from it with the normal arithmetic opcodes no problem.

But now you want to *display* the score to the player, and you probably want to
do that in base 10 because humans are generally bad at reading binary or hex.
So how do you take a byte like `#xA5` and determine the hundreds, tens, and ones
sprites to draw on the screen to read "165 points"?

This is what the `BCD` instruction does.  It takes a single argument (the
register) and stores the hundreds, tens, and ones digits of that register's
value into three separate bytes of memory, starting at wherever the `index`
register is currently pointing.

<pre class="lineart">
    Register V3   Index Register
    ┌───┐         ┌───┐
    │135│         │...│
    └───┘         └─┬─┘
      ┌─────────────┘


    ┌───┬───┬───┬───┬───┐
    │   │   │   │   │   │   Memory
    └───┴───┴───┴───┴───┘

    Run BCD V3 instruction =================

    Register V3   Index Register
    ┌───┐         ┌───┐
    │135│         │...│
    └───┘         └─┬─┘
      ┌─────────────┘


    ┌───┬───┬───┬───┬───┐
    │ 1 │ 3 │ 5 │   │   │   Memory
    └───┴───┴───┴───┴───┘
</pre>

We'll split this into two parts to make it easier to read.  First we'll make
a `digit` function to retrieve a digit of a number:

```lisp
(defun-inline digit (position integer &optional (base 10))
  (-<> integer
    (floor <> (expt base position))
    (mod <> base)))
```

We could have hard-coded the base 10, but part of the Common Lisp tradition is
making flexible functions that can be reused in the future when it's not much
harder to do so.

Let's make sure it works on its own:

```lisp
(digit 0 135)
5

(digit 1 135)
3

(digit 2 135)
1

(digit 0 #xD6 16)
6

(digit 1 #xD6 16)
13     ; 13 in base 10 == D in base 16
```

And then we can define the actual operation:

```lisp
(define-instruction op-ld-bcd<vx (_ r _ _)              ;; LD B, Vx
  (let ((number (register r)))
    (setf (aref memory (+ index 0)) (digit 2 number)
          (aref memory (+ index 1)) (digit 1 number)
          (aref memory (+ index 2)) (digit 0 number))))
```

[BCD]: https://en.wikipedia.org/wiki/Binary-coded_decimal

### Arithmetic

Next up are the arithmetic instructions.  The CHIP-8 only supports addition and
subtraction — there are no multiplication or division instructions.

The first two instructions are `ADD` and `SUB`:

    ADD Vx, Vy → Vx = Vx + Vy
    SUB Vx, Vy → Vx = Vx - Vy

Once again we'll start by creating two helper functions to perform the 8-bit
addition/subtraction with overflow/underflow.  These functions will return
a second value that represents the carry/borrow bit.

```lisp
(defun-inline +_8 (x y)
  (let ((result (+ x y)))
    (values (chop 8 result)
            (if (> result 255) 1 0)))) ; carry

(defun-inline -_8 (x y)
  (let ((result (- x y)))
    (values (chop 8 result)
            (if (> x y) 1 0)))) ; borrow
```

And now the instructions themselves are trivial:

```lisp
(define-instruction op-add-reg<reg (_ rx ry)      ;; ADD Vx, Vy (8-bit)
    (setf (values (register rx) flag)
          (+_8 (register rx) (register ry))))

(define-instruction op-sub-reg<reg (_ rx ry)      ;; SUB Vx, Vy (8-bit)
  (setf (values (register rx) flag)
        (-_8 (register rx) (register ry))))
```

This takes advantage of the fact that you can use `(setf (values ...) ...)` to
assign the multiple values returned by a function, without binding them to local
variables.

Notice how we just assign to `flag`.  Under the hood that `flag` has been bound
with our `with-chip` macro to mean `(chip-flag chip)`, which we defined way back
in the beginning to mean `(aref (chip-registers chip) #xF)`.  But isn't it much
nicer to just say `(setf flag ...)`?

There's also a `SUBN` instruction for subtracting the operands in reverse order
(but still storing the result in the first):

```lisp
(define-instruction op-subn-reg<reg (_ rx ry)     ;; SUBN Vx, Vy (8-bit)
  (setf (values (register rx) flag)
        ;; subtraction order is swapped for SUBN
        (-_8 (register ry) (register rx))))
```

Next is an `ADD` instruction that takes an immediate value.  Unlike the other
instructions this one does *not* set the flag for some reason (that was a fun
bug to track down):

```lisp
(define-instruction op-add-reg<imm (_ r (immediate 2))  ;; ADD Vx, Imm
  ;; For some weird reason the ADD immediate op doesn't set the flag
  (zapf (register r) (+_8 % immediate)))
```

Because Common Lisp will just ignore extra return values if we don't use them,
we can just use our `+_8` helper function here too and ignore the carry result.

There's also a single 16-bit `ADD` instruction that adds the value in
a particular register to the index register.  It too ignores the `flag`:

```lisp
(define-instruction op-add-index<reg (_ r)        ;; ADD I, Vx (16-bit)
  (zapf index (chop 16 (+ % (register r)))))
```

### Shifting

The CHIP-8 has two bit-shifting instructions: `SHR` and `SHL` which shift
a register's contents right or left by a single bit.  Both of these instructions
also set the flag to the bit that got shifted "off the end" of the register.

We'll define two more helpers to do the actual 8-bit shifting and keep track of
the bit that falls off the end:

```lisp
(defun-inline get-bit (position integer)
  (ldb (byte 1 position) integer))

(defun-inline >>_8 (v)
  (values (ash v -1)
          (get-bit 0 v)))

(defun-inline <<_8 (v)
  (values (chop 8 (ash v 1))
          (get-bit 7 v)))
```

The instructions themselves are trivial:

```lisp
(define-instruction op-shr (_ r)                    ;; SHR
  (setf (values (register r) flag)
        (>>_8 (register r))))

(define-instruction op-shl (_ r)                    ;; SHL
  (setf (values (register r) flag)
        (<<_8 (register r))))
```

### Logical Operations

Next up are the logical `AND`/`OR`/`XOR` instructions.  We could define these
like so:

```lisp
(define-instruction op-and (_ destination source _)
  (zapf (register destination) (logand % (register source))))

(define-instruction op-or (_ destination source _)
  (zapf (register destination) (logior % (register source))))

(define-instruction op-xor (_ destination source _)
  (zapf (register destination) (logxor % (register source))))
```

This works, but aside from the name and the operation they're all completely
identical.  The next few groups of instructions are also going to be mostly
similar, so let's step back for a moment and see if we can abstract away the
tedium.

### Macro-Map

Instead of typing the same thing over and over, we'd like to just say what we
want once.  The traditional way to do this is with `macrolet`:

```lisp
(macrolet
    ((define-logical-instruction (name function)
       `(define-instruction ,name (_ destination source _)
          (zapf (register destination)
                (,function % (register source))))))
  (define-logical-instruction op-and logand)
  (define-logical-instruction op-or logior)
  (define-logical-instruction op-xor logxor))
```

Now we're only writing out the actual functionality once instead of three times.
That's better, but I'm still not satisfied.  Using `macrolet` means I need to
think of a name for the macro that I'm just going to use within this block and
throw away, and naming things is hard.

What I *really* want to do here is just "map" a macro over a bunch of arguments
and be done with it.  I've played around a bit and ended up with a macro called
`macro-map` that does just that:

```lisp
(defmacro macro-map (lambda-list items &rest body)
  (with-gensyms (macro)
    `(macrolet ((,macro ,(ensure-list lambda-list) ,@body))
      ,@(iterate (for item :in items)
                 (collect `(,macro ,@(ensure-list item)))))))
```

`macro-map` takes a lambda list, list of arguments, and a body for the macro and
builds the `macrolet` we wrote by hand earlier using a `gensym` for the name so
we don't have to waste brain cells thinking of one.  Now we can define our
logical operations all at once:

```lisp
(macro-map                                     ;; AND/OR/XOR
    (NAME    OP)
    ((op-and logand)
     (op-or  logior)
     (op-xor logxor))
  `(define-instruction ,name (_ destination source _)
    (zapf (register destination) (,op % (register source)))))
```

This actually ends up being one line of code longer than the copy/pasted version
because I like to linebreak liberally, but the important thing is that we only
say each thing we need to say once.

Notice how this almost starts to look like a table of data rather than code.
I've taken advantage of the fact that the Lisp reader reads everything as
uppercase by default and uppercased the "header" row (the lambda list) to make
it stand out a bit more.

If you squint a little bit you might imagine how defining a set of related
instructions could almost look like [a page from a CPU
manual](https://i.imgur.com/8jbXmVj.png).

### Branching

Now that we've got a way to define batches of similar instructions without going
crazy we can tackle the branching instructions.

Most CPUs have some form of "jump to some location if zero, otherwise continue"
instruction to implement branching.  The CHIP-8 needs to fit every instruction
into two bytes, so it does something a bit simpler.  Instead of an arbitrary
jump on a condition, it has a series of *skip* instructions.

For example, `SE Vx, 0` means "skip the next instruction if register X is zero".
Skipping the next instruction is a simple as incrementing the program counter by
an extra two bytes.

There are four variants of this instruction:

    SE  Vx, Immediate → Skip when Vx equals Immediate
    SNE Vx, Immediate → Skip when Vx does not equal Immediate
    SE  Vx, Vy        → Skip when Vx equals Vy
    SNE Vx, Vy        → Skip when Vx does not equal Vy

Let's use `macro-map` to write this out as a table of code:

```lisp
(macro-map                                              ;; SE/SNE
    ((NAME            TEST X-ARG  X-FORM        Y-ARG         Y-FORM)
     ((op-se-reg-imm  =    (r 1)  (register r)  (immediate 2) immediate)
      (op-sne-reg-imm not= (r 1)  (register r)  (immediate 2) immediate)
      (op-se-reg-reg  =    (rx 1) (register rx) (ry 1)        (register ry))
      (op-sne-reg-reg not= (rx 1) (register rx) (ry 1)        (register ry))))
  `(define-instruction ,name (_ ,x-arg ,y-arg)
     (when (,test ,x-form ,y-form)
       (incf program-counter 2))))
```

After the macroexpansion we end up with things like:

```lisp
(define-instruction op-sne-reg-imm (_ (r 1) (immediate 2))
  (when (not= (register r) immediate)
    (incf program-counter 2)))
```

We'll also need `not=` itself to avoid having to do messy things inside the
macro body:

```lisp
(defun-inline not= (x y)
  (not (= x y)))
```

Just one more group of instructions left (for this post).

### Loads

The final group of instructions we'll look at for now is the `LD` family of
loads.  Normally we'd implement these first, but I wanted to introduce
`macro-map` as gently as possible.

Most of the `LD` instructions simply take a value from a source and stick it
into a destination, and we can implement them as a single `setf` form:

```lisp
(macro-map                                              ;; LD
     (NAME           ARGLIST         DESTINATION   SOURCE)
     ((op-ld-i<imm   (_ (value 3))   index         value)
      (op-ld-reg<imm (_ r (value 2)) (register r)  value)
      (op-ld-reg<reg (_ rx ry _)     (register rx) (register ry))
      (op-ld-reg<dt  (_ r _ _)       (register r)  delay-timer)
      (op-ld-dt<reg  (_ r _ _)       delay-timer   (register r))
      (op-ld-st<reg  (_ r _ _)       sound-timer   (register r)))
  `(define-instruction ,name ,arglist
     (setf ,destination ,source)))
```

We haven't talked about the timers yet, so don't worry about them.  I'm leaving
them in so you can see how nice the `macro-map` is when you need to define lots
of very similar operations at once.

There are two other more interesting `LD` instructions which move data between
multiple registers and memory.

`LD [I], n` loads consecutive bytes of memory with the contents of registers
`V0` through `Vn`, starting at wherever the index register is pointing.  For
example, `LD [I], 2` would be:

<pre class="lineart">
     V0   V1   V2   V3   V4  ...
    ┌──┐ ┌──┐ ┌──┐ ┌──┐ ┌──┐
    │  │ │  │ │  │ │  │ │  │ ...
    └─┬┘ └─┬┘ └─┬┘ └──┘ └──┘
      └───┐└──┐ └─┐
          │   │   │
          ▼   ▼   ▼
    ┌───┬───┬───┬───┬───┐
    │   │   │   │   │   │   Memory
    └───┴───┴───┴───┴───┘
          ▲  ┌───┐
          └──│...│ Index Register
             └───┘
</pre>

`LD n, [I]` does the opposite: it loads the contents of memory into the
registers `V0` through `Vn`.

Because we've used Lisp arrays for both the registers and memory, these
instructions are really lovely to implement — they end up being just a single
call to `replace`:

```lisp
(define-instruction op-ld-mem<regs (_ n _ _)            ;; LD [I] < Vn
  (replace memory registers :start1 index :end2 (1+ n)))

(define-instruction op-ld-regs<mem (_ n _ _)            ;; LD Vn < [I]
  (replace registers memory :end1 (1+ n) :start2 index))
```

That's it for the `LD` instructions (for now) and for the instructions in
general!

## Future

With these instructions implemented (and with some stubs for the rest) we can
load and run a ROM and it will push bytes around in memory and mostly work.  In
the next few posts we'll look at the next steps to getting a fully-functional
emulator up and running, including:

* Graphics and input
* Sound
* Debugging

*Thanks to [James Cash](https://twitter.com/jamesnvc) for reading a draft of
this post.*