--- a/content/blog/2016/12/chip8-debugging-infrastructure.markdown Sun Dec 25 11:34:06 2016 -0500
+++ /dev/null Thu Jan 01 00:00:00 1970 +0000
@@ -1,198 +0,0 @@
-+++
-title = "CHIP-8 in Common Lisp: Debugging Infrastructure"
-snip = "Let's figure out what the hell is going on."
-date = 2016-12-31T14:50:00Z
-draft = true
-
-+++
-
-In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common
-Lisp. After adding a screen, input, and sound the core of the emulator is
-essentially complete.
-
-I've been guiding you through the code step by step and it might look pretty
-simple, but that's only because I went down all the dead ends myself first. In
-practice, when you're writing an emulator for a system you'll need a way to
-debug the execution of code, so let's look at how to add some debugging
-capabilities to our simple CHIP-8 emulator.
-
-The full series of posts so far:
-
-1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/)
-2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/)
-3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/)
-
-The full emulator source is on [BitBucket][] and [GitHub][].
-
-[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8
-[BitBucket]: https://bitbucket.org/sjl/cl-chip8
-[GitHub]: https://github.com/sjl/cl-chip8
-
-<div id="toc"></div>
-
-## Disassembling
-
-The first thing we'll need is a way to take an instruction like `#x8055` and
-turn it into something we can read. The easiest way to do this seemed to be to
-copy the dispatch loop from the CPU emulator and turn it into a disassembly
-function:
-
-```lisp
-(defun disassemble-instruction (instruction)
- (flet ((v (n) (symb 'v (format nil "~X" n))))
- (let ((_x__ (ldb (byte 4 8) instruction))
- (__x_ (ldb (byte 4 4) instruction))
- (___x (ldb (byte 4 0) instruction))
- (__xx (ldb (byte 8 0) instruction))
- (_xxx (ldb (byte 12 0) instruction)))
- (case (logand #xF000 instruction)
- (#x0000 (case instruction
- (#x00E0 '(cls))
- (#x00EE '(ret))))
- (#x1000 `(jp ,_xxx))
- (#x2000 `(call ,_xxx))
- (#x3000 `(se ,(v _x__) ,__xx))
- (#x4000 `(sne ,(v _x__) ,__xx))
- (#x5000 (case (logand #x000F instruction)
- (#x0 `(se ,(v _x__) ,(v __x_)))))
- (#x6000 `(ld ,(v _x__) ,__xx))
- (#x7000 `(add ,(v _x__) ,__xx))
- (#x8000 (case (logand #x000F instruction)
- (#x0 `(ld ,(v _x__) ,(v __x_)))
- (#x1 `(or ,(v _x__) ,(v __x_)))
- (#x2 `(and ,(v _x__) ,(v __x_)))
- (#x3 `(xor ,(v _x__) ,(v __x_)))
- (#x4 `(add ,(v _x__) ,(v __x_)))
- (#x5 `(sub ,(v _x__) ,(v __x_)))
- (#x6 `(shr ,(v _x__) ,(v __x_)))
- (#x7 `(subn ,(v _x__) ,(v __x_)))
- (#xE `(shl ,(v _x__) ,(v __x_)))))
- (#x9000 (case (logand #x000F instruction)
- (#x0 `(sne ,(v _x__) ,(v __x_)))))
- (#xA000 `(ld i ,_xxx))
- (#xB000 `(jp ,(v 0) ,_xxx))
- (#xC000 `(rnd ,(v _x__) ,__xx))
- (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
- (#xE000 (case (logand #x00FF instruction)
- (#x9E `(skp ,(v _x__)))
- (#xA1 `(sknp ,(v _x__)))))
- (#xF000 (case (logand #x00FF instruction)
- (#x07 `(ld ,(v _x__) dt))
- (#x0A `(ld ,(v _x__) k))
- (#x15 `(ld dt ,(v _x__)))
- (#x18 `(ld st ,(v _x__)))
- (#x1E `(add i ,(v _x__)))
- (#x29 `(ld f ,(v _x__)))
- (#x33 `(ld b ,(v _x__)))
- (#x55 `(ld (mem i) ,_x__))
- (#x65 `(ld ,_x__ (mem i)))))))))
-```
-
-There are a lot of other ways we could have done this, like making a proper
-parser or adding functionality to `define-opcode`, but since there's not that
-many instructions I think this is pretty reasonable. Now we can pass in a raw,
-two-byte instruction and get out something readable:
-
-```
-[SBCL] CHIP8> (disassemble-instruction #x8055)
-(SUB V0 V5)
-
-[SBCL] CHIP8> (disassemble-instruction #x4077)
-(SNE V0 119)
-```
-
-Disassembling a single instruction will be useful, but it would also be nice to
-disassemble an entire ROM at once to see what its code looks like. Let's make
-a little helper function to handle that:
-
-```lisp
-(defun dump-disassembly (array &optional (start 0) (end (length array)))
- (iterate
- (for i :from start :below end :by 2)
- (print-disassembled-instruction array i)
- (sleep 0.001)))
-```
-
-The `sleep` is there because Neovim's terminal seems to shit the bed if you dump
-too much text at it at once. Computers are garbage.
-
-Other that than, `dump-disassembly` is pretty straightforward: just iterate
-through the array of instructions two bytes at a time and print the information.
-Let's look at the printing function now:
-
-```lisp
-(defun print-disassembled-instruction (array index)
- (destructuring-bind (address instruction disassembly)
- (instruction-information array index)
- (let ((*print-base* 16))
- (format t "~3,'0X: ~4,'0X ~24A~%"
- address
- instruction
- (or disassembly "")))))
-```
-
-Once again we'll delegate to a helper function.
-`print-disassembled-instruction` just handles the string formatting to dump an
-instruction to the screen. Running it for a single instruction would print
-something like:
-
-```
-Address Disassembly
- | |
- v v
-200: 8055 (SUB V0 V5)
- ^
- |
- Raw instruction
-```
-
-The helper function `instruction-information` is simple, but we'll be using it
-in the future for something else, so it's nice to have:
-
-```lisp
-(defun instruction-information (array index)
- (let ((instruction (retrieve-instruction array index)))
- (list index
- instruction
- (disassemble-instruction instruction))))
-```
-
-`retrieve-instruction` is simple (for now):
-
-```lisp
-(defun retrieve-instruction (array index)
- (cat-bytes (aref array index)
- (aref array (1+ index))))
-```
-
-These functions *could* be combined into a single, bigger function, but I'm
-a strong believer in having each function do exactly one thing only. And as
-we'll see, each of these "simple" tasks is going to get more complicated in the
-real world.
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/content/blog/2016/12/chip8-disassembly.markdown Mon Dec 26 12:28:40 2016 -0500
@@ -0,0 +1,404 @@
++++
+title = "CHIP-8 in Common Lisp: Disassembly"
+snip = "What's in a ROM?"
+date = 2016-12-31T14:50:00Z
+draft = true
+
++++
+
+In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common
+Lisp. After adding a screen, input, and sound the core of the emulator is
+essentially complete.
+
+I've been guiding you through the code step by step and it might look pretty
+simple, but that's only because I went down all the dead ends myself first. In
+practice, when you're writing an emulator for a system you'll need a way to
+debug the execution of code, and the first step is to be able to *read* the
+code, so let's look at how to add a disassembler to our simple CHIP-8 emulator.
+
+The full series of posts so far:
+
+1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/)
+2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/)
+3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/)
+
+The full emulator source is on [BitBucket][] and [GitHub][].
+
+[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8
+[BitBucket]: https://bitbucket.org/sjl/cl-chip8
+[GitHub]: https://github.com/sjl/cl-chip8
+
+<div id="toc"></div>
+
+## Disassembling Single Instructions
+
+The first thing we'll need is a way to take a single instruction like `#x8055`
+and turn it into something we can read. The easiest way to do this seemed to be
+to copy the dispatch loop from the CPU emulator and turn it into a disassembly
+function:
+
+```lisp
+(defun disassemble-instruction (instruction)
+ (flet ((v (n) (symb 'v (format nil "~X" n))))
+ (let ((_x__ (ldb (byte 4 8) instruction))
+ (__x_ (ldb (byte 4 4) instruction))
+ (___x (ldb (byte 4 0) instruction))
+ (__xx (ldb (byte 8 0) instruction))
+ (_xxx (ldb (byte 12 0) instruction)))
+ (case (logand #xF000 instruction)
+ (#x0000 (case instruction
+ (#x00E0 '(cls))
+ (#x00EE '(ret))))
+ (#x1000 `(jp ,_xxx))
+ (#x2000 `(call ,_xxx))
+ (#x3000 `(se ,(v _x__) ,__xx))
+ (#x4000 `(sne ,(v _x__) ,__xx))
+ (#x5000 (case (logand #x000F instruction)
+ (#x0 `(se ,(v _x__) ,(v __x_)))))
+ (#x6000 `(ld ,(v _x__) ,__xx))
+ (#x7000 `(add ,(v _x__) ,__xx))
+ (#x8000 (case (logand #x000F instruction)
+ (#x0 `(ld ,(v _x__) ,(v __x_)))
+ (#x1 `(or ,(v _x__) ,(v __x_)))
+ (#x2 `(and ,(v _x__) ,(v __x_)))
+ (#x3 `(xor ,(v _x__) ,(v __x_)))
+ (#x4 `(add ,(v _x__) ,(v __x_)))
+ (#x5 `(sub ,(v _x__) ,(v __x_)))
+ (#x6 `(shr ,(v _x__) ,(v __x_)))
+ (#x7 `(subn ,(v _x__) ,(v __x_)))
+ (#xE `(shl ,(v _x__) ,(v __x_)))))
+ (#x9000 (case (logand #x000F instruction)
+ (#x0 `(sne ,(v _x__) ,(v __x_)))))
+ (#xA000 `(ld i ,_xxx))
+ (#xB000 `(jp ,(v 0) ,_xxx))
+ (#xC000 `(rnd ,(v _x__) ,__xx))
+ (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
+ (#xE000 (case (logand #x00FF instruction)
+ (#x9E `(skp ,(v _x__)))
+ (#xA1 `(sknp ,(v _x__)))))
+ (#xF000 (case (logand #x00FF instruction)
+ (#x07 `(ld ,(v _x__) dt))
+ (#x0A `(ld ,(v _x__) k))
+ (#x15 `(ld dt ,(v _x__)))
+ (#x18 `(ld st ,(v _x__)))
+ (#x1E `(add i ,(v _x__)))
+ (#x29 `(ld f ,(v _x__)))
+ (#x33 `(ld b ,(v _x__)))
+ (#x55 `(ld (mem i) ,_x__))
+ (#x65 `(ld ,_x__ (mem i)))))))))
+```
+
+There are a lot of other ways we could have done this, like making a proper
+parser or adding functionality to `define-opcode`, but since there's not that
+many instructions I think this is pretty reasonable. Now we can pass in a raw,
+two-byte instruction and get out something readable:
+
+```
+[SBCL] CHIP8> (disassemble-instruction #x8055)
+(SUB V0 V5)
+
+[SBCL] CHIP8> (disassemble-instruction #x4077)
+(SNE V0 119)
+```
+
+## Disassembling Entire ROMs
+
+Disassembling a single instruction will be useful, but it would also be nice to
+disassemble an entire ROM at once to see what its code looks like. Let's make
+a little helper function to handle that:
+
+```lisp
+(defun dump-disassembly (array &optional (start 0) (end (length array)))
+ (iterate
+ (for i :from start :below end :by 2)
+ (print-disassembled-instruction array i)
+ (sleep 0.001)))
+```
+
+The `sleep` is there because Neovim's terminal seems to shit the bed if you dump
+too much text at it at once. Computers are garbage.
+
+Other that than, `dump-disassembly` is pretty straightforward: just iterate
+through the array of instructions two bytes at a time and print the information.
+Let's look at the printing function now:
+
+```lisp
+(defun print-disassembled-instruction (array index)
+ (destructuring-bind (address instruction disassembly)
+ (instruction-information array index)
+ (let ((*print-base* 16))
+ (format t "~3,'0X: ~4,'0X ~24A~%"
+ address
+ instruction
+ (or disassembly "")))))
+```
+
+Once again we'll delegate to a helper function.
+`print-disassembled-instruction` just handles the string formatting to dump an
+instruction to the screen. Running it for a single instruction would print
+something like:
+
+```
+Address Disassembly
+ | |
+ v v
+200: 8055 (SUB V0 V5)
+ ^
+ |
+ Raw instruction
+```
+
+The helper function `instruction-information` is simple, but we'll be using it
+in the future for something else, so it's nice to have:
+
+```lisp
+(defun instruction-information (array index)
+ (let ((instruction (retrieve-instruction array index)))
+ (list index
+ instruction
+ (disassemble-instruction instruction))))
+```
+
+It just takes an address and memory array and returns a list of:
+
+* The address
+* The raw instruction at the address
+* The disassembly for that instruction
+
+`retrieve-instruction` is simple (for now):
+
+```lisp
+(defun retrieve-instruction (array index)
+ (cat-bytes (aref array index)
+ (aref array (1+ index))))
+```
+
+These functions *could* be combined into a single, bigger function, but I'm
+a strong believer in having each function do exactly one thing only. And as
+we'll see, each of these "simple" tasks is going to get more complicated in the
+real world.
+
+Now we can dump the disassembly for a ROM to see how it works:
+
+```
+(run "roms/ufo.rom") ; stores the current chip struct in *c*
+
+(dump-disassembly (chip-memory *c*) #x200 #x220)
+200: A2CD (LD I 2CD)
+202: 6938 (LD V9 38)
+204: 6A08 (LD VA 8)
+206: D9A3 (DRW V9 VA 3)
+208: A2D0 (LD I 2D0)
+20A: 6B00 (LD VB 0)
+20C: 6C03 (LD VC 3)
+20E: DBC3 (DRW VB VC 3)
+210: A2D6 (LD I 2D6)
+212: 641D (LD V4 1D)
+214: 651F (LD V5 1F)
+216: D451 (DRW V4 V5 1)
+218: 6700 (LD V7 0)
+21A: 680F (LD V8 F)
+21C: 22A2 (CALL 2A2)
+21E: 22AC (CALL 2AC)
+```
+
+## Sprites
+
+Take a look at `print-disassembled-instruction` again:
+
+```lisp
+(defun print-disassembled-instruction (array index)
+ (destructuring-bind (address instruction disassembly)
+ (instruction-information array index)
+ (let ((*print-base* 16))
+ (format t "~3,'0X: ~4,'0X ~24A~%"
+ address
+ instruction
+ (or disassembly "")))))
+```
+
+Notice how we used `(or disassembly "")` instead of just passing in the
+disassembly. If you look back at `disassemble-instruction` you'll see it uses
+normal `case` statements, not `ecase`, so if the instruction doesn't match any
+valid opcodes it will return `nil`.
+
+The CHIP-8 (like [most computers][neumann]) uses the same memory to hold both
+program code (instructions) and data. This includes things like player health,
+score, location, and most importantly: the sprites that will be drawn on the
+screen.
+
+[neumann]: https://en.wikipedia.org/wiki/Von_Neumann_architecture
+
+Unfortunately there's no way to know for sure whether a given memory location
+contains an instruction (and thus needs to be disassembled) or is intended to be
+a piece of data. Indeed, someone like [Mel][] could conceivably figure out
+a way to use a particular sequence of instructions as a sprite! So our
+disassembler will just always show the disassembly for anything that *might* be
+an instruction.
+
+[Mel]: http://www.catb.org/jargon/html/story-of-mel.html
+
+But with that said, we can probably make some educated guesses. If we had some
+way to visualize what a hunk of memory would look like *if* it were rendered as
+a sprite, we could probably figure out where most of the program's sprites are
+kept. It's unlikely that any given sequence of instructions would just *happen*
+to look like a ghost from Pac Man or something.
+
+We could add a separate function to draw out the sprite data, but the CHIP-8's
+sprites are so simple that we can just tack it on to the disassembly output.
+
+[Remember][cowgod-disp] that each byte of memory defines one eight-pixel-wide
+row of a sprite, and that `DRW X, Y, Size` will draw `Size` rows of a sprite
+using contiguous bytes in memory. So if memory contains something like this (at
+the location specified by the index register):
+
+[cowgod-disp]: http://devernay.free.fr/hacks/chip8/C8TECH10.HTM#2.4
+
+```
+Address Data
+#x300 #b11110000
+#x301 #b00010000
+#x302 #b11110000
+#x303 #b00010000
+#x304 #b11110000
+```
+
+A `DRW X, Y, 5` instruction would draw a `3` sprite to the screen:
+
+<pre class="lineart">
+ ████
+ █
+ ████
+ █
+ ████
+</pre>
+
+It would be trivial to simply render the bits of any given instruction as spaces
+and some other ASCII character and tack it onto the end of the disassembly, but
+there's a snag: instructions are *two* bytes each, but each row in a sprite is
+*one* byte long. Our sprites would get pretty mangled if we printed two of
+their rows per line of disassembly — for example, `4` would look like this:
+
+```
+ byte 1 byte 2
+ 1111111122222222
+064: 9090 (SNE V0 V9) █ █ █ █
+066: F010 ████ █
+068: 10F0 (JP F0) █
+```
+
+Not ideal. One option would be to make every instruction of disassembly two
+lines long, but that's painful to read when trying to look at the code portions
+of the ROM.
+
+We can get around this with a delightful little hack: using characters from
+[Unicode Block Elements][block] to cram two rows of sprite data into a single
+line of output. Let's start by defining a `bit-diagram` function that will take
+a two-byte-wide integer and return an ASCII diagram of its bits:
+
+[block]: https://en.wikipedia.org/wiki/Block_Elements
+
+```lisp
+(defun bit-diagram (integer)
+ (iterate (for high-bit :from 15 :downto 8)
+ (for low-bit :from 7 :downto 0)
+ (for hi = (logbitp high-bit integer))
+ (for lo = (logbitp low-bit integer))
+ (collect (cond
+ ((and hi lo) #\full_block)
+ (hi #\upper_half_block)
+ (lo #\lower_half_block)
+ (t #\space))
+ :result-type 'string)))
+```
+
+```lisp
+; Example rows of sprite data:
+; 11110000
+; 11001100
+
+(bit-diagram #b1111000011001100)
+"██▀▀▄▄ "
+```
+
+Now that we've got this we can easily add it into our disassembly functions:
+
+```lisp
+(defun instruction-information (array index)
+ (let ((instruction (retrieve-instruction array index)))
+ (list index
+ instruction
+ (disassemble-instruction instruction)
+ (bit-diagram instruction)))) ; NEW
+
+(defun print-disassembled-instruction (array index)
+ (destructuring-bind (address instruction disassembly bits)
+ (instruction-information array index)
+ (let ((*print-base* 16))
+ ; NEW
+ (format t "~3,'0X: ~4,'0X ~24A ~8A~%"
+ address
+ instruction
+ (or disassembly "")
+ bits)))) ; NEW
+```
+
+Now when we dump the disassembly of a ROM we'll also see what each instruction
+would look like if drawn as a sprite. For program code this will tend to look
+like garbage (unless some crazy person has managed to write code that also works
+as sprites):
+
+<pre class="lineart" style="line-height: 1.0 !important;">
+ 200: A2CD (LD I 2CD) █▄▀ ▄▄▀▄
+ 202: 6938 (LD V9 38) ▀█▄█ ▀
+ 204: 6A08 (LD VA 8) ▀▀ █ ▀
+ 206: D9A3 (DRW V9 VA 3) █▀▄▀▀ ▄█
+ 208: A2D0 (LD I 2D0) █▄▀▄ ▀
+ 20A: 6B00 (LD VB 0) ▀▀ ▀ ▀▀
+ 20C: 6C03 (LD VC 3) ▀▀ ▀▀▄▄
+ 20E: DBC3 (DRW VB VC 3) ██ ▀▀ ██
+</pre>
+
+But when we look at areas of the ROM that *do* contain sprites, they look pretty
+recognizable:
+
+<pre class="lineart" style="line-height: 1.0 !important;">
+ 050: F090 █▀▀█
+ 052: 9090 (SNE V0 V9) █ █
+ 054: F020 ▀▀█▀
+ 056: 6020 (LD V0 20) ▀█
+ 058: 2070 (CALL 70) ▄█▄
+ 05A: F010 ▀▀▀█
+ 05C: F080 █▀▀▀
+ 05E: F0F0 ████
+ 060: 10F0 (JP F0) ▄▄▄█
+ 062: 10F0 (JP F0) ▄▄▄█
+ 064: 9090 (SNE V0 V9) █ █
+ 066: F010 ▀▀▀█
+ 068: 10F0 (JP F0) ▄▄▄█
+ 06A: 80F0 (LD V0 VF) █▄▄▄
+ 06C: 10F0 (JP F0) ▄▄▄█
+ 06E: F080 █▀▀▀
+</pre>
+
+Human eyes are pretty good at picking out patterns, so when you're scrolling
+through a disassembled ROM it's pretty easy to tell which sections are sprites
+and which are data, even if it's not perfectly rendered.
+
+## Result
+
+We've now got a way to dump the disassembly of a ROM to see what its code and
+data look like.
+
+We can also inspect the rest of our emulator's state at runtime with NREPL or
+SLIME by running things like `(chip-program-counter *c*)`.
+
+## Future
+
+Manually querying for information and dumping the disassembly isn't very
+ergonomic, so in the future we'll look at adding:
+
+* Debugging infrastructure like pausing and breakpoints
+* A graphical debugger/disassembly viewer
+
+As well as a few other niceties like menus for loading ROMs, etc.