6b4d51f1c30d

Finish up disassembly entry
[view raw] [browse files]
author Steve Losh <steve@stevelosh.com>
date Mon, 26 Dec 2016 12:28:40 -0500
parents ae7bfb3acac3
children 2e6d03f14eb7
branches/tags (none)
files content/blog/2016/12/chip8-debugging-infrastructure.markdown content/blog/2016/12/chip8-disassembly.markdown

Changes

--- a/content/blog/2016/12/chip8-debugging-infrastructure.markdown	Sun Dec 25 11:34:06 2016 -0500
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,198 +0,0 @@
-+++
-title = "CHIP-8 in Common Lisp: Debugging Infrastructure"
-snip = "Let's figure out what the hell is going on."
-date = 2016-12-31T14:50:00Z
-draft = true
-
-+++
-
-In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common
-Lisp.  After adding a screen, input, and sound the core of the emulator is
-essentially complete.
-
-I've been guiding you through the code step by step and it might look pretty
-simple, but that's only because I went down all the dead ends myself first.  In
-practice, when you're writing an emulator for a system you'll need a way to
-debug the execution of code, so let's look at how to add some debugging
-capabilities to our simple CHIP-8 emulator.
-
-The full series of posts so far:
-
-1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/)
-2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/)
-3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/)
-
-The full emulator source is on [BitBucket][] and [GitHub][].
-
-[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8
-[BitBucket]: https://bitbucket.org/sjl/cl-chip8
-[GitHub]: https://github.com/sjl/cl-chip8
-
-<div id="toc"></div>
-
-## Disassembling
-
-The first thing we'll need is a way to take an instruction like `#x8055` and
-turn it into something we can read.  The easiest way to do this seemed to be to
-copy the dispatch loop from the CPU emulator and turn it into a disassembly
-function:
-
-```lisp
-(defun disassemble-instruction (instruction)
-  (flet ((v (n) (symb 'v (format nil "~X" n))))
-    (let ((_x__ (ldb (byte 4 8) instruction))
-          (__x_ (ldb (byte 4 4) instruction))
-          (___x (ldb (byte 4 0) instruction))
-          (__xx (ldb (byte 8 0) instruction))
-          (_xxx (ldb (byte 12 0) instruction)))
-      (case (logand #xF000 instruction)
-        (#x0000 (case instruction
-                  (#x00E0 '(cls))
-                  (#x00EE '(ret))))
-        (#x1000 `(jp ,_xxx))
-        (#x2000 `(call ,_xxx))
-        (#x3000 `(se ,(v _x__) ,__xx))
-        (#x4000 `(sne ,(v _x__) ,__xx))
-        (#x5000 (case (logand #x000F instruction)
-                  (#x0 `(se ,(v _x__) ,(v __x_)))))
-        (#x6000 `(ld ,(v _x__) ,__xx))
-        (#x7000 `(add ,(v _x__) ,__xx))
-        (#x8000 (case (logand #x000F instruction)
-                  (#x0 `(ld ,(v _x__) ,(v __x_)))
-                  (#x1 `(or ,(v _x__) ,(v __x_)))
-                  (#x2 `(and ,(v _x__) ,(v __x_)))
-                  (#x3 `(xor ,(v _x__) ,(v __x_)))
-                  (#x4 `(add ,(v _x__) ,(v __x_)))
-                  (#x5 `(sub ,(v _x__) ,(v __x_)))
-                  (#x6 `(shr ,(v _x__) ,(v __x_)))
-                  (#x7 `(subn ,(v _x__) ,(v __x_)))
-                  (#xE `(shl ,(v _x__) ,(v __x_)))))
-        (#x9000 (case (logand #x000F instruction)
-                  (#x0 `(sne ,(v _x__) ,(v __x_)))))
-        (#xA000 `(ld i ,_xxx))
-        (#xB000 `(jp ,(v 0) ,_xxx))
-        (#xC000 `(rnd ,(v _x__) ,__xx))
-        (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
-        (#xE000 (case (logand #x00FF instruction)
-                  (#x9E `(skp ,(v _x__)))
-                  (#xA1 `(sknp ,(v _x__)))))
-        (#xF000 (case (logand #x00FF instruction)
-                  (#x07 `(ld ,(v _x__) dt))
-                  (#x0A `(ld ,(v _x__) k))
-                  (#x15 `(ld dt ,(v _x__)))
-                  (#x18 `(ld st ,(v _x__)))
-                  (#x1E `(add i ,(v _x__)))
-                  (#x29 `(ld f ,(v _x__)))
-                  (#x33 `(ld b ,(v _x__)))
-                  (#x55 `(ld (mem i) ,_x__))
-                  (#x65 `(ld ,_x__ (mem i)))))))))
-```
-
-There are a lot of other ways we could have done this, like making a proper
-parser or adding functionality to `define-opcode`, but since there's not that
-many instructions I think this is pretty reasonable.  Now we can pass in a raw,
-two-byte instruction and get out something readable:
-
-```
-[SBCL] CHIP8> (disassemble-instruction #x8055)
-(SUB V0 V5)
-
-[SBCL] CHIP8> (disassemble-instruction #x4077)
-(SNE V0 119)
-```
-
-Disassembling a single instruction will be useful, but it would also be nice to
-disassemble an entire ROM at once to see what its code looks like.  Let's make
-a little helper function to handle that:
-
-```lisp
-(defun dump-disassembly (array &optional (start 0) (end (length array)))
-  (iterate
-    (for i :from start :below end :by 2)
-    (print-disassembled-instruction array i)
-    (sleep 0.001)))
-```
-
-The `sleep` is there because Neovim's terminal seems to shit the bed if you dump
-too much text at it at once.  Computers are garbage.
-
-Other that than, `dump-disassembly` is pretty straightforward: just iterate
-through the array of instructions two bytes at a time and print the information.
-Let's look at the printing function now:
-
-```lisp
-(defun print-disassembled-instruction (array index)
-  (destructuring-bind (address instruction disassembly)
-      (instruction-information array index)
-    (let ((*print-base* 16))
-      (format t "~3,'0X: ~4,'0X ~24A~%"
-              address
-              instruction
-              (or disassembly "")))))
-```
-
-Once again we'll delegate to a helper function.
-`print-disassembled-instruction` just handles the string formatting to dump an
-instruction to the screen.  Running it for a single instruction would print
-something like:
-
-```
-Address    Disassembly
- |          |
- v          v
-200: 8055 (SUB V0 V5)
-      ^
-      |
-      Raw instruction
-```
-
-The helper function `instruction-information` is simple, but we'll be using it
-in the future for something else, so it's nice to have:
-
-```lisp
-(defun instruction-information (array index)
-  (let ((instruction (retrieve-instruction array index)))
-    (list index
-          instruction
-          (disassemble-instruction instruction))))
-```
-
-`retrieve-instruction` is simple (for now):
-
-```lisp
-(defun retrieve-instruction (array index)
-  (cat-bytes (aref array index)
-             (aref array (1+ index))))
-```
-
-These functions *could* be combined into a single, bigger function, but I'm
-a strong believer in having each function do exactly one thing only.  And as
-we'll see, each of these "simple" tasks is going to get more complicated in the
-real world.
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
-
-```lisp
-```
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/content/blog/2016/12/chip8-disassembly.markdown	Mon Dec 26 12:28:40 2016 -0500
@@ -0,0 +1,404 @@
++++
+title = "CHIP-8 in Common Lisp: Disassembly"
+snip = "What's in a ROM?"
+date = 2016-12-31T14:50:00Z
+draft = true
+
++++
+
+In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common
+Lisp.  After adding a screen, input, and sound the core of the emulator is
+essentially complete.
+
+I've been guiding you through the code step by step and it might look pretty
+simple, but that's only because I went down all the dead ends myself first.  In
+practice, when you're writing an emulator for a system you'll need a way to
+debug the execution of code, and the first step is to be able to *read* the
+code, so let's look at how to add a disassembler to our simple CHIP-8 emulator.
+
+The full series of posts so far:
+
+1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/)
+2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/)
+3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/)
+
+The full emulator source is on [BitBucket][] and [GitHub][].
+
+[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8
+[BitBucket]: https://bitbucket.org/sjl/cl-chip8
+[GitHub]: https://github.com/sjl/cl-chip8
+
+<div id="toc"></div>
+
+## Disassembling Single Instructions
+
+The first thing we'll need is a way to take a single instruction like `#x8055`
+and turn it into something we can read.  The easiest way to do this seemed to be
+to copy the dispatch loop from the CPU emulator and turn it into a disassembly
+function:
+
+```lisp
+(defun disassemble-instruction (instruction)
+  (flet ((v (n) (symb 'v (format nil "~X" n))))
+    (let ((_x__ (ldb (byte 4 8) instruction))
+          (__x_ (ldb (byte 4 4) instruction))
+          (___x (ldb (byte 4 0) instruction))
+          (__xx (ldb (byte 8 0) instruction))
+          (_xxx (ldb (byte 12 0) instruction)))
+      (case (logand #xF000 instruction)
+        (#x0000 (case instruction
+                  (#x00E0 '(cls))
+                  (#x00EE '(ret))))
+        (#x1000 `(jp ,_xxx))
+        (#x2000 `(call ,_xxx))
+        (#x3000 `(se ,(v _x__) ,__xx))
+        (#x4000 `(sne ,(v _x__) ,__xx))
+        (#x5000 (case (logand #x000F instruction)
+                  (#x0 `(se ,(v _x__) ,(v __x_)))))
+        (#x6000 `(ld ,(v _x__) ,__xx))
+        (#x7000 `(add ,(v _x__) ,__xx))
+        (#x8000 (case (logand #x000F instruction)
+                  (#x0 `(ld ,(v _x__) ,(v __x_)))
+                  (#x1 `(or ,(v _x__) ,(v __x_)))
+                  (#x2 `(and ,(v _x__) ,(v __x_)))
+                  (#x3 `(xor ,(v _x__) ,(v __x_)))
+                  (#x4 `(add ,(v _x__) ,(v __x_)))
+                  (#x5 `(sub ,(v _x__) ,(v __x_)))
+                  (#x6 `(shr ,(v _x__) ,(v __x_)))
+                  (#x7 `(subn ,(v _x__) ,(v __x_)))
+                  (#xE `(shl ,(v _x__) ,(v __x_)))))
+        (#x9000 (case (logand #x000F instruction)
+                  (#x0 `(sne ,(v _x__) ,(v __x_)))))
+        (#xA000 `(ld i ,_xxx))
+        (#xB000 `(jp ,(v 0) ,_xxx))
+        (#xC000 `(rnd ,(v _x__) ,__xx))
+        (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x))
+        (#xE000 (case (logand #x00FF instruction)
+                  (#x9E `(skp ,(v _x__)))
+                  (#xA1 `(sknp ,(v _x__)))))
+        (#xF000 (case (logand #x00FF instruction)
+                  (#x07 `(ld ,(v _x__) dt))
+                  (#x0A `(ld ,(v _x__) k))
+                  (#x15 `(ld dt ,(v _x__)))
+                  (#x18 `(ld st ,(v _x__)))
+                  (#x1E `(add i ,(v _x__)))
+                  (#x29 `(ld f ,(v _x__)))
+                  (#x33 `(ld b ,(v _x__)))
+                  (#x55 `(ld (mem i) ,_x__))
+                  (#x65 `(ld ,_x__ (mem i)))))))))
+```
+
+There are a lot of other ways we could have done this, like making a proper
+parser or adding functionality to `define-opcode`, but since there's not that
+many instructions I think this is pretty reasonable.  Now we can pass in a raw,
+two-byte instruction and get out something readable:
+
+```
+[SBCL] CHIP8> (disassemble-instruction #x8055)
+(SUB V0 V5)
+
+[SBCL] CHIP8> (disassemble-instruction #x4077)
+(SNE V0 119)
+```
+
+## Disassembling Entire ROMs
+
+Disassembling a single instruction will be useful, but it would also be nice to
+disassemble an entire ROM at once to see what its code looks like.  Let's make
+a little helper function to handle that:
+
+```lisp
+(defun dump-disassembly (array &optional (start 0) (end (length array)))
+  (iterate
+    (for i :from start :below end :by 2)
+    (print-disassembled-instruction array i)
+    (sleep 0.001)))
+```
+
+The `sleep` is there because Neovim's terminal seems to shit the bed if you dump
+too much text at it at once.  Computers are garbage.
+
+Other that than, `dump-disassembly` is pretty straightforward: just iterate
+through the array of instructions two bytes at a time and print the information.
+Let's look at the printing function now:
+
+```lisp
+(defun print-disassembled-instruction (array index)
+  (destructuring-bind (address instruction disassembly)
+      (instruction-information array index)
+    (let ((*print-base* 16))
+      (format t "~3,'0X: ~4,'0X ~24A~%"
+              address
+              instruction
+              (or disassembly "")))))
+```
+
+Once again we'll delegate to a helper function.
+`print-disassembled-instruction` just handles the string formatting to dump an
+instruction to the screen.  Running it for a single instruction would print
+something like:
+
+```
+Address    Disassembly
+ |          |
+ v          v
+200: 8055 (SUB V0 V5)
+      ^
+      |
+      Raw instruction
+```
+
+The helper function `instruction-information` is simple, but we'll be using it
+in the future for something else, so it's nice to have:
+
+```lisp
+(defun instruction-information (array index)
+  (let ((instruction (retrieve-instruction array index)))
+    (list index
+          instruction
+          (disassemble-instruction instruction))))
+```
+
+It just takes an address and memory array and returns a list of:
+
+* The address
+* The raw instruction at the address
+* The disassembly for that instruction
+
+`retrieve-instruction` is simple (for now):
+
+```lisp
+(defun retrieve-instruction (array index)
+  (cat-bytes (aref array index)
+             (aref array (1+ index))))
+```
+
+These functions *could* be combined into a single, bigger function, but I'm
+a strong believer in having each function do exactly one thing only.  And as
+we'll see, each of these "simple" tasks is going to get more complicated in the
+real world.
+
+Now we can dump the disassembly for a ROM to see how it works:
+
+```
+(run "roms/ufo.rom") ; stores the current chip struct in *c*
+
+(dump-disassembly (chip-memory *c*) #x200 #x220)
+200: A2CD (LD I 2CD)
+202: 6938 (LD V9 38)
+204: 6A08 (LD VA 8)
+206: D9A3 (DRW V9 VA 3)
+208: A2D0 (LD I 2D0)
+20A: 6B00 (LD VB 0)
+20C: 6C03 (LD VC 3)
+20E: DBC3 (DRW VB VC 3)
+210: A2D6 (LD I 2D6)
+212: 641D (LD V4 1D)
+214: 651F (LD V5 1F)
+216: D451 (DRW V4 V5 1)
+218: 6700 (LD V7 0)
+21A: 680F (LD V8 F)
+21C: 22A2 (CALL 2A2)
+21E: 22AC (CALL 2AC)
+```
+
+## Sprites
+
+Take a look at `print-disassembled-instruction` again:
+
+```lisp
+(defun print-disassembled-instruction (array index)
+  (destructuring-bind (address instruction disassembly)
+      (instruction-information array index)
+    (let ((*print-base* 16))
+      (format t "~3,'0X: ~4,'0X ~24A~%"
+              address
+              instruction
+              (or disassembly "")))))
+```
+
+Notice how we used `(or disassembly "")` instead of just passing in the
+disassembly.  If you look back at `disassemble-instruction` you'll see it uses
+normal `case` statements, not `ecase`, so if the instruction doesn't match any
+valid opcodes it will return `nil`.
+
+The CHIP-8 (like [most computers][neumann]) uses the same memory to hold both
+program code (instructions) and data.  This includes things like player health,
+score, location, and most importantly: the sprites that will be drawn on the
+screen.
+
+[neumann]: https://en.wikipedia.org/wiki/Von_Neumann_architecture
+
+Unfortunately there's no way to know for sure whether a given memory location
+contains an instruction (and thus needs to be disassembled) or is intended to be
+a piece of data.  Indeed, someone like [Mel][] could conceivably figure out
+a way to use a particular sequence of instructions as a sprite!  So our
+disassembler will just always show the disassembly for anything that *might* be
+an instruction.
+
+[Mel]: http://www.catb.org/jargon/html/story-of-mel.html
+
+But with that said, we can probably make some educated guesses.  If we had some
+way to visualize what a hunk of memory would look like *if* it were rendered as
+a sprite, we could probably figure out where most of the program's sprites are
+kept.  It's unlikely that any given sequence of instructions would just *happen*
+to look like a ghost from Pac Man or something.
+
+We could add a separate function to draw out the sprite data, but the CHIP-8's
+sprites are so simple that we can just tack it on to the disassembly output.
+
+[Remember][cowgod-disp] that each byte of memory defines one eight-pixel-wide
+row of a sprite, and that `DRW X, Y, Size` will draw `Size` rows of a sprite
+using contiguous bytes in memory.  So if memory contains something like this (at
+the location specified by the index register):
+
+[cowgod-disp]: http://devernay.free.fr/hacks/chip8/C8TECH10.HTM#2.4
+
+```
+Address Data
+#x300   #b11110000
+#x301   #b00010000
+#x302   #b11110000
+#x303   #b00010000
+#x304   #b11110000
+```
+
+A `DRW X, Y, 5` instruction would draw a `3` sprite to the screen:
+
+<pre class="lineart">
+    ████
+       █
+    ████
+       █
+    ████
+</pre>
+
+It would be trivial to simply render the bits of any given instruction as spaces
+and some other ASCII character and tack it onto the end of the disassembly, but
+there's a snag: instructions are *two* bytes each, but each row in a sprite is
+*one* byte long.  Our sprites would get pretty mangled if we printed two of
+their rows per line of disassembly — for example, `4` would look like this:
+
+```
+                                   byte 1  byte 2
+                                   1111111122222222
+064: 9090 (SNE V0 V9)              █  █    █  █
+066: F010                          ████    █
+068: 10F0 (JP F0)                     █
+```
+
+Not ideal.  One option would be to make every instruction of disassembly two
+lines long, but that's painful to read when trying to look at the code portions
+of the ROM.
+
+We can get around this with a delightful little hack: using characters from
+[Unicode Block Elements][block] to cram two rows of sprite data into a single
+line of output.  Let's start by defining a `bit-diagram` function that will take
+a two-byte-wide integer and return an ASCII diagram of its bits:
+
+[block]: https://en.wikipedia.org/wiki/Block_Elements
+
+```lisp
+(defun bit-diagram (integer)
+  (iterate (for high-bit :from 15 :downto 8)
+           (for low-bit :from 7 :downto 0)
+           (for hi = (logbitp high-bit integer))
+           (for lo = (logbitp low-bit integer))
+           (collect (cond
+                      ((and hi lo) #\full_block)
+                      (hi #\upper_half_block)
+                      (lo #\lower_half_block)
+                      (t #\space))
+                    :result-type 'string)))
+```
+
+```lisp
+; Example rows of sprite data:
+; 11110000
+; 11001100
+
+(bit-diagram #b1111000011001100)
+"██▀▀▄▄  "
+```
+
+Now that we've got this we can easily add it into our disassembly functions:
+
+```lisp
+(defun instruction-information (array index)
+  (let ((instruction (retrieve-instruction array index)))
+    (list index
+          instruction
+          (disassemble-instruction instruction)
+          (bit-diagram instruction))))           ; NEW
+
+(defun print-disassembled-instruction (array index)
+  (destructuring-bind (address instruction disassembly bits)
+      (instruction-information array index)
+    (let ((*print-base* 16))
+                                     ; NEW
+      (format t "~3,'0X: ~4,'0X ~24A ~8A~%"
+              address
+              instruction
+              (or disassembly "")
+              bits))))              ; NEW
+```
+
+Now when we dump the disassembly of a ROM we'll also see what each instruction
+would look like if drawn as a sprite.  For program code this will tend to look
+like garbage (unless some crazy person has managed to write code that also works
+as sprites):
+
+<pre class="lineart" style="line-height: 1.0 !important;">
+    200: A2CD (LD I 2CD)               █▄▀ ▄▄▀▄
+    202: 6938 (LD V9 38)                ▀█▄█  ▀
+    204: 6A08 (LD VA 8)                 ▀▀ █ ▀
+    206: D9A3 (DRW V9 VA 3)            █▀▄▀▀ ▄█
+    208: A2D0 (LD I 2D0)               █▄▀▄  ▀
+    20A: 6B00 (LD VB 0)                 ▀▀ ▀ ▀▀
+    20C: 6C03 (LD VC 3)                 ▀▀ ▀▀▄▄
+    20E: DBC3 (DRW VB VC 3)            ██ ▀▀ ██
+</pre>
+
+But when we look at areas of the ROM that *do* contain sprites, they look pretty
+recognizable:
+
+<pre class="lineart" style="line-height: 1.0 !important;">
+    050: F090                          █▀▀█
+    052: 9090 (SNE V0 V9)              █  █
+    054: F020                          ▀▀█▀
+    056: 6020 (LD V0 20)                ▀█
+    058: 2070 (CALL 70)                 ▄█▄
+    05A: F010                          ▀▀▀█
+    05C: F080                          █▀▀▀
+    05E: F0F0                          ████
+    060: 10F0 (JP F0)                  ▄▄▄█
+    062: 10F0 (JP F0)                  ▄▄▄█
+    064: 9090 (SNE V0 V9)              █  █
+    066: F010                          ▀▀▀█
+    068: 10F0 (JP F0)                  ▄▄▄█
+    06A: 80F0 (LD V0 VF)               █▄▄▄
+    06C: 10F0 (JP F0)                  ▄▄▄█
+    06E: F080                          █▀▀▀
+</pre>
+
+Human eyes are pretty good at picking out patterns, so when you're scrolling
+through a disassembled ROM it's pretty easy to tell which sections are sprites
+and which are data, even if it's not perfectly rendered.
+
+## Result
+
+We've now got a way to dump the disassembly of a ROM to see what its code and
+data look like.
+
+We can also inspect the rest of our emulator's state at runtime with NREPL or
+SLIME by running things like `(chip-program-counter *c*)`.
+
+## Future
+
+Manually querying for information and dumping the disassembly isn't very
+ergonomic, so in the future we'll look at adding:
+
+* Debugging infrastructure like pausing and breakpoints
+* A graphical debugger/disassembly viewer
+
+As well as a few other niceties like menus for loading ROMs, etc.