# HG changeset patch # User Steve Losh # Date 1482773320 18000 # Node ID 6b4d51f1c30dcf8b885c69506e15e9ceb53949a9 # Parent ae7bfb3acac3718b27f9b52198bad98dc642d69b Finish up disassembly entry diff -r ae7bfb3acac3 -r 6b4d51f1c30d content/blog/2016/12/chip8-debugging-infrastructure.markdown --- a/content/blog/2016/12/chip8-debugging-infrastructure.markdown Sun Dec 25 11:34:06 2016 -0500 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,198 +0,0 @@ -+++ -title = "CHIP-8 in Common Lisp: Debugging Infrastructure" -snip = "Let's figure out what the hell is going on." -date = 2016-12-31T14:50:00Z -draft = true - -+++ - -In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common -Lisp. After adding a screen, input, and sound the core of the emulator is -essentially complete. - -I've been guiding you through the code step by step and it might look pretty -simple, but that's only because I went down all the dead ends myself first. In -practice, when you're writing an emulator for a system you'll need a way to -debug the execution of code, so let's look at how to add some debugging -capabilities to our simple CHIP-8 emulator. - -The full series of posts so far: - -1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/) -2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/) -3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/) - -The full emulator source is on [BitBucket][] and [GitHub][]. - -[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8 -[BitBucket]: https://bitbucket.org/sjl/cl-chip8 -[GitHub]: https://github.com/sjl/cl-chip8 - -
- -## Disassembling - -The first thing we'll need is a way to take an instruction like `#x8055` and -turn it into something we can read. The easiest way to do this seemed to be to -copy the dispatch loop from the CPU emulator and turn it into a disassembly -function: - -```lisp -(defun disassemble-instruction (instruction) - (flet ((v (n) (symb 'v (format nil "~X" n)))) - (let ((_x__ (ldb (byte 4 8) instruction)) - (__x_ (ldb (byte 4 4) instruction)) - (___x (ldb (byte 4 0) instruction)) - (__xx (ldb (byte 8 0) instruction)) - (_xxx (ldb (byte 12 0) instruction))) - (case (logand #xF000 instruction) - (#x0000 (case instruction - (#x00E0 '(cls)) - (#x00EE '(ret)))) - (#x1000 `(jp ,_xxx)) - (#x2000 `(call ,_xxx)) - (#x3000 `(se ,(v _x__) ,__xx)) - (#x4000 `(sne ,(v _x__) ,__xx)) - (#x5000 (case (logand #x000F instruction) - (#x0 `(se ,(v _x__) ,(v __x_))))) - (#x6000 `(ld ,(v _x__) ,__xx)) - (#x7000 `(add ,(v _x__) ,__xx)) - (#x8000 (case (logand #x000F instruction) - (#x0 `(ld ,(v _x__) ,(v __x_))) - (#x1 `(or ,(v _x__) ,(v __x_))) - (#x2 `(and ,(v _x__) ,(v __x_))) - (#x3 `(xor ,(v _x__) ,(v __x_))) - (#x4 `(add ,(v _x__) ,(v __x_))) - (#x5 `(sub ,(v _x__) ,(v __x_))) - (#x6 `(shr ,(v _x__) ,(v __x_))) - (#x7 `(subn ,(v _x__) ,(v __x_))) - (#xE `(shl ,(v _x__) ,(v __x_))))) - (#x9000 (case (logand #x000F instruction) - (#x0 `(sne ,(v _x__) ,(v __x_))))) - (#xA000 `(ld i ,_xxx)) - (#xB000 `(jp ,(v 0) ,_xxx)) - (#xC000 `(rnd ,(v _x__) ,__xx)) - (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x)) - (#xE000 (case (logand #x00FF instruction) - (#x9E `(skp ,(v _x__))) - (#xA1 `(sknp ,(v _x__))))) - (#xF000 (case (logand #x00FF instruction) - (#x07 `(ld ,(v _x__) dt)) - (#x0A `(ld ,(v _x__) k)) - (#x15 `(ld dt ,(v _x__))) - (#x18 `(ld st ,(v _x__))) - (#x1E `(add i ,(v _x__))) - (#x29 `(ld f ,(v _x__))) - (#x33 `(ld b ,(v _x__))) - (#x55 `(ld (mem i) ,_x__)) - (#x65 `(ld ,_x__ (mem i))))))))) -``` - -There are a lot of other ways we could have done this, like making a proper -parser or adding functionality to `define-opcode`, but since there's not that -many instructions I think this is pretty reasonable. Now we can pass in a raw, -two-byte instruction and get out something readable: - -``` -[SBCL] CHIP8> (disassemble-instruction #x8055) -(SUB V0 V5) - -[SBCL] CHIP8> (disassemble-instruction #x4077) -(SNE V0 119) -``` - -Disassembling a single instruction will be useful, but it would also be nice to -disassemble an entire ROM at once to see what its code looks like. Let's make -a little helper function to handle that: - -```lisp -(defun dump-disassembly (array &optional (start 0) (end (length array))) - (iterate - (for i :from start :below end :by 2) - (print-disassembled-instruction array i) - (sleep 0.001))) -``` - -The `sleep` is there because Neovim's terminal seems to shit the bed if you dump -too much text at it at once. Computers are garbage. - -Other that than, `dump-disassembly` is pretty straightforward: just iterate -through the array of instructions two bytes at a time and print the information. -Let's look at the printing function now: - -```lisp -(defun print-disassembled-instruction (array index) - (destructuring-bind (address instruction disassembly) - (instruction-information array index) - (let ((*print-base* 16)) - (format t "~3,'0X: ~4,'0X ~24A~%" - address - instruction - (or disassembly ""))))) -``` - -Once again we'll delegate to a helper function. -`print-disassembled-instruction` just handles the string formatting to dump an -instruction to the screen. Running it for a single instruction would print -something like: - -``` -Address Disassembly - | | - v v -200: 8055 (SUB V0 V5) - ^ - | - Raw instruction -``` - -The helper function `instruction-information` is simple, but we'll be using it -in the future for something else, so it's nice to have: - -```lisp -(defun instruction-information (array index) - (let ((instruction (retrieve-instruction array index))) - (list index - instruction - (disassemble-instruction instruction)))) -``` - -`retrieve-instruction` is simple (for now): - -```lisp -(defun retrieve-instruction (array index) - (cat-bytes (aref array index) - (aref array (1+ index)))) -``` - -These functions *could* be combined into a single, bigger function, but I'm -a strong believer in having each function do exactly one thing only. And as -we'll see, each of these "simple" tasks is going to get more complicated in the -real world. - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` - -```lisp -``` diff -r ae7bfb3acac3 -r 6b4d51f1c30d content/blog/2016/12/chip8-disassembly.markdown --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/content/blog/2016/12/chip8-disassembly.markdown Mon Dec 26 12:28:40 2016 -0500 @@ -0,0 +1,404 @@ ++++ +title = "CHIP-8 in Common Lisp: Disassembly" +snip = "What's in a ROM?" +date = 2016-12-31T14:50:00Z +draft = true + ++++ + +In the previous posts we looked at how to emulate a [CHIP-8][] CPU with Common +Lisp. After adding a screen, input, and sound the core of the emulator is +essentially complete. + +I've been guiding you through the code step by step and it might look pretty +simple, but that's only because I went down all the dead ends myself first. In +practice, when you're writing an emulator for a system you'll need a way to +debug the execution of code, and the first step is to be able to *read* the +code, so let's look at how to add a disassembler to our simple CHIP-8 emulator. + +The full series of posts so far: + +1. [CHIP-8 in Common Lisp: The CPU](http://stevelosh.com/blog/2016/12/chip8-cpu/) +2. [CHIP-8 in Common Lisp: Graphics](http://stevelosh.com/blog/2016/12/chip8-graphics/) +3. [CHIP-8 in Common Lisp: Input](http://stevelosh.com/blog/2016/12/chip8-input/) + +The full emulator source is on [BitBucket][] and [GitHub][]. + +[CHIP-8]: https://en.wikipedia.org/wiki/CHIP-8 +[BitBucket]: https://bitbucket.org/sjl/cl-chip8 +[GitHub]: https://github.com/sjl/cl-chip8 + +
+ +## Disassembling Single Instructions + +The first thing we'll need is a way to take a single instruction like `#x8055` +and turn it into something we can read. The easiest way to do this seemed to be +to copy the dispatch loop from the CPU emulator and turn it into a disassembly +function: + +```lisp +(defun disassemble-instruction (instruction) + (flet ((v (n) (symb 'v (format nil "~X" n)))) + (let ((_x__ (ldb (byte 4 8) instruction)) + (__x_ (ldb (byte 4 4) instruction)) + (___x (ldb (byte 4 0) instruction)) + (__xx (ldb (byte 8 0) instruction)) + (_xxx (ldb (byte 12 0) instruction))) + (case (logand #xF000 instruction) + (#x0000 (case instruction + (#x00E0 '(cls)) + (#x00EE '(ret)))) + (#x1000 `(jp ,_xxx)) + (#x2000 `(call ,_xxx)) + (#x3000 `(se ,(v _x__) ,__xx)) + (#x4000 `(sne ,(v _x__) ,__xx)) + (#x5000 (case (logand #x000F instruction) + (#x0 `(se ,(v _x__) ,(v __x_))))) + (#x6000 `(ld ,(v _x__) ,__xx)) + (#x7000 `(add ,(v _x__) ,__xx)) + (#x8000 (case (logand #x000F instruction) + (#x0 `(ld ,(v _x__) ,(v __x_))) + (#x1 `(or ,(v _x__) ,(v __x_))) + (#x2 `(and ,(v _x__) ,(v __x_))) + (#x3 `(xor ,(v _x__) ,(v __x_))) + (#x4 `(add ,(v _x__) ,(v __x_))) + (#x5 `(sub ,(v _x__) ,(v __x_))) + (#x6 `(shr ,(v _x__) ,(v __x_))) + (#x7 `(subn ,(v _x__) ,(v __x_))) + (#xE `(shl ,(v _x__) ,(v __x_))))) + (#x9000 (case (logand #x000F instruction) + (#x0 `(sne ,(v _x__) ,(v __x_))))) + (#xA000 `(ld i ,_xxx)) + (#xB000 `(jp ,(v 0) ,_xxx)) + (#xC000 `(rnd ,(v _x__) ,__xx)) + (#xD000 `(drw ,(v _x__) ,(v __x_) ,___x)) + (#xE000 (case (logand #x00FF instruction) + (#x9E `(skp ,(v _x__))) + (#xA1 `(sknp ,(v _x__))))) + (#xF000 (case (logand #x00FF instruction) + (#x07 `(ld ,(v _x__) dt)) + (#x0A `(ld ,(v _x__) k)) + (#x15 `(ld dt ,(v _x__))) + (#x18 `(ld st ,(v _x__))) + (#x1E `(add i ,(v _x__))) + (#x29 `(ld f ,(v _x__))) + (#x33 `(ld b ,(v _x__))) + (#x55 `(ld (mem i) ,_x__)) + (#x65 `(ld ,_x__ (mem i))))))))) +``` + +There are a lot of other ways we could have done this, like making a proper +parser or adding functionality to `define-opcode`, but since there's not that +many instructions I think this is pretty reasonable. Now we can pass in a raw, +two-byte instruction and get out something readable: + +``` +[SBCL] CHIP8> (disassemble-instruction #x8055) +(SUB V0 V5) + +[SBCL] CHIP8> (disassemble-instruction #x4077) +(SNE V0 119) +``` + +## Disassembling Entire ROMs + +Disassembling a single instruction will be useful, but it would also be nice to +disassemble an entire ROM at once to see what its code looks like. Let's make +a little helper function to handle that: + +```lisp +(defun dump-disassembly (array &optional (start 0) (end (length array))) + (iterate + (for i :from start :below end :by 2) + (print-disassembled-instruction array i) + (sleep 0.001))) +``` + +The `sleep` is there because Neovim's terminal seems to shit the bed if you dump +too much text at it at once. Computers are garbage. + +Other that than, `dump-disassembly` is pretty straightforward: just iterate +through the array of instructions two bytes at a time and print the information. +Let's look at the printing function now: + +```lisp +(defun print-disassembled-instruction (array index) + (destructuring-bind (address instruction disassembly) + (instruction-information array index) + (let ((*print-base* 16)) + (format t "~3,'0X: ~4,'0X ~24A~%" + address + instruction + (or disassembly ""))))) +``` + +Once again we'll delegate to a helper function. +`print-disassembled-instruction` just handles the string formatting to dump an +instruction to the screen. Running it for a single instruction would print +something like: + +``` +Address Disassembly + | | + v v +200: 8055 (SUB V0 V5) + ^ + | + Raw instruction +``` + +The helper function `instruction-information` is simple, but we'll be using it +in the future for something else, so it's nice to have: + +```lisp +(defun instruction-information (array index) + (let ((instruction (retrieve-instruction array index))) + (list index + instruction + (disassemble-instruction instruction)))) +``` + +It just takes an address and memory array and returns a list of: + +* The address +* The raw instruction at the address +* The disassembly for that instruction + +`retrieve-instruction` is simple (for now): + +```lisp +(defun retrieve-instruction (array index) + (cat-bytes (aref array index) + (aref array (1+ index)))) +``` + +These functions *could* be combined into a single, bigger function, but I'm +a strong believer in having each function do exactly one thing only. And as +we'll see, each of these "simple" tasks is going to get more complicated in the +real world. + +Now we can dump the disassembly for a ROM to see how it works: + +``` +(run "roms/ufo.rom") ; stores the current chip struct in *c* + +(dump-disassembly (chip-memory *c*) #x200 #x220) +200: A2CD (LD I 2CD) +202: 6938 (LD V9 38) +204: 6A08 (LD VA 8) +206: D9A3 (DRW V9 VA 3) +208: A2D0 (LD I 2D0) +20A: 6B00 (LD VB 0) +20C: 6C03 (LD VC 3) +20E: DBC3 (DRW VB VC 3) +210: A2D6 (LD I 2D6) +212: 641D (LD V4 1D) +214: 651F (LD V5 1F) +216: D451 (DRW V4 V5 1) +218: 6700 (LD V7 0) +21A: 680F (LD V8 F) +21C: 22A2 (CALL 2A2) +21E: 22AC (CALL 2AC) +``` + +## Sprites + +Take a look at `print-disassembled-instruction` again: + +```lisp +(defun print-disassembled-instruction (array index) + (destructuring-bind (address instruction disassembly) + (instruction-information array index) + (let ((*print-base* 16)) + (format t "~3,'0X: ~4,'0X ~24A~%" + address + instruction + (or disassembly ""))))) +``` + +Notice how we used `(or disassembly "")` instead of just passing in the +disassembly. If you look back at `disassemble-instruction` you'll see it uses +normal `case` statements, not `ecase`, so if the instruction doesn't match any +valid opcodes it will return `nil`. + +The CHIP-8 (like [most computers][neumann]) uses the same memory to hold both +program code (instructions) and data. This includes things like player health, +score, location, and most importantly: the sprites that will be drawn on the +screen. + +[neumann]: https://en.wikipedia.org/wiki/Von_Neumann_architecture + +Unfortunately there's no way to know for sure whether a given memory location +contains an instruction (and thus needs to be disassembled) or is intended to be +a piece of data. Indeed, someone like [Mel][] could conceivably figure out +a way to use a particular sequence of instructions as a sprite! So our +disassembler will just always show the disassembly for anything that *might* be +an instruction. + +[Mel]: http://www.catb.org/jargon/html/story-of-mel.html + +But with that said, we can probably make some educated guesses. If we had some +way to visualize what a hunk of memory would look like *if* it were rendered as +a sprite, we could probably figure out where most of the program's sprites are +kept. It's unlikely that any given sequence of instructions would just *happen* +to look like a ghost from Pac Man or something. + +We could add a separate function to draw out the sprite data, but the CHIP-8's +sprites are so simple that we can just tack it on to the disassembly output. + +[Remember][cowgod-disp] that each byte of memory defines one eight-pixel-wide +row of a sprite, and that `DRW X, Y, Size` will draw `Size` rows of a sprite +using contiguous bytes in memory. So if memory contains something like this (at +the location specified by the index register): + +[cowgod-disp]: http://devernay.free.fr/hacks/chip8/C8TECH10.HTM#2.4 + +``` +Address Data +#x300 #b11110000 +#x301 #b00010000 +#x302 #b11110000 +#x303 #b00010000 +#x304 #b11110000 +``` + +A `DRW X, Y, 5` instruction would draw a `3` sprite to the screen: + +
+    ████
+       █
+    ████
+       █
+    ████
+
+ +It would be trivial to simply render the bits of any given instruction as spaces +and some other ASCII character and tack it onto the end of the disassembly, but +there's a snag: instructions are *two* bytes each, but each row in a sprite is +*one* byte long. Our sprites would get pretty mangled if we printed two of +their rows per line of disassembly — for example, `4` would look like this: + +``` + byte 1 byte 2 + 1111111122222222 +064: 9090 (SNE V0 V9) █ █ █ █ +066: F010 ████ █ +068: 10F0 (JP F0) █ +``` + +Not ideal. One option would be to make every instruction of disassembly two +lines long, but that's painful to read when trying to look at the code portions +of the ROM. + +We can get around this with a delightful little hack: using characters from +[Unicode Block Elements][block] to cram two rows of sprite data into a single +line of output. Let's start by defining a `bit-diagram` function that will take +a two-byte-wide integer and return an ASCII diagram of its bits: + +[block]: https://en.wikipedia.org/wiki/Block_Elements + +```lisp +(defun bit-diagram (integer) + (iterate (for high-bit :from 15 :downto 8) + (for low-bit :from 7 :downto 0) + (for hi = (logbitp high-bit integer)) + (for lo = (logbitp low-bit integer)) + (collect (cond + ((and hi lo) #\full_block) + (hi #\upper_half_block) + (lo #\lower_half_block) + (t #\space)) + :result-type 'string))) +``` + +```lisp +; Example rows of sprite data: +; 11110000 +; 11001100 + +(bit-diagram #b1111000011001100) +"██▀▀▄▄ " +``` + +Now that we've got this we can easily add it into our disassembly functions: + +```lisp +(defun instruction-information (array index) + (let ((instruction (retrieve-instruction array index))) + (list index + instruction + (disassemble-instruction instruction) + (bit-diagram instruction)))) ; NEW + +(defun print-disassembled-instruction (array index) + (destructuring-bind (address instruction disassembly bits) + (instruction-information array index) + (let ((*print-base* 16)) + ; NEW + (format t "~3,'0X: ~4,'0X ~24A ~8A~%" + address + instruction + (or disassembly "") + bits)))) ; NEW +``` + +Now when we dump the disassembly of a ROM we'll also see what each instruction +would look like if drawn as a sprite. For program code this will tend to look +like garbage (unless some crazy person has managed to write code that also works +as sprites): + +
+    200: A2CD (LD I 2CD)               █▄▀ ▄▄▀▄
+    202: 6938 (LD V9 38)                ▀█▄█  ▀
+    204: 6A08 (LD VA 8)                 ▀▀ █ ▀
+    206: D9A3 (DRW V9 VA 3)            █▀▄▀▀ ▄█
+    208: A2D0 (LD I 2D0)               █▄▀▄  ▀
+    20A: 6B00 (LD VB 0)                 ▀▀ ▀ ▀▀
+    20C: 6C03 (LD VC 3)                 ▀▀ ▀▀▄▄
+    20E: DBC3 (DRW VB VC 3)            ██ ▀▀ ██
+
+ +But when we look at areas of the ROM that *do* contain sprites, they look pretty +recognizable: + +
+    050: F090                          █▀▀█
+    052: 9090 (SNE V0 V9)              █  █
+    054: F020                          ▀▀█▀
+    056: 6020 (LD V0 20)                ▀█
+    058: 2070 (CALL 70)                 ▄█▄
+    05A: F010                          ▀▀▀█
+    05C: F080                          █▀▀▀
+    05E: F0F0                          ████
+    060: 10F0 (JP F0)                  ▄▄▄█
+    062: 10F0 (JP F0)                  ▄▄▄█
+    064: 9090 (SNE V0 V9)              █  █
+    066: F010                          ▀▀▀█
+    068: 10F0 (JP F0)                  ▄▄▄█
+    06A: 80F0 (LD V0 VF)               █▄▄▄
+    06C: 10F0 (JP F0)                  ▄▄▄█
+    06E: F080                          █▀▀▀
+
+ +Human eyes are pretty good at picking out patterns, so when you're scrolling +through a disassembled ROM it's pretty easy to tell which sections are sprites +and which are data, even if it's not perfectly rendered. + +## Result + +We've now got a way to dump the disassembly of a ROM to see what its code and +data look like. + +We can also inspect the rest of our emulator's state at runtime with NREPL or +SLIME by running things like `(chip-program-counter *c*)`. + +## Future + +Manually querying for information and dumping the disassembly isn't very +ergonomic, so in the future we'll look at adding: + +* Debugging infrastructure like pausing and breakpoints +* A graphical debugger/disassembly viewer + +As well as a few other niceties like menus for loading ROMs, etc.