--- a/README.markdown	Sat Jul 09 21:57:55 2016 +0000
+++ b/README.markdown	Sun Jul 10 12:41:03 2016 +0000
@@ -3,6 +3,8 @@
 
 I'm going to try keeping a `.plan`.  Let's see how this goes.
 
+[TOC]
+
 [Bones]: http://bitbucket.org/sjl/bones
 [AMOP]: http://www.amazon.com/dp/0262610744/?tag=stelos-20
 [Masters of Doom]: http://www.amazon.com/dp/0812972155/?tag=stelos-20
@@ -546,3 +548,145 @@
 * More [Coding Math][].
 * Poked around at some basic GGP games, which revealed some bugs and memory
   problems in [Bones][], so I fixed them.
+
+### 2016-07-10
+
+* Watched the "Learning Through Failure - Portico" GDC talk.  Short but good.
+  I think the best idea in is was "players only learn through failure when they
+  *understand* what went wrong".
+* Brain dumped some stuff about [Bones][] below.
+
+#### Splitting the Bones Store
+
+Up until now I've been writing [Bones][] as mostly a transliteration of what's
+in the [WAM book][].  This means, for example, that the heap/stack are just
+arrays of bytes in memory.  This made it easy to follow along with the book, but
+I'm at the point now where I feel like I've wrapped my head around it and am
+ready to move on.  My idea for the next big step in Bones is to split the store
+into two separate arrays.
+
+The WAM store consists of "cells", each of which has a type and a value.  For
+example, reference cells have a type of `REF` and their value is an
+index/pointer to whatever they're bound to.  Because I wanted to pack everything
+into a single Lisp array, I've implemented this by storing a cell as an
+`(unsigned-byte 16)` with a few of the low-order bits used for a type tag.  So
+to check the type of a cell, you mask off everything but the low-order bits.  To
+get the value, you shift it right.
+
+This works, and is pretty memory-efficient, but has a couple of problems.
+First: you need to do masks and shifts all the time which are extra
+instructions.  But that's probably not a really huge problem in practice -- CPUs
+are fast.  The bigger issue is that because the value of a cell has to be packed
+into part of an `unsigned-byte`, you can basically only store integers as cell
+values.  For `REF` (and `STR`) cells this is fine (they're just indexes into the
+store array), but for functor cells it's not ideal.  Bones currently keeps what
+is essentially a symbol table to map functor/arity pairs to integers, which is
+really ugly.
+
+This is an annoying layer of indirection, but it also limits us further.  I'd
+*really* like to let Bones work with arbitrary Lisp objects.  For example, you
+could say something like `(fact (username sally "sal"))` and the string there
+would Just Work.  You'd also get support for numerics basically for free (though
+I might still special-case those a bit).
+
+So the solution I've been rolling around in my head for a while is to split the
+store into two separate arrays: an array of types and an array of values.  The
+types array would just store small unique integer tags, so it could be packed
+pretty tightly (maybe four-bit bytes for each tag).  The values array would just
+be a normal Lisp `(vector t)`.  This means each cell would be 64 bits, and you
+could store pointers to normal Lisp objects.
+
+This would let us ditch the symbol table entirely.  Now a functor cell would
+just have a tag of `FUN` and the value would be a pointer directly to the Lisp
+symbol.  We'd still need to handle the arity, but we can just follow every `FUN`
+cell with a second cell containing the Lisp fixnum for the arity.
+
+Crucially, SBCL (and CCL, possibly others) will still store fixnums immediately
+in the array, so for `REF` and `STR` cells we don't need to suffer an extra
+memory deref -- we've still got the index right there in the array's contiguous
+hunk of memory.  And we actually don't need to deref even for functors in this
+new scheme; the functor values are pointers, but since we just need to compare
+them with `eq` and it just compares the pointers themselves.
+
+This also has another possible benefit: we don't need the WAM at (query) compile
+time.  Currently when you compile a query you need to have the WAM object
+present, so Bones can shove any functors in the query inside its symbol table.
+But now we just store the functor symbols directly, so there's no need to have
+the WAM available at compile time.
+
+This means that for queries which are completely known at compile time, we can
+do the expensive compilation of query-to-WAM-bytecode at compile time instead of
+runtime with a compiler macro.  For example: if we have something like `(defun
+legal-moves () (query-all (legal ?role ?move)))` we can, at Lisp compile-time,
+process that `(legal ?role ?move)` query into a hunk of WAM bytecode and turn it
+into: `(defun legal-moves () (query-all #<array of WAM bytecode>))`.  Then at
+runtime we just load that into the query area of the code store and go -- no
+expensive bytecode compilation needed.
+
+For dynamic queries where we need to build the query at runtime (e.g. ``(defun
+legal-moves-for (role) (invoke-query-all `(legal ,role ?move)))``) we can't do
+this, unfortunately.
+
+So there are a bunch of benefits to splitting up the store, but of course there
+are also drawbacks.  Obviously it will use more memory, but I don't care much
+about that.  Popping things off the stack/heap will also become more expensive
+because now we need to null out references to Lisp objects when we pop them, so
+they can be GC'ed properly.
+
+It also makes another idea I've had slightly more complicated.
+
+#### Prolog on the GPU
+
+A few months back when I was thinking about topics for my upcoming master's
+thesis one of the things on my list was seeing if I could get my WAM queries
+running on the GPU using OpenGL.  This could potentially speed things up when
+you want to know *all* the results of a query and not just the first.
+
+The basic idea would be something like this.
+
+Ahead of time:
+
+* Write an implementation of the WAM bytecode interpreter (not the compiler,
+  just the VM part) in GLSL as a vertex (or possible compute?) shader.
+* For each predicate, note how many different choices it has (essentially the
+  number of `try/retry/trust` instructions).
+
+The idea would be to divide up the and/or tree of Prolog choices into `N`
+subtrees, where `N` is the number of GPU cores you've got.  Then each core
+searches a portion of the subtree and you merge the results at the end.
+
+At query time:
+
+* Push the WAM code store to the graphics card as a texture.
+* Walk the and/or tree of predicates breadth-first to generate `M <= N`
+  "traces", each of which is a list of which choice you picked at a given node.
+* Push all those different traces to the vertex shader as vertices.
+* Each time the vertex shader runs, it gets one trace which essentially tells it
+  which direction to take at each choice point it encounters, so it can skip the
+  backtracking and just dive head-first down this one path.
+* Once it reaches the end of the trace, it just continues normally and starts
+  backtracking when necessary.
+* Results can get written to a texture framebuffer or something and pulled out
+  at the end.
+* Merge the results from all the vertices.
+
+This parallel mode would be heavily restricted.  Things like `cut` would cut off
+(no pun intended) the trace at that point, forcing full processing by the shader
+from that point on.  And of course anything with side effects would just error
+out.  But when you're just grinding through unification of basic Prolog terms it
+might potentially speed things up (though the latency to/from the graphics card
+memory could be a problem).
+
+All this is just an idea I've had for the far future, but I've been thinking
+about it because if I split up the WAM store and start treating functors as Lisp
+object pointers, that complicates things.  We can easily push integers up to the
+graphics card, but if functors are pointers (which can potentially get moved
+around by GC) what can we do?
+
+For a while I thought it was unsolvable, but now I think there might be a way.
+If we just keep a global "object index" that maps Lisp objects to integers, we
+can use those integers as stand-ins for the actual objects when we're on the
+GPU.  Of course that would interfere with GC, but there's a solution.  Most
+modern implementations support hash tables with weak values, so if we use that
+for the index I think it would fix the problem.  We'd be able to map objects to
+fixnums without interfering with GC and leaking memory.