Saturday, September 27, 2008

Revisiting SECD and the power of Lisp

While visiting the ICFP Conference 2008, I was discussing my SECD microprocessor reimplementation project with my co-worker Kilian. The SECD is a classic virtual machine architecture supporting functional programming languages invented by Peter J. Landin in the early 1960ies. It is a relatively unusual architecture in that it is commonly described by transformations of the machine state represented as four registers containing cons cells. In fact, the virtual machine's memory conceptionally consists of linked cons cells, not of a vector of words that is addressed by integer addresses. Jia-Huai You of University of Alberta has a good description of how the SECD virtual machine works.

Quoting SECD: DESIGN ISSUES, here is the specification of the SECD virtual machine. There is one line for each machine instruction, describing the state of the four machine registers before and after the particular instruction has been executed:

While we were discussing my hardware project, Kilian suggested that we should implement a SECD in Common Lisp in order to bridge some time that we had until the next break. Given that Lisp supports lists nicely, we chose to use Lisp lists to represent the SECD memory. We first fooled around with imperative code that mimicked the required state transformations, but soon realised that it would be much easier to directly transform the specification for each machine instruction into executable Lisp. Following common practice, we started writing down how we wanted to type in the specification for each of the SECD instructions. As an example, the LD instruction would be defined by this:

(define-instr 1 LD
  (s e (LD (m . n) . c) d)
  `((,(locate m n e) . ,s) ,e ,c ,d))
The literal "1" is the op code, the "LD" is the symbolic name of the instruction, the first list is the state before the instruction is executed (with the four list elements representing the state of the four registers), and the rest specifying the code to execute for the instruction. The executed code again returns a four element list representing the new state of the machine registers.

We then wrote a macro to transform this specification syntax into Lisp code. Our DEFINE-INSTR macro uses the first list as an argument to DESTRUCTURING-BIND, binding all the names that we then use in the transformed state:

(defmacro define-instr (code name pre post)
  (let ((name* (intern (format nil "~A-INSTR" name))))
       (setf (gethash ,code *instr-hash*) ',name*)
       (defun ,name* ()
         (multiple-value-setq (s e c d)
           (destructuring-bind ,pre (list s e c d)
             (assert (= ,name ,code))
             (values-list ,post)))))))

We were quite happy with our virtual machine, and we were able to run several of the precompiled SECD "binaries" from the ancient LispKit software collection. Getting the macro right and implementing the 21 instructions took little time, and it was easy to, say, instrument the macro for tracing or augmenting it with checks to make sure that we got the original syntax right. It would certainly be possible to polish the macro more - for example, having to specify the mnemonic of the operation twice is not so pretty. We just meant to create something that we could use to explore the behavior of the SECD architecture, so we did not care polishing.

The next day, I attended an introductory Erlang workshop and as I had some prior knowledge of Erlang (Programming Erlang by Joe Armstrong is a good read), I decided to try implementing a SECD VM in Erlang instead of doing the exercises suggested by the lecturer. Erlang has built-in pattern matching, so I used a transliteration scheme of the VM specification very similar to what I used in Lisp. Also, Erlang has lists, so representing the SECD memory as Erlang lists should be a one to one match, too.

So, here is the CAR instruction implemented in Erlang:

secd(S = [ [ CAR | _ ] | _ ],
     [ ?CAR | C ],
     D) ->
    secd([ CAR | S ],
In Erlang, function dispatch is based on pattern matching, and there is total flexibility to match structures and bind variables to what has matched. For example, in the implementation of the car operation above, we bind the variable S to the previous stack contents, and at the same time bind the variable CAR to the car of the cons cell that is sitting on top of the stack.

The Erlang version of the VM grew well, and the source code looked as nice as the Lisp version. Yet, as I tried to implement the the RAP instruction, I faced a major stumbling block.

The RAP instruction is used to support recursive function calls and works by replacing a previously created dummy environment on the stack, called OMEGA, by one which contains all the functions that are visible in the recursive scope. The specification uses RPLACA to denote that replacement operation, and that is what we used in our Lisp implementation of SECD, too:

(define-instr 7 RAP
  (((c* . e*) v . s) (om . e) (rap . c) d)
    (assert (equal om 'omega))
    `(nil ,(rplaca e* v) ,c* (,s ,e ,c . ,d))))

When trying to implement RAP in Erlang, I got stuck because there are no destructive operations on lists. Not in the standard API, and seemingly not in the system API either. So, the Erlang SECD looks nice, only it does not run. Frankly, our Lisp based SECD has bugs, too, and I am not totally sure whether the problem is not in the RAP instruction.

Curiously, getting the RAP instruction right is harder than one wants to expect: In the SECD hardware implementation that I worked on, I used the original microcode of the SECD chip developed at the University of Calgary as part of a verification project. I found two major bugs in this microcode which, ironically, was part of the file set that compromises the proof that the SECD hardware implementation was correct according to its abstract specification. One bug was the garbage collector which did not work at all, and the other was in the RAP instruction which did not work either. I have trouble not concluding something regarding academia in general from these findings.

Anyway.... this was another time when I realized how much I like Common Lisp and why it will propably not be replaced by another language anytime soon. Common Lisp is teh tool of choice for my exploratory programming, as it does not have a terrible lot of restrictions, yet supports some advanced programming abstractions. There are domains where Common Lisp does not have much to offer, though. Concurrent programming currently is the one area where the lack of support in Common Lisp hurts me most. Erlang may be my cure, and I hope to look into Lisp Flavoured Erlang in order to spare myself the arcane syntax that Erlang provides by default.