Friday, February 27, 2009

Hardware/Software Co-Design - What it means with rekonstrukt

One of my primary goals with the rekonstrukt project is to create a hardware/software co-design environment that cuts down on turnaround times and gives me, the designer, flexibility to test and try out things at various levels. Here is what this means in practice:

rekonstrukt consists of the MC6809 microprocessor running Maisforth. The microprocessor can be implemented in three ways:

  • In the usim software emulator, which is written in C++ and runs on my workstation. In this mode, the emulator needs be extended in C++ if I want to emulate more of the hardware that the real implementation provides. This mode has fast turnaround times (the simulated 6809 is way faster than the real hardware) and is useful to try out algorithms, learn Forth and the like. It is not timing accurate.
  • In the Active-HDL simulator, again running on my workstation. In this mode, the actual VHDL implementation of rekonstrukt is simulated. The simulation is still fast enough to run small pieces of actual Forth code. It is a good mode to pinpoint hardware and timing issues, and Active-HDL has great flexibility with respect to setting breakpoints and inspecting the simulated state, which helps a great deal with diagnosing issues.
  • On the FPGA. The interactive nature of Forth means that this way is often most productive. By actually trying out things, I can quickly gain confidence in my design or explore ideas, getting immediate feedback. In this mode, the logic analyzer is the crucial tool to verify waveforms and timings, and it is not that suitable for pinpointing problems that are inside the FPGA.

In all three modes, it is possible to recompile the Forth kernel and put it into operation in a few seconds. The usim simulator directly works with the ROM image that the cross compiler generates, the VHDL simulator can load a pre-initialized ROM image in VHDL format, and the new ROM image can also be merged with a completely synthesized FPGA bit stream. Thus, synthesizing the hardware is required only when changes need to be tried out on the real hardware. Trying out firmware changes only requires running the Forth cross compiler and restarting the simulator or downloading the bitstream to the target hardware.

Synthesizing rekonstrukt on my workstation takes about 10 minutes, but as the design is split into partitions, changes to individual parts of the hardware does not require a full synthesis run. Incremental runs take only about 3 minutes. Sure, I'd love to see that be faster, but it is bearable

I love this shit :)

Sunday, February 22, 2009

ACIA fixed, S3E SK sucks

In the last two weeks, I was making good progress with my attempt to fully support the Spartan-3E starter kit by Maisforth. I had to improve my SPI controller slightly so that it works better for programming the serial Flash, worked on implementing the (ancient) standard Block I/O vocabulary so that the serial Flash can be used like a floppy disk in the old days and almost got the beautiful vibe full screen block editor to work.

One thing that was began to bother me a lot was the development cycle for Forth programs: I use Emacs to edit stuff, then copy and paste it to the real hardware. As my programs grow, this begins to take considerable time, but for some reason I could not send the data through the serial port at full speed: Maisforth reproduceably lost characters, and I had to insert a small delay after each character sent which made uploads slow. Too slow.

Thus, I went to pinpoint the reason for this character loss. At first, I thought that what I saw was a problem related to the lack of flow control. The serial port on the Spartan-3E Starter Kit is only equipped with RX and TX signals, so no hardware flow control is possible and software flow control is not implemented. After some investigation and experiments it became clear to me, though, that the problem was not flow control related. Both hardware (RTS/CTS) and software (XON/XOFF) flow control work under the assumption that data is buffered in larger chunks, but my character loss happened even when I sent short bursts at full speed to the FPGA.

The problem really was related to the lack of double buffering in the transmitter of the System09 UART. This meant that the transmitter signalled the "transmit buffer full" condition for the whole duration of the actual send operation, even though the data had already been copied from the transmit buffer to the shift register. As the serial port echo in Maisforth is generated by Forth itself, processing of every character and sending it back out through the serial port introduced a pause between every character echoed, which quickly caused problems when characters to be echoed came in at full serial port speed.

The fix for the problem was to introduce double buffering. As soon as the character to be transmitted has been copied over to the transmitter shift register, the UART module now signals the host that the transmit buffer is available for the next character to be sent. This made the problem disappear as Forth echo processing can now take place while the previous character echo is transmitted. Given the slow serial port and the high clock rate of System09, there now is plenty of time for all processing needed.

The Spartan-3E Starter Kit sucks

Well, not really. It is a nice cheap board with a good load of on-board peripherals, useful documentation and nice example designs, but it also has severe limitations: The on-board FPGA is a XC3S500E, which has 40 kByte of block RAM. In my current Maisforth setup, I use 16k for the Forth kernel ROM, 16k for RAM, and 6k for the VGA controller. This leaves only 2k free, so that makes playing with table based waveform synthesis really hard. What I hoped for was that I could put the Forth kernel and the board support Forth library into the parallel NOR flash of the S3ESK and map the Flash into the 6809 address space.

The show stopper for this plan is the fact that data bit 0 of the NOR flash is shared with the SPI bus data signal. Thus, one can either access the flash or the SPI, but not both at the same time. Seemingly, the idea is to copy the contents of the NOR flash into SDRAM and then run from there, but I'm still reluctant to try getting the SDRAM to work. The easy path is blocked, though, and maybe using SDRAM is the best option.

Sunday, February 15, 2009

Rekonstrukt - A "New" Forth Machine

I have been a Fan of Forth since the 1980ies, and when I picked up my FPGA stuff a few weeks, I decided to put some more effort into getting a complete Forth based FPGA system to run before turning back to the SECD reimplementation project. This new Forth machine is called "Rekonstrukt", and I have created a Google Code project to publish the source code.

My last FPGA productivity rush ended with Maisforth running on the Spartan-3E Starter Kit. Maisforth is a ANS-like 8 bit Forth originally written by Albert Nijhof for the somewhat obscure MC6809 based computer called "Maiskastje" built by a dutch computer club a few years ago. Porting Maisforth to System09 was rather easy; all that was needed was a little tweaking of the serial I/O routines so that the System09 MC6850-lookalike serial port was properly handled.

My next goal is to support most of the Hardware that the Spartan-3E Starter Kit has to offer by Forth. This will require some VHDL hacking in order to implement the low level interfaces as well as implementing Forth libraries.

Implementing a SPI controller

I started by implementing an SPI interface. SPI is a serial bus protocol for moderately high speed connections between devices inside one system. On the Spartan-3E board, the analog capture unit, analog preamplifier, digital to analog converter, serial flash and platform flash chips are all connected to one shared SPI bus. It would be possible to operate this bus by bit-banging in software, but this would be rather slow. In hardware, it is easily possible to operate the SPI bus at a bit rate of 10 Mhz.

While other open source VHDL SPI controllers exist, none of them easily interfaced with the "proprietary" bus protocol of System09. I took this as an opportunity to freshen up my VHDL and FPGA design skills. Again, I learned that it does not make sense to synthesize to the real hardware before simulation has shown that the design basically works, but that successful simulation does not automatically mean that the design works in the real hardware. With the help of some regulars in the #fpga IRC channel on freenode.net, I got the SPI controller to run.

Implementing vectorized I/O in Maisforth

System09 comes with a VGA controller that provides for a 80x25 text console, named vdu8. vdu8 is register based, i.e. to write a character onto the screen, the host CPU needs to write the screen position where the character should go into the cursor address registers, then write the character to be shown into the data registers. In order to make vdu8 useable as I/O device for Forth, a small terminal emulator is needed that interprets standard control sequences like Linefeed, Carriage Return and Backspace properly so that it can be used to handle EMIT calls.

The VGA console should be useable as an alternative to the serial console and switching between the two should be possible at run time. The common technique to achieve that is to make the words that need to be switched between alternative implementations be vectored, or "deferred" words. What this means is that a pointer to the word that actually implements the word is stored in the word that is being called by the system, and this pointer can be modified at run time in order to switch to the desired implementation.

Normally, the Forth compiler performs a name lookup when compiling a word, not when executing it. Thus, once a word has been compiled, the addresses of the words that are being called are fixed, and the implementations of these words cannot be changed in retrospect.

For normal, RAM based operation, implementing deferred words is quite simple:

: alias ( xt -- ) create , does> @ execute ;
: noop ( -- ) ;
: defer (  -- ) ['] noop alias ;
: is ( xt  -- ) ' >body ! ;
alias is used to set up an alias for a word. defer is used to set up an alias for the do-nothing word alias. The implementation of such a word can later be changed using the is word.

This works well for RAM based Forth in which the , word can be used to compile a number into the dictionary and change that later. For a ROM based Forth like Maisforth, the default dictionary is read only and as we are interested in vectorizing words that are used by the system, the vectors need to be put somewhere else. Maisforth provides for a "user space" for this purpose. The "user space" is located in RAM at the beginning of the address space, and the IVEC word can be used to allocate a cell from that space in the cross compiler. When adding new user vectors, the USERBYTES constant needs to be increased so that the cross compiler knows the changed memory layout. I did not find out how I could get the vectors initialized automatically by the cross compiler, so I added a new 'EMIT vector initialized to 0 which would be called in the (EMIT word, and changed the COLD word to intialize the 'EMIT vector when rekonstrukt starts.

More information about vectorized functions can be found in Leo Brodie's excellent book Starting Forth in the Chapter Under the Hood.