Thursday, October 2, 2008

October 02, 2008

Trying Clojure...

I am becoming increasingly frustrated by Common Lisp's age. On the one hand, history makes it what it is: Mature, well-documented, thoroughly understood and practical. On the other, it fails to keep up with current system designs, lacking convenient native support for rich data structures, infrastructure access and parallel programming. No programming language choice is without tradeoffs and in that respect, and I'll still chose Common Lisp in many situations. Realistically, though, Common Lisp cannot be the only language in my tool chest. For browser work, Javascript is much more practical, and for parallel programming, I'm on the lookout.

On my last visit in Cambridge, I attended the Boston Lisp Meeting. The presentation scheduled was about the new Lisp dialect the Clojure by Rich Hickey, who is the creator of this language.

Parallel programming is one of the areas that Clojure wants to make easier. It does so by making all data structure immutable and by language-level abstractions for concurrent data access. In that respect, it is similar to Erlang as it requires a functional programming style everywhere. Unlike Erlang, which puts almost all operations with side effects behind a common message passing interface, Clojure exposes the full Java API to programs. Thus, side effects can be produced everywhere, except in algorithms implemented in Clojure.

Clojure wants to be a Lisp, but it explicitly does not try to be backwards compatible. This opened a rather large design space to Rich Hickey, and some of the choices he made really do make sense. He specifies a reader, yet his reader does not intern symbols. That is a big win, as it allows the reader to actually work with arbitary Clojure source files. In Common Lisp, one needs to re-implement a full reader which does not intern symbols if one wants to read Common Lisp source files. This is kind of ironic, as the "Code is Data" mantra that we keep repeating does not really reflect what is possible in practice.

Rich managed to make me enthusiastic about Clojure, and I decided to give it a spin with a real project that I wanted to conduct anyway: I am a Twitter user, and I would like to be notified of new posts to Planet Lisp on Twitter. The program would read the Planet's RSS feed using a HTTP request, determine if new items have been posted and update the Twitter status of the Planet Lisp Twitter account that I have set up for this purpose.

Getting the development environment up

My laptop runs Windows, but I do most of my Lisp development in a VMware running FreeBSD. As Clojure is hosted on the Java virtual machine, I decided to avoid the indirection and use Windows as my native platform. Clojure support for Slime is available, so I can stay in my familiar environment. A very recent CVS checkout of Slime is required, which took me a little while to figure out.

Processing XML

It took me a another while to discover what the proper way to process XML in Clojure is. An XML parser is included with the base distribution, but there is no information in the documentation how one would actually work with the data structure that is generated by the parser. For my application, I wanted to iterate over certain elements in the XML and extract a few subelements in a loop. An evaluator for XPath expressions would have suited the job, but obviously that is not the Clojure way to do it.

In Clojure, XML does not receive any special treatment. The parser reads it into a tree, and processing is performed using functions that process trees. As simple as it may sound, I had a hard time finding a practical example of how this was supposed to work. Chris Houser finally got me on the right track when he pointed me to his zip_filter package. zip_filter is a tool for filtering data out of trees, and it can work with trees produced by Clojure's XML parser.

Once I had figured this out, things were very easy. The Planet Lisp RSS feed can be read into an XML tree with

(clojure.zip/xml-zip (clojure.xml/parse "http://planet.lisp.org/rss20.xml"))

and one can extract data out of the parsed tree using path expressions:

(clojure.contrib.zip-filter.xml/xml-> xml :channel :item)

This certainly is concise and elegant, and I like the fact that XML is kept out of the program's way. Note that I've included the namespace prefixes in the examples above so that they can be copied and pasted. Normally, one would import these namespaces so that shorter or no namespace needs to be specified to access the symbols.

Making HTTP requests

In order to update the Twitter status, a HTTP POST request to Twitter must be made. A HTTP client which is based on the java.net.HttpsURLConnection class is available from the Files section of the Clojure Google Group. It is not a polished product and required some minor tweaking, but after that, fetching something using HTTP is as easy as

(http-client/url-do "http://planet.lisp.org/" "GET" {})

Reading and writing files

The Twitter gateway needs to persistently store the list of articles that it has found on Planet Lisp in order to decide which of the current articles are new. Clojure is a Lisp and thus can read and write its own data structures easily, so all it takes is calling a few functions. As in the XML case, the hard part was figuring out what the proper functions are, as the documentation is sparse. I found Stuart Halloway's Blog in which he publishes a number of articles describing how the examples from Peter Seibel's book Practical Common Lisp can be implemented in Clojure. The Simple Database example pretty much does the same thing that I required, so I looked at that. Unfortunately, the "spit" function that is the complement to "slurp" and writes a string to a named file did not appear to exist in the clojure namespace. It took a little grepping to find it in clojure.contrib.duck-streams, and referenced from there, it works as expected.

Functional glue

Having all the required components in place, it was time to come up with the real code. As data structures are immutable in Clojure, the challenge was to express the required loop so that both the new persistent state and the list of new postings would be maintained. I came up with a recursive function with two accumulators:

(defn poll
  "Poll planet lisp, check for new postings, update Twitter status when new postings have appeared"
  []
  (save-data
   (let [old-data (load-data)
         process
         (fn [items new-data new-items]
           (if items
             (let [item (first items)
                   guid (first (xml-> item :guid text)) ]
               (recur (rest items)
                      (conj new-data guid)
                      (if (old-data guid)
                        new-items
                        (conj new-items (first (xml-> item :title text))))))
             (do
               (maybe-post-twit new-items)
               new-data)))]
     (process (xml-> (feed-to-zip "http://planet.lisp.org/rss20.xml")
                     :channel :item)
              #{} []))))

To Scheme programmers, this style should be familiar. I find it not terrible myself, although one could certainly push things around to suit taste.

The Verdict

I was sceptical when I first read about Clojure. I am sceptical of new languages in general, and new Lisp family members always make me think "why?" first. Rich Hickey's presentation got me interested because he gave very good reasoning for his design choices. He also is a very good presenter and could easily withstand an audience of die-hard Lispers that included people who have played an active role in the creation of Common Lisp. But that is the singer, not the song.

The good

The fixed reader, vastly improved data structure support, access to a host of libraries and concurrency support all make a good case for Clojure. It is Lisp in many respects, and in the uncompromised macro facility opens up the extensibility that I am used to from Common Lisp.

The bad

Clojure is not multi-paradigm in the sense that Common Lisp is: A functional programming style is required, and there is no way around that. Making the comma be white space is somewhat of an arbitary decision, that does not match the other design choices that seem to be grounded better.

The ugly

The error messages that the compiler produces are mostly useless. Debugging is hard, as no tracing facility and no breakpoints are available - Or maybe they are, but I could not find them in the documentation. The overall immaturity of the language shows frequently, and I have spent hours looking for code examples and finding my way through something that is very much in flux.

My Conclusion

I like Clojure, as it seems to fill my need for a language that supports concurrency and makes it possible to write modern desktop applications, while still being a Lisp. I will use it for another project I have been planning to do in Erlang. For general exploratory development, Clojure is not yet a good choice as the development environment is too immature.

I am not buying Rich Hickey's claim that classic object oriented programming is bad because it does not support concurrency. I begin thinking of object oriented systems more as active databases that allow modeling of complex, interconnected and persistent data structures. Such structures are managable with a pure sequential execution model, and one should refrain from mixing that with tasks that are inherently concurrent.

Finally: My Twitter gateway for Planet Lisp exists, but I have not yet been able to deploy it. Getting Java to run on FreeBSD/amd64 has proven to be kind of a challenge. I will have this sorted out soon, so feel free to follow planet_lisp.

Saturday, September 27, 2008

September 27, 2008

Revisiting SECD and the power of Lisp

While visiting the ICFP Conference 2008, I was discussing my SECD microprocessor reimplementation project with my co-worker Kilian. The SECD is a classic virtual machine architecture supporting functional programming languages invented by Peter J. Landin in the early 1960ies. It is a relatively unusual architecture in that it is commonly described by transformations of the machine state represented as four registers containing cons cells. In fact, the virtual machine's memory conceptionally consists of linked cons cells, not of a vector of words that is addressed by integer addresses. Jia-Huai You of University of Alberta has a good description of how the SECD virtual machine works.

Quoting SECD: DESIGN ISSUES, here is the specification of the SECD virtual machine. There is one line for each machine instruction, describing the state of the four machine registers before and after the particular instruction has been executed:

While we were discussing my hardware project, Kilian suggested that we should implement a SECD in Common Lisp in order to bridge some time that we had until the next break. Given that Lisp supports lists nicely, we chose to use Lisp lists to represent the SECD memory. We first fooled around with imperative code that mimicked the required state transformations, but soon realised that it would be much easier to directly transform the specification for each machine instruction into executable Lisp. Following common practice, we started writing down how we wanted to type in the specification for each of the SECD instructions. As an example, the LD instruction would be defined by this:

(define-instr 1 LD
  (s e (LD (m . n) . c) d)
  `((,(locate m n e) . ,s) ,e ,c ,d))

The literal "1" is the op code, the "LD" is the symbolic name of the instruction, the first list is the state before the instruction is executed (with the four list elements representing the state of the four registers), and the rest specifying the code to execute for the instruction. The executed code again returns a four element list representing the new state of the machine registers.

We then wrote a macro to transform this specification syntax into Lisp code. Our DEFINE-INSTR macro uses the first list as an argument to DESTRUCTURING-BIND, binding all the names that we then use in the transformed state:

(defmacro define-instr (code name pre post)
  (let ((name* (intern (format nil "~A-INSTR" name))))
    `(progn
       (setf (gethash ,code *instr-hash*) ',name*)
       (defun ,name* ()
         (multiple-value-setq (s e c d)
           (destructuring-bind ,pre (list s e c d)
             (assert (= ,name ,code))
             (values-list ,post)))))))

We were quite happy with our virtual machine, and we were able to run several of the precompiled SECD "binaries" from the ancient LispKit software collection. Getting the macro right and implementing the 21 instructions took little time, and it was easy to, say, instrument the macro for tracing or augmenting it with checks to make sure that we got the original syntax right. It would certainly be possible to polish the macro more - for example, having to specify the mnemonic of the operation twice is not so pretty. We just meant to create something that we could use to explore the behavior of the SECD architecture, so we did not care polishing.

The next day, I attended an introductory Erlang workshop and as I had some prior knowledge of Erlang (Programming Erlang by Joe Armstrong is a good read), I decided to try implementing a SECD VM in Erlang instead of doing the exercises suggested by the lecturer. Erlang has built-in pattern matching, so I used a transliteration scheme of the VM specification very similar to what I used in Lisp. Also, Erlang has lists, so representing the SECD memory as Erlang lists should be a one to one match, too.

So, here is the CAR instruction implemented in Erlang:

secd(S = [ [ CAR | _ ] | _ ],
     E,
     [ ?CAR | C ],
     D) ->
    secd([ CAR | S ],
         E,
         C,
         D);

In Erlang, function dispatch is based on pattern matching, and there is total flexibility to match structures and bind variables to what has matched. For example, in the implementation of the car operation above, we bind the variable S to the previous stack contents, and at the same time bind the variable CAR to the car of the cons cell that is sitting on top of the stack.

The Erlang version of the VM grew well, and the source code looked as nice as the Lisp version. Yet, as I tried to implement the the RAP instruction, I faced a major stumbling block.

The RAP instruction is used to support recursive function calls and works by replacing a previously created dummy environment on the stack, called OMEGA, by one which contains all the functions that are visible in the recursive scope. The specification uses RPLACA to denote that replacement operation, and that is what we used in our Lisp implementation of SECD, too:

(define-instr 7 RAP
  (((c* . e*) v . s) (om . e) (rap . c) d)
  (progn
    (assert (equal om 'omega))
    `(nil ,(rplaca e* v) ,c* (,s ,e ,c . ,d))))

When trying to implement RAP in Erlang, I got stuck because there are no destructive operations on lists. Not in the standard API, and seemingly not in the system API either. So, the Erlang SECD looks nice, only it does not run. Frankly, our Lisp based SECD has bugs, too, and I am not totally sure whether the problem is not in the RAP instruction.

Curiously, getting the RAP instruction right is harder than one wants to expect: In the SECD hardware implementation that I worked on, I used the original microcode of the SECD chip developed at the University of Calgary as part of a verification project. I found two major bugs in this microcode which, ironically, was part of the file set that compromises the proof that the SECD hardware implementation was correct according to its abstract specification. One bug was the garbage collector which did not work at all, and the other was in the RAP instruction which did not work either. I have trouble not concluding something regarding academia in general from these findings.

Anyway.... this was another time when I realized how much I like Common Lisp and why it will propably not be replaced by another language anytime soon. Common Lisp is teh tool of choice for my exploratory programming, as it does not have a terrible lot of restrictions, yet supports some advanced programming abstractions. There are domains where Common Lisp does not have much to offer, though. Concurrent programming currently is the one area where the lack of support in Common Lisp hurts me most. Erlang may be my cure, and I hope to look into Lisp Flavoured Erlang in order to spare myself the arcane syntax that Erlang provides by default.

Friday, August 1, 2008

August 01, 2008

Schema evolution made easy

Applications age, and with age often comes the desire to change things. One of the more painful aspects of changing a deployed application is transforming existing data. This post shows some of the schema evolution techniques that we have used to update applications using the BKNR datastore.

With traditional database-based persistence solutions, converting existing databases is often done by first exporting the data using database tools or specialized exporter software and then importing it into the new database schema, possibly converting data while reading. Using explicit export and import steps can be tricky and time consuming. The export side often comes for free, but the importer will need to reinstantiate the correct objects in the new schema knowing both the export format and the new schema. Making such importers work reliably is an additional task that has no additional value, and after having converted the existing databases, the code written can typically be thrown away.

A common technique for making schema evolution less painful is to make only additive schema changes. Old information is preserved, and the new code does the conversion either on the fly or in a separate conversion run. This scheme leads to both cruft in the database and in the source code, and requires additional data cleanup steps to finalize the schema upgrade once the application has been migrated to the new code base. Substantial structural changes require more thought in this scheme, in particular if database side integrity constraints need to be considered.

All in all, schema evolution is a messy business.

Recently, we have been working on extending and updating two of the applications that we have developed using the BKNR datastore, and the schema evolution problem occured for us to be solved, too. When designing the store, we felt that schema evolution should be easy. After all, we have all our data in main memory, so we can always load the old data using the old application code, reload the application, and implement methods for the relevant CLOS functions that are called by the object system when classes are changed.

In practice, we found the need to load the old code, the data and then the new code in succession to be inconvenient. Keeping the old code around in the right places would be rather burdensome, and we also had some problems with (our usage of) CLOS in the face of large metadata changes.

As we keep snapshots of our data in a file, we decided that our schema evolution process would work by converting the data into the new schema while reading the snapshot file. Our snapshot file format contains serialized versions of all persistent CLOS objects that constitutes a data store. For each class used in the snapshot, a layout record is present in the snapshot file that contains the name of the class and the names of the slots of the class.

When reading a snapshot file, the object system compares the class layout in the snapshot against the class definitions of the running code. Inconsistencies detected are signalled, and restarts are used to select schema evolution actions.

Currently, we support the following schema evolution steps:

Class removal: When a layout specifies a class that does not exist in the running application, instances of this class can be ignored.
Class renames: When a class has been renamed, the new class name can be specified and the reading can be restarted.
Slot removal: If a slot named in the snapshot file does not exist in the running code, the slot values of the instances can be ignored.

These three options are solely controlled through restarts established while reading the snapshot file. No code needs to be written to do these conversions.

Additionally, we have a slot conversion mechanism: If a slot has been renamed or its semantics have been changed, slot values of previous software versions can be converted. In order to do this, a method on the CONVERT-SLOT-VALUE-WHILE-RESTORING generic function must be defined. It receives the object being restored, the name of the slot as specified in the snapshot layout, and the value to set.

The default method of CONVERT-SLOT-VALUE-WHILE-RESTORING simply assigns the value provided to the slot with the given name. A specialized method could convert the value or assign it to a different slot of the same object. When reading a class layout from the snapshot file that contains an unknown slot name, a restart is offered to call CONVERT-SLOT-VALUE-WHILE-RESTORING for values of this slot.

The beauty of this form of schema evolution is that we can easily load snapshots from our production systems using new code, without any additional required conversion steps. We can regularily verify that our new code works with real data, and can incrementally refine our schema evolution process without having to resort to database tools or external formats that require special handling.

One could certainly improve on this, for example by making some of the evolution steps automatic so that one does not remember what restarts need to be chosen when loading older snapshots. For the moment, these tools serve us well, and it did not take a lot of time (and was fun) to implement them.

Thursday, July 3, 2008

July 03, 2008

Building Load Resilient Web Servers

One of the bigger challenges with deploying dynamic web servers is being able to cope with load peaks that happen when the URL is communicated to larger communities in a short time. I have seen several sites being slashdotted and collapse under the load, and it is common to blame the underlying server technology for crashes; being Lisp lovers, we don't want to see that happen for our sites.

Using a caching frontend to reduce backend load

Following industry practice, we have been using a caching reverse proxy in front of our Lisp based dynamic web servers from the beginning. Our choice was Squid, as I had prior experience configuring it and found it to work reliably once the configuration was in place. I was never quite happy with that choice, though, because Squid's main operation mode is forward proxying. It has gazillions of configuration options that are not needed for a reverse proxy, which made us feel that Squid would not be a perfect match for our demands.

This is not to say that Squid has hurt us in any way. It has served us well during a load peak on the create-rainforest web site, but as reverse proxying has become very common in the last few years, we found it about time to go shopping for a solution that might be a better match to our needs.

We hoped to find a frontend proxy having the following features:

Caching - We want the frontend to cache all content that is not session dependent in order to reduce the load on our Lisp backend.
Scalable - The frontend must be able to handle loads much larger than what we see normally in order to accomodate for peaks.
Request queueing - Ideally, we'd like the frontend to handle all concurrency issues and send requests to the backend one after the other using a single persistent http connection, maybe with pipelining.

This set of requirements reduced the available options dramatically, and we ended up giving varnish serious consideration.

Evaluating varnish

varnish seems to have most of the features that we require: It supports caching, has been tested under very high loads and supports FreeBSD which is our deployment platform. Varnish has been written by Poul-Henning Kamp of FreeBSD fame, so we also found it to be culturally compatible.

Our evaluation revealed that varnish is under active development and support, and we found the developers be very responsive to our bug reports and requests. Its architecture looks well thought out, and the configuration language (even if underdocumented) makes the request handling process very transparent.

There are a number of downsides with varnish, though:

Eager cache cleanup policy: varnish removes objects from its cache as soon as they expire. This means that expired objects are always fetched from the backend, even if the previous cached copy was still up to date. It is possible to work around this by either hacking varnish to revalidate cached objects before expiring them or by not expiring them automatically, but rather by explicit purge requests sent from the backend to varnish. Both options, while doable, require substantial work.

No high load safeguards: varnish offers no control over the number of backend connections established and does not support any queuing or backend resource control mechanisms. This means that it will unconditionally try to establish a connection to the backend if an object can't be served from the cache. Once the backend reaches saturation, varnish will aggravate the situation by sending even more requests to the backend, increasing the time to recover from the backend saturation.

Threading: varnish uses threading to schedule requests. While this in principle should not be something negative, threaded programs are much harder to debug in practice, and many concurrency bugs that threaded code is susceptible to only show up after a long time or under certain load patterns which may be hard to reproduce. Admittedly, I am a threads hater, but I am writing this here because we found a serious threading race condition on the second day of our evaluation which prevented varnish from even starting up. My trust in the code was seriously affected by this.

When talking to the varnish developers, the problems were acknowledged, help to fixable problems was very quick and they told us that more advanced features will be develop after the upcoming 2.0 release of varnish.

We would have liked to switch to varnish because of the very good support and because it is meant to be a web frontend, nothing else. Yet, at the current point in time, it does not seem to be mature enough to serve our needs. After having evaluated it, we turned back to squid as it seemed to be the only other option. We found that squid meets our requirements for cache cleaning and revalidation of cached objects very well.

Making objects cacheable

Chosing a front end software is only part of what needs to be done to make a web system fast and robust enough to withstand high loads. The most important factor is to make the frontend serve a large percentage of the incoming requests from its cache and only consult the backend server for content that is really dynamic. The HTTP/1.1 protocol provides for request and response headers that control how content can be cached, and there is little need for explicit configuration if these headers are used correctly.

Using If-modified-since

One way to limit the traffic to the backend is implement the If-modified-since mechanism. It is supported by Hunchentoot for static files by default, and can also be used for dynamic handlers that can check whether a resource has changed since it had previously been requested. Varnish will set the If-modified-since header when it requests resources that it already has in the cache, so every cacheable resource will normally be transfered from the backend to Varnish only once.

Controlling cache refreshing

Often, resources are dynamic, yet it is not crucial that every client sees the absolutely latest version of the resource. For example, in our square meter sales application, visitors should always see the current number of square meters sold, but it is not vital that this information is accurate to the second. Thus, we want the cache to refresh such pages only at a certain interval. The HTTP/1.1 provides for the cache-control directive and in particular the max-age parameter. It specifies how long a cache may consider a cached resource be valid without revalidating with the originating server. By setting this parameter in responses sent by the backend, we can effectively limit the maximum refresh rate for dynamic resources that do not need completely up to date every time.

Testing realistically

In order to test the performance of a Web system, it needs better tools than the often-used ApacheBench tool, which only tests response times and throughput of a single URL. For meaningful results, one should simulate a user load that reflects the load that real users create. I have been using SIEGE for informal testing often, as it is easy to use, but I have also found it a little flakey and prune to random crashes, which made us look for more reliable solutions.

In the FreeBSD ports collection, we found tsung. Tsung is an open-source multi-protocol distributed load testing tool written in Erlang, and it has good support for HTTP. In addition to running load tests against web servers and generating statistics reports, it supports a session recorder that can be used to create a log of the URLs that a user visits when browsing a web server which can then be directly used to simulate many simultaneous users. As an added bonus, it is possible to capture dynamically generated information from web server responses into variables and use them in subsequents requests in a session. This feature can be used to generate sale transactions or user registrations in a load simulation.

Using tsung and squid, we were able to tune our Lisp backend so that all non-dynamic content is served from the cache, pinpoint a serious performance problem and simulate realistic loads. We are now confident that our setup can withstand the next rush of users without crashing.

Sunday, May 18, 2008

May 18, 2008

Symbolics Keyboard on PS/2 port

I have been busy with hacking Hunchentoot and flexi-streams at daytime during the last few weeks, and wrote a thread-safe version of a simple deterministic profiler for Clozure CL so that I could isolate performance problems. I'm going to blog about that stuff in a few days, but as that is daytime work, I had to find something else to relax on the weekend. Thus, I turned my attention to the kbdbabel hardware that I received last week. A few months ago, I emailed Alexander Kurz, who is the creator of the kbdbabel project, whether he could help me getting a PS/2 adapter for the Symbolics keyboard to work. At the time, he had created PS/2 adapters for various vintage keyboards and he expressed his interest. We could not make much progress, though, as he did not have a keyboard and I lacked the knowledge to adopt the 8051 based assembler code to the needs of the Symbolics protocol. I'm mostly an AVR guy when it comes to microcontrollers. Alexander was intimidated enough by the pictures of the various Symbolics keyboards that he decided to get one and make a kbdbabel for it. I was more than happy when he contacted me out of the blue a few weeks ago and told me that he now has a solution. The hardware he sent did not work right away with my Thinkpad, but that was because the Thinkpad's PS/2 port do not quite work like most other PS/2 ports do. Thus, I had to get a PS/2 to USB adapter for 10 Euros. With that, the adapter worked as advertised.

I had to complete the keyboard mapping so that all keys in the default Open Genera map are mapped to a key on the Symbolics keyboard. This means that there are no mappings for several keys (the square, circle and star keys among them). Supporting those keys will require changing the Open Genera X11 keyboard mapping code, which is something I'll put onto my "weekend projects" list now. I have not yet tested the adapter with the "slim" keyboard - I only have a 3600 one right now and the two "slim" keyboards I bought from David K. Schmidt are still in Cambridge and won't be with me until next month. Contrasted to the 3600 keyboard, the "slim" keyboard is microcontroller based and will most likely require different timing than the TTL based old-style keyboard. Other than that, the keyboards should be compatible and the adapter should work with both variants. I'm going to build a few adapters for myself and some friends of mine. If you want one as well and don't like to do the soldering yourself, contact me.

Wednesday, April 23, 2008

April 23, 2008

BKNR at the Boston Lisp Meeting

After a weekend full of parentheses in Amsterdam, I attended the Boston Lisp Meeting hosted at MIT. Peter Dillinger gave a presentation on the ACL2s, which is an Eclipse based user interface for the ACL2 theorem prover. I have no background in mathematics and as such, I had to guess my way through the terminology, but the talk was entertaining and even though it somehow does not feel right to see an advanced Common Lisp program be controlled by a lowly Java application, I see the practical aspects of the approach. Bottom line: Common Lisp loses when it comes to programming modern user interfaces in general, and I'm not sure that we can do much about that. The second speech for the evening was my presentation of the BKNR datastore. I found myself being well received, and the questions from the audience mostly were those that I have been asking myself during the last few years. I have put the slides online, and I also updated the BKNR website to make the documentation easier to access. If you have Google Earth installed, you can play with the (alpha quality) GE interface to the BKNR based Samboja Lestari web site. If you want, you can have a look at the source code, too. Our use of publish/subscribe for persistent objects may be of interest, as well as the use of Screamer for brute-force determination of the largest rectangle inside an arbitary polygon. Kilian, who does most of the programming in the BOS project presently, has a background in music composition. He found it easier to formulate the solution to the problem using constraints rather than using explicit looping.

The socializing part of the meeting was rather short for me as I am mostly staying on Central European Time while being in Boston, so it felt like past midnight to me. I met several interesting people anyway, and I hope that I can synchronize my next visits in Cambridge with the meetings.

Monday, April 14, 2008

April 14, 2008

Chaosradio Express on Lisp

Last week, I have been a guest at Tim Pritlove's Chaosradio Express and he interviewed me about Lisp. The podcast is in german, we talked about how I came to Lisp and what I think it is about.

Monday, March 31, 2008

March 31, 2008

Feature development and getting carried away on Lisp

Customer Demands

Recently, I have been spending some weekend and late night hours on extending the QuickHoney web system. The artists want to see the system modernized with RSS feeds and more interaction, and we'll also be adding a shop where both digital and real warez can be bought. Peter in particular wants to get more feedback, so he asked me to add a "quick feedback" mechanism that visitors can use to send a short textual comment on any of the pictures. In this post, I'll describe how I added this feature using Javascript, BKNR, CL-SMTP and CL-MIME. First off, here is a little overview of the QuickHoney web application. It is a early AJAX style application with one HTML page consisting of a number of layers which are controlled by a Javascript application. Communication between the Javascript application and the backend server is done through Javascript code snippets that are generated on the server and evaluated inside the client application using IFRAMES. The user navigates through categories and subcategories to individual pictures. The backend for the QuickHoney application is written in Common Lisp using the BKNR framework. It uses the datastore extensively as well as the cl-gd image processing library written by Edi Weitz.

Frontend Functionality

The feedback functionality will work on a per-picture basis. When a picture is displayed, a small "provide feedback" icon will be displayed. When the user clicks it, a form will be displayed in a layered window. The form will consist of "From" and "Text" fields and a "Send" button. The onclick action of the button is connected to a function that packs the contents of the "From" and "Text" fields into a urlencoded form data string and send it to the server using a POST request:

function digg_send()
{
    var d = doXHR("/digg-image/" + current_image.id,
                  { method: 'POST',
                    headers: {"Content-Type":"application/x-www-form-urlencoded"},
                    sendContent: queryString({ from: $('digg_from').value,
                                               text: $('digg_text').value }) })
        .addCallback(function () { alert('sent'); });
    return false;
}

doXHR and $ are functions from the MochiKit Javascript library which I like to use for non-GUI-related things for its conciseness and conceptional soundness. As can be seen in the Javascript snippet above, there is a /digg-image/ handler on the web server that is used to send the feedback. The URL that is used also consists of the object ID of the image currently displayed. Every persistent object in the BKNR datastore has a unique object ID, and the FIND-STORE-OBJECT function can be used to find an object with a certain object ID in the store.

Implementing a Backend Handler

The BKNR web framework makes it easy for applications to declare new handlers which relate to a certain store object. It provides for a set of handler classes that application handlers can inherit from. These handler base classes implement request URL parsing and automatically call handler methods with with relevant information from the URL and the request body parsed into Lisp objects. For the feedback feature, we need to subclass the OBJECT-HANDLER base class which extracts the object ID out of the URL, uses FIND-STORE-OBJECT to find the relevant object in the store and then calls the HANDLE-OBJECT method of the handler to actually handle the request. The declaration for this handler class looks like this:

(defclass digg-image-handler (object-handler)
  ()
  (:default-initargs :object-class 'quickhoney-image))

The :OBJECT-CLASS initarg can optionally be specified when creating an OBJECT-HANDLER object to make sure that HANDLE-OBJECT is only called for objects of that class. If the object ID in the URL references an object from a different class, an error page is displayed to the user. The HANDLE-OBJECT method for the DIGG-IMAGE-HANDLER class is specialized on both the DIGG-IMAGE-HANDLER handler class and on the QUICKHONEY-IMAGE class. It extracts the form parameters from the request and opens a connection to the SMTP server to send a mail to the owner or owners of the QUICKHONEY-IMAGE. The mail itself is a simple HTML mail with a table containing the name of the image, hyperlinked to the online page with the image and the feedback information entered by the user. Also, a thumbnail of the image is included with the email so that the artist immediately sees what picture the user is raving about. Modern mailers (like Apple Mail or Google Mail) display images attached in a multipart/mixed MIME mail inline, so there is no need to come up with a fancy HTML mail that references elements included in the same mail body. This is the source code of the handler:

(defmethod handle-object ((handler digg-image-handler) (image quickhoney-image))
  (with-query-params (from text)
    (cl-smtp:with-smtp-mail (smtp "localhost"
                                  "webserver@quickhoney.com"
                                  (remove-duplicates (mapcar #'user-email
                                                             (or (owned-object-owners image)
                                                                 (list (find-user "n")
                                                                       (find-user "p"))))))
      (cl-mime:print-mime
       smtp
       (make-instance
        'cl-mime:multipart-mime
        :subtype "mixed"
        :content (list
                  (make-instance
                   'cl-mime:mime
                   :type "text" :subtype "html"
                   :content (with-output-to-string (s)
                              (html-stream s
                                           (:html
                                            (:head
                                             (:title "Picture comment"))
                                            (:body
                                             (:table
                                              (:tbody
                                               (:tr
                                                ((:td :colspan "2")
                                                 "Comment on picture "
                                                 ((:a :href (make-image-link image))
                                                  (:princ-safe (store-image-name image)))))
                                               (:tr
                                                (:td (:b "From"))
                                                (:td (:princ-safe from))))
                                               (:tr
                                                ((:td :valign "top") (:b "Text"))
                                                (:td (:princ-safe text)))))))))
                  (make-instance
                   'cl-mime:mime
                   :type "image"
                   :subtype (string-downcase (symbol-name (blob-type image)))
                   :encoding :base64
                   :content (flexi-streams:with-output-to-sequence (s)
                              (blob-to-stream image s)))))
       t t))))

For completeness, let me also show you how the handler is entered into the list of handlers of the Quickhoney backend server:

(defun publish-quickhoney ()
  (unpublish)
  (make-instance 'website
		 :name "Quickhoney CMS"
		 :handler-definitions `(("/random-image" random-image-handler)
                                 ("/animation" animation-handler)
                                 ("/image-query-js" image-query-js-handler)
                                 ("/login-js" login-js-handler)
                                 ("/clients-js" clients-js-handler)
                                 ("/buttons-js" buttons-js-handler)
                                 ("/edit-image-js" edit-image-js-handler)
                                 ("/upload-image" upload-image-handler)
                                 ("/upload-animation" upload-animation-handler)
                                 ("/upload-button" upload-button-handler)
                                 ("/rss" rss-handler)
                                 ("/admin" admin-handler)
                                 ("/upload-news" upload-news-handler)
                                 ("/digg-image" digg-image-handler)
                                 ("/" template-handler
                                      :default-template "frontpage"
                                      :destination ,(namestring (merge-pathnames "templates/" *website-directory*))
                                      :command-packages (("http://quickhoney.com/" . :quickhoney.tags)
                                                         ("http://bknr.net/" . :bknr.web)))
                                 user
                                 images
                                 ("/static" directory-handler
                                            :destination ,(merge-pathnames #p"static/" *website-directory*))
                                 ("/MochiKit" directory-handler
                                              :destination ,(merge-pathnames #p"static/MochiKit/" *website-directory*))
                                 ("/favicon.ico" file-handler
                                                 :destination ,(merge-pathnames #p"static/favicon.ico" *website-directory*)
                                 :content-type "application/x-icon"))
		 :admin-navigation '(("user" . "/user/")
				     ("images" . "/edit-images")
				     ("import" . "/import")
				     ("logout" . "/logout"))
		 :authorizer (make-instance 'bknr-authorizer)
		 :site-logo-url "/image/quickhoney/color,000000,33ff00"
		 :login-logo-url "/image/quickhoney/color,000000,33ff00/double,3"
		 :style-sheet-urls '("/static/styles.css")
		 :javascript-urls '("/static/javascript.js")))

Getting Carried Away on Common Lisp

As you can see, I had to write little code to implement this functionality. In fact, the whole thing should have taken no longer than one or two hours, but here is how I found myself getting carried away: For one, I spent substantial time on refactoring CL-SMTP, as you can read in my last blog entry. That took a few hours. For another, I really don't like how mime emails are constructed with MAKE-INSTANCE in the HANDLE-OBJECT method above. I thought that I'd be nice to have a DEFINE-CONSTRUCTOR macro that created a function to create objects of arbitary classes, but that allows for one or more positional arguments in addition to any of the keyword arguments accepted by MAKE-INSTANCE for a particular class. Thus, instead of just coping with the slight ugliness, I tried myself on a macro. Simple as it looked, it surely became more complicated as I had to deal with defaulted arguments, and I was stopped from investing any more time into this when I was reminded by cmm that INITIALIZE-INSTANCE and SHARED-INITIALIZE can define additional keyword arguments that are accepted by MAKE-INSTANCE. Sure, it would be possible to find all applicable methods and extract more acceptable arguments from their lambda lists. Yet, I felt that I had enough fun for this last weekend and wrapped up the feedback functionality for Quickhoney.

Monday, March 24, 2008

March 24, 2008

Refactoring CL-SMTP

In my recent refactoring of BKNR, I decided that we do no longer want to use any of Franz' open source libraries if we can avoid it. Even though they work fine in general, hacking them is a pain because they adhere to John Foderaros Common Lisp Coding Standards which basically say that one should use Franz' own non-standard IF* macro for all conditionals. I do think that one should be careful with nesting conditionals deeply, but I do not agree with using a macro and spacing as a cure. If I have code in which conditional nesting exceeds two levels, I refactor to extract conditional code into functions. That way, functions are kept shorter and the use of names instead of literal code usually makes it easier to understand what's going on. So my easter goal was to replace Franz' NET.POST-OFFICE SMTP client by CL-SMTP. Both clients do not support proper quoting of non-ASCII characters in mail headers, thus the need to hack arose and IF* is nothing I want to get myself used to. A SMTP client really is not that complicated to begin with, and apart from the basic functionality, CL-SMTP already supported SSL and authentication, which nowadays are two basic requirements. What it was missing was the possibility to send pre-formatted messages, which is something that I require because I usually make up my own messages, including headers and body, using format strings or CL-MIME if I want to send attachments or otherwise need more control over the mail body. CL-SMTP prove to be a little hackish. Seemingly, only simple mail sending had originally been planned for, and the API had then been extended multiple times with growing user needs. There was no layering between the SMTP protocol aspects and the message formatting functionality, both being freely interleaved. While being simple to use for those use cases that had been planned for, the API was not helpful for my intended use. In a first round, I simplified the code, collapsed a few common patterns into functions and made the code generally easier to hack on. I then split the SMTP protocol handling into an exported macro, WITH-SMTP-MAIL, that is used to establish an SMTP connection and create the mail envelope. The body of the macro invocation is then invoked with a variable bound to the stream connected to the SMTP server. The existing mail sending functions of CL-SMTP have been converted to use that API, too. I then incorporated a patch that I found in the CL-SMTP mailing list archive so that raw TLS is supported. The existing TLS functionality worked by connecting to the standard SMTP port, then switching the cleartext connection to encrypted mode by issuing the STARTTLS command. In contrast, raw TLS works by having an SMTP server listen on a separate port for encrypted connections. No initial cleartext handshake is required in this operation mode. Finally, I implemented automatic quoting of non-ASCII characters in mail headers using a specialized stream class. The Gray Streams Examples in the SBCL manual are my favoured cheat sheet when implementing special purpose stream classes. The result of my work is available in the BKNR repository at svn://svn.bknr.net/svn/trunk/thirdparty/cl-smtp/ in case you want to give it a try right now. I developed with Clozure CL on my Powerbook and after I committed, our buildbot went red for SBCL. I had named an accessor for my specialized stream class "STREAM" which triggered a package lock violation error on SBCL. The name was bad, so I changed it to "ENCAPSULATED-STREAM" and saw the buildbot going green for SBCL too. I love that! I am now waiting for feedback on the refactorings and extensions. There are still bugs which need fixing and support for compilers other than CCL and SBCL needs to be verified. Also, the documentation for CL-SMTP needs to be better, and I think I'll just steal Edis HTML template and write something up myself next week. Also, an automated test for CL-SMTP would be great, but as this requires a SMTP peer to talk to, I have not really had a good idea how to implement that. Maybe later on. I certainly hope that Jan Idzikowski, who is the author and maintainer of CL-SMTP, will accept my patches and make them part of the main distribution.

Thursday, March 20, 2008

March 20, 2008

BKNR is alive

Recently, there has been quite some activity in the BKNR repository. This is because my company finally seems to become reality, with a real office, customers and all that. I even have an employee now, Kilian, who is working full time on BKNR based projects. I guess we are the only Lisp company in Berlin now! As I am working for ITA Software full time, the fact that I have some more workforce allowed me to attack the long-outstanding major updates for QuickHoney and Create Rainforest. Quickhoney will be modernized with a Blog and RSS feed, and we will also implement a shopping system so that one can buy digital and real QuickHoney products. Create Rainforest will be extended into Google Earth. Instead of hacking our funny, but close to unusable satellite application, we will use Google Earth to display and visualize our data. The old system will stay, but new features will not be added to that. These commercial projects have lead to several updates to BKNR in the past weeks: We have finally converted from portable AllegroServe to Hunchentoot as HTTP server substrate. Even though AllegroServe performs fine, it written in a very peculiar style which makes hacking it a rather unpleasant experience. Also, Hunchentoot is better maintained. In the process, we got away with passing request arguments throughout the web environment. The current request is now (only) carried in a special variable, as applications seldomly require access to it. SBCL and CCL are our current primary development platforms. I have still not given up on CMUCL completely, but we are waiting for the 19E release before we pick it up again. I am fed up with working around missing Unicode support, which we require in all our current projects. We moved away from common-lisp.net with our repository and web sites. The primary reason for this move was that we now have a Buildbot running for BKNR, and I could not quickly get the neccessary infrastructure to run on common-lisp.net - Buildbot is Python based and requires substantial library support to run. BKNR bugs have been fixed. Sure enough, there have been quite some of them. We hope to be better in the future with our continous integration testing and with more unit tests. Sure enough, I will be moving this Blog to our BKNR based blog system soon. Until that happened, this is my place to practice.