Thursday, October 2, 2008

Trying Clojure...

I am becoming increasingly frustrated by Common Lisp's age. On the one hand, history makes it what it is: Mature, well-documented, thoroughly understood and practical. On the other, it fails to keep up with current system designs, lacking convenient native support for rich data structures, infrastructure access and parallel programming. No programming language choice is without tradeoffs and in that respect, and I'll still chose Common Lisp in many situations. Realistically, though, Common Lisp cannot be the only language in my tool chest. For browser work, Javascript is much more practical, and for parallel programming, I'm on the lookout.

On my last visit in Cambridge, I attended the Boston Lisp Meeting. The presentation scheduled was about the new Lisp dialect the Clojure by Rich Hickey, who is the creator of this language.

Parallel programming is one of the areas that Clojure wants to make easier. It does so by making all data structure immutable and by language-level abstractions for concurrent data access. In that respect, it is similar to Erlang as it requires a functional programming style everywhere. Unlike Erlang, which puts almost all operations with side effects behind a common message passing interface, Clojure exposes the full Java API to programs. Thus, side effects can be produced everywhere, except in algorithms implemented in Clojure.

Clojure wants to be a Lisp, but it explicitly does not try to be backwards compatible. This opened a rather large design space to Rich Hickey, and some of the choices he made really do make sense. He specifies a reader, yet his reader does not intern symbols. That is a big win, as it allows the reader to actually work with arbitary Clojure source files. In Common Lisp, one needs to re-implement a full reader which does not intern symbols if one wants to read Common Lisp source files. This is kind of ironic, as the "Code is Data" mantra that we keep repeating does not really reflect what is possible in practice.

Rich managed to make me enthusiastic about Clojure, and I decided to give it a spin with a real project that I wanted to conduct anyway: I am a Twitter user, and I would like to be notified of new posts to Planet Lisp on Twitter. The program would read the Planet's RSS feed using a HTTP request, determine if new items have been posted and update the Twitter status of the Planet Lisp Twitter account that I have set up for this purpose.

Getting the development environment up

My laptop runs Windows, but I do most of my Lisp development in a VMware running FreeBSD. As Clojure is hosted on the Java virtual machine, I decided to avoid the indirection and use Windows as my native platform. Clojure support for Slime is available, so I can stay in my familiar environment. A very recent CVS checkout of Slime is required, which took me a little while to figure out.

Processing XML

It took me a another while to discover what the proper way to process XML in Clojure is. An XML parser is included with the base distribution, but there is no information in the documentation how one would actually work with the data structure that is generated by the parser. For my application, I wanted to iterate over certain elements in the XML and extract a few subelements in a loop. An evaluator for XPath expressions would have suited the job, but obviously that is not the Clojure way to do it.

In Clojure, XML does not receive any special treatment. The parser reads it into a tree, and processing is performed using functions that process trees. As simple as it may sound, I had a hard time finding a practical example of how this was supposed to work. Chris Houser finally got me on the right track when he pointed me to his zip_filter package. zip_filter is a tool for filtering data out of trees, and it can work with trees produced by Clojure's XML parser.

Once I had figured this out, things were very easy. The Planet Lisp RSS feed can be read into an XML tree with

( (clojure.xml/parse ""))
and one can extract data out of the parsed tree using path expressions:
(> xml :channel :item)
This certainly is concise and elegant, and I like the fact that XML is kept out of the program's way. Note that I've included the namespace prefixes in the examples above so that they can be copied and pasted. Normally, one would import these namespaces so that shorter or no namespace needs to be specified to access the symbols.

Making HTTP requests

In order to update the Twitter status, a HTTP POST request to Twitter must be made. A HTTP client which is based on the class is available from the Files section of the Clojure Google Group. It is not a polished product and required some minor tweaking, but after that, fetching something using HTTP is as easy as

(http-client/url-do "" "GET" {})

Reading and writing files

The Twitter gateway needs to persistently store the list of articles that it has found on Planet Lisp in order to decide which of the current articles are new. Clojure is a Lisp and thus can read and write its own data structures easily, so all it takes is calling a few functions. As in the XML case, the hard part was figuring out what the proper functions are, as the documentation is sparse. I found Stuart Halloway's Blog in which he publishes a number of articles describing how the examples from Peter Seibel's book Practical Common Lisp can be implemented in Clojure. The Simple Database example pretty much does the same thing that I required, so I looked at that. Unfortunately, the "spit" function that is the complement to "slurp" and writes a string to a named file did not appear to exist in the clojure namespace. It took a little grepping to find it in, and referenced from there, it works as expected.

Functional glue

Having all the required components in place, it was time to come up with the real code. As data structures are immutable in Clojure, the challenge was to express the required loop so that both the new persistent state and the list of new postings would be maintained. I came up with a recursive function with two accumulators:

(defn poll
  "Poll planet lisp, check for new postings, update Twitter status when new postings have appeared"
   (let [old-data (load-data)
         (fn [items new-data new-items]
           (if items
             (let [item (first items)
                   guid (first (xml-> item :guid text)) ]
               (recur (rest items)
                      (conj new-data guid)
                      (if (old-data guid)
                        (conj new-items (first (xml-> item :title text))))))
               (maybe-post-twit new-items)
     (process (xml-> (feed-to-zip "")
                     :channel :item)
              #{} []))))
To Scheme programmers, this style should be familiar. I find it not terrible myself, although one could certainly push things around to suit taste.

The Verdict

I was sceptical when I first read about Clojure. I am sceptical of new languages in general, and new Lisp family members always make me think "why?" first. Rich Hickey's presentation got me interested because he gave very good reasoning for his design choices. He also is a very good presenter and could easily withstand an audience of die-hard Lispers that included people who have played an active role in the creation of Common Lisp. But that is the singer, not the song.

The good

The fixed reader, vastly improved data structure support, access to a host of libraries and concurrency support all make a good case for Clojure. It is Lisp in many respects, and in the uncompromised macro facility opens up the extensibility that I am used to from Common Lisp.

The bad

Clojure is not multi-paradigm in the sense that Common Lisp is: A functional programming style is required, and there is no way around that. Making the comma be white space is somewhat of an arbitary decision, that does not match the other design choices that seem to be grounded better.

The ugly

The error messages that the compiler produces are mostly useless. Debugging is hard, as no tracing facility and no breakpoints are available - Or maybe they are, but I could not find them in the documentation. The overall immaturity of the language shows frequently, and I have spent hours looking for code examples and finding my way through something that is very much in flux.

My Conclusion

I like Clojure, as it seems to fill my need for a language that supports concurrency and makes it possible to write modern desktop applications, while still being a Lisp. I will use it for another project I have been planning to do in Erlang. For general exploratory development, Clojure is not yet a good choice as the development environment is too immature.

I am not buying Rich Hickey's claim that classic object oriented programming is bad because it does not support concurrency. I begin thinking of object oriented systems more as active databases that allow modeling of complex, interconnected and persistent data structures. Such structures are managable with a pure sequential execution model, and one should refrain from mixing that with tasks that are inherently concurrent.

Finally: My Twitter gateway for Planet Lisp exists, but I have not yet been able to deploy it. Getting Java to run on FreeBSD/amd64 has proven to be kind of a challenge. I will have this sorted out soon, so feel free to follow planet_lisp.