Thursday, October 2, 2008

Trying Clojure...

I am becoming increasingly frustrated by Common Lisp's age. On the one hand, history makes it what it is: Mature, well-documented, thoroughly understood and practical. On the other, it fails to keep up with current system designs, lacking convenient native support for rich data structures, infrastructure access and parallel programming. No programming language choice is without tradeoffs and in that respect, and I'll still chose Common Lisp in many situations. Realistically, though, Common Lisp cannot be the only language in my tool chest. For browser work, Javascript is much more practical, and for parallel programming, I'm on the lookout.

On my last visit in Cambridge, I attended the Boston Lisp Meeting. The presentation scheduled was about the new Lisp dialect the Clojure by Rich Hickey, who is the creator of this language.

Parallel programming is one of the areas that Clojure wants to make easier. It does so by making all data structure immutable and by language-level abstractions for concurrent data access. In that respect, it is similar to Erlang as it requires a functional programming style everywhere. Unlike Erlang, which puts almost all operations with side effects behind a common message passing interface, Clojure exposes the full Java API to programs. Thus, side effects can be produced everywhere, except in algorithms implemented in Clojure.

Clojure wants to be a Lisp, but it explicitly does not try to be backwards compatible. This opened a rather large design space to Rich Hickey, and some of the choices he made really do make sense. He specifies a reader, yet his reader does not intern symbols. That is a big win, as it allows the reader to actually work with arbitary Clojure source files. In Common Lisp, one needs to re-implement a full reader which does not intern symbols if one wants to read Common Lisp source files. This is kind of ironic, as the "Code is Data" mantra that we keep repeating does not really reflect what is possible in practice.

Rich managed to make me enthusiastic about Clojure, and I decided to give it a spin with a real project that I wanted to conduct anyway: I am a Twitter user, and I would like to be notified of new posts to Planet Lisp on Twitter. The program would read the Planet's RSS feed using a HTTP request, determine if new items have been posted and update the Twitter status of the Planet Lisp Twitter account that I have set up for this purpose.

Getting the development environment up

My laptop runs Windows, but I do most of my Lisp development in a VMware running FreeBSD. As Clojure is hosted on the Java virtual machine, I decided to avoid the indirection and use Windows as my native platform. Clojure support for Slime is available, so I can stay in my familiar environment. A very recent CVS checkout of Slime is required, which took me a little while to figure out.

Processing XML

It took me a another while to discover what the proper way to process XML in Clojure is. An XML parser is included with the base distribution, but there is no information in the documentation how one would actually work with the data structure that is generated by the parser. For my application, I wanted to iterate over certain elements in the XML and extract a few subelements in a loop. An evaluator for XPath expressions would have suited the job, but obviously that is not the Clojure way to do it.

In Clojure, XML does not receive any special treatment. The parser reads it into a tree, and processing is performed using functions that process trees. As simple as it may sound, I had a hard time finding a practical example of how this was supposed to work. Chris Houser finally got me on the right track when he pointed me to his zip_filter package. zip_filter is a tool for filtering data out of trees, and it can work with trees produced by Clojure's XML parser.

Once I had figured this out, things were very easy. The Planet Lisp RSS feed can be read into an XML tree with

(clojure.zip/xml-zip (clojure.xml/parse "http://planet.lisp.org/rss20.xml"))
and one can extract data out of the parsed tree using path expressions:
(clojure.contrib.zip-filter.xml/xml-> xml :channel :item)
This certainly is concise and elegant, and I like the fact that XML is kept out of the program's way. Note that I've included the namespace prefixes in the examples above so that they can be copied and pasted. Normally, one would import these namespaces so that shorter or no namespace needs to be specified to access the symbols.

Making HTTP requests

In order to update the Twitter status, a HTTP POST request to Twitter must be made. A HTTP client which is based on the java.net.HttpsURLConnection class is available from the Files section of the Clojure Google Group. It is not a polished product and required some minor tweaking, but after that, fetching something using HTTP is as easy as

(http-client/url-do "http://planet.lisp.org/" "GET" {})

Reading and writing files

The Twitter gateway needs to persistently store the list of articles that it has found on Planet Lisp in order to decide which of the current articles are new. Clojure is a Lisp and thus can read and write its own data structures easily, so all it takes is calling a few functions. As in the XML case, the hard part was figuring out what the proper functions are, as the documentation is sparse. I found Stuart Halloway's Blog in which he publishes a number of articles describing how the examples from Peter Seibel's book Practical Common Lisp can be implemented in Clojure. The Simple Database example pretty much does the same thing that I required, so I looked at that. Unfortunately, the "spit" function that is the complement to "slurp" and writes a string to a named file did not appear to exist in the clojure namespace. It took a little grepping to find it in clojure.contrib.duck-streams, and referenced from there, it works as expected.

Functional glue

Having all the required components in place, it was time to come up with the real code. As data structures are immutable in Clojure, the challenge was to express the required loop so that both the new persistent state and the list of new postings would be maintained. I came up with a recursive function with two accumulators:

(defn poll
  "Poll planet lisp, check for new postings, update Twitter status when new postings have appeared"
  []
  (save-data
   (let [old-data (load-data)
         process
         (fn [items new-data new-items]
           (if items
             (let [item (first items)
                   guid (first (xml-> item :guid text)) ]
               (recur (rest items)
                      (conj new-data guid)
                      (if (old-data guid)
                        new-items
                        (conj new-items (first (xml-> item :title text))))))
             (do
               (maybe-post-twit new-items)
               new-data)))]
     (process (xml-> (feed-to-zip "http://planet.lisp.org/rss20.xml")
                     :channel :item)
              #{} []))))
To Scheme programmers, this style should be familiar. I find it not terrible myself, although one could certainly push things around to suit taste.

The Verdict

I was sceptical when I first read about Clojure. I am sceptical of new languages in general, and new Lisp family members always make me think "why?" first. Rich Hickey's presentation got me interested because he gave very good reasoning for his design choices. He also is a very good presenter and could easily withstand an audience of die-hard Lispers that included people who have played an active role in the creation of Common Lisp. But that is the singer, not the song.

The good

The fixed reader, vastly improved data structure support, access to a host of libraries and concurrency support all make a good case for Clojure. It is Lisp in many respects, and in the uncompromised macro facility opens up the extensibility that I am used to from Common Lisp.

The bad

Clojure is not multi-paradigm in the sense that Common Lisp is: A functional programming style is required, and there is no way around that. Making the comma be white space is somewhat of an arbitary decision, that does not match the other design choices that seem to be grounded better.

The ugly

The error messages that the compiler produces are mostly useless. Debugging is hard, as no tracing facility and no breakpoints are available - Or maybe they are, but I could not find them in the documentation. The overall immaturity of the language shows frequently, and I have spent hours looking for code examples and finding my way through something that is very much in flux.

My Conclusion

I like Clojure, as it seems to fill my need for a language that supports concurrency and makes it possible to write modern desktop applications, while still being a Lisp. I will use it for another project I have been planning to do in Erlang. For general exploratory development, Clojure is not yet a good choice as the development environment is too immature.

I am not buying Rich Hickey's claim that classic object oriented programming is bad because it does not support concurrency. I begin thinking of object oriented systems more as active databases that allow modeling of complex, interconnected and persistent data structures. Such structures are managable with a pure sequential execution model, and one should refrain from mixing that with tasks that are inherently concurrent.

Finally: My Twitter gateway for Planet Lisp exists, but I have not yet been able to deploy it. Getting Java to run on FreeBSD/amd64 has proven to be kind of a challenge. I will have this sorted out soon, so feel free to follow planet_lisp.

Share:

13 comments:

  1. Hans,

    Interesting example. Any chance you could post the full code somewhere?

    ReplyDelete
  2. The code is here. You'll need to load the http_client.clj file manually. The authorization string must be Base64 encoded, which requires external libraries or code that I don't have, so I just used CL to encode the username/password, colon separated.

    ReplyDelete
  3. The question i have is about compiling in clojure.
    AFAIK clojure does not save compiled files. Does that mean that it will start compiling all your source code, every time you run your program ? Looks like a showstopper for big programs.

    ReplyDelete
  4. At the moment, Clojure does not save compiled files, that is true. The compiler is rather simple, though, and compiling Clojure itself takes only 6 seconds on my machine. I would recommend that you try out whether this is a real problem before dismissing Clojure just for this reason. I suspect that there will be a .jar deployment mechanism before you have had the chance to write enough code to make compilation times a significant burden :)

    ReplyDelete
  5. Please clarify what you mean by "rich data structures." I'm not trying to be critical, just trying to learn.

    -Kiyu

    ReplyDelete
  6. Common Lisp has very good library functions for handling lists. All other data structures (arrays, hash tables, sets) are restricted and handled in a non-uniform manner. Contrasted to that, Clojure defines several useful basic data structures that share a common interface and require no additional code. I am aware of the fact that Common Lisp is extensible and that some new things can be retrofitted nicely. Having a rich set of first class data structures in the base language definition is good because it, if done right, guarantees a uniform API and interoperability of the data structures and the algorithms provided.

    ReplyDelete
  7. In response to Vagrif Verdi: there is no analog to a FASL file in Common LISP: Clojure code is compiled into byte-code when it is read. You can package Clojure sources in a JAR file, and load (transparently) from the JAR. I've found this to be pretty fast.

    ReplyDelete
  8. Hans, I have updated my article to include a pointer to clojure-contrib for the spit function. Sorry that set you back!

    ReplyDelete
  9. Using gen-and-save-class can get you a faster-loading compiled file in some situations. It's not as well-integrated and convenient as fasls in CL though.

    ReplyDelete
  10. An alternative if you are interested in using lisp I would suggest Lisp Flavoured Erlang (LFE) which is a concurrent lisp based on the features and limitations of the Erlang VM. It is a proper lisp with macros, sexprs, code-as-data together with the power of Erlang pattern matching and binaries. Best of all it seamlessly integrates with vanilla Erlang/OTP.

    There is no homepage for it yet but you can load from the user contributions at trapexit.org of from github at http://github.com/rvirding/lfe/tree.

    ReplyDelete
  11. In response to Zak:
    AFAIU, gen-and-save-class does not compile any clojure source itself. It generates byte-code for a Java class, which looks up the actual implementation at runtime in a clojure namespace. This has to be provided as an usual (although specially named) source file. This file is not subject to any compilation until it is loaded as any other source file. So gen-and-save-class does not give any improvement on load times.

    ReplyDelete
  12. As an update: I have deployed the Twitter gateway for Planet Lisp on FreeBSD. Getting the JDK required the usual procedure to get a native FreeBSD to run. With that in place, I could just run the code I had developed on Windows. Cross platform test: passed. Follow planet_lisp to be updated.

    ReplyDelete
  13. About debugging and profiling:
    http://blip.tv/file/1313503

    In this video Rich explains (around minute 55) that there is fantastic support for debugging and profiling.
    You have the full set of all Java tools available to do that, you can set breakpoints, step, etc.

    For us Common Lispers it is unfamiliar, as we expect these tools to come with the implementation.

    ReplyDelete