Sunday, December 30, 2012

December 30, 2012

DRAKMA documentation updates

The DRAKMA documentation has received a big overhaul to bring the docstrings and the HTML documentation in sync again. I also restructured the large example section in the beginning so that the individual examples are indexed. I also added changed two examples so that they deal with JSON data as JSON is so popular nowadays.

DRAKMA and JSON

A user suggested that DRAKMA should handle JSON data like text data, i.e. that the content type "application/json" should, by default, be on the list of text content types. The issue with that suggestion is that JSON data is not following the encoding rules for text content type. By default, JSON data is encoded in UTF-8. It is possible to use UTF-16 and UTF-32 by the way of a byte order mark transmitted in the beginning of the file. For textual content types, the character set is determined by the charset parameter in the Content-Type header, which is not used with "application/json".

I have thought about adding a DRAKMA:HTTP-JSON-REQUEST function that would deal with JSON encoding properly, but I think such a function would rather belong into a JSON library than into DRAKMA, so it is left out for now.

How useful are docstrings, really?

Getting the docstrings and the documentation into sync again was pretty laborious, and I fear that I have been missing quite a few things that were in one version of the documentation, but not in the other. So far, Edi and I have used a forward-annotation strategy where the initial version of the documentation was created off the docstrings in the source code, and then manually edited to contain links and other niceties that real documentation deserves. This, of course, is a pretty bad strategy, as edits must be repeated or the docstrings need to contain some form of markup so that no edits are needed.

Going forward, I will probably start to generate docstrings from the XML documentation source. That way, the documentation source can contain links and other, more advanced markup that is not useful in docstrings, and the two representations will not go out of sync. The docstrings will no longer be inline with the source code, but go into a separate file that is generated as part of the release process.

Any thoughts? Use comments or send email.

Saturday, August 11, 2012

August 11, 2012

lparallel and how I was thinking too complicated

Yesterday, I posted a short article that showed how I used lparallel to speed up a lengthy computation by reading input data files in parallel. A few readers responded, and the response from James M. Lawrence, who is the author of lparallel, is reproduced here because it contains source code that can't be properly formatted in blogspot comments (any suggestions regarding something better?).

[citing James M. Lawrence]
Here is a map-reduce-style version:
(defun slurp-alist (hash-table alist)
  (loop
    :for (key . data) :in alist
    :do (setf (gethash key hash-table) data))
  hash-table)

(defun load-files-in-parallel (pathnames)
  (flet ((parse (pathname)
           (let ((result nil))
             (read-and-parse-file
              pathname
              (lambda (key data)
                (push (cons key data) result)))
             result)))
    (reduce #'slurp-alist
            (lparallel:pmap 'vector #'parse pathnames)
            :initial-value (make-hash-table))))
The benefit here is that worker threads do not have to contend for a queue lock. A possible drawback is that the key-data pairs survive until the final reduce, whereas the pairs have a shorter lifetime when passed through a queue. Of course performance will depend upon the scale and shape of the data.

Benchmarks with early versions of lparallel showed better performance with a vector-based queue, however since then I found a simple cons-based queue to be generally faster, which is currently the default. There is still a compile-time option to get the vector-based queue, and unless that is enabled the number passed to make-queue is ignored. In other words, the 10000 is a lie :)

Motivated by James' reply, I spent some time to find out how the process could be further optimized. The idea to have the workers cons up the data and then collect all data into the hash table at the end of the operation reduced the run time for my application by some 30%, which is not bad.

A comment from Max Mikhanosha on G+ suggested that I could avoid the queue and directly use a thread safe hash table. It is a little embarrassing that I assumed that I could not write hash tables from multiple threads. Both CCL's and SBCL's hash tables provide for thread safe modes, so there is no point in tunneling the data through a queue.

As I am using CCL, I tried different values for the :SHARED keyword argument to MAKE-HASH-TABLE. The default is :LOCK-FREE, which makes hash tables thread safe without requiring the grabbing of a lock for each operation. Other permitted values are T, which locks the table for each access, or NIL for thread-unsafe hash tables.

I found that when writing to the hash table directly from the file readers, :SHARED T would give me the best performance. With that option, my code ran another 10% faster than with consing up the file results and then processing all the lists at the end.

lparallel is a great example how Common Lisp is an extensible language. The operators provided by it seamlessly integrate parallel programming into the language, and with the help of my readers, the code that I originally posted now became very small:

(defun load-files-in-parallel (pathnames)
  (let ((hash-table (make-hash-table :shared t)))
    (lparallel:pmap nil
                    (lambda (pathname)
                      (read-and-parse-file pathname
                                           (lambda (key data)
                                             (setf (gethash key hash-table)
                                                   data))))
                    pathnames)
    hash-table))

Look, Ma! All I needed was pmap!

Friday, August 10, 2012

August 10, 2012

Using lparallel to utilize multiple cores [updated]

A while ago, I discovered the lparallel library as a nice set of abstractions to make parallel programming in Common Lisp easier and more pleasant. The use case that I presented in my previous blog post was more about dealing with asynchronicity rather than parallelism, though. As I have seen people ask for lparallel examples, I will present how I used it to actually speed up a long running computation by having multiple cores work on the problem at the same time.

The problem that I have been working on involved reading and parsing a few large data files to extract some data from them. The extracted data needed to be stored in one hash table for later processing. The reading and parsing process is CPU bound, so reading the data files in parallel will speed up the overall process. "Speed up" here means "reduce the run time because I'm waiting for completion", not "make it as fast as possible" or "have it utilize all cores". I'm trying to reduce my own wait times, and I cannot spend time debugging locks. Thus, the high abstraction level of lparallel fits me just right

For the task at hand, I created a set of workers which read the input files and send the extracted data items to a collector process by the way of a queue. The collector process puts them into a hash table for later processing. Once all files are read, the hash table is returned.

(defun load-files-in-parallel (pathnames)
  (let ((queue (lparallel.queue:make-queue 10000)))
    (lparallel:future
      (lparallel:pmap nil
                      (lambda (pathname)
                        (read-and-parse-file
                         pathname
                         (lambda (key data)
                           (lparallel.queue:push-queue (list key data)
                                                       queue)))
                        (lparallel.queue:push-queue :done queue))
                      pathnames))
    (let ((hash-table (make-hash-table))
          (remaining-worker-count (length pathnames)))
      (loop while (plusp remaining-worker-count)
            for item = (lparallel.queue:pop-queue queue)
            if (eq item :done)
              do (decf remaining-worker-count)
            else do (destructuring-bind (key data) item
                       (setf (gethash key hash-table) data)))
      hash-table)))

Some points are worth noting:

Instead of using a queue to send the data to a separate worker for hash table insertion, the hash table could be locked or a lock-free hash table could be used. This may or may not be faster.
The scheduling of the workers is done by lparallel and may or may not be efficient. I've looked at CPU utilization and found the process to use 3-6 cores, so it seems that the scheduling is not optimal. I'm guessing that the queue processing and hash table insertion eat more time than I'd hope for, but I have not made a real assessment.
I've picked the size of the queue randomly. It may be worthwhile to make the queue larger so that file readers are not blocked while hash table rehashing occurs.

Implementing this parallel file loader took me just a few minutes and gave my program considerable speedup. Using memory is already easy, and with lparallel, using multiple cores also becomes much easier than it has been for me.

[Update] James M. Lawrence, the author of lparallel, kindly pointed out that the original version of my example code contained an unneeded forced future to read the data. I have updated the code according to his remark.

Saturday, June 30, 2012

June 30, 2012

Orphaned projects on common-lisp.net

We have lost contact to the maintainers of the following projects on common-lisp.net:

fret
ganelon
innen
erlisp

If you are the original maintainer of any of these, please tell us your current email address or let us know if we can delete the project.

Saturday, June 2, 2012

June 02, 2012

lparallel - A Parallel Programming Library for Common Lisp

Parallel programming is hard, and as CPUs are getting ever-faster, I usually tend to find optimizing a single thread of control to be less risky than dealing with threads, locks and synchronized data structures. Recently, though, I had to deal with a reporting function which I started from a Hunchentoot handler and that was running long enough to time out the client. I needed to somehow put running the function into the background, and as I heard good things about the relatively new lparallel library, I thought I could give it a try.

lparallel implements an abstraction layer for parallel programming in Common Lisp. In addition to relatively low-level concepts like tasks and channels, it implements mid-level promises and futures as well as high-level parallel mapping, binding and reducing functionality. Futures looked like a suitable mechanism to solve my problem at hand.

The non-parallel version of my HTTP request handler looked something like this:

(hunchentoot:define-easy-handler (something-that-takes-long :uri "/sttl")
    (parameter)
  (if (eql (hunchentoot:request-method*) :post)
      (compute-and-display-response-page parameter)
      (show-job-parameter-form)))

The compute-and-display-response-page function does what the name suggests. As usual with Hunchentoot request handlers, it returns the page to display to the user. Likewise, the show-job-parameter-form returns a HTML form that collects the parameter. Now, compute-and-display-response-page potentially takes long, so instead of just delaying the response to the HTTP request until the function returns, I want to have it execute in the background and respond to the request with a "job has been started" message. The client is then supposed to poll in regular intervals. Once the background job completes, the page that the function has generated is returned to the client.

Using a future, I came up with something like this:

(hunchentoot:define-easy-handler (something-that-takes-long :uri "/sttl")
    (parameter)

  ;; Get or start Hunchentoot session context
  (hunchentoot:start-session)

  (let ((job-running (hunchentoot:session-value :job)))
    (cond

      ;; Previously started job has finished
      ((and job-running
            (lparallel:fulfilledp job-running))
       (hunchentoot:delete-session-value :job)
       (lparallel:force job-running))

      ;; Previously started job still running
      (job-running
       "Previous job still running")

      ;; Start new job
      ((eq (hunchentoot:request-method*) :post)
       (setf (hunchentoot:session-value :job)
             (lparallel:future
               (compute-and-display-response-page parameter)))
       "The job has been started")

      ;; Display job parameter form
      (t
       (show-job-parameter-form)))))

Hunchentoot's session mechanism is used to make the future accessible to subsequent requests. The future is placed in a session value; the completion of the background calculation is determined by calling lparallel:fulfilledp. Once the computation is finished, the return value of the compute-and-display-response-page is determined using lparallel:force.

This is mostly it, and I find parallel programming in this case easy to understand and reason about. Some additional things are worth mentioning:

To use lparallel, one has to create a kernel, which constitutes the worker thread pool in which futures and other lparallel tasks are executed. Multiple kernels can coexist, the kernel to use is determined by the lparallel:*kernel* special variable. In my application, I am creating a kernel with one worker thread. This provides me with automatic request serialization so that only one long-running function invocation exists at any time, without the need to do any locking or explicit resource management. I like that a lot.
By default, errors that occur in a worker thread cause the debugger to be entered. This can be changed by setting lparallel:*debug-tasks-p* to nil. Then, errors will be caught and signalled to the caller of lparallel:force, which in my case was the right thing to do (i.e. if the background process triggers an error, I want the backtrace to be sent to the client.
One thing I find slightly bothersome is the reverse order of the cond clauses in the request handler - During the lifetime of a typical session, the last clause of the cond is true first, then the second to last and so on. This is a pattern that I'm seeing quite often when writing http request handlers. Maybe a cond-reverse macro would be suitable in this case.

lparallel has more to offer of course, and I have only used a very small part of it. So far, I have found it to be very well thought out, documented appropriately and well maintained. If you need parallel programming or even just easy background execution in your Common Lisp programs, I recommend looking at it.

Friday, March 23, 2012

March 23, 2012

BVG - Es ist Zeit, dass Ihr die kaputten U-Bahn-Fahrkartenautomaten ENDLICH repariert

Ich zeige Euch mal, wie Fahrkarten-Automaten in DDR-Straßenbahnen funktioniert haben, vor 25 Jahren:

Durch Ziehen des Hebels wurde das eingeworfene Geld - hinter einer Glasscheibe sichtbar - nach rechts gedreht und ein Fahrschein ausgegeben. Durch die Scheibe konnte man also sehen, was die beiden vorherigen Fahrgäste bezahlt hatten.

Man konnte den Hebel bedienen, ohne Geld eingeworfen zu haben, und erhielt trotzdem ein Ticket. Gedacht war es so, dass die Bezahlung "sozial" kontrolliert wurde, d.h. jeder im Wagen konnte sehen, ob gerade jemand bezahlt hatte und ob der richtige Betrag eingeworfen wurde. Zeitkarten-Inhaber hatten ihre Zeitkarte in die Höhe zu halten, um so zu erklären, dass sie nicht zur Zahlbox gehen mussten.

So weit, so gut. Man mag jetzt darüber philosophieren, ob das nur in einem Spitzelstaat funktionieren kann oder einfach locker und normal war, mich hat das als westberliner Kind auf Besuch jedenfalls beeindruckt. Simpel, einfach, unkompliziert.

Warum ich das schreibe?

deshalb.

Erstmal: Wenn ich eine Fahrkarte kaufen möchte, dann, weil ich fahren will. Ich bin nicht an "anderen Angeboten" interessiert, und ich fände es auch total unhöflich gegenüber folgenden Fahrgästen, wenn ich mir noch die interessanten "anderen Angebote" ansehen würde. Die könnten es ja auch eilig haben. Das scheint bei der BVG aber niemanden zu interessieren, und deswegen wird auf dem Fahrkartenautomaten erst mal eine Werbe-Seite angezeigt mit irgendwelchen irrelevanten Informationen, die noch nie jemand gelesen hat, ausser, wenn er eine Fahrkarte kaufen wollte und herausfinden wollte, wie das geht. Irgendwo in diesem Werbemüll steht "Berühren sie den Bildschirm", und wenn man das macht, dann wird auch tatsächlich die Auswahlseite für die Fahrscheine angezeigt.

Diese Werbeseite ist komplett sinnlos und muss weg. Alle, die sich diesem Automaten nähern, wollen Fahrkarten kaufen. Wozu zur Hölle müssen diejenigen, die es noch nicht wissen, diesen Bildschirm angucken und herausfinden, dass sie ihn berühren müssen? Es gibt da auch nichts zu "saven" oder so. Es kostet einfach nur zusätzliche Zeit, diesen Bildschirm zu verstehen. Er fügt der Dienstleistung "öffentlicher Nahverkehr" keinerlei Mehrwert hinzu. Wenn es noch zusätzliche tolle Sachen gibt, dann sollten die bitteschön auf Werbeflächen beworben werden, nicht auf einem Fahrkartenautomaten, den die Fahrgäste dazu benutzen, sich eine Fahrkarte zu kaufen.

Okay fein, ich habe also den Bildschirm berührt und der Automat zeigt mir an, welche Fahrkarten er mir verkaufen kann. Ich muss zugeben, dass sich hier in den vergangenen Jahren, seit die BVG die grandiose Idee hatte, die Automaten, welche für jeden Fahrkartentyp eine eigene Taste haben und die in der Straßenbahn auch nach wie vor im Dienst sind, durch diese Touchscreen-Windows-Gurken zu ersetzen, einiges getan hat. Früher hatte diese Schrottsoftware auch noch eine Latenz im Sekundenbereich, heutzutage werden immerhin die wichtigsten Fahrscheine direkt auf der Einstiegsseite zum Kauf angeboten. Wenn man Glück hatte und gerade vorher jemand anderes einen Fahrschein gekauft hat, muss man also nur den Fahrschein auswählen und darf dann gleich bezahlen.

Aber:

Das Bezahlen dauert immer noch so lange, dass ich mich dem Automaten inzwischen stets mit dem ausdrücklichen Bewusstsein nähere, den gleich kommenden Zug nicht erreichen zu können. Nur so kann ich Frustration vermeiden und gelegentlich komme ich sogar in die Situation, einen Fahrschein erworben zu haben, bevor ein Zug ein- und ausgefahren ist, was mir dann kleine Glücksgefühle beschert.

Das Prüfen meines Geldes benötigt in diesem Wunderwerk der Technik nämlich tatsächlich Sekunden. Wenn ich mehrere Fahrscheine kaufe, dann wird zwischen dem Ausdruck der Fahrscheine jeweils mehrere Sekunden gewartet, und das Auszahlen des Restgeldes muss der Automat sich auch sekundenlang überlegen. Ich weiss, dass das nicht sein muss. Ich weiss, das das schneller gehen kann, wenn man die Zeit des Fahrgastes wertschätzt und nichts unversucht lässt, den eigentlich unnötigen Vorgang des Fahrscheinerwerbs so schnell wie möglich durchzuführen.

Ganz bizarr wird es, wenn man mit seiner EC-Karte bezahlt. Das man bei uns immer eine PIN eingeben muss, und nicht, wie in vielen anderen Ländern der Welt, einfach seine Karte durchzieht und damit in einer oder zwei Sekunden bezahlt hat, das prangere ich in diesem Zusammenhang noch nicht mal an. Dass ich aber, wenn ich einen Kurzstrecken-Fahrschein für 1,30 Euro gekauft habe, noch warten muss, bis mir ein komplett sinnloser und mit total überflüssigen Fakten bedruckter "EC-Beleg" produziert wurde, das kann ich einfach überhaupt nicht begreifen. Auf dem Zettel steht, ich soll ihn aufbewahren, aber daran denke ich doch gar nicht. Wie kommen die eigentlich dazu, mir dieses Papierstück aufzudrängen? Nur aus Höflichkeit gegenüber meinen Mitreisenden lasse ich den Müll nicht direkt im Automaten liegen, sondern schmeisse ihn in den nächsten Mülleimer, aber das ist einfach der Gipfel des Schwachsinns. Auf dem Fahrschein ist genug Platz, um das Zahlungsmittel zu dokumentieren, wenn das tatsächlich von der Buchhaltung so verlangt wird.

Um bei dieser Kurzstrecke zu bleiben: Den Fahrschein hab ich nun also, der Zug steht bereit und ich will ihn nehmen, aber jetzt muss ich das Ding auch noch entwerten? Auf dem Bahnsteig? Wie bescheuert ist das denn eigentlich? Der Vorverkauf ist ja wohl der Sonderfall, und der normale ist, dass man hier seinen Fahrschein für den sofortigen Fahrtantritt kauft. Wenn dann, wie am U-Bahnhof Rosenthaler Platz, nur ein Entwerter für zwei Automaten zur Verfügung steht, dann ist der Zug eben doch wieder weg, bevor man den Entwerter gefunden und das kostbare Dokument validiert hat. Ich hab's ja so geplant und deswegen ist es auch keine grosse Enttäuschung, aber

Wer auch immer bei der BVG für diesen Kack verantwortlich ist: Sie sind ein Amateur. Sie gehören nicht auf diese Stelle. Machen Sie Platz für jemanden, der seine Aufgabe ernst nimmt und für den das Wichtigste nicht das sorgfältige Zählen des Fahrgeldes, sondern die bequeme und zügige Beförderung der Fahrgäste ist.

Wednesday, February 8, 2012

February 08, 2012

TRACEs of awesomeness

Debugging is hard, and I admit it freely: I am using print statements regularly to debug code that I'm writing, in particular when not writing Lisp code. With Lisp and its interactive development environment, though, my debugging activity often involves calling functions from the repl and tracing individual functions.

The standard TRACE facility that Common Lisp offers often replaces adding debug print statements for me. In order for tracing to help, the code needs to be structured so that a function is called at the place that is interesting in the particular debug situation. In many situations, one is not really interested in all invocations of a function, but only under some conditions. When debugging functions that are called often or that have large argument lists, simple traces can easily grow huge and analyzing trace logs can become a pain. Also, the trace output can slow down processing enough to make the simple approach fail.

Where TRACE's full power lives

The CLHS entry on TRACE suggests that there may be implementation dependent arguments, and in fact most implementations provide additional options that make the TRACE facility much more powerful. In this post, I'm going to show you a few examples of advanced TRACE options in three implementations to show what one may expect to be available. The examples and implementations chosen are pretty arbitrary and are not indicative of what the particular tracing facilities offer. Please look up the details in the respective documentation.

Breaking with CCL

Clozure CL's TRACE supports :BREAK-BEFORE to enter the debugger before a function is called:

Welcome to Clozure Common Lisp Version 1.7-r15160M  (DarwinX8664)!
? (defun foo ())
FOO
? (trace (foo :break-before t))
NIL
? (foo)
0> Calling (FOO) 
> Break: FOO trace entry: NIL
> While executing: (CCL::TRACED FOO), in process listener(1).
> Type :GO to continue, :POP to abort, :R for a list of available restarts.
> If continued: Return from BREAK.
> Type :? for other options.
1 > :go
hello
<0 FOO returned NIL
NIL

Notice how the debugger is entered before FOO is called.

Statically selective tracing with Allegro CL

Sometimes, one only wants to trace a function when it is called from a certain other function. Allegro Common Lisp's tracer offers the :INSIDE option:

International Allegro CL Enterprise Edition
8.2 [64-bit Mac OS X (Intel)] (Dec 21, 2011 13:59)
Copyright (C) 1985-2010, Franz Inc., Oakland, CA, USA.  All Rights Reserved.
[...]
cl-user(1): (defun foo (arg) (format t "foo called, arg ~A~%" arg))
foo
cl-user(2): (defun bar () (foo "from bar"))
bar
cl-user(3): (defun qux () (foo "from qux"))
qux
cl-user(4): (defun main () (bar) (qux))
main
cl-user(5): (trace (foo :inside qux))
(foo)
cl-user(6): (main)
foo called, arg from bar
 0[2]: (foo "from qux")
foo called, arg from qux
 0[2]: returned nil
nil

Note how the function FOO is called with no tracing when called from BAR, but with tracing from QUX.

Dynamically selective tracing with SBCL

Sometimes, one wants to trace a function based on some dynamic condition. SBCL's tracing facility offers this, as well as programmatic access to the argument list of the function being called:

This is SBCL 1.0.55.0-abb03f9, an implementation of ANSI Common Lisp.
[...]
* (defun foo (arg) (format t "foo called, arg ~A~%" arg))
FOO
* (defun bar () (foo 1) (foo 2))
BAR
* (trace foo :condition (= (sb-debug:arg 0) 2))
(FOO)
* (bar)
foo called, arg 1
  0: (FOO 2)
foo called, arg 2
  0: FOO returned NIL
NIL

Note how tracing is only enabled when FOO is called with a specific argument value.

Wrapping up

The purpose of this post was to show you some of the non-standard tracing options that Common Lisp implementations offer. As you can see, they are powerful debugging tools that can be used with no source code changes required. As these options are not standardized, make sure that you check out the documentation of your implementation. Happy debugging!

netzhansa