Saturday, August 11, 2012

lparallel and how I was thinking too complicated

Yesterday, I posted a short article that showed how I used lparallel to speed up a lengthy computation by reading input data files in parallel. A few readers responded, and the response from James M. Lawrence, who is the author of lparallel, is reproduced here because it contains source code that can't be properly formatted in blogspot comments (any suggestions regarding something better?).

[citing James M. Lawrence]
Here is a map-reduce-style version:
(defun slurp-alist (hash-table alist)
    :for (key . data) :in alist
    :do (setf (gethash key hash-table) data))

(defun load-files-in-parallel (pathnames)
  (flet ((parse (pathname)
           (let ((result nil))
              (lambda (key data)
                (push (cons key data) result)))
    (reduce #'slurp-alist
            (lparallel:pmap 'vector #'parse pathnames)
            :initial-value (make-hash-table))))

The benefit here is that worker threads do not have to contend for a queue lock. A possible drawback is that the key-data pairs survive until the final reduce, whereas the pairs have a shorter lifetime when passed through a queue. Of course performance will depend upon the scale and shape of the data.

Benchmarks with early versions of lparallel showed better performance with a vector-based queue, however since then I found a simple cons-based queue to be generally faster, which is currently the default. There is still a compile-time option to get the vector-based queue, and unless that is enabled the number passed to make-queue is ignored. In other words, the 10000 is a lie :)

Motivated by James' reply, I spent some time to find out how the process could be further optimized. The idea to have the workers cons up the data and then collect all data into the hash table at the end of the operation reduced the run time for my application by some 30%, which is not bad.

A comment from Max Mikhanosha on G+ suggested that I could avoid the queue and directly use a thread safe hash table. It is a little embarrassing that I assumed that I could not write hash tables from multiple threads. Both CCL's and SBCL's hash tables provide for thread safe modes, so there is no point in tunneling the data through a queue.

As I am using CCL, I tried different values for the :SHARED keyword argument to MAKE-HASH-TABLE. The default is :LOCK-FREE, which makes hash tables thread safe without requiring the grabbing of a lock for each operation. Other permitted values are T, which locks the table for each access, or NIL for thread-unsafe hash tables.

I found that when writing to the hash table directly from the file readers, :SHARED T would give me the best performance. With that option, my code ran another 10% faster than with consing up the file results and then processing all the lists at the end.

lparallel is a great example how Common Lisp is an extensible language. The operators provided by it seamlessly integrate parallel programming into the language, and with the help of my readers, the code that I originally posted now became very small:

(defun load-files-in-parallel (pathnames)
  (let ((hash-table (make-hash-table :shared t)))
    (lparallel:pmap nil
                    (lambda (pathname)
                      (read-and-parse-file pathname
                                           (lambda (key data)
                                             (setf (gethash key hash-table)

Look, Ma! All I needed was pmap!


1 comment:

  1. The new lparallel-1.7.0 removes the complexity mentioned above for (future (pmap ...)). It's now optimized without need of user intervention.