Friday, August 10, 2012

Using lparallel to utilize multiple cores [updated]

A while ago, I discovered the lparallel library as a nice set of abstractions to make parallel programming in Common Lisp easier and more pleasant. The use case that I presented in my previous blog post was more about dealing with asynchronicity rather than parallelism, though. As I have seen people ask for lparallel examples, I will present how I used it to actually speed up a long running computation by having multiple cores work on the problem at the same time.

The problem that I have been working on involved reading and parsing a few large data files to extract some data from them. The extracted data needed to be stored in one hash table for later processing. The reading and parsing process is CPU bound, so reading the data files in parallel will speed up the overall process. "Speed up" here means "reduce the run time because I'm waiting for completion", not "make it as fast as possible" or "have it utilize all cores". I'm trying to reduce my own wait times, and I cannot spend time debugging locks. Thus, the high abstraction level of lparallel fits me just right

For the task at hand, I created a set of workers which read the input files and send the extracted data items to a collector process by the way of a queue. The collector process puts them into a hash table for later processing. Once all files are read, the hash table is returned.

(defun load-files-in-parallel (pathnames)
  (let ((queue (lparallel.queue:make-queue 10000)))
      (lparallel:pmap nil
                      (lambda (pathname)
                         (lambda (key data)
                           (lparallel.queue:push-queue (list key data)
                        (lparallel.queue:push-queue :done queue))
    (let ((hash-table (make-hash-table))
          (remaining-worker-count (length pathnames)))
      (loop while (plusp remaining-worker-count)
            for item = (lparallel.queue:pop-queue queue)
            if (eq item :done)
              do (decf remaining-worker-count)
            else do (destructuring-bind (key data) item
                       (setf (gethash key hash-table) data)))

Some points are worth noting:

  • Instead of using a queue to send the data to a separate worker for hash table insertion, the hash table could be locked or a lock-free hash table could be used. This may or may not be faster.
  • The scheduling of the workers is done by lparallel and may or may not be efficient. I've looked at CPU utilization and found the process to use 3-6 cores, so it seems that the scheduling is not optimal. I'm guessing that the queue processing and hash table insertion eat more time than I'd hope for, but I have not made a real assessment.
  • I've picked the size of the queue randomly. It may be worthwhile to make the queue larger so that file readers are not blocked while hash table rehashing occurs.

Implementing this parallel file loader took me just a few minutes and gave my program considerable speedup. Using memory is already easy, and with lparallel, using multiple cores also becomes much easier than it has been for me.

[Update] James M. Lawrence, the author of lparallel, kindly pointed out that the original version of my example code contained an unneeded forced future to read the data. I have updated the code according to his remark.


1 comment:

  1. There's so many concurrency libraries for CL nowadays! Have you used any of the others mentioned on Cliki? If so, how do they compare?