Paraprogramming Dispatches

CL-JPEG has been updated

Eugene Zaikonnikov

A long due update to Common Lisp JPEG Library has now been merged into sharplispers/cl-jpeg.

The summary of changes:

  • The various global state tables and counters were moved into special variables of decoder/encoder functions. This should address the concerns of thread safety.
  • The monolithic source file was broken up into several according with modern way of structuring the projects.
  • Metadata for :author and :description was added to the project’s .asd.
  • The contributors are now listed. Do let me know if I forgot to add someone.

Revisiting own 16 year old code was an interesting experience. Thanks to everyone who kept the project alive over the years.

Wait channels as debugging aid for threading bugs

Eugene Zaikonnikov

..In every system programmer’s career inevitably comes a moment when they encounter a non-obvious, asymptomatic, poorly reproducible bug in critical software running on remote site without remote debug instrumentation.

Well there was a symptom. The device would put up a diagnostic message amounting to “oops” and stop responding to stimuli. The log file would stop growing while the generating thread is still around. The process would not react to SIGQUIT (denying me a core dump). In other words, a very good hint of a deadlock.

To top that, the issue would appear only on few devices out of a hundred, and after running for at least few weeks.

In cases like this, however, there’s still a great tool at disposal: wchan (wait channel) of procfs. Per the manpage:

 (35) wchan  %lu
      This  is the "channel" in which the process is waiting.  It is the address
	  of a location in the kernel where the process is sleeping.  The corresponding
	  symbolic name can be found in /proc/[pid]/wchan.

Each process has also associated tasks, in this case threads, found in /proc/[pid]/tasks/, which is again a list of PIDs with a wchan field associated. I start with skimming through the application threads on an affected system, looking for something out of ordinary.

The application is structured around a central SYSV message queue, where a state machine dispatches the messages and notifies the subscribed functional modules. Some of the modules are separate threads, but a number of simpler tasks are executed directly in the dispatcher thread.

Looking at the one of the threads reveals it is blocked in do_msgsnd(): the kernel call for sending SysV IPC message:

 # cat /proc/306/task/311/wchan 

Now, sending a message can block in one case only: the queue is full. Again, we can check that via procfs.

 # cat /proc/sysvipc/msg
       key      msqid perms      cbytes       qnum lspid lrpid 
    151273          0     0       16384        128   306   306

Yep, qnum is at 128, the limit on the system, so the queue is full. The only place that consumes the queue is the dispatcher thread, and the most likely way for that part to lock up is for notification loop to block on some call.

After checking the wchan value of the dispatcher thread versus one on a healthy system, the likely offender is found:

 # cat /proc/306/task/317/wchan 

This is the kernel call handling the opening of Unix socket. The only place where communication over AF_UNIX happens is the code forwarding system status over to snmpd. Upon examination it turns out cough the socket is being opened and closed for every message session. Normally this shouldn’t be a problem, but the system runs with pretty ancient kernel, affected by certain bugs like CVE-2009-3621, and this is just not a good practice in general.

The conclusion is, there is a way out of most desperate situations. If you think you are stuck, remember it’s nothing compared to what Apollo 13 crew had to go through.

The simplest bitmap format ever

Eugene Zaikonnikov

So here’s one fine, terse, human readable (!) image format you can roll quickly in a squeeze. It is also understood by nearly every web browser and image viewer out there. It’s Netpbm, the underdog of image processing world.

There are three subformats, for binary, grayscale and RGB images. Each comes in human-readable ASCII and binary version. Here’s a sample for ASCII grayscale borrowed from pgm manpage:

       # feep.pgm
       24 7
       0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
       0  3  3  3  3  0  0  7  7  7  7  0  0 11 11 11 11  0  0 15 15 15 15  0
       0  3  0  0  0  0  0  7  0  0  0  0  0 11  0  0  0  0  0 15  0  0 15  0
       0  3  3  3  0  0  0  7  7  7  0  0  0 11 11 11  0  0  0 15 15 15 15  0
       0  3  0  0  0  0  0  7  0  0  0  0  0 11  0  0  0  0  0 15  0  0  0  0
       0  3  0  0  0  0  0  7  7  7  7  0  0 11 11 11 11  0  0 15  0  0  0  0
       0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

The header is the format magic (P2 is ASCII grayscale), arbitraty comment lines starting with ‘#’, image dimensions and the maximum grayscale value. The latter is rather nice property as it allows for flexible grayscales. The header is followed by ASCII image value payload, formatted according to the header specs.

Here’s a short Common Lisp code example.

(defun image-to-pgm (image filename)
  (with-open-file (f filename :direction :output
  		     	      :if-exists :overwrite :if-does-not-exist :create)
    (let ((y (first (array-dimensions image)))
          (x (second (array-dimensions image))))
      (format f "P2~%~D ~D~%255" y x)
      (loop for i from 0 below y do
	   (terpri f)
	   (loop for j from 0 below x do
		(format f "~D " (- 255 (aref image i j))))))))

Now the impracticality of it in every day use is clear. It is wasteful with bytes (although it should compress just as well as its binary counterpart). The spec also limits the ASCII version to 70 characters per line, although some parsers would accept longer ones. However it is great little format when you need to viusalize smaller things but don’t feel like using an image format library.

Newer Posts »