Saturday, September 14, 2013

Waiting for multiple queues in Python

It is impossible to correctly wait for multiple queues in Python and only consume the first element of any of them.

This came as a big surprise to me. Both because it seems like an operation that should be supported, but also because this never had been a problem in practice.

While reading up on the programming language Go, I enjoyed the primitives they provide for concurrency. The three basic concepts are the go operator which calls a function in a new thread, the typed channel which can be used to transfer values between threads, and an extension of the switch statement which can be used to wait for values on channels.

To play around a bit with the concepts, I wanted to port these to Python. Implementing go using Python’s threading.Thread and its target argument is trivial. A channel is nothing but good old queue.Queue. But the extended switch statement stumped me.

What I’d need would be a function get_first(*channels) which returns a tuple (channel, obj) representing the channel which had the first contents, and the object read from it. Furthermore, no other channels should have an object removed from them.

Short of busy-polling each queue in turn, there is no way of implementing this.

I found some discussion on how to do this for multiprocessing.Queue which utilizes the fact that it is implemented using file descriptors, where we can use select, but queue.Queue uses thread.Lock as the underlying mechanism to ensure thread safety, and there is no way of waiting for more than one lock in a given thread.

A workaround would be to start one polling thread per queue to wait on, and have each of them write the (queue, obj) tuple to a common queue, and the main thread then simply waits on that queue. This would solve the first part of the problem, but it would also mean that the original queues can get objects read from them even when another queue already produced an object.

The funny thing is, as I mentioned, that this never was a problem in practice for me. It seems that when designing message-passing systems, the idea that a thread has one and exactly one “inbox” for messages is quite natural and works well. I’d love to see some concrete examples of Go applications using this feature, and what design decisions encourage this.