Dec 2, 2012, 1:41:18 AM12/2/12
As promised, I have a different sort of puzzle for you this month:a
small caricature of the sort of application which could benefit from
parallelization and which can be easily parallelized using celluloid on
ruby 1.9...with a catch.
By "small" I mean "small in comparison to a real application," but it is
unfortunately significantly larger than my typical hangman puzzle (and
lacks a clever-twist "one right answer"), so I'm sending it out a few
days ahead of time to give people a chance to tackle it.
Imagine a simple shipping scenario involving three main classes: Boxes
of some finite size containing Items going to Locations. There are
typically many Items being sent to each Location, and there's a good
business reason for minimizing the number of Boxes sent by putting
multiple Items in each Box, so our Location objects hold onto Boxes
until either they are too full or the end of the run, at which point we
send them to their destination (a Location).
Thus at any given time we will have some number (typically 0..3 or so)
Boxes queued up for each Location.
When an Item destined for some Location comes in, it is put in the first
of these Box that has room for it, or a new Box, if there aren't any.
If the Box becomes full when the Item is added to it, it is shipped.
To make it make sense to parallelize, the basic steps (queuing a Box for
a given Location, packing an Item in a Box, and shipping a Box to a
Location, etc.) are assumed to take some non-trivial amount of time,
simulated with a sleep. In a real application, this would be an
externally mandated delay, such as waiting for a robot arm to move
things around, or for a web site to respond to a request, etc.
This is pretty straight forward to code as a single threaded process,
and a small amount of wrapping can make "the same code" run either
vanilla ruby, celluloid with no async calls, and celluloid with async.
For test data I'm using a thousand words from /usr/share/dict/words to
seed my Items. The size of the Item is based on the length of the string
an and the Location it is to be shipped to is determined by the second
character (thus "Hippopotamus" would be sent to location "I"). The Box
capacity is set to 140 characters (Tweet tweet).
To "deliver" the boxes I just print them out to STDOUT like so:
ageing, agglomerating, agglutinating, aggrandizing, aggravating,
aggregating, aggrieving, aging, agitating, agog, agonizing,
Thus we can monitor that all our words are getting delivered by simply
piping STDOUT through wc.
Running vanilla with a 1 millisecond delay works fine:
> time ruby ship.rb X 0.001 | wc
lag = 0.001s
91 1000 10843
It took just over 2 seconds of wall clock time to deliver 1000 Items in
91 boxes (10843, the character count, doesn't have a cute
Running with celluloid also works, but it's a lot slower:
> time ruby ship.rb C 0.001 | wc
using celluloid, lag = 0.001s
I, [2012-11-30T11:33:35.709578 #14502] INFO -- : Terminating
I, [2012-11-30T11:33:35.846040 #14502] INFO -- : Shutdown
91 1000 10843
Going async gains us back a little time, but we only get 998 of our
thousand words out (a bug which hints at the crux of this puzzle).
> time ruby ship.rb A 0.001 | wc
using celluloid, asychronous, lag = 0.001s
I, [2012-11-30T11:35:16.836952 #15647] INFO -- : Terminating
I, [2012-11-30T11:35:16.971569 #15647] INFO -- : Shutdown
90 998 10817
Looking at just the speed difference for now, a reasonable hypothesis is
that making async calls only makes sense if the operation we are putting
off to the other thread takes a significant amount of time compared to
the overhead of thread management, etc. So let's try making our slow
operations take 20 milliseconds on average, instead of just one:
> time ruby ship.rb X 0.02 | wc
lag = 0.02s
91 1000 10843
The vanilla ruby version is about 6x slower than the first time, meaning
the lag was taking up about 25% (5/19) of the time at 1 millisecond, and
87% (20/23) at 20 milliseconds, so we should expect to see an actual
throughput gain going async. And we do:
> time ruby ship.rb A 0.02 | wc
using celluloid, asychronous, lag = 0.02s
I, [2012-11-30T11:37:42.270422 #16804] INFO -- : Terminating
I, [2012-11-30T11:37:42.366634 #16804] INFO -- : Shutdown
118 2236 24016
The async version is only slightly slower than it had been, and
significantly faster than the single threaded version, as expected. A
clear win for asynchronicity.
What we also see is that we're no longer dropping items. In fact, a
full 2236 of our 1000 words are being delivered.
There are a few small changes that could be made that would drastically
improve the throughput advantage of celluloid, and it is instructive to
try to find them.
But the real challenge here is to fix the code so that it works as
reliably async/threaded as it does vanilla, whilst keeping a good faith
preservation of the problem definition (e.g., no introducing global
locks to make it effectively single threaded, etc.)
Each Item should be sent exactly once--none should be dropped, and none
should be sent repeatedly.
I look forward to hearing your solutions at the meeting!