Having separate queues for each stage makes it easy to scale resources for the stage that needs it. If your pending queue length is consistently nonzero for a stage, then it needs more workers.
For the asyncronous client notification, a callback can work nicely if you have low enough concurrency. Otherwise waiters can periodically poll a completion queue.
I'd make a single job which does two things: run the sizer job
synchronously, and then the face recognition job synchronously. Your
web server task only ever waits on the master job.
What I'm working on is having the web server page auto-reload every
few seconds to probe the job, and updating some eye candy to make the
user think work is really happening. Once it detects the job is done
(and all sub-jobs implicitly) it can move to the next stop of the UI.
Minimal load on the web server this way (ie, you don't tie up a whole
connection waiting for work to be done).
> What I'm working on is having the web server page auto-reload every
> few seconds to probe the job, and updating some eye candy to make the
> user think work is really happening. Once it detects the job is done
> (and all sub-jobs implicitly) it can move to the next stop of the UI.
> Minimal load on the web server this way (ie, you don't tie up a whole
> connection waiting for work to be done).
jQuery/AJAX is your friend here - it's trivially easy to do the polling in background and avoid the whole page reload.
We do something similar on http://Curate.Us - when you first ask us to clip a web page that job is async and assigned to a pool of clipping servers. Then you resize or reposition an existing clip we use gearman to assign it to a pool of render servers and run it synchronously. For example all the non-browser image manipulation on this clip of the gearman home page is done using the sync render servers http://s.tt/11sMG+
We also use gearman for logging - all the log events are sent asynchronously to a memcache backed gearman server with workers doing a bunch of de-normalization on the result so that we don't keep the web servers hanging around to write complex logs and update stats - this also lets us treat the log workers as long running batch jobs which can maintain local state for thinks like persistent DB connection, in memory copies of the browscap db etc.