General approach for multiple worker types

235 views
Skip to first unread message

Jason Judge

unread,
Dec 5, 2012, 7:01:11 AM12/5/12
to beansta...@googlegroups.com
Taking as a starting point a PHP framework, with multiple types of workers (doing different things) all within the framework, what is the general approach to structuring the architecture?

I'm assuming a separate pipe for each type of task will be useful (e.g. one for emails, one for image processing, one for handling file uploads). What about the workers? Does it make sense to have one worker entry point that listens on all the pipes and then calls up appropriate methods within the framework to handle data appropriate to the pipe it comes from? This is how cron is often handled on frameworks: one job gets called every minute, and that job decides through the framework what it is subsequently going to run. Multiple workers would be running, but they are all identical (although some pipes may have more workers listening than other pipes, just to keep the data flow up where parallel jobs make sense).

Or:

Are jobs created on a more ad-hoc basis, each with their own entry point, each with their own process name in supervisord so they can be managed separately?

I can see pros and cons in both methods - the first being easier to manage, but with a higher development overhead. The latter being easy to throw new workers into the mix, but could easily get out of hand if they grow in number and all end up with their own unique ways of logging, error handling, etc,

How does it work in general? I realise both approaches are probably "correct", but just wondering what people [like to] do and what generally works?

-- Jason

Keith Rarick

unread,
Dec 5, 2012, 7:59:30 PM12/5/12
to beansta...@googlegroups.com
On Wed, Dec 5, 2012 at 4:01 AM, Jason Judge <jason....@gmail.com> wrote:
> I'm assuming a separate pipe for each type of task will be useful

All else being equal, it's better to put all jobs through a single tube.
Each job should be a complete description of the work to be done,
so that a worker can pick it up and run it without, for example,
knowing which tube it came from.

If you have two workers with different capabilities (say, they're
written in different programming languages with different codebases
and process disjoint subsets of jobs), that would be a good reason
to have a tube for each type of worker.

Jason Judge

unread,
Dec 5, 2012, 8:58:57 PM12/5/12
to beansta...@googlegroups.com
That surprises me. Maybe this is an edge case, but I could imagine that some jobs need to be processed in order, by a single worker that handles one job at a time. Other jobs can be handled in parallel with a big bunch of workers waiting to process jobs as quickly as they arrive. This seems like two separate tubes to me - one with a single worker and one with a load of workers standing by.

That doesn't contradict with what you are suggesting - that the data alone can tell a worker enough about what it needs to do with that data. The tubes can then just be set up to define which jobs are handled in parallel and which are handled serially.

Back to the original question, if everything goes through one pipe, then your approach would be to have just one type of worker for all jobs, and that worker would inspect the data and pass it on to whatever framework method is needs to process it. This is the way I am leaning at the moment - keeping the OS configuration simple and building any complexity into the application.

A standard wrapper or envelope for jobs pushed onto a tube would need to be defined for the application, so the worker has and "address label" for passing the job data on to the correct method or process.

-- Jason

Keith Rarick

unread,
Dec 5, 2012, 9:05:25 PM12/5/12
to beansta...@googlegroups.com
On Wed, Dec 5, 2012 at 5:58 PM, Jason Judge <jason....@gmail.com> wrote:
> That surprises me. Maybe this is an edge case, but I could imagine that some
> jobs need to be processed in order, by a single worker that handles one job
> at a time. Other jobs can be handled in parallel with a big bunch of workers
> waiting to process jobs as quickly as they arrive. This seems like two
> separate tubes to me - one with a single worker and one with a load of
> workers standing by.

Yes, that would be an example of having different types of workers
with fundamentally different capabilities.

However, I'd advise you not to rely on having a single, serial worker
process for jobs that require mutual exclusion. It's more reliable to
coordinate things like that separately, through a database or locking
service, and to express reify dependencies so they can be checked
explicitly. This also gives you the freedom to run however many
workers makes sense and possibly get more parallelism.

Jason Judge

unread,
Dec 6, 2012, 5:35:06 AM12/6/12
to beansta...@googlegroups.com

Thanks, I'll consider that. I've not found a good way to create locks in PHP yet, but that's another adventure to explore.

The bottom line, then, is to get stuff through the pipes as quickly as possible, pulling them from the pipes with a single worker (if possible) and handle the back end processing and parallelism and locking through that worker.

-- Jason

Chad Kouse

unread,
Dec 6, 2012, 2:52:52 PM12/6/12
to beansta...@googlegroups.com
Just curious, why is it better to put all jobs through a single tube?

-- 
Chad Kouse

--
You received this message because you are subscribed to the Google Groups "beanstalk-talk" group.
To post to this group, send email to beansta...@googlegroups.com.
To unsubscribe from this group, send email to beanstalk-tal...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beanstalk-talk?hl=en.

Keith Rarick

unread,
Dec 6, 2012, 6:18:53 PM12/6/12
to beansta...@googlegroups.com
On Thu, Dec 6, 2012 at 11:52 AM, Chad Kouse <chad....@gmail.com> wrote:
> Just curious, why is it better to put all jobs through a single tube?

In queueing theory terms, a single queue with N servers is more
efficient than N queues with one server each. In beanstalkd terms
a "queue" is a tube and a "server" is a worker. Now, since a
beanstalkd worker can listen on multiple tubes, having two tubes
is just as good as one tube if all workers are listening on both tubes.
The thing to avoid (unless of course there's a good reason for it) is
some workers on just one tube and other workers on a different tube.

I have yet to find a good explanation online to convey intuition for
why this is so. It would make a good blog post.

Jason Judge

unread,
Dec 6, 2012, 7:00:48 PM12/6/12
to beansta...@googlegroups.com

One thing I love about beanstalkd is its flexibility. For every approach that offers an efficiency advantage, there will be real-life exceptions where you want to organise the tubes and workers in a different way, and beanstalkd allows for so many different ways of configuring the workers and tubes to achieve different results.

This is what led to my original question: what do people *actually* do, and what advantages and disadvantages have they found in those approaches. There are always little giants whose shoulders I aim to stand on ;-)
Reply all
Reply to author
Forward
0 new messages