named jobs

18 views
Skip to first unread message

Ilya Grigorik

unread,
Jun 21, 2008, 6:55:15 PM6/21/08
to beanstalk-talk
Idea / feature request: named jobs.

Instead of simply 'pushing' a job onto the queue, it would be nice to
be able to assign an external id to a job, by which the job could be
retrieved (md5?), rescheduled, or canceled. Specifically, I have a use
case where the job is scheduled to fire several days in the future,
but often during that 'quiet' period, some other event occurs which
requires to either reschedule the original job, or cancel it all
together.

How far fetched do you see this scenario, or, how hard would it be to
extend Beanstalk to support something like this? This would
essentially turn the 'job queue' in to hash, where the jobs could be
canceled, rescheduled, and so on.

ig

Dustin

unread,
Jun 21, 2008, 8:48:01 PM6/21/08
to beanstalk-talk

On Jun 21, 6:55 pm, Ilya Grigorik <i...@aiderss.com> wrote:

> How far fetched do you see this scenario, or, how hard would it be to
> extend Beanstalk to support something like this? This would
> essentially turn the 'job queue' in to hash, where the jobs could be
> canceled, rescheduled, and so on.

It already has the job id. Why don't you just store the job id
somewhere else?

Ilya Grigorik

unread,
Jun 22, 2008, 5:58:36 AM6/22/08
to beanstalk-talk
>
>   It already has the job id.  Why don't you just store the job id
> somewhere else?

Theoretically, could do that. But in a distributed system with many
workers (who consume and create jobs), this would mean yet another
database to synchronize on.

Is it possible to query beanstalk by jobid and change parameters on an
existing job? (perhaps I'm using out of date ruby client?)

ig

Keith Rarick

unread,
Jun 24, 2008, 12:32:41 AM6/24/08
to beansta...@googlegroups.com
On Sun, Jun 22, 2008 at 2:58 AM, Ilya Grigorik <il...@aiderss.com> wrote:
>> It already has the job id. Why don't you just store the job id
>> somewhere else?
>
> Theoretically, could do that. But in a distributed system with many
> workers (who consume and create jobs), this would mean yet another
> database to synchronize on.

It sounds like you are asking for a name that can be derived from the
job in some way, so it doesn't need to be stored. That's an
interesting property, but I don't know how widely useful it would be.

You could try to design the worker to recognize situations where the
job doesn't need to run. Then you wouldn't need to delete the job. You
could just schedule a second job if necessary.

Other possible beanstalk features that would help here are a search
function and content-addressability. But again, I don't know if they
would be a good idea in the broader context of beanstalk use.

Do others have opinions here?

> Is it possible to query beanstalk by jobid and change parameters on an
> existing job? (perhaps I'm using out of date ruby client?)

You can do queue.peek_job(id) and then manipulate the resulting job,
but the available operations are very limited. I'm interested in
removing some of those limitations; what operations are you interested
in? From your description it sounds like you want at least to delete
or change the delay time.

kr

Ilya Grigorik

unread,
Jun 29, 2008, 6:55:44 PM6/29/08
to beanstalk-talk
> It sounds like you are asking for a name that can be derived from the
> job in some way, so it doesn't need to be stored. That's an
> interesting property, but I don't know how widely useful it would be.
>
> You could try to design the worker to recognize situations where the
> job doesn't need to run. Then you wouldn't need to delete the job. You
> could just schedule a second job if necessary.

Let me give you a scenario which will should illustrate the benefit of
having 'named' jobs:

There is a pool of workers (processes / machines / threads .. doesn't
matter), and each one is talking to a central beanstalk server to get
the next job. Once a job is reserved and processed by the worker, the
worker himself creates at least 1 new job: repeat this task in x
minutes, or 'do these x tasks in y minutes'. Hence, there is no
central scheduler. An outside process may create new jobs on the
queue, but for all intents and purposes, once a job is on the queue,
it pretty much stays there because each time it is consumed, the
worker reschedules it.

More down to the ground: a scheduler process creates a job to spider a
web-site. The worker does the job, but at the end of the task,
reschedules it to be repeated at some point in the future.

So why named jobs? Because there are many 'job producers' (spiders in
this case), and it would be nice to have a way to create and/or query
for same job, ex: if the web-site is already scheduled, don't do
anything, or delete that job, or, etc.. Likewise, an outside producer
should have the capability to reach into the work queue and reschedule
or kill the job (due to some external circumstance).

Hope that makes sense. :)

> Other possible beanstalk features that would help here are a search
> function and content-addressability. But again, I don't know if they
> would be a good idea in the broader context of beanstalk use.

What do you have in mind for 'content-addressability'.

>
> You can do queue.peek_job(id) and then manipulate the resulting job,
> but the available operations are very limited. I'm interested in
> removing some of those limitations; what operations are you interested
> in? From your description it sounds like you want at least to delete
> or change the delay time.

Perfect, exactly what I'm looking for, with the exception of having to
know the beanstalk generated ID for each job. If ID was a user
specified MD5 string, that would be ideal. ;)

ig

Keith Rarick

unread,
Jun 29, 2008, 9:27:39 PM6/29/08
to beansta...@googlegroups.com
On Sun, Jun 29, 2008 at 3:55 PM, Ilya Grigorik <il...@aiderss.com> wrote:
> Perfect, exactly what I'm looking for, with the exception of having to
> know the beanstalk generated ID for each job. If ID was a user
> specified MD5 string, that would be ideal. ;)

That's almost what I meant by content-addressability, except that the
hash would be server-generated. Then the client wouldn't have to worry
about picking a unique name.

Upon further reflection, though, I think it's better if the client
picks the name. Those clients that don't care about the name can just
ignore this feature; those that do care will probably have little
difficulty selecting a naming scheme. Also, a server-generated hash
for every job would make life difficult if the clients want to submit
several jobs with the same body.

I'll put this on my list, but it'll be a while before I can get to it.

If anyone else is interested in implementing it, I'd be happy to
accept a patch. My current protocol design plan is to add a single
command:

put-named <name> <pri> <delay> <ttr> <bytes>\r\n
<data>\r\n

which is the same as the usual put command, but also gives the job a
name. If the name is taken, this would respond:

NAME_TAKEN_BY <id>\r\n

where <id> is the already-existing job's id.

If a job has been given a name, the name can be used in place of its
id to refer to the job anywhere in the protocol.

Also, the job name (if any) should be added to the job stats response.

kr

Ilya Grigorik

unread,
Jun 30, 2008, 9:33:37 AM6/30/08
to beanstalk-talk

> I'll put this on my list, but it'll be a while before I can get to it.
>
> If anyone else is interested in implementing it, I'd be happy to
> accept a patch. My current protocol design plan is to add a single
> command:
>
> put-named <name> <pri> <delay> <ttr> <bytes>\r\n
> <data>\r\n
>
> which is the same as the usual put command, but also gives the job a
> name. If the name is taken, this would respond:
>
> NAME_TAKEN_BY <id>\r\n
>
> where <id> is the already-existing job's id.
>
> If a job has been given a name, the name can be used in place of its
> id to refer to the job anywhere in the protocol.
>
> Also, the job name (if any) should be added to the job stats response.

That sounds great, looking forward to it!

ig
Reply all
Reply to author
Forward
0 new messages