Total job timeout

67 views
Skip to first unread message

Dustin

unread,
May 13, 2008, 2:12:45 AM5/13/08
to beanstalk-talk

Is there any thought into allowing a total job timeout? Something
like ttr, but a ttl? I'd like to add jobs to a queue that would
automatically get cleaned up if nobody's paying attention to them.

I suppose it's possible to build a client that wanders around the
queue and cleans stuff up, but it'd be great to just have junk drop
out after a while.

Even a global job ttl might be good. Something I could set to a day
or so and just have jobs drop out due to age.

Keith Rarick

unread,
May 13, 2008, 6:33:24 PM5/13/08
to beansta...@googlegroups.com
On Mon, May 12, 2008 at 11:12 PM, Dustin <dsal...@gmail.com> wrote:
> Is there any thought into allowing a total job timeout?

How many people want this?

I'm reluctant to add a feature for a task that can be accomplished
with existing features.

kr

Erich

unread,
May 13, 2008, 7:22:19 PM5/13/08
to beanstalk-talk
I like the idea of a reserve timeout. I like better tho, the idea of
a "cancel-reserve" command. The idea is that while a connection is mid
reserve, it sends the appropriae string, and it cancels the reserve
for you. This would allow for timeouts to be built into clients,
without too much hassle. It would also allow for better connection
usage. Im thinking of a case were a client program is both producer
and consumer. If it is trying to reserve a job, and also comes up with
a job to put on the queue, this is possible now.

The cancel-reserve concept would require a bit of work on the internal
handling of connections for beanstalkd. However, the same mechanism
that is needed for it can make the following behavior better defined
(as of now it sort of works, which i discovered by accident the other
day, but it should be made to either work in a well defined way, or
not work at all):

worker does the following (beanstalkd queue is empty)
reserve
use bar
put foo ....

some time after which a producer puts a job on the queue.

The worker then gets the job. After that, the worker associates to
tube bar, and puts a job.

I think that beanstalkd should guard against such backlog of
commands. Reserve should be truely blocking, and any command/data
that is sent on a connection mid reserve should be a well defined
error state.

Anyway, i think the above makes sense, but Im kind of out of it today,
so if it needs clarifying, Ill be glad to tomorrow after enough
sleep :)

Regards,
Erich

On May 13, 5:33 pm, "Keith Rarick" <k...@causes.com> wrote:

Dustin

unread,
May 13, 2008, 7:38:26 PM5/13/08
to beanstalk-talk

Perhaps a recipe would be good. I'm imagining a couple of processes
that intend to meet up in a tube somewhere, but one process stands the
other up and the job gets wedged.

I can't really tell the difference between ``old job that needs to
get deleted'' and ``new job that would get processed if you didn't
block it.'' I can reserve/release, but it doesn't seem hard to
contrive a situation where a job could get overlooked.

On May 13, 3:33 pm, "Keith Rarick" <k...@causes.com> wrote:

Erich

unread,
May 13, 2008, 8:15:56 PM5/13/08
to beanstalk-talk
Hrm, looks like I totally misread this thread. The stuff I said
previously still is on my mind, but for this discusion:

+1 for total job timeout. Its not real important to me, but I kinda
like it.

Keith Rarick

unread,
May 14, 2008, 2:27:37 AM5/14/08
to beansta...@googlegroups.com
On Tue, May 13, 2008 at 4:22 PM, Erich <soph...@gmail.com> wrote:
> I like the idea of a reserve timeout. I like better tho, the idea of
> a "cancel-reserve" command.

That is a really neat idea. I could implement it without too much trouble.

I'm wary, though, because I think it would be hard to explain. There
would have to be a lot of conditions and caveats and it would require
an undue amount of verbiage in protocol.txt. (See also
http://www.python.org/dev/peps/pep-0020/ number 17.)

It would also be a special case in the command pipelining behavior,
which is otherwise consistent. (See also number 8 in that list
mentioned above.)

> ...


> (as of now it sort of works, which i discovered by accident
> the other day, but it should be made to either work in a
> well defined way, or not work at all):
>

> [snip example]
> ...

It's well-defined now, although the protocol doc could be more
explicit on this point. This is command pipelining -- it just looks
weird in this case because the reserve command takes so long to run.

kr

Keith Rarick

unread,
May 14, 2008, 2:46:02 AM5/14/08
to beansta...@googlegroups.com
On Tue, May 13, 2008 at 4:38 PM, Dustin <dsal...@gmail.com> wrote:
> ...

> I can't really tell the difference between ``old job that needs to
> get deleted'' and ``new job that would get processed if you didn't
> block it.'' I can reserve/release, but it doesn't seem hard to
> contrive a situation where a job could get overlooked.

This still seems like a somewhat special-purpose scenario. For
instance, at causes.com we hardly ever want to delete old jobs; even
though they are old we want to make sure they run.

I like that beanstalkd is pretty dumb. The fancy, smart features
belong in the clients.

That having been said I am not opposed to a ttl feature. It's simple
enough to design and explain. I doubt I'd be able to get to it for a
while. I would happily accept a patch for this feature if you need it
sooner. :)

kr

Erich

unread,
May 14, 2008, 11:10:27 AM5/14/08
to beanstalk-talk
Keith,

Im cool with this, although Im tempted to rebut with the next item on
the list:
"Although practicality beats purity."

In either case, I think the behaviour needs to be explicity defined,
as I've been ignoring it because I was unsure if it is expected and
correct, or just an implementation artifact.

Regards,
Erich

On May 14, 1:27 am, "Keith Rarick" <k...@causes.com> wrote:
> On Tue, May 13, 2008 at 4:22 PM, Erich <sophac...@gmail.com> wrote:
> > I like the idea of a reserve timeout. I like better tho, the idea of
> > a "cancel-reserve" command.
>
> That is a really neat idea. I could implement it without too much trouble.
>
> I'm wary, though, because I think it would be hard to explain. There
> would have to be a lot of conditions and caveats and it would require
> an undue amount of verbiage in protocol.txt. (See alsohttp://www.python.org/dev/peps/pep-0020/number 17.)

Andrew Betts

unread,
Dec 24, 2011, 6:02:15 AM12/24/11
to beansta...@googlegroups.com
Can I suggest a specific use case where this would be helpful?  I'm designing a tool right now where users submit a search, this spawns a number of concurrent backend processes which assemble output, and then as each piece of output is ready, it is sent to the client to update the results page.  So my architecture looks something like:

1. Request received from client, given a unique ID, and a copy for each required output is put on the 'queued requests' tube
2. Workers take those requests, run the necessary process, and put the output on the 'search_12345' tube, where 12345 is the unique ID of the search
3. The client has meanwhile made a long-poll request to the search which sits waiting for stuff to appear on the search_12345 tube, and when it appears, takes it and outputs it to the client

This plan works well because although the various parts of the output may be ready in an unpredictable order, the client only needs one long poll connection (at a time) to get the output from any of them.  The problem arises when a client sends the initial search result, but doesn't follow up with the long poll request to get the deferred outputs.  In that case, the outputs are still produced, but they sit in the search-specific tube for ever.

If they were able to time out, then the system would be self-cleaning.  But as it stands, I don't think there's an easy solution, because I don't think it's possible to watch a tube without knowing its name.  My current solution is a cron job that does list-tubes, makes a list of all those that match the pattern 'search_XXXXX', and watches all of them, so it can make an inspection of all the jobs in the queues and delete any that are more than 10 mins old, while releasing any that are still within their timeout.

A TTL on the job would make this use case far easier, I would think.

Andrew

Keith Rarick

unread,
Jan 2, 2012, 10:25:32 PM1/2/12
to beansta...@googlegroups.com
On Sat, Dec 24, 2011 at 3:02 AM, Andrew Betts <andrew...@gmail.com> wrote:
> The problem
> arises when a client sends the initial search result, but doesn't follow up
> with the long poll request to get the deferred outputs.  In that case, the
> outputs are still produced, but they sit in the search-specific tube for
> ever.

I agree this is a problem. It will be solved by mailbox tubes:
https://github.com/kr/beanstalkd/issues/3

This solution lets us avoid introducing another timeout.
If nothing is listening on the mailbox tube, the job will
be deleted immediately.

kr

Andrew Betts

unread,
Jan 25, 2012, 4:16:10 AM1/25/12
to beansta...@googlegroups.com
Hi Keith,

I think mailbox tubes are a good idea (not sure about the name though - it seems that in a mailbox, messages exist until they are collected, which is exactly the opposite of what a mailbox tube is in your definition, but maybe this is a common term I'm just not aware of).  

However, I think one of the risks with mailbox tubes in my use case is that mailbox tubes assume things happen in a particular order - the consumer must be connected to the tube before the publisher puts anything on it (did I understand that correctly?).  The whole point of message queues is to decouple things, and so typically this kind of order of operation is not guaranteed.  Indeed, in my use case, the webpage submits its search, receives the ID in response, and then initiates the long-poll for results.  There is a window there when the first result might be available but the long poll has not yet connected to the tube to receive it.

There's also a second problem if there is more than one message to be sent to the mailbox tube - as there is in my use case.  The consumer sees the first one and handles it, delivering the response back to the browser, and then a second long poll is made to gather more results.  In the interim, the result queue is briefly unwatched.

Finally, there are other use cases that mailbox tubes don't solve, like dev servers - we have a dev server that I'd just discovered has 1.5 million jobs queued on it, because no-one is running the workers.  A job timeout would solve this use case as well as the search results one.

Hope this is of some help.

Keith Rarick

unread,
Jan 26, 2012, 9:43:41 PM1/26/12
to beansta...@googlegroups.com
This is one of those things that's probably better implemented
by a client, as you've already done. Significant new features
in beanstalkd should generally be added only if they enable
things that simply aren't possible otherwise (and even then I
try to set a high bar on the benefit/cost ratio).

kr

Reply all
Reply to author
Forward
0 new messages