Deadline soon explained

3,333 views
Skip to first unread message

Erich

unread,
May 11, 2008, 12:14:00 PM5/11/08
to beanstalk-talk
Hi all,

It seems to me that a pretty big source of confusion in beanstalk is
the deadline soon error returned from the beanstalkd server. This
posting is an attempt to explain it better, with examples.

Fist of all, a quick set of terminology definitions, for common
communication:

job - the basic "unit of work" for beanstalk. Conceptually, a job
consists of two parts, data, and metadata. The data (job payload) is
what clients operate on. This can take various forms, which are
application specific, ranging from a set of commands to be batched
(like web page updates) to data to be processed (e.g. images). The
metadata portion consists of various parameters that are relevant to
beanstalk itself. These include job id, delay, time to run, etc. For
the purposes of this discussion, the two important parts of the
metadata are job id, for identifying unique jobs, and time to run (see
below)

ttr - time-to-run, the maximum time it can take a job to finish. the
timer starts when a job is sent to a remote worker. The timer is
cancelled when a release or delete command is recieved from the
server. Note the timer is in beanstalkd, and therefore can only be
cancelled by successfully recieving command. Further, the command must
come over the connection that reserved the job.

beanstalk - for the purposes of this post, I will use the term
beanstalk when discussing the protocol itself, and the overall system
using the protcol. The system includes various actors, defined below.

beanstalkd, alternatively the server - this is the program that serves
jobs from various queues. its what you download from keiths page

client: a protocol implementation, this is what converses with the
server. for example there exist a ruby client and python client

worker - this is a program which uses a client to get jobs and run
them (the consumer pattern)

A note about beanstalkd's workflow: Internally, the server keeps track
of jobs, including jobs that it has given to workers. Once a worker
has ben given a job, that job is considered "in the possession" of the
worker. There are 3 things that can happen to a job at this point.
1. the worker finishes the job, and issues a delete command. This
means that the job ('unit of work') is complete, and beanstalkd
removes it from all the internal data structures.
2. the worker cannot complete the job, and issues a release command.
The release command causes the server to requeue the job, and it will
be given to another worker.
3. the job's ttr is exceeded. Internal to the server, this is very
similar to a release command. However, it will cause errors to be
raised should the worker decide to issue commands related to the job.
All of the above events are related to the connection in the
beanstalkd. A connection is a tcp/ip connection. If a single worker
has 2 connections, beanstalkd will see that as 2 separate and
unrelated connections. Once a job is reserved it is tied to a
connection. (there is no concept of worker, or client internally)

Ok, now that we have some common terminology, let's talk about
DEADLINE_SOON (abbreviated DS, for the rest of this post). Prior to
DS, there was only one way for the client to return from a reserve
command, that was to recieve a job frrom beanstalkd. Essentially this
command was blocking for the process or thread that was runing the
connection.

This way caused problems. worker could handle multiple jobs
simultaniously, the potential for concurrency errors was high, unless
there was one connection per concurrent job. This is because of the
blocking nature reserve. If the number of jobs on the queue dropped
to 0, reserve would happily block forever. If there was a job
assosciated with a connection that was blocking on reserve, it would
just time out (exceed ttr). Since this was the case, multiple
connections were necessary, wasting resources for both the clients and
the server.

So DS was implemented. Its purpose is allow multiple jobs to be
associated with one connection, without concern for the blocking
problem mentioned above. If worker has a job, and the ttr of that job
is going to be exceeded soon, the reserve command becomes "non
blocking". That is to say, reserve will return, without a job. Now
the connection is freed up, so that jobs can be released or deleted as
necceessary. Without the blocking problem, the major concurrency
error has been removed from the connection. This of course frees up
many resources for clients, and more notably, the server.

To explain the above a bit better, here is a (sort of)psuedo-code
worker that would have experience blocking problems without DS, but
not with DS:

// assume everything in this is done in a threadsafe manner, im trying
to keep it to the minimum
// necessary to demonstrate the point.
function do_work(data, statusptr):
(the point of this program)
statusptr->finished = True

function mainloop:
jobs = statustracker
while true:
JobOrError = beanstalkconnection.reserve() //Without DS this
would block here and ttr may be
//
exceeded
if isJob(JobOrError): //isJob will return false if the
reserve() returned a DEADLINE_SOON
send_to_thread(JobOrError,
statustracker.newJob(JobOrError.jobid))
endif

foreach jobid in statustracker.finishedJobs():
beanstalkconnection.delete(jobid)
endfor


Note that in the above, ttr may be exceeded, but it will not happen
due to resource starvation from the server.

HTH
Erich

PS: if there are parts that are unclear or what not, let me know, im
hoping to put this on a wiki somesthing. More documentation is always
better :)

igrigorik

unread,
May 12, 2008, 7:21:45 AM5/12/08
to beanstalk-talk
Erich, thanks for the detailed explanation.. A few questions coming
out of it:

Let's assume you have a 'Manager' process, whose role is to talk to
beanstalk and reserve/release jobs when required. As well, there are
several workers hidden behind the Manager. The data flow is as
follows: Manager takes a job off the queue; Manager passes a job to an
empty worker; Worker begins processing; Manager is waiting for the
next worker to finish, in order to take off the next job.

Now, given the scenario you've described. Manager will only have one
connection open to beanstalk, correct?

Question: if a manager has 3 workers, all doing some computations on
separate jobs from Beanstalkd, and if one of the workers get close to
their TTR, next time Manager tries to reserve a job (possibly for
another, non-slow worker), Beanstalkd will return a DEADLINE_SOON
error. Is that correct?

I can see why it's implemented the way it is, but how do you recommend
working around the scenario described above? One long-running worker
close a deadline, essentially forces all other workers to starve for
work...

ig

Erich

unread,
May 12, 2008, 9:45:37 AM5/12/08
to beanstalk-talk
Good question. This is something I definately should have been clearer
about.The case you describe below will not starve workers here is why:

beanstalkd will only return a DEADLINE_SOON when there is a job about
to expire AND there are no jobs to send to the requesting connection.
That is, DS prevents blocking, so if there is a job on the queue, it
will be sent immediately.

Regards,
Erich

Tomas

unread,
May 12, 2008, 9:52:13 AM5/12/08
to beanstalk-talk
Yes, thanks for the detailed explanation.

One question though:

On May 11, 12:14 pm, Erich <sophac...@gmail.com> wrote:
> All of the above events are related to the connection in the
> beanstalkd. A connection is a tcp/ip connection. If a single worker
> has 2 connections, beanstalkd will see that as 2 separate and
> unrelated connections. Once a job is reserved it is tied to a
> connection. (there is no concept of worker, or client internally)
>
If the TCP connection is dropped (say while a client has a job
reserved) but the client re-establishes the connection, does
beanstalkd consider it a brand new connection unrelated to the old
one? (even though it is the same client) What happens to the reserved
job? Is it automatically released when that connection is dropped?

Erich

unread,
May 12, 2008, 10:09:20 AM5/12/08
to beanstalk-talk
Great question, which I don't know the answer to. I think Keith will
need to weigh in on this one. Im pretty sure the connection is viewed
as a new, separated connection, and that the job is released, but I'm
not certain.

Regards,
Erich

igrigorik

unread,
May 12, 2008, 11:15:44 AM5/12/08
to beanstalk-talk
Erich, few more questions. Here is the scenario we're seeing:

1) Several workers processing jobs, Beanstalk queue is empty.
2) One worker finishes, deletes the jobs; second one is still
processing
3) Worker makes a 'reserve' call, and a DEADLINE_SOON error is
returned (so as to not block the connection, and allow the other
worker to release its job -- as far as i understand)
4) We catch the exception -- what's the best behavior from here?

Currently, after catching the exception we make the reserve call once
again, and it seems to go through.. Client blocks on the reserve call,
but from this point on, there is no activity on the socket at all - no
jobs get sent to the workers. We've had a dozen of workers block for
almost a day after encountering this edge case.

Any suggestions? It sounds like there may be a bug in this edge case.

ig

Erich

unread,
May 12, 2008, 11:38:56 AM5/12/08
to beanstalk-talk
If I'm following your situation correctly:

1. After the DS is raised, I personally would have the code check for
finished jobs, and delete them.
2. This may be a bug in beanstalkd itself, perhaps Keith could comment
on that here. Ill try to come up with some code to reproduce the same
situation after work tonite.

A couple of questions for you:
1. the scenario you describe has an empty queue. I presume more jobs
get put on the queue at some point. Is thie presumption correct? (Just
making sure I understand correctly)
2. Do you have debug logging in place for your app, and if so, could
you please post a more complete record of events (like per client/
connection tracking of commands).
3. which client is it you're using again?

Regards,
Erich

igrigorik

unread,
May 12, 2008, 2:01:35 PM5/12/08
to beanstalk-talk
Erich, comments inline:

> 1. After the DS is raised, I personally would have the code check for
> finished jobs, and delete them.

That's not really a problem in our case. We're not loosing track of
jobs, the problem appears to be with a scenario where you're trying to
process multiple jobs at once, coming from same connection.

> 2. This may be a bug in beanstalkd itself, perhaps Keith could comment
> on that here. Ill try to come up with some code to reproduce the same
> situation after work tonite.
>
> A couple of questions for you:
> 1. the scenario you describe has an empty queue. I presume more jobs
> get put on the queue at some point. Is thie presumption correct? (Just
> making sure I understand correctly)

Yes.

> 2. Do you have debug logging in place for your app, and if so, could
> you please post a more complete record of events (like per client/
> connection tracking of commands).

Here is a quick excerpt from one of our workers:
Tomas: [2008-05-11 17:18:04] INFO -- Discovery: Worker finished with
status = 0
[2008-05-11 17:18:04] INFO -- Discovery: Waiting for job...
[2008-05-11 17:21:05] INFO -- Discovery: Waiting for worker 5117 to
finish until Sun May 11 17:31:05 -04
00 2008
[2008-05-11 17:31:05] WARN -- Discovery: Worker process timed out.
PID: 5117
[2008-05-11 17:31:05] INFO -- Discovery: Waiting for job...
[2008-05-11 17:31:05] ERROR -- Discovery: DEADLINE_SOON
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:171:in `check_res
p'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:183:in `read_job'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:89:in `reserve'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:213:in `send'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:213:in `method_mi
ssing'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:353:in `send'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:353:in `send_to_r
and_conn'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:337:in `wrap'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:353:in `send_to_r
and_conn'
/usr/lib/ruby/gems/1.8/gems/beanstalk-client-0.11.0/lib/beanstalk-
client/connection.rb:262:in `reserve'
./overlord.rb:73:in `grunt_work'
./overlord.rb:56:in `run'
discovery.rb:21
[2008-05-11 17:31:25] INFO -- Discovery: Waiting for job... (calls
reserve)

Problem is, after that it just hangs and sits there forever. (as you
can tell, we're using the ruby client)

ig

Keith Rarick

unread,
May 13, 2008, 12:53:31 AM5/13/08
to beansta...@googlegroups.com
On Mon, May 12, 2008 at 7:09 AM, Erich <soph...@gmail.com> wrote:
> On May 12, 8:52 am, Tomas <juuxs...@gmail.com> wrote:
>> Yes, thanks for the detailed explanation.
>>
>> One question though:
>>
>> If the TCP connection is dropped (say while a client has a job
>> reserved) but the client re-establishes the connection, does
>> beanstalkd consider it a brand new connection unrelated to the old
>> one? (even though it is the same client) What happens to the reserved
>> job? Is it automatically released when that connection is dropped?
>
> Great question, which I don't know the answer to. I think Keith will
> need to weigh in on this one. Im pretty sure the connection is viewed
> as a new, separated connection, and that the job is released, but I'm
> not certain.

Yes. If a connection is closed, any reserved jobs will be released
immediately. If the worker opens another connection, it'll be fresh
and unrelated to the first one.

kr

Keith Rarick

unread,
May 13, 2008, 1:23:06 AM5/13/08
to beansta...@googlegroups.com
On Mon, May 12, 2008 at 11:01 AM, igrigorik <igri...@gmail.com> wrote:
>
>> 2. Do you have debug logging in place for your app, and if so, could
>> you please post a more complete record of events (like per client/
>> connection tracking of commands).
>
> Here is a quick excerpt from one of our workers:

I suspect a bug in the client library. Can you reproduce this and send
a packet trace?

Before you start the server and clients, on the machine where you will
run beanstalkd, type:

$ sudo tcpdump -w /tmp/tcpdump.out -s 1024 -i any tcp port 11300

or some similar command.

Then start beanstalkd and reproduce the problem.

Then kill tcpdump and send the file /tmp/tcpdump.out. You can send it
to me directly if you don't want to post it to the list.

Thanks!

kr

Keith Rarick

unread,
May 13, 2008, 1:24:58 AM5/13/08
to beansta...@googlegroups.com
Also, let me echo the others in saying thanks for this good explanation, Erich.

kr

Tomas

unread,
May 14, 2008, 6:53:07 PM5/14/08
to beanstalk-talk
On May 13, 1:23 am, "Keith Rarick" <k...@causes.com> wrote:
>
> I suspect a bug in the client library. Can you reproduce this and send
> a packet trace?
>
So I managed to discover the source of our problem (in case it wasn't
obvious, I am working on the same project as Ilya). It stemmed from
tubes not being "re-watched" after a dropped and re-established
connection. I saw that this problem has been remedied in the latest
code base but we're still using the 0.11.0 version of the ruby
beanstalk-client gem. I guess the newest code has not been released
because for me, "gem install beanstalk-client" always gives me the
0.11.0 version. We had to manually build a gem from the latest git
source.

In any event, the problem seems fixed now though it had nothing to do
with the DEADLINE_SOON message directly. However, it very
consistently appeared right after the DEADLINE_SOON response in the
log files. So I wrote up a test script to try to find out why this
correlation seemed to manifest. Turns out that the connection between
the server and client is always dropped after a DEADLINE_SOON
response. I'm not sure as to why this is so but I included my Ruby
test script at the end of this post.

In our code, this dropped connection caused a re-connect on the next
reserve call which then appeared to hang because it was watching the
default tube and not the tubes which actually had jobs waiting. Thus
the DEADLINE_SOON was not directly related to our initial problem
though because of this odd dropped connection side-effect, it seemed
to cause the problem. Hence our multitude of questions to try to
understand it and explain this strange correlation. Thanks for all
the patience BTW.

Here's the test script:

<beanstalk.test.rb>
require 'rubygems'
require 'beanstalk-client'

@beanstalk = Beanstalk::Pool.new('localhost:11300')

@beanstalk.use('test-tube')
@beanstalk.watch('test-tube')
@beanstalk.ignore('default')

# clear the tube of ready jobs
while !@beanstalk.peek_ready.nil?
@beanstalk.reserve.delete
end

puts "Ready Jobs: #{@beanstalk.stats['current-jobs-ready']}"
puts "Reserved Jobs: #{@beanstalk.stats['current-jobs-reserved']}"

@beanstalk.put('message 1', 65536, 0, 10)

job = @beanstalk.reserve
puts "Reserved: #{job.body}"

begin
puts "Open Connections: #{@beanstalk.open_connections.length}"
puts "Waiting for job..."
puts @beanstalk.reserve
rescue
puts "ERROR: #{$!}"
puts "Open Connections: #{@beanstalk.open_connections.length}"
end
</beanstalk.test.rb>



Running with the 0.11.2 client:

[root@localhost ~]# gem list | grep beanstalk
beanstalk-client (0.11.2)


Here's the output:

[root@localhost ~]# ruby beanstalk-test.rb
connecting to beanstalk at localhost:11300
Ready Jobs: 0
Reserved Jobs: 0
Reserved: message 1
Open Connections: 1
Waiting for job...
ERROR: DEADLINE_SOON
Open Connections: 0

Keith Rarick

unread,
May 14, 2008, 7:15:47 PM5/14/08
to beansta...@googlegroups.com
On Wed, May 14, 2008 at 3:53 PM, Tomas <juux...@gmail.com> wrote:
> So I managed to discover the source of our problem
> ...

Thanks for the thorough information. I'll try to fix this connection
dropping error and release an updated client library soon.

kr

Reply all
Reply to author
Forward
0 new messages