stuck delayed jobs

43 views
Skip to first unread message

Graham Barr

unread,
Nov 16, 2009, 5:36:59 PM11/16/09
to beansta...@googlegroups.com
I thought that this bug was fixed by the latest commits in master, but
it is not. we compiled the latest master and we are currently seeing


stats-job 311189
OK 170
id: 311189
state: delayed
pri: 20000
age: 4285
delay: 2
ttr: 30
time-left: 18446744069425

these jobs were put with a delay and they never become ready, unless
we issue a kick command

I am currently hunting for what causes this, but hopefully someone
else will find a fix first :-)

Graham.

Ask Bjørn Hansen

unread,
Nov 16, 2009, 6:21:26 PM11/16/09
to beanstalk-talk


On Nov 16, 2:36 pm, Graham Barr <gmb...@gmail.com> wrote:

> time-left: 18446744069425
>
> these jobs were put with a delay and they never become ready, unless
> we issue a kick command

... or just wait 580,000 years. You're so impatient. :-)


- ask


--
http://www.solfo.com/

Graham Barr

unread,
Nov 17, 2009, 10:33:59 AM11/17/09
to beansta...@googlegroups.com
On Mon, Nov 16, 2009 at 4:36 PM, Graham Barr <gmb...@gmail.com> wrote:
> I thought that this bug was fixed by the latest commits in master, but
> it is not. we compiled the latest master and we are currently seeing
>
>
> stats-job 311189
> OK 170
> id: 311189
> state: delayed
> pri: 20000
> age: 4285
> delay: 2
> ttr: 30
> time-left: 18446744069425

I think we found the problem. This issue is a race condition causing
an overflow.

in set_main_delay_timeout we find the first deadline_at that needs to
happen. set_main_timeout was simply subtracting the current time from
that value. however as we now use usec times it is very possible that
the calculated deadline has passed before we get to set_main_timeout,
causing an unsigned overflow and the timeout is set for some time in
the distant future. So the timeout never happens and delayed jobs are
only made active again during the timeout handler. Ask has the path in
his repository at

http://github.com/abh/beanstalkd/commit/ee671fad03701def60af908da2b76b28f4808f0a

this sets the timeout to happen in 1 usec if the deadline_at has already passed.

While doing this we also found some warnings from the binlog code when
restarting. The issue is that the unused free space allocated at the
end of the file can cause EOF warnings to happen when reading the
binlog.

The following patch fixes that issue by truncating off the free space
when closing the binlog file

http://github.com/gbarr/beanstalkd/commit/d11faa0605e451a3a15420b28e5af59f9335ea96

Graham.

Keith Rarick

unread,
Nov 17, 2009, 5:59:48 PM11/17/09
to beansta...@googlegroups.com
On Tue, Nov 17, 2009 at 7:33 AM, Graham Barr <gmb...@gmail.com> wrote:
> http://github.com/abh/beanstalkd/commit/ee671fad03701def60af908da2b76b28f4808f0a
> ...
> http://github.com/gbarr/beanstalkd/commit/d11faa0605e451a3a15420b28e5af59f9335ea96

Thanks! I'll push these along with the pause command momentarily.

kr
Reply all
Reply to author
Forward
0 new messages