On Mon, Nov 16, 2009 at 4:36 PM, Graham Barr <
gmb...@gmail.com> wrote:
> I thought that this bug was fixed by the latest commits in master, but
> it is not. we compiled the latest master and we are currently seeing
>
>
> stats-job 311189
> OK 170
> id: 311189
> state: delayed
> pri: 20000
> age: 4285
> delay: 2
> ttr: 30
> time-left: 18446744069425
I think we found the problem. This issue is a race condition causing
an overflow.
in set_main_delay_timeout we find the first deadline_at that needs to
happen. set_main_timeout was simply subtracting the current time from
that value. however as we now use usec times it is very possible that
the calculated deadline has passed before we get to set_main_timeout,
causing an unsigned overflow and the timeout is set for some time in
the distant future. So the timeout never happens and delayed jobs are
only made active again during the timeout handler. Ask has the path in
his repository at
http://github.com/abh/beanstalkd/commit/ee671fad03701def60af908da2b76b28f4808f0a
this sets the timeout to happen in 1 usec if the deadline_at has already passed.
While doing this we also found some warnings from the binlog code when
restarting. The issue is that the unused free space allocated at the
end of the file can cause EOF warnings to happen when reading the
binlog.
The following patch fixes that issue by truncating off the free space
when closing the binlog file
http://github.com/gbarr/beanstalkd/commit/d11faa0605e451a3a15420b28e5af59f9335ea96
Graham.