Possible bug when inserting 179366 jobs and then deleting them

110 views
Skip to first unread message

ui

unread,
May 1, 2010, 3:41:31 PM5/1/10
to beanstalk-talk
Summary: Server returns internal err if we restart, put 179366 jobs,
then try to reserve+delete those same jobs. This happens only with -
b

I hope what follows isn't too much info. Pls. yell if I can be of any
assistance.

Beanstalkd-1.4.4 with -b

If we stop beanstalkd, wipe the binlog folder, restart beanstalkd
then run a script that adds 179366 jobs and then run another that
reserves and deletes all of those jobs, the last delete returns an
INTERNAL ERROR.

I see this: "/tmp/beanstalkd-1.4.4/beanstalkd: binlog.c:473 in
binlog_write_job: failed to use next binlog" . Prot.c:1302 returns
the err ( r == 0 )
pheanstalk throws an exception.


It takes at least that many jobs (on this linux machine) ; adding and
then deleting 179365 jobs doesnt seeem to do it.

The 179366th job does seem to get erased.

We invoke beanstalkd like this
beanstalkd -l 74.54.253.178 -p 11300 -b /tmp/beanstalkd.binlogs/

the PUSH script that adds the jobs to the test tube:


<?php
require_once('pheanstalk_init.php');
$pheanstalk = new Pheanstalk('74.54.253.178');
$MAX=179366; // smallest # which will throw exception on last item's
delete in drain.php
$pheanstalk->useTube('testtube');
$job = str_repeat("0123456789",1024);
for ($i=1; $i <= $MAX;$i++ )
{
$pheanstalk->put("job $i payloadx\n");
}
?>


the DRAIN script that triggers the exception on the last delete
<?php
require_once('pheanstalk_init.php');
$pheanstalk = new Pheanstalk('74.54.253.178');

$job = $pheanstalk
->watch('testtube')
->ignore('default');

$job = $pheanstalk->reserve(0);

while ($job )
{

$last_del = $job->getID() . "\n";

$pheanstalk->delete($job);

$job = $pheanstalk->reserve(0);

if (!$job ) break;



}

echo "last_del=$last_del\n";

?>

I can replicate under gdb. FWIW,

Starting program: /tmp/beanstalkd-1.4.4/beanstalkd -l 74.54.253.178 -p
11300 -b /tmp/beanstalkd.binlogs/
[Thread debugging using libthread_db enabled]
^C[New Thread 0xb7d446b0 (LWP 3967)]

Program received signal SIGINT, Interrupt.
[Switching to Thread 0xb7d446b0 (LWP 3967)]
0xb7f16424 in __kernel_vsyscall ()
(gdb) b prot.c:1299 // I did this just before I ran the the
final delete
Breakpoint 1 at 0x8050f46: file prot.c, line 1299.
(gdb) cont
Continuing.

Breakpoint 1, dispatch_cmd (c=0x914b8b8) at prot.c:1299
1299 r = binlog_write_job(j);
(gdb) p j
$1 = (job) 0xa8b3950
(gdb) p *j
$2 = {id = 179366, pri = 1024, delay = 0, ttr = 60000000, body_size =
22, created_at = 1272741574555196, deadline_at = 1272741908163805,
reserve_ct = 1, timeout_ct = 0, release_ct = 0, bury_ct = 0, kick_ct =
0, state = 0 '\0', pad = "\000\000\000\000\000", tube = 0x914bb78,
prev = 0xa8b3950, next = 0xa8b3950, ht_next = 0x0, heap_index = 0,
binlog = 0x914afd0, reserver = 0x0, reserved_binlog_space = 73, body
= 0xa8b39bc "job 179366 payloadx\n\r\n"}
(gdb) bt
#0 dispatch_cmd (c=0x914b8b8) at prot.c:1299
#1 0x0805237e in do_cmd (c=0x914b8b8) at prot.c:1586
#2 0x0805250f in h_conn_data (c=0x914b8b8) at prot.c:1624
#3 0x08052a33 in h_conn (fd=10, which=2, c=0x914b8b8) at prot.c:1735
#4 0xb7f02e60 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#5 0xb7f03039 in event_loop () from /usr/lib/libevent-1.4.so.2
#6 0xb7f0305e in event_dispatch () from /usr/lib/libevent-1.4.so.2
#7 0x0804a018 in main (argc=7, argv=0xbff30474) at beanstalkd.c:321
(gdb) next
/tmp/beanstalkd-1.4.4/beanstalkd: binlog.c:473 in binlog_write_job:
failed to use next binlog
1300 job_free(j);
(gdb) p r
$3 = 0
(gdb) p errno
Cannot access memory at address 0x6fac068c
(gdb) next
1302 if (!r) return reply_serr(c, MSG_INTERNAL_ERROR);
(gdb) p r
$4 = 0


output of stats command before last delete

uday@dev:~/devel/pda-pheanstalk-37c1842$ telnet 74.54.253.178 11300
Trying 74.54.253.178...
Connected to 74.54.253.178.
Escape character is '^]'.
stats
OK 827
---
current-jobs-urgent: 0
current-jobs-ready: 1
current-jobs-reserved: 0
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 179366
cmd-peek: 0
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 0
cmd-reserve-with-timeout: 179365
cmd-delete: 179365
cmd-release: 0
cmd-use: 1
cmd-watch: 1
cmd-ignore: 1
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 3
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 179366
max-job-size: 65535
current-tubes: 2
current-connections: 1
current-producers: 0
current-workers: 0
current-waiting: 0
total-connections: 5
pid: 3967
version: 1.4.4
rusage-utime: 9.252578
rusage-stime: 21.345334
uptime: 271
binlog-oldest-index: 2
binlog-current-index: 4
binlog-max-size: 10485760



output of stats command after last delete


uday@dev:/tmp/beanstalkd-1.4.4$ telnet 74.54.253.178 11300
Trying 74.54.253.178...
Connected to 74.54.253.178.
Escape character is '^]'.
stats
OK 827
---
current-jobs-urgent: 0
current-jobs-ready: 0
current-jobs-reserved: 0
current-jobs-delayed: 0
current-jobs-buried: 0
cmd-put: 179366
cmd-peek: 0
cmd-peek-ready: 0
cmd-peek-delayed: 0
cmd-peek-buried: 0
cmd-reserve: 0
cmd-reserve-with-timeout: 179366
cmd-delete: 179366
cmd-release: 0
cmd-use: 1
cmd-watch: 2
cmd-ignore: 2
cmd-bury: 0
cmd-kick: 0
cmd-touch: 0
cmd-stats: 4
cmd-stats-job: 0
cmd-stats-tube: 0
cmd-list-tubes: 0
cmd-list-tube-used: 0
cmd-list-tubes-watched: 0
cmd-pause-tube: 0
job-timeouts: 0
total-jobs: 179366
max-job-size: 65535
current-tubes: 2
current-connections: 2
current-producers: 0
current-workers: 1
current-waiting: 0
total-connections: 7
pid: 3967
version: 1.4.4
rusage-utime: 9.252578
rusage-stime: 21.349334
uptime: 433
binlog-oldest-index: 4
binlog-current-index: 4
binlog-max-size: 10485760




output of drain script
uday@dev:~/devel/pda-pheanstalk-37c1842$ php drain.php

Fatal error: Uncaught exception
'Pheanstalk_Exception_ServerInternalErrorException' with message
'INTERNAL_ERROR in response to 'delete 179366'' in /home/uday/devel/
pda-pheanstalk-37c1842/classes/Pheanstalk/Connection.php:91
Stack trace:
#0 /home/uday/devel/pda-pheanstalk-37c1842/classes/
Pheanstalk.php(345): Pheanstalk_Connection-
>dispatchCommand(Object(Pheanstalk_Command_DeleteCommand))
#1 /home/uday/devel/pda-pheanstalk-37c1842/classes/Pheanstalk.php(63):
Pheanstalk->_dispatch(Object(Pheanstalk_Command_DeleteCommand))
#2 /home/uday/devel/pda-pheanstalk-37c1842/drain.php(21): Pheanstalk-
>delete(Object(Pheanstalk_Job))
#3 {main}
thrown in /home/uday/devel/pda-pheanstalk-37c1842/classes/Pheanstalk/
Connection.php on line 91

--
You received this message because you are subscribed to the Google Groups "beanstalk-talk" group.
To post to this group, send email to beansta...@googlegroups.com.
To unsubscribe from this group, send email to beanstalk-tal...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/beanstalk-talk?hl=en.

Keith Rarick

unread,
May 1, 2010, 4:18:15 PM5/1/10
to beansta...@googlegroups.com
On Sat, May 1, 2010 at 12:41 PM, ui <uiva...@gmail.com> wrote:
> Summary:  Server returns internal err if we restart, put 179366 jobs,
> then try to reserve+delete those same jobs.   This happens only
> with -b

That is a bug. At first glance, it looks like something to do with our
code for allocating binlog space.

> I hope what follows isn't too much info. Pls. yell if I can be of any
> assistance.

It's excellent info! Thanks for being so thorough. Here's a ticket to
track the bug:

http://github.com/kr/beanstalkd/issues/issue/38

kr

Keith Rarick

unread,
May 10, 2010, 5:46:34 PM5/10/10
to beansta...@googlegroups.com
This is fixed in git along with a test case. Would you mind trying it
out? If someone can confirm the fix, I'll make another release.
Reply all
Reply to author
Forward
0 new messages