beanstalk halted, strange system state

16 views
Skip to first unread message

Steve

unread,
Dec 16, 2009, 7:19:45 PM12/16/09
to beanstalk-talk
My beanstalk stopped responding. I ran gdb and got the following
stack trace:

#0 rehash (j=<value optimized out>) at job.c:85
#1 store_job (j=<value optimized out>) at job.c:57
#2 0x0804c60e in make_job_with_id (pri=65536, delay=0, ttr=120000000,
body_size=316, tube=0x95c4e80,
id=0) at job.c:145
#3 0x0804f2c5 in dispatch_cmd (c=<value optimized out>) at prot.c:
1115
#4 0x080506ea in do_cmd (fd=25, which=2, c=0x95c31c0) at prot.c:1494
#5 h_conn_data (fd=25, which=2, c=0x95c31c0) at prot.c:1532
#6 h_conn (fd=25, which=2, c=0x95c31c0) at prot.c:1643
#7 0xb778a248 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#8 0xb778a3c9 in event_loop () from /usr/lib/libevent-1.4.so.2
#9 0xb778a3ee in event_dispatch () from /usr/lib/libevent-1.4.so.2
#10 0x08049aab in main (argc=1, argv=0xbff4da54) at beanstalkd.c:321

I then *restarted* beanstalk, and it wouldn't finish launching. I got
this stack trace:

#0 0xb7773416 in __kernel_vsyscall ()
#1 0xb76c0738 in epoll_wait () from /lib/tls/i686/nosegneg/libc.so.6
#2 0xb7768792 in ?? () from /usr/lib/libevent-1.4.so.2
#3 0xb775af82 in event_base_loop () from /usr/lib/libevent-1.4.so.2
#4 0xb775b3c9 in event_loop () from /usr/lib/libevent-1.4.so.2
#5 0xb775b3ee in event_dispatch () from /usr/lib/libevent-1.4.so.2
#6 0x08049aab in main (argc=1, argv=0xbfdffbd4) at beanstalkd.c:321

The output from top was bizarre too - 0% cpu utilization and 2.38 load
avg... and that wasn't just a timing fluke - I checked and it was
reporting similar numbers for 5 minutes.

It only worked again after rebooting the system. I'm running ubuntu
9.10 on ec2 with 2.6.31-302-ec2.

Where do I start?

Keith Rarick

unread,
Dec 16, 2009, 8:18:27 PM12/16/09
to beansta...@googlegroups.com
On Wed, Dec 16, 2009 at 4:19 PM, Steve <sfar...@gmail.com> wrote:
> My beanstalk stopped responding.  I ran gdb and got the following
> stack trace:

This stack trace looks pretty reasonable. The rehash function is one
of the few that do long-running (i.e. on the order of milliseconds)
work. A load average of 2.38 suggests that some other processes were
trying to run and beanstalkd couldn't get scheduled. Did you have a
high iowait? That would explain a lot.

> I then *restarted* beanstalk, and it wouldn't finish launching.  I got
> this stack trace:

This stack trace is normal for a running idle beanstalkd. I'm not sure
exactly what you mean by "finish launching". Could you describe the
behavior you saw?

If rebooting solved the problem, maybe something else is wrong.

kr

Steve

unread,
Dec 16, 2009, 9:30:05 PM12/16/09
to beanstalk-talk

On Dec 16, 5:18 pm, Keith Rarick <k...@xph.us> wrote:


> On Wed, Dec 16, 2009 at 4:19 PM, Steve <sfarr...@gmail.com> wrote:
> > My beanstalk stopped responding.  I ran gdb and got the following
> > stack trace:
>
> This stack trace looks pretty reasonable. The rehash function is one
> of the few that do long-running (i.e. on the order of milliseconds)
> work. A load average of 2.38 suggests that some other processes were
> trying to run and beanstalkd couldn't get scheduled. Did you have a
> high iowait? That would explain a lot.

no processes, including beanstalk, were registering as using any cpu
according to top. Here's the relevant line from top: Cpu(s): 0.2%us,
0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

I also ran the stack trace multiple times and it was in exactly the
same place.

>
> > I then *restarted* beanstalk, and it wouldn't finish launching.  I got
> > this stack trace:
>
> This stack trace is normal for a running idle beanstalkd. I'm not sure
> exactly what you mean by "finish launching". Could you describe the
> behavior you saw?

Well, clients were unable to connect. They would just hang for
multiple minutes before I canceled them.

>
> If rebooting solved the problem, maybe something else is wrong.

Well, yeah, that's totally possible. Beanstalk seemed to have the
weirdest behavior - other services responded, albeit very slowly.

If it happens again, is there anything I should look for?

Reply all
Reply to author
Forward
0 new messages