[rabbitmq-discuss] RabbitMQ crash caused by channel leak?

65 views
Skip to first unread message

Ian Ragsdale

unread,
Mar 11, 2010, 3:28:48 PM3/11/10
to rabbitmq...@lists.rabbitmq.com
Hello all. I've had a couple of RabbitMQ crashes recently, with a backtrace that looks like this:

=CRASH REPORT==== 11-Mar-2010::08:13:21 ===
crasher:
pid: <0.16638.1>
registered_name: []
exception error: a system limit has been reached
in function spawn/3
called as spawn(rabbit_writer,mainloop,
[{wstate,#Port<0.451>,6539,131072}])
in call from rabbit_writer:start/3
in call from rabbit_reader:send_to_new_channel/3
in call from rabbit_reader:handle_frame/4
in call from rabbit_reader:handle_input/3
in call from rabbit_reader:mainloop/3
in call from rabbit_reader:start_connection/3
initial call: rabbit_reader:init(<0.396.0>)
ancestors: [rabbit_tcp_client_sup,rabbit_sup,<0.109.0>]
messages: [{'EXIT',<0.31524.15>,
{system_limit,
[{erlang,spawn_opt,
[proc_lib,init_p,
[<0.31524.15>,[],gen,init_it,
[gen_server2,<0.31524.15>,<0.31524.15>,
rabbit_channel,
[6528,<0.16638.1>,<0.31519.15>,
<<"stormcloud">>,<<"/">>],
[]]],
[link]]},
{proc_lib,start_link,5},
{rabbit_channel,start_link,5},
{rabbit_framing_channel,'-start_link/2-fun-0-',2}]}},

<large amount of backtrace, messages, and a huge dictionary full of channels removed>

trap_exit: true
status: running
heap_size: 514229
stack_size: 23
reductions: 68039905
neighbours:


Based on the number of channels in the logged dictionary, I'm guessing I hit a limit on the number of channels, which I'm guessing was the cause of the crash. Does this sound like a likely cause? I've identified and removed the code that was creating all the channels, but I'm concerned that it appears to be so easy for a single rogue client to take down the entire server. Is there a way for me to prevent this?

Thanks,
Ian

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Matthew Sackman

unread,
Mar 12, 2010, 6:37:09 AM3/12/10
to Ian Ragsdale, rabbitmq...@lists.rabbitmq.com
On Thu, Mar 11, 2010 at 02:28:48PM -0600, Ian Ragsdale wrote:
> =CRASH REPORT==== 11-Mar-2010::08:13:21 ===
> crasher:
> pid: <0.16638.1>
> registered_name: []
> exception error: a system limit has been reached
> in function spawn/3

Yes, I think you've probably run out of processes. The default process
limit in erlang is 32768, but this can be raised up to 134217727. To set
this for Rabbit, in your rabbitmq.conf file (which under Linux will be
at /etc/rabbitmq/rabbitmq.conf) add:

SERVER_START_ARGS="+P 1000000"

Which will get you 1000,000 processes. Note that the additional
accounting done by the erlang VM will mean that you'll see more memory
used.

> Based on the number of channels in the logged dictionary, I'm guessing I hit a limit on the number of channels, which I'm guessing was the cause of the crash. Does this sound like a likely cause? I've identified and removed the code that was creating all the channels, but I'm concerned that it appears to be so easy for a single rogue client to take down the entire server. Is there a way for me to prevent this?

Mmmm, that's a good point, and no, there's no such knob. In
rabbit_reader.erl, at around line 589, you should find the following
code:

ok = send_on_channel0(
Sock,
#'connection.tune'{channel_max = 0,
%% set to zero once QPid fix their negotiation
frame_max = 131072,
heartbeat = 0}),

That channel_max = 0 implies no limit on the number of channels (well,
it's a 16-bit uint, so 65536). If you set that to a lower number then
well-behaved clients may take note of that. However it's still not
enforced at the server. For that, you'll need to either add to the last
head of handle_frame in rabbit_reader (at the bottom, before it calls
send_to_new_channel). The difficulty here is that channels don't need to
be allocated in the correct order and can be reused. Gaps can appear etc
etc, so it's a bit harder than just looking at the channels number, you
actually have to count. I'll raise a bug for this and we'll fix it in
due course.

Matthew

Matthew Sackman

unread,
Mar 12, 2010, 6:40:59 AM3/12/10
to Ian Ragsdale, rabbitmq...@lists.rabbitmq.com
On Fri, Mar 12, 2010 at 11:37:09AM +0000, Matthew Sackman wrote:
> > Based on the number of channels in the logged dictionary, I'm guessing I hit a limit on the number of channels, which I'm guessing was the cause of the crash. Does this sound like a likely cause? I've identified and removed the code that was creating all the channels, but I'm concerned that it appears to be so easy for a single rogue client to take down the entire server. Is there a way for me to prevent this?
>
> Mmmm, that's a good point, and no, there's no such knob.

Having said all that however, be aware that you'll also crash if you run
out of file descriptors, and you can also explode rabbit by creating
lots of queues - there are no limits on either of these. So there are,
sadly, several vectors for DoS attacks just at the moment.

/me raises further bugs.

Ian Ragsdale

unread,
Mar 15, 2010, 3:22:32 PM3/15/10
to rabbitmq...@lists.rabbitmq.com
On Mar 12, 2010, at 5:40 AM, Matthew Sackman wrote:

> On Fri, Mar 12, 2010 at 11:37:09AM +0000, Matthew Sackman wrote:
>>> Based on the number of channels in the logged dictionary, I'm guessing I hit a limit on the number of channels, which I'm guessing was the cause of the crash. Does this sound like a likely cause? I've identified and removed the code that was creating all the channels, but I'm concerned that it appears to be so easy for a single rogue client to take down the entire server. Is there a way for me to prevent this?
>>
>> Mmmm, that's a good point, and no, there's no such knob.
>
> Having said all that however, be aware that you'll also crash if you run
> out of file descriptors, and you can also explode rabbit by creating
> lots of queues - there are no limits on either of these. So there are,
> sadly, several vectors for DoS attacks just at the moment.

Good to know. We'll be controlling all the producers and consumers, so it isn't a huge worry that it would be possible to DOS, we'll just have to be a little more careful on our end. Thanks for confirming the likely cause of the crash - I now feel comfortable that it was a self-inflicted wound and don't have to worry that we have a problem with our rabbitmq install.

- Ian

Reply all
Reply to author
Forward
0 new messages