[rabbitmq-discuss] Erlang client: function_clause error

30 views
Skip to first unread message

Edwin Fine

unread,
Oct 26, 2008, 1:55:53 PM10/26/08
to rabbitmq
I shut RabbitMQ down while an application still had connections to it (I am doing various recovery scenarios), and I got this:

** Reason for termination ==
** {function_clause,
       [{amqp_connection,handle_info,
            [{method,
                 {'connection.close',320,
                     <<"CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'">>,
                     0,0},
                 none},
             {connection_state,<<"xhg">>,<<"xhg">>,"0.0.0.0",
                 #Port<0.230>,<<"/xhg">>,<0.150.0>,<0.151.0>,undefined,0,0,
                 {dict,0,16,16,8,80,48,
                     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                     {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}]},
        {gen_server,handle_msg,5},
        {proc_lib,init_p,5}]}

Should there be a handle_info clause for this in amqp_connection, or is it something I need to code for somehow?

Regards,
Edwin Fine

Ben Hood

unread,
Oct 27, 2008, 8:04:02 AM10/27/08
to Edwin Fine, rabbitmq
Edwin,

On Sun, Oct 26, 2008 at 5:55 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Should there be a handle_info clause for this in amqp_connection, or is it
> something I need to code for somehow?

This is a bug in the amqp_connection module, it should be handling
this message from the broker. I have started to fix this (19625
refers), but I'm going to have to think about the event propagation.
The current patch will at least handle the message.

HTH,

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Edwin Fine

unread,
Oct 27, 2008, 10:06:08 AM10/27/08
to Ben Hood, rabbitmq
Thanks, Ben.

Do you have any general suggestions as to how to recover cleanly from multiple connections dying in an application because the broker went down? Ideally, I'd like to be able to recover gracefully and not have to crash processes unnecessarily.

Regards,
Edwin

Ben Hood

unread,
Oct 27, 2008, 11:06:14 AM10/27/08
to Edwin Fine, rabbitmq
Edwin,

On Mon, Oct 27, 2008 at 2:06 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Do you have any general suggestions as to how to recover cleanly from
> multiple connections dying in an application because the broker went down?
> Ideally, I'd like to be able to recover gracefully and not have to crash
> processes unnecessarily.

When you say multiple connections in an application, are you referring
to multiple TCP connections or multiple AMQP channels?

Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'm not quite sure whether the discussion was about how to supervise
client connections using an OTP tree or whether I had that discussion
with somebody else.....if it was, could that person please chip in
here?

Edwin Fine

unread,
Oct 27, 2008, 11:08:45 AM10/27/08
to Ben Hood, rabbitmq
On Mon, Oct 27, 2008 at 11:06 AM, Ben Hood <0x6e...@gmail.com> wrote:
Edwin,

On Mon, Oct 27, 2008 at 2:06 PM, Edwin Fine
> Do you have any general suggestions as to how to recover cleanly from
> multiple connections dying in an application because the broker went down?
> Ideally, I'd like to be able to recover gracefully and not have to crash
> processes unnecessarily.

When you say multiple connections in an application, are you referring
to multiple TCP connections or multiple AMQP channels?
 
Both, actually. I have a connection pool of TCP connections to a rabbit broker, each connection of which supports multiple channels.


Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'll take a look.
 

Ben Hood

unread,
Oct 27, 2008, 11:24:43 AM10/27/08
to Edwin Fine, rabbitmq
Edwin,

On Mon, Oct 27, 2008 at 3:08 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Both, actually. I have a connection pool of TCP connections to a rabbit
> broker, each connection of which supports multiple channels.

One thing that the Erlang client doesn't have which the other clients
do is a facility to register a shutdown handler with the AMQP
connection. Maybe we should look into doing this.

Edwin Fine

unread,
Oct 27, 2008, 11:33:36 AM10/27/08
to Ben Hood, rabbitmq
On Mon, Oct 27, 2008 at 11:24 AM, Ben Hood <0x6e...@gmail.com> wrote:
Edwin,

On Mon, Oct 27, 2008 at 3:08 PM, Edwin Fine
> Both, actually. I have a connection pool of TCP connections to a rabbit
> broker, each connection of which supports multiple channels.

One thing that the Erlang client doesn't have which the other clients
do is a facility to register a shutdown handler with the AMQP
connection. Maybe we should look into doing this.
 
Music to my ears. I would really, really appreciate something like that.


Ben


Ben Hood

unread,
Oct 27, 2008, 12:30:01 PM10/27/08
to Edwin Fine, rabbitmq
Edwin,

On Mon, Oct 27, 2008 at 3:33 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Music to my ears. I would really, really appreciate something like that.

Ok, I've roadmapped it for consideration in the 1.0 release of the
Erlang client (when a patch transpires 19630 will refer).

Edwin Fine

unread,
Oct 27, 2008, 1:06:37 PM10/27/08
to Ben Hood, rabbitmq
Ben,

Thanks. In the meantime, I may try to hack it for myself.

Regards,
Ed

Valentino Volonghi

unread,
Oct 27, 2008, 7:01:31 PM10/27/08
to Edwin Fine, rabbitmq

On Oct 27, 2008, at 8:08 AM, Edwin Fine wrote:

Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'll take a look.

The way I solved this is simply to let the client crash and restart from the supervisor. Then
it starts a loop until it connects successfully to the other broker. This makes the code
really simple and very robust.

The only 'problem' is that in this way I'm basically trusting the problem (connection error)
to be solved before RabbitMQ goes out of memory.

Speaking of which... Is there any way (donations, manual labor, slavery or such) that
I can be of help to change the queue state to a new object that knows how to persist
without using all the memory up?

From what I see by reading the code I think that the change would be isolated inside
rabbit_amqqueue_process.erl and the exact variable is message_buffer. An object
with a similar API but a different storage strategy could maybe be swapped in instead
of the current queue.

--
Valentino Volonghi aka Dialtone
Now running MacOS X 10.5

PGP.sig

Ben Hood

unread,
Oct 28, 2008, 7:31:08 AM10/28/08
to Valentino Volonghi, rabbitmq
Valentino,

On Mon, Oct 27, 2008 at 11:01 PM, Valentino Volonghi <dial...@gmail.com> wrote:
> Speaking of which... Is there any way (donations, manual labor, slavery or
> such) that
> I can be of help to change the queue state to a new object that knows how to
> persist
> without using all the memory up?

In general, you can help by either contributing code yourself or by
financing the reprioritization of the roadmap.

Disk overflow or queue paging is on the mid term roadmap as something
we are going to do, but we still need to gather requirements.

Here are a few examples:

- When do you decide to page things to disk?
- Is it done on memory consumption or queue depth?
- Is this configurable per queue or across the broker?
- What are sensible defaults so that people who haven't even though
about paging don't get affected by overly-agressive defaults?
- When and how do you swap back in - is this automatic or manual?
- How do you decide when the low water mark has been reached after
having commenced the page-in? Do you resume flow control at this point?
- If you do page, are you interested in last image caching?
- Do you want to apply application level heuristics to selectively purge
overflowed queues?
- What role do TTLs play in this scenario?
- Furthermore, we do already have a fast message persister - it's just
that it's geared to write as quickly as possible, not read.
- Ask yourself, if we implement paging, is are we potentially reinventing
a wheel that the OS has already invented?
- And as always, what do you do when your SAN fills up?

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

> From what I see by reading the code I think that the change would be
> isolated inside
> rabbit_amqqueue_process.erl and the exact variable is message_buffer. An
> object
> with a similar API but a different storage strategy could maybe be swapped
> in instead
> of the current queue.

True - the intention is to have a code base that is as short as
possible so that it can be easily understood and adapted.

Having said that, whilst it may be straight forward to do a hello
world overflow, the devil is in the detail of all of the moving parts
and different scenarios that you have to account for whilst maintaing
the clarity of the code base. And testing it of course.

HTH,

Ben

Valentino Volonghi

unread,
Oct 28, 2008, 2:53:17 PM10/28/08
to Ben Hood, rabbitmq
On Oct 28, 2008, at 4:31 AM, Ben Hood wrote:

In general, you can help by either contributing code yourself or by
financing the reprioritization of the roadmap.

Disk overflow or queue paging is on the mid term roadmap as something
we are going to do, but we still need to gather requirements.

Here are a few examples:

- When do you decide to page things to disk?

I'd say memory high watermark reached or, if reached before, a number
of messages in the queue.

- Is it done on memory consumption or queue depth?

Both. Depends on which one is reached first, I can see use cases for
both of these triggers.

- Is this configurable per queue or across the broker?

Across the broker, other AMQP implementation have a 'maximum number
of messages in a queue' as a per broker option.

- What are sensible defaults so that people who haven't even though
 about paging don't get affected by overly-agressive defaults?

If a user was not affected before it won't be affected even after because
he won't need to overflow to disk anyway. In case the limit hit is the number
of messages then simply set it to infinite when no different is specified. Then
it will only use memory watermark and this would give them more robustness
going from simply crashing the erlang VM because memory is finished to
slowing down because you are now reading from disk.

- When and how do you swap back in - is this automatic or manual?

When the buffer in the disk is empty then rabbitmq can stop using it, I expect
it to deliver messages in order so once the buffer kicks in I'd queue stuff on
disk immediately.

- How do you decide when the low water mark has been reached after
 having commenced the page-in? Do you resume flow control at this point?

When the disk buffer is empty. And you simply resume working without it.

- If you do page, are you interested in last image caching?

I'm not sure what this means but looks like an optimization, it's probably
interesting to have but caching can come when the system can resist
prolonged consumer downtimes.

Anyway being an optimization in the queue process it should be fairly
isolated in it. I can totally see keeping the current page always in memory
(if this is the optimization we are talking about), the big problem is keeping
everything in memory.

- Do you want to apply application level heuristics to selectively purge
 overflowed queues?

ActiveMQ offers basically 3 different ways to deal with the problem:
kill the queue, drop all new messages, drop old messages.

This can be taken care of in the publish command of rabbitmq.

- What role do TTLs play in this scenario?

It's not the role of the message broker to kill single messages, it's an application
level decision. I can see a configuration option in the queue though that together
with the 3 options above can provide a 'kill all the messages older than X seconds'.

This can be taken care inside the queue itself when getting the top of the queue.

- Furthermore, we do already have a fast message persister - it's just
 that it's geared to write as quickly as possible, not read.

And this is a great to me.

- Ask yourself, if we implement paging, is are we potentially reinventing
 a wheel that the OS has already invented?

Well, sure. But erlang fails before when it cannot malloc memory.

- And as always, what do you do when your SAN fills up?

If a 500-600GB disk fills up it means I had at least ~80 times more time to
fix the problem somehow, given an average of 8GB of memory in a webserver.
So if with 8GB of memory I had 10 hours to fix it, with a disk I have 30 days to
fix it, and I can even add new disks with a good disk array or filesystem to buy
me more time.

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

Good, I'm all for this. And I'm starting with this reply.

Having said that, whilst it may be straight forward to do a hello
world overflow, the devil is in the detail of all of the moving parts
and different scenarios that you have to account for whilst maintaing
the clarity of the code base. And testing it of course.

Yep, this is just right.
PGP.sig

Valentino Volonghi

unread,
Oct 30, 2008, 3:22:58 PM10/30/08
to Ben Hood, rabbitmq
On Oct 28, 2008, at 4:31 AM, Ben Hood wrote:

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

I've started doing this in the rabbitmq wiki, hopefully nobody will
complain about it.


Anyway should feel free to comment, edit, add new stuff, their usecases
etc etc.
PGP.sig

Ben Hood

unread,
Nov 2, 2008, 5:17:26 AM11/2/08
to Valentino Volonghi, rabbitmq
Valentino,

On Thu, Oct 30, 2008 at 7:22 PM, Valentino Volonghi <dial...@gmail.com> wrote:
> I've started doing this in the rabbitmq wiki, hopefully nobody will
> complain about it.
> Here's a link to the page: https://dev.rabbitmq.com/wiki/DiskOverflow
> Anyway should feel free to comment, edit, add new stuff, their usecases
> etc etc.

This is exactly what the wiki is for - to be able to transition
loosely structured discussions about new features into something
cohesive that you can use to implement the feature, as opposed to just
hacking it down.

I've just commented on this inline:

https://dev.rabbitmq.com/wiki/DiskOverflow/diff?v1=20081030214344-7b340-374fd9ac3526c09882d27bdebc00a9988b1c2f39.gz&v2=20081102101409-7b340-d967384f374748b08950ee77b934cca4bf5fa093.gz

For the information of others, you can subscribe to the page to
receive update notifications.

And BTW, thanks for making the effort, this kind of thing helps us out a lot.

Ben Hood

unread,
Dec 10, 2008, 6:38:02 AM12/10/08
to Edwin Fine, rabbitmq
Ed,

On Mon, Oct 27, 2008 at 4:30 PM, Ben Hood <0x6e...@gmail.com> wrote:
> On Mon, Oct 27, 2008 at 3:33 PM, Edwin Fine
> <rabbitmq-di...@usa.net> wrote:
>> Music to my ears. I would really, really appreciate something like that.
>
> Ok, I've roadmapped it for consideration in the 1.0 release of the
> Erlang client (when a patch transpires 19630 will refer).

I forgot to mention the other day that the handling for this has been
updated with the latest mainline.

It is known as 19625 and basically handles a forced connection more gracefully.

It still doesn't contain a shutdown handler, which will be in a
seperate branch (19630) when something gets done on this.

Edwin Fine

unread,
Dec 10, 2008, 7:30:28 AM12/10/08
to Ben Hood, rabbitmq
Thanks for the info, Ben.

One more question: Without using transactions, is there a way to basic.publish using a call (and getting either a positive or negative response from the server), rather than just doing a "blind" cast? I think we talked about this but I can't fully  recall the conclusion. IIRC, you said you would add a call for it?

Regards,
Ed

Ben Hood

unread,
Dec 10, 2008, 7:42:15 AM12/10/08
to Edwin Fine, rabbitmq
Ed,

On Wed, Dec 10, 2008 at 12:30 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> One more question: Without using transactions, is there a way to
> basic.publish using a call (and getting either a positive or negative
> response from the server), rather than just doing a "blind" cast? I think we
> talked about this but I can't fully recall the conclusion. IIRC, you said
> you would add a call for it?

Yep, I've added this in 19560 (which also has some descendents as well
e.g. 19334, 19625, 19373).

HTH,

Reply all
Reply to author
Forward
0 new messages