[rabbitmq-discuss] Erlang client: function

Edwin Fine

unread,

Oct 26, 2008, 1:55:53 PM10/26/08

to rabbitmq

I shut RabbitMQ down while an application still had connections to it (I am doing various recovery scenarios), and I got this:

** Reason for termination ==
** {function_clause,
       [{amqp_connection,handle_info,
            [{method,
                 {'connection.close',320,
                     <<"CONNECTION_FORCED - broker forced connection closure with reason 'shutdown'">>,
                     0,0},
                 none},
             {connection_state,<<"xhg">>,<<"xhg">>,"0.0.0.0",
                 #Port<0.230>,<<"/xhg">>,<0.150.0>,<0.151.0>,undefined,0,0,
                 {dict,0,16,16,8,80,48,
                     {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                     {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}}]},
        {gen_server,handle_msg,5},
        {proc_lib,init_p,5}]}

Should there be a handle_info clause for this in amqp_connection, or is it something I need to code for somehow?

Regards,
Edwin Fine

Ben Hood

unread,

Oct 27, 2008, 8:04:02 AM10/27/08

to Edwin Fine, rabbitmq

Edwin,

On Sun, Oct 26, 2008 at 5:55 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Should there be a handle_info clause for this in amqp_connection, or is it
> something I need to code for somehow?

This is a bug in the amqp_connection module, it should be handling
this message from the broker. I have started to fix this (19625
refers), but I'm going to have to think about the event propagation.
The current patch will at least handle the message.

HTH,

Ben

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
http://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Edwin Fine

unread,

Oct 27, 2008, 10:06:08 AM10/27/08

to Ben Hood, rabbitmq

Thanks, Ben.

Do you have any general suggestions as to how to recover cleanly from multiple connections dying in an application because the broker went down? Ideally, I'd like to be able to recover gracefully and not have to crash processes unnecessarily.

Regards,
Edwin

Ben Hood

unread,

Oct 27, 2008, 11:06:14 AM10/27/08

to Edwin Fine, rabbitmq

Edwin,

On Mon, Oct 27, 2008 at 2:06 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Do you have any general suggestions as to how to recover cleanly from
> multiple connections dying in an application because the broker went down?
> Ideally, I'd like to be able to recover gracefully and not have to crash
> processes unnecessarily.

When you say multiple connections in an application, are you referring
to multiple TCP connections or multiple AMQP channels?

Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'm not quite sure whether the discussion was about how to supervise
client connections using an OTP tree or whether I had that discussion
with somebody else.....if it was, could that person please chip in
here?

Edwin Fine

unread,

Oct 27, 2008, 11:08:45 AM10/27/08

to Ben Hood, rabbitmq

On Mon, Oct 27, 2008 at 11:06 AM, Ben Hood <0x6e...@gmail.com> wrote:

Edwin,

On Mon, Oct 27, 2008 at 2:06 PM, Edwin Fine

<rabbitmq-di...@usa.net> wrote:

> Do you have any general suggestions as to how to recover cleanly from
> multiple connections dying in an application because the broker went down?
> Ideally, I'd like to be able to recover gracefully and not have to crash
> processes unnecessarily.

When you say multiple connections in an application, are you referring
to multiple TCP connections or multiple AMQP channels?

Both, actually. I have a connection pool of TCP connections to a rabbit broker, each connection of which supports multiple channels.

Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'll take a look.

Ben Hood

unread,

Oct 27, 2008, 11:24:43 AM10/27/08

to Edwin Fine, rabbitmq

Edwin,

On Mon, Oct 27, 2008 at 3:08 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Both, actually. I have a connection pool of TCP connections to a rabbit
> broker, each connection of which supports multiple channels.

One thing that the Erlang client doesn't have which the other clients
do is a facility to register a shutdown handler with the AMQP
connection. Maybe we should look into doing this.

Edwin Fine

unread,

Oct 27, 2008, 11:33:36 AM10/27/08

to Ben Hood, rabbitmq

On Mon, Oct 27, 2008 at 11:24 AM, Ben Hood <0x6e...@gmail.com> wrote:

Edwin,

On Mon, Oct 27, 2008 at 3:08 PM, Edwin Fine

<rabbitmq-di...@usa.net> wrote:

> Both, actually. I have a connection pool of TCP connections to a rabbit
> broker, each connection of which supports multiple channels.

One thing that the Erlang client doesn't have which the other clients
do is a facility to register a shutdown handler with the AMQP
connection. Maybe we should look into doing this.

Music to my ears. I would really, really appreciate something like that.

Ben

Ben Hood

unread,

Oct 27, 2008, 12:30:01 PM10/27/08

to Edwin Fine, rabbitmq

Edwin,

On Mon, Oct 27, 2008 at 3:33 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> Music to my ears. I would really, really appreciate something like that.

Ok, I've roadmapped it for consideration in the 1.0 release of the
Erlang client (when a patch transpires 19630 will refer).

Edwin Fine

unread,

Oct 27, 2008, 1:06:37 PM10/27/08

to Ben Hood, rabbitmq

Ben,

Thanks. In the meantime, I may try to hack it for myself.

Regards,
Ed

Valentino Volonghi

unread,

Oct 27, 2008, 7:01:31 PM10/27/08

to Edwin Fine, rabbitmq

On Oct 27, 2008, at 8:08 AM, Edwin Fine wrote:

Valentino started a thread on a related topic so he may be able to
chime in here: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2008-October/002105.html

I'll take a look.

The way I solved this is simply to let the client crash and restart from the supervisor. Then

it starts a loop until it connects successfully to the other broker. This makes the code

really simple and very robust.

The only 'problem' is that in this way I'm basically trusting the problem (connection error)

to be solved before RabbitMQ goes out of memory.

Speaking of which... Is there any way (donations, manual labor, slavery or such) that

I can be of help to change the queue state to a new object that knows how to persist

without using all the memory up?

From what I see by reading the code I think that the change would be isolated inside

rabbit_amqqueue_process.erl and the exact variable is message_buffer. An object

with a similar API but a different storage strategy could maybe be swapped in instead

of the current queue.

--

Valentino Volonghi aka Dialtone

Now running MacOS X 10.5

Home Page: http://www.twisted.it

http://www.adroll.com

PGP.sig

Ben Hood

unread,

Oct 28, 2008, 7:31:08 AM10/28/08

to Valentino Volonghi, rabbitmq

Valentino,

On Mon, Oct 27, 2008 at 11:01 PM, Valentino Volonghi <dial...@gmail.com> wrote:
> Speaking of which... Is there any way (donations, manual labor, slavery or
> such) that
> I can be of help to change the queue state to a new object that knows how to
> persist
> without using all the memory up?

In general, you can help by either contributing code yourself or by
financing the reprioritization of the roadmap.

Disk overflow or queue paging is on the mid term roadmap as something
we are going to do, but we still need to gather requirements.

Here are a few examples:

- When do you decide to page things to disk?
- Is it done on memory consumption or queue depth?
- Is this configurable per queue or across the broker?
- What are sensible defaults so that people who haven't even though
about paging don't get affected by overly-agressive defaults?
- When and how do you swap back in - is this automatic or manual?
- How do you decide when the low water mark has been reached after
having commenced the page-in? Do you resume flow control at this point?
- If you do page, are you interested in last image caching?
- Do you want to apply application level heuristics to selectively purge
overflowed queues?
- What role do TTLs play in this scenario?
- Furthermore, we do already have a fast message persister - it's just
that it's geared to write as quickly as possible, not read.
- Ask yourself, if we implement paging, is are we potentially reinventing
a wheel that the OS has already invented?
- And as always, what do you do when your SAN fills up?

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

> From what I see by reading the code I think that the change would be
> isolated inside
> rabbit_amqqueue_process.erl and the exact variable is message_buffer. An
> object
> with a similar API but a different storage strategy could maybe be swapped
> in instead
> of the current queue.

True - the intention is to have a code base that is as short as
possible so that it can be easily understood and adapted.

Having said that, whilst it may be straight forward to do a hello
world overflow, the devil is in the detail of all of the moving parts
and different scenarios that you have to account for whilst maintaing
the clarity of the code base. And testing it of course.

HTH,

Ben

Valentino Volonghi

unread,

Oct 28, 2008, 2:53:17 PM10/28/08

to Ben Hood, rabbitmq

On Oct 28, 2008, at 4:31 AM, Ben Hood wrote:

In general, you can help by either contributing code yourself or by
financing the reprioritization of the roadmap.

Disk overflow or queue paging is on the mid term roadmap as something
we are going to do, but we still need to gather requirements.

Here are a few examples:

- When do you decide to page things to disk?

I'd say memory high watermark reached or, if reached before, a number

of messages in the queue.

- Is it done on memory consumption or queue depth?

Both. Depends on which one is reached first, I can see use cases for

both of these triggers.

- Is this configurable per queue or across the broker?

Across the broker, other AMQP implementation have a 'maximum number

of messages in a queue' as a per broker option.

- What are sensible defaults so that people who haven't even though
about paging don't get affected by overly-agressive defaults?

If a user was not affected before it won't be affected even after because

he won't need to overflow to disk anyway. In case the limit hit is the number

of messages then simply set it to infinite when no different is specified. Then

it will only use memory watermark and this would give them more robustness

going from simply crashing the erlang VM because memory is finished to

slowing down because you are now reading from disk.

- When and how do you swap back in - is this automatic or manual?

When the buffer in the disk is empty then rabbitmq can stop using it, I expect

it to deliver messages in order so once the buffer kicks in I'd queue stuff on

disk immediately.

- How do you decide when the low water mark has been reached after
having commenced the page-in? Do you resume flow control at this point?

When the disk buffer is empty. And you simply resume working without it.

- If you do page, are you interested in last image caching?

I'm not sure what this means but looks like an optimization, it's probably

interesting to have but caching can come when the system can resist

prolonged consumer downtimes.

Anyway being an optimization in the queue process it should be fairly

isolated in it. I can totally see keeping the current page always in memory

(if this is the optimization we are talking about), the big problem is keeping

everything in memory.

- Do you want to apply application level heuristics to selectively purge
overflowed queues?

ActiveMQ offers basically 3 different ways to deal with the problem:

kill the queue, drop all new messages, drop old messages.

This can be taken care of in the publish command of rabbitmq.

- What role do TTLs play in this scenario?

It's not the role of the message broker to kill single messages, it's an application

level decision. I can see a configuration option in the queue though that together

with the 3 options above can provide a 'kill all the messages older than X seconds'.

This can be taken care inside the queue itself when getting the top of the queue.

- Furthermore, we do already have a fast message persister - it's just
that it's geared to write as quickly as possible, not read.

And this is a great to me.

- Ask yourself, if we implement paging, is are we potentially reinventing
a wheel that the OS has already invented?

Well, sure. But erlang fails before when it cannot malloc memory.

- And as always, what do you do when your SAN fills up?

If a 500-600GB disk fills up it means I had at least ~80 times more time to

fix the problem somehow, given an average of 8GB of memory in a webserver.

So if with 8GB of memory I had 10 hours to fix it, with a disk I have 30 days to

fix it, and I can even add new disks with a good disk array or filesystem to buy

me more time.

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

Good, I'm all for this. And I'm starting with this reply.

Having said that, whilst it may be straight forward to do a hello
world overflow, the devil is in the detail of all of the moving parts
and different scenarios that you have to account for whilst maintaing
the clarity of the code base. And testing it of course.

Yep, this is just right.

PGP.sig

Valentino Volonghi

unread,

Oct 30, 2008, 3:22:58 PM10/30/08

to Ben Hood, rabbitmq

On Oct 28, 2008, at 4:31 AM, Ben Hood wrote:

One suggestion to kick things off is to begin a more structured
analysis of the whole problem on the wiki and start a dedicated
discussion thread around this. For example, one could start a document
highlighting the motivation and requirements and let interested
parties comment on this.

I've started doing this in the rabbitmq wiki, hopefully nobody will

complain about it.

Here's a link to the page: https://dev.rabbitmq.com/wiki/DiskOverflow

Anyway should feel free to comment, edit, add new stuff, their usecases

etc etc.

PGP.sig

Ben Hood

unread,

Nov 2, 2008, 5:17:26 AM11/2/08

to Valentino Volonghi, rabbitmq

Valentino,

On Thu, Oct 30, 2008 at 7:22 PM, Valentino Volonghi <dial...@gmail.com> wrote:
> I've started doing this in the rabbitmq wiki, hopefully nobody will
> complain about it.
> Here's a link to the page: https://dev.rabbitmq.com/wiki/DiskOverflow
> Anyway should feel free to comment, edit, add new stuff, their usecases
> etc etc.

This is exactly what the wiki is for - to be able to transition
loosely structured discussions about new features into something
cohesive that you can use to implement the feature, as opposed to just
hacking it down.

I've just commented on this inline:

https://dev.rabbitmq.com/wiki/DiskOverflow/diff?v1=20081030214344-7b340-374fd9ac3526c09882d27bdebc00a9988b1c2f39.gz&v2=20081102101409-7b340-d967384f374748b08950ee77b934cca4bf5fa093.gz

For the information of others, you can subscribe to the page to
receive update notifications.

And BTW, thanks for making the effort, this kind of thing helps us out a lot.

Ben Hood

unread,

Dec 10, 2008, 6:38:02 AM12/10/08

to Edwin Fine, rabbitmq

Ed,

On Mon, Oct 27, 2008 at 4:30 PM, Ben Hood <0x6e...@gmail.com> wrote:
> On Mon, Oct 27, 2008 at 3:33 PM, Edwin Fine
> <rabbitmq-di...@usa.net> wrote:
>> Music to my ears. I would really, really appreciate something like that.
>
> Ok, I've roadmapped it for consideration in the 1.0 release of the
> Erlang client (when a patch transpires 19630 will refer).

I forgot to mention the other day that the handling for this has been
updated with the latest mainline.

It is known as 19625 and basically handles a forced connection more gracefully.

It still doesn't contain a shutdown handler, which will be in a
seperate branch (19630) when something gets done on this.

Edwin Fine

unread,

Dec 10, 2008, 7:30:28 AM12/10/08

to Ben Hood, rabbitmq

Thanks for the info, Ben.

One more question: Without using transactions, is there a way to basic.publish using a call (and getting either a positive or negative response from the server), rather than just doing a "blind" cast? I think we talked about this but I can't fully recall the conclusion. IIRC, you said you would add a call for it?

Regards,
Ed

Ben Hood

unread,

Dec 10, 2008, 7:42:15 AM12/10/08

to Edwin Fine, rabbitmq

Ed,

On Wed, Dec 10, 2008 at 12:30 PM, Edwin Fine
<rabbitmq-di...@usa.net> wrote:
> One more question: Without using transactions, is there a way to
> basic.publish using a call (and getting either a positive or negative
> response from the server), rather than just doing a "blind" cast? I think we
> talked about this but I can't fully recall the conclusion. IIRC, you said
> you would add a call for it?

Yep, I've added this in 19560 (which also has some descendents as well
e.g. 19334, 19625, 19373).

HTH,

Reply all

Reply to author

Forward

[rabbitmq-discuss] Erlang client: function_clause error

Edwin Fine

Ben Hood

Edwin Fine

Ben Hood

Edwin Fine

Ben Hood

Edwin Fine

Ben Hood

Edwin Fine

Valentino Volonghi

Ben Hood

Valentino Volonghi

Valentino Volonghi

Ben Hood

Ben Hood

Edwin Fine

Ben Hood