[rabbitmq-discuss] Patching rabbitmq to get more information about framing exceptions

18 views
Skip to first unread message

Michael Klishin

unread,
Jul 20, 2011, 5:30:56 PM7/20/11
to rabbitmq...@lists.rabbitmq.com
Hi,

I am investigating a rare (but nonetheless top priority) amqp gem issue when frames are delivered out of order. Neither the Tracer tool nor Wireshark give me
good understanding of what frame exactly RabbitMQ chokes on, it just says "expected a method frame, got a non-method frame instead".

So I am trying to patch RabbitMQ to get some more information. What's the right way to do it? Right now I am trying to make
rabbit_command_assembler:process/2 do what I want and I am not sure how to get a string representation of

1. Method that was delivered out of order.
2. # of the channel on which it happened
3. Ideally, 10-20 most recent frames

Can I ask someone from the RabbitMQ team help me a little bit? Where should I be looking? Is it a good idea to make Rabbit log such
issues with more detail (it shouldn't affect hot code path performance anyway)?

At this point I have an app that reproduces the issue roughly every other time and I am using a Rabbit instance compiled from HEAD sources.
--
MK

http://github.com/michaelklishin
http://twitter.com/michaelklishin

Matthias Radestock

unread,
Jul 20, 2011, 6:01:20 PM7/20/11
to Michael Klishin, rabbitmq...@lists.rabbitmq.com
Michael,

On 20/07/11 22:30, Michael Klishin wrote:
> I am investigating a rare (but nonetheless top priority) amqp gem issue
> when frames are delivered out of order. Neither the Tracer tool nor
> Wireshark give me
> good understanding of what frame exactly RabbitMQ chokes on, it just
> says "expected a method frame, got a non-method frame instead".

That means the server got a content header or body frame, which in case
of client->server communication can only be as part of a publish. The
most likely cause is the client sending more content body frames than it
should, e.g. a trailing empty frame.

It should be possible to spot that from the tracer / wireshark.

> Is it a good idea to make Rabbit log such issues with more detail (it
> shouldn't affect hot code path performance anyway)?

We wouldn't want to log complete frame contents since that would
introduce a major DoS attack vector. Having said that, the error that is
logged (and indeed returned to the client) should include some more info
than it currently does. I have filed a bug for that.

Regards,

Matthias.
_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Michael Klishin

unread,
Jul 20, 2011, 6:09:16 PM7/20/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
2011/7/21 Matthias Radestock <matt...@rabbitmq.com>
That means the server got a content header or body frame, which in case of client->server communication can only be as part of a publish. The most likely cause is the client sending more content body frames than it should, e.g. a trailing empty frame.

It should be possible to spot that from the tracer / wireshark.


Yes, it is purely a publishing issue and the idea of extra frames has crossed my mind. Wireshark doesn't prove it though.
 

Is it a good idea to make Rabbit log such issues with more detail (it
shouldn't affect hot code path performance anyway)?

We wouldn't want to log complete frame contents since that would introduce a major DoS attack vector. Having said that, the error that is logged (and indeed returned to the client) should include some more info than it currently does. I have filed a bug for that.

I still need to find a way to patch my local rabbit copy to get more information. What's a good way to keep a few recent frames around? Is it possible to implement as a plugin?

Matthias Radestock

unread,
Jul 20, 2011, 6:33:48 PM7/20/11
to Michael Klishin, rabbitmq...@lists.rabbitmq.com
Michael,

On 20/07/11 23:09, Michael Klishin wrote:
> 2011/7/21 Matthias Radestock <matt...@rabbitmq.com
> <mailto:matt...@rabbitmq.com>>


>
> That means the server got a content header or body frame, which in
> case of client->server communication can only be as part of a
> publish. The most likely cause is the client sending more content
> body frames than it should, e.g. a trailing empty frame.
>
> It should be possible to spot that from the tracer / wireshark.
>
>
> Yes, it is purely a publishing issue and the idea of extra frames has
> crossed my mind. Wireshark doesn't prove it though.

Wireshark should have all the information you need. It probably won't
call this issue out as an error though. Ditto for the tracer.

> I still need to find a way to patch my local rabbit copy to get more
> information. What's a good way to keep a few recent frames around?

Given that you can reproduce the problem easily, you may just want to
log some details of all the frames - frame type, class id, method id,
and for content headers the expected body size, and for content bodies
the payload size.

> Is it possible to implement as a plugin?

No. Well, technically yes, but you are much better off just modifying
the code in rabbit_command_assembler.

Michael Klishin

unread,
Jul 20, 2011, 6:42:28 PM7/20/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
2011/7/21 Matthias Radestock <matt...@rabbitmq.com>

Given that you can reproduce the problem easily, you may just want to log some details of all the frames - frame type, class id, method id, and for content headers the expected body size, and for content bodies the payload size.

That's what I am doing now. One thing that has caught my attention is a 0-sized BodyFrame. that did not cause any exceptions to be raised (for good 10 seconds or so, anyway)
but there is absolutely no way I am publishing empty payloads. So you are likely to be right. We also have a long-stanging issue (not 0.8.x specific though) with empty messages.

I will try to investigate more on the client side and take a look at the Java client framing code. Thank you for your help. I did not suspect extra frames to be an issue initially.

Michael Klishin

unread,
Jul 20, 2011, 10:51:28 PM7/20/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
2011/7/21 Matthias Radestock <matt...@rabbitmq.com>

Given that you can reproduce the problem easily, you may just want to log some details of all the frames - frame type, class id, method id, and for content headers the expected body size, and for content bodies the payload size.

So far I narrowed it down to the "empty messages" problem: whenever there is an empty message sent by amqp gem master version, RabbitMQ
doesn't consider it a complete message and thus clients do not get anything. With a simple script that terminates as soon as the message is received
(by the same process), this looks like app is hanging. In the context of a larger application, sometimes there are empty messages transmitted (I am streaming a stdout output).

After comparing 2 clients I see the following picture: the data transmitted is exactly the same, frame order is correct but Wireshark GUI suggests that
one client transmits 3 frames as 3 IP packets. In this case, RabbitMQ is happy. The other transmits them all in a single IP packet and RabbitMQ doesn't seem to recognize it as a complete message.

Here are two Wireshark session exports:
https://gist.github.com/1096404

If the [frames] content is the same, what may be going on here?

Michael Klishin

unread,
Jul 21, 2011, 12:08:59 AM7/21/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com
2011/7/21 Michael Klishin <michael....@gmail.com>

Here are two Wireshark session exports:
https://gist.github.com/1096404

One more session: https://gist.github.com/1096472

Matthias Radestock

unread,
Jul 21, 2011, 2:31:01 AM7/21/11
to Michael Klishin, rabbitmq...@lists.rabbitmq.com
Michael,

On 21/07/11 03:51, Michael Klishin wrote:
> So far I narrowed it down to the "empty messages" problem: whenever
> there is an empty message sent by amqp gem master version, RabbitMQ
> doesn't consider it a complete message and thus clients do not get
> anything. With a simple script that terminates as soon as the message is
> received
> (by the same process), this looks like app is hanging. In the context of
> a larger application, sometimes there are empty messages transmitted (I
> am streaming a stdout output).
>
> After comparing 2 clients I see the following picture: the data
> transmitted is exactly the same, frame order is correct but Wireshark
> GUI suggests that
> one client transmits 3 frames as 3 IP packets. In this case, RabbitMQ is
> happy. The other transmits them all in a single IP packet and RabbitMQ
> doesn't seem to recognize it as a complete message.
>
> Here are two Wireshark session exports:
> https://gist.github.com/1096404
>
> If the [frames] content is the same, what may be going on here?

That's curious. Both should cause rabbit to complain with the "expected
method frame, got non method frame instead" error. Are you sure you are
seeing different behaviour here? If so, please send through the pcap
files showing the entire tcp session.

Content body frames should never be zero length. If the content length
is zero then no content body frame should be sent at all, just a header.

Michael Klishin

unread,
Jul 21, 2011, 6:15:12 AM7/21/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com


2011/7/21 Matthias Radestock <matt...@rabbitmq.com>


That's curious. Both should cause rabbit to complain with the "expected method frame, got non method frame instead" error. Are you sure you are seeing different behaviour here? If so, please send through the pcap files showing the entire tcp session.

Yes, I am very much sure. I have spent hours investigating this thing and ran it probably hundreds of times. Will do.
 

Content body frames should never be zero length. If the content length is zero then no content body frame should be sent at all, just a header.

Ah, yes, I read this line in the spec but then the fact that another Ruby client that *also sends an empty content body frame* works fine confused me.

So for a blank string payload, the sequence is always

[method frame] [content header frame]

not

[method frame] [content header frame] [content body frame with size = 0]

correct?

Matthias Radestock

unread,
Jul 21, 2011, 6:21:35 AM7/21/11
to Michael Klishin, rabbitmq...@lists.rabbitmq.com
On 21/07/11 11:15, Michael Klishin wrote:
> So for a blank string payload, the sequence is always
>
> [method frame] [content header frame]
>
> not
>
> [method frame] [content header frame] [content body frame with size = 0]
>
> correct?

Correct.

Michael Klishin

unread,
Jul 21, 2011, 7:28:18 AM7/21/11
to Matthias Radestock, rabbitmq...@lists.rabbitmq.com


2011/7/21 Matthias Radestock <matt...@rabbitmq.com>

On 21/07/11 11:15, Michael Klishin wrote:
So for a blank string payload, the sequence is always

[method frame] [content header frame]

not

[method frame] [content header frame] [content body frame with size = 0]

correct?

Correct.

Thanks Matthias. This was a one-line fix and it makes multiple issues that I has been investigating for some time go away.
New amq-protocol gem release is on its way to rubygems.org.

I am going to send you pcap dumps that demonstrate the aforementioned rabbit behavior with respect to # of IP packets
later today or tomorrow.

Tony Garnock-Jones

unread,
Jul 21, 2011, 11:23:27 AM7/21/11
to Michael Klishin, rabbitmq...@lists.rabbitmq.com
On 21 July 2011 06:15, Michael Klishin <michael....@gmail.com> wrote:
Ah, yes, I read this line in the spec but then the fact that another Ruby client that *also sends an empty content body frame* works fine confused me.

I could have sworn I made a pull-request with this commit https://github.com/tonyg/amqp/commit/d92a455dcc6763a4495f512f0d74f667354e5a60 some time ago, but I can't for the life of me find it now. Just FTR. It might be worth comparing with the solution you've adopted.

Regards,
  Tony

Michael Klishin

unread,
Jul 21, 2011, 2:08:04 PM7/21/11
to Tony Garnock-Jones, rabbitmq...@lists.rabbitmq.com
2011/7/21 Tony Garnock-Jones <tonygarnockj...@gmail.com>

I could have sworn I made a pull-request with this commit https://github.com/tonyg/amqp/commit/d92a455dcc6763a4495f512f0d74f667354e5a60 some time ago, but I can't for the life of me find it now. Just FTR. It might be worth comparing with the solution you've adopted.

Tony,

We discovered a separate (although similar) issue: framing on publishing. From what I see, your patch was for content boundary detection when frames are received.

But thanks for the example, I will try to do some thorough amq-protocol gem code review and compare it to a few other clients like the Java client and Pika.
Reply all
Reply to author
Forward
0 new messages