I am investigating a rare (but nonetheless top priority) amqp gem issue when frames are delivered out of order. Neither the Tracer tool nor Wireshark give me good understanding of what frame exactly RabbitMQ chokes on, it just says "expected a method frame, got a non-method frame instead".
So I am trying to patch RabbitMQ to get some more information. What's the right way to do it? Right now I am trying to make rabbit_command_assembler:process/2 do what I want and I am not sure how to get a string representation of
1. Method that was delivered out of order. 2. # of the channel on which it happened 3. Ideally, 10-20 most recent frames
Can I ask someone from the RabbitMQ team help me a little bit? Where should I be looking? Is it a good idea to make Rabbit log such issues with more detail (it shouldn't affect hot code path performance anyway)?
At this point I have an app that reproduces the issue roughly every other time and I am using a Rabbit instance compiled from HEAD sources. -- MK
> I am investigating a rare (but nonetheless top priority) amqp gem issue > when frames are delivered out of order. Neither the Tracer tool nor > Wireshark give me > good understanding of what frame exactly RabbitMQ chokes on, it just > says "expected a method frame, got a non-method frame instead".
That means the server got a content header or body frame, which in case of client->server communication can only be as part of a publish. The most likely cause is the client sending more content body frames than it should, e.g. a trailing empty frame.
It should be possible to spot that from the tracer / wireshark.
> Is it a good idea to make Rabbit log such issues with more detail (it > shouldn't affect hot code path performance anyway)?
We wouldn't want to log complete frame contents since that would introduce a major DoS attack vector. Having said that, the error that is logged (and indeed returned to the client) should include some more info than it currently does. I have filed a bug for that.
> That means the server got a content header or body frame, which in case of > client->server communication can only be as part of a publish. The most > likely cause is the client sending more content body frames than it should, > e.g. a trailing empty frame.
> It should be possible to spot that from the tracer / wireshark.
Yes, it is purely a publishing issue and the idea of extra frames has crossed my mind. Wireshark doesn't prove it though.
> Is it a good idea to make Rabbit log such issues with more detail (it >> shouldn't affect hot code path performance anyway)?
> We wouldn't want to log complete frame contents since that would introduce > a major DoS attack vector. Having said that, the error that is logged (and > indeed returned to the client) should include some more info than it > currently does. I have filed a bug for that.
I still need to find a way to patch my local rabbit copy to get more information. What's a good way to keep a few recent frames around? Is it possible to implement as a plugin?
> That means the server got a content header or body frame, which in > case of client->server communication can only be as part of a > publish. The most likely cause is the client sending more content > body frames than it should, e.g. a trailing empty frame.
> It should be possible to spot that from the tracer / wireshark.
> Yes, it is purely a publishing issue and the idea of extra frames has > crossed my mind. Wireshark doesn't prove it though.
Wireshark should have all the information you need. It probably won't call this issue out as an error though. Ditto for the tracer.
> I still need to find a way to patch my local rabbit copy to get more > information. What's a good way to keep a few recent frames around?
Given that you can reproduce the problem easily, you may just want to log some details of all the frames - frame type, class id, method id, and for content headers the expected body size, and for content bodies the payload size.
> Is it possible to implement as a plugin?
No. Well, technically yes, but you are much better off just modifying the code in rabbit_command_assembler.
> Given that you can reproduce the problem easily, you may just want to log > some details of all the frames - frame type, class id, method id, and for > content headers the expected body size, and for content bodies the payload > size.
That's what I am doing now. One thing that has caught my attention is a 0-sized BodyFrame. that did not cause any exceptions to be raised (for good 10 seconds or so, anyway) but there is absolutely no way I am publishing empty payloads. So you are likely to be right. We also have a long-stanging issue (not 0.8.x specific though) with empty messages.
I will try to investigate more on the client side and take a look at the Java client framing code. Thank you for your help. I did not suspect extra frames to be an issue initially.
> Given that you can reproduce the problem easily, you may just want to log > some details of all the frames - frame type, class id, method id, and for > content headers the expected body size, and for content bodies the payload > size.
So far I narrowed it down to the "empty messages" problem: whenever there is an empty message sent by amqp gem master version, RabbitMQ doesn't consider it a complete message and thus clients do not get anything. With a simple script that terminates as soon as the message is received (by the same process), this looks like app is hanging. In the context of a larger application, sometimes there are empty messages transmitted (I am streaming a stdout output).
After comparing 2 clients I see the following picture: the data transmitted is exactly the same, frame order is correct but Wireshark GUI suggests that one client transmits 3 frames as 3 IP packets. In this case, RabbitMQ is happy. The other transmits them all in a single IP packet and RabbitMQ doesn't seem to recognize it as a complete message.
> So far I narrowed it down to the "empty messages" problem: whenever > there is an empty message sent by amqp gem master version, RabbitMQ > doesn't consider it a complete message and thus clients do not get > anything. With a simple script that terminates as soon as the message is > received > (by the same process), this looks like app is hanging. In the context of > a larger application, sometimes there are empty messages transmitted (I > am streaming a stdout output).
> After comparing 2 clients I see the following picture: the data > transmitted is exactly the same, frame order is correct but Wireshark > GUI suggests that > one client transmits 3 frames as 3 IP packets. In this case, RabbitMQ is > happy. The other transmits them all in a single IP packet and RabbitMQ > doesn't seem to recognize it as a complete message.
> If the [frames] content is the same, what may be going on here?
That's curious. Both should cause rabbit to complain with the "expected method frame, got non method frame instead" error. Are you sure you are seeing different behaviour here? If so, please send through the pcap files showing the entire tcp session.
Content body frames should never be zero length. If the content length is zero then no content body frame should be sent at all, just a header.
> That's curious. Both should cause rabbit to complain with the "expected > method frame, got non method frame instead" error. Are you sure you are > seeing different behaviour here? If so, please send through the pcap files > showing the entire tcp session.
Yes, I am very much sure. I have spent hours investigating this thing and ran it probably hundreds of times. Will do.
> Content body frames should never be zero length. If the content length is > zero then no content body frame should be sent at all, just a header.
Ah, yes, I read this line in the spec but then the fact that another Ruby client that *also sends an empty content body frame* works fine confused me.
So for a blank string payload, the sequence is always
[method frame] [content header frame]
not
[method frame] [content header frame] [content body frame with size = 0]
>> So for a blank string payload, the sequence is always
>> [method frame] [content header frame]
>> not
>> [method frame] [content header frame] [content body frame with size = 0]
>> correct?
> Correct.
Thanks Matthias. This was a one-line fix and it makes multiple issues that I has been investigating for some time go away. New amq-protocol gem release is on its way to rubygems.org.
I am going to send you pcap dumps that demonstrate the aforementioned rabbit behavior with respect to # of IP packets later today or tomorrow. -- MK
On 21 July 2011 06:15, Michael Klishin <michael.s.klis...@gmail.com> wrote:
> Ah, yes, I read this line in the spec but then the fact that another Ruby > client that *also sends an empty content body frame* works fine confused me.
We discovered a separate (although similar) issue: framing on publishing. From what I see, your patch was for content boundary detection when frames are received.
But thanks for the example, I will try to do some thorough amq-protocol gem code review and compare it to a few other clients like the Java client and Pika. -- MK