frame_too_large error

1,533 views
Skip to first unread message

Dmitry Andrianov

unread,
Jul 5, 2017, 7:37:52 PM7/5/17
to rabbitmq-users
Hi, guys.
I am reading this http://john.eckersberg.com/debugging-rabbitmq-frame_too_large-error.html and being curious - what could produce same errors on our load test cluster if the only client we are using is RabbitMQ's java one?
Is it something well known that Java client can do that? was it fixed already?
I believe the last load test run that produced some of these errors was using java client 3.6.5 and it ran against 3.5.6 version of broker.

Cheers

PS: by "same errors" i mean not just random frame_too_large ones but exactly the ones as in the link - with type 65, channel id 19793 etc. So definitely the same "AMQP header instead of a frame" problem.

Michael Klishin

unread,
Jul 5, 2017, 7:59:22 PM7/5/17
to rabbitm...@googlegroups.com
Concurrent operations (such as publishing) on a shared channel which results in incorrect framing or frame interleaving. Naive or incorrect connection pooling mentioned in the post is one possible scenario but not the root cause per se.

Ultimately only a traffic capture can really tell.
Newest versions of the Java client can be used with RabbitMQ 3.5.6.

hivehome.com



Hive | London | Cambridge | Houston | Toronto
The information contained in or attached to this email is confidential and intended only for the use of the individual(s) to which it is addressed. It may contain information which is confidential and/or covered by legal professional or other privilege. The views expressed in this email are not necessarily the views of Centrica plc, and the company, its directors, officers or employees make no representation or accept any liability for their accuracy or completeness unless expressly stated to the contrary. 
Centrica Connected Home Limited (company no: 5782908), registered in England and Wales with its registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ

Dmitry Andrianov

unread,
Jul 5, 2017, 8:02:06 PM7/5/17
to rabbitmq-users
I googled a bit more and found this: https://github.com/lemenkov/rabbitmq-server/commit/a87c2616c8394294d10b143a6871627acfedb837
It says basically that a load balancer can do that kind of weirdness and multiplex two client connections into one.

We did use an Amazon ELB in the test but it was a SSL connection and SSL termination was done by RabbitMQ and not a load balancer.
Which IMHO makes that scenario of multiplexig into a single connection highly unlikely - load balancer sees encrypted stream and if it forwarded any part of it into another connection, RabbitMQ won't be able to read AMQP header from it.

Cheers

Dmitry Andrianov

unread,
Jul 6, 2017, 4:02:11 AM7/6/17
to rabbitmq-users
Michael,
I am almost certain that there is no concurrent publishing on that channel as just a single thread does that (need to confirm).
However it can be that the same channel is used to both consume and publish.
So consumption is done by some other thread in the client library I assume? If there some ACK-ing involved during consumption that can cause this?

In other words, is there a requirement to have separate channels for consumption and publishing?

Thanks

Michael Klishin

unread,
Jul 6, 2017, 5:21:12 AM7/6/17
to rabbitm...@googlegroups.com
In the scenario described in Peter's commit it does not matter whether you use TLS or not.
If a load balancer and RabbitMQ node do not agree on the state of a connection/port
(which is very rare but possible with unlucky timing), RabbitMQ parser will get confused
since it does not expect a protocol handshake sequence on an already open and negotiated
connection.

That patch has been present in the mainline for months as far as I can tell:

hivehome.com



Hive | London | Cambridge | Houston | Toronto
The information contained in or attached to this email is confidential and intended only for the use of the individual(s) to which it is addressed. It may contain information which is confidential and/or covered by legal professional or other privilege. The views expressed in this email are not necessarily the views of Centrica plc, and the company, its directors, officers or employees make no representation or accept any liability for their accuracy or completeness unless expressly stated to the contrary. 
Centrica Connected Home Limited (company no: 5782908), registered in England and Wales with its registered office at Millstream, Maidenhead Road, Windsor, Berkshire SL4 5GD.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Carl Hörberg

unread,
Jul 6, 2017, 6:17:00 AM7/6/17
to rabbitmq-users
Headers and arguments can't be sliced/framed, so if you have headers larger than the frame_max size than you can get frame_too_large even if the client is able to slice/frame the body.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 6, 2017, 8:26:36 AM7/6/17
to rabbitm...@googlegroups.com
That's an interesting scenario, thanks Carl.

I don't think I've seen more than 128 kB worth of headers :)

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dmitry Andrianov

unread,
Jul 6, 2017, 8:34:10 AM7/6/17
to rabbitmq-users
I am not sure how this can be possible with TLS...
If load balancer does not do TLS termination (that is it does not decode traffic), then the only thing it can do is to send encrypted TLS data from client connection#2 into an existing already negotiated/established connection to the broker, right?
But because load balancer only sees encrypted data eve if it sends it over broker connection which already passed TLS handshake phase - that data is just going to be rejected by TLS layer on RabbitMQ side because it does not belong to the first connection. So it just cannot lead to RabbitMQ seeing properly decoded traffic there so it cannot get 'AMQP' header from the second connection...
Am I missing something obvious?
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Carl Hörberg

unread,
Jul 6, 2017, 9:13:33 AM7/6/17
to rabbitm...@googlegroups.com
We have ;) Also that the client negotiate a smaller frame max (eg. Node's amqplib), to say 4096, and then the user adds massive headers. 

You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/u4AZ2t9enu0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.

Michael Klishin

unread,
Jul 6, 2017, 10:29:53 AM7/6/17
to rabbitm...@googlegroups.com
It's true that with TLS and its session-specific key it is less likely that a load balancer would end somehow
interleaving two client connections. But it's not impossible.

I'm afraid any further guesses without a traffic capture that demonstrates this scenario will lead
to a lot of time wasted for everyone.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dmitry Andrianov

unread,
Jul 7, 2017, 7:11:04 AM7/7/17
to rabbitmq-users
I understand you need a packet capture, Michael. I need to create isolated reproducible case for that first.
But before I do that - I wanted to clarify if consuming and publishing on the same channel is ok or two separate channel should be used for that. I do not remember anything in the docs saying we must use two but my gut feeling tells me that there can be some data send back even when you are consuming (like acks) but I am not sure if rabbitmq-client already handles that properly or we should be separating.

(If two channel MUST be used, then we probably do not even need a test case at all because our test uses single channel for that so we know it is the problem).

Thanks

Michael Klishin

unread,
Jul 7, 2017, 8:04:36 AM7/7/17
to rabbitm...@googlegroups.com
Consuming with basic.consume and publishing on the same channel is fine in general.
Whether you may run into concurrency hazards when consumer and publisher use different threads
really depends on the client and the app.

Consuming and publishing on different *connections* has a benefit: should your publisher
be blocked, consumer connection won't be affected in any way (including the processing of acks/nacks).

By far the most common scenario which results in incorrect on-the-wire framing is when a channel is shared between
threads for publishing. Each message is published using at least 2 frames:

[basic.publish method] [content metadata] [body chunk]*

therefore, it is fairly easy to end up with this kind of framing with concurrent publishers:

[basic.publish method A] [content metadata A] [basic.publish method B] [content metadataB] [body chunk B] [body chunk A]

or something like that.

In theory it can happen with a basic.ack being sent concurrently:

[basic.publish method] [content metadata] [basic.ack] [body chunk A]

If you see this issue reappearing frequently, try using separate channels or adding explicit synchronisation for those operations,
run with that for a week and compare the outcome.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Dmitry Andrianov

unread,
Oct 17, 2017, 5:54:39 AM10/17/17
to rabbitmq-users
Hello. Reviving this old thread.

So I managed to more or less reliably reproduce it and take traffic with tcpdump+ssldump.
To my surprise, ssldump clearly shows that there is a header sent twice in a connection from the client.
Even more surprisingly, both are in the same SSL packet. So must really be sent in a quick succession or something like that.

...
1 4  0.6374 (0.0000)  S>C  Handshake
      CertificateRequest
        certificate_types                   rsa_sign
        certificate_authority
          ...
1 5  0.6374 (0.0000)  S>C  Handshake
      ServerHelloDone
1 6  0.7405 (0.1030)  C>S  Handshake
      Certificate
      ClientKeyExchange
1 7  0.7592 (0.0186)  C>S  Handshake
      CertificateVerify
        Signature[128]=
          ...
1 8  0.7592 (0.0000)  C>S  ChangeCipherSpec
1 9  0.7592 (0.0000)  C>S  Handshake
      Finished
1 10 0.7603 (0.0011)  S>C  ChangeCipherSpec
1 11 0.7603 (0.0000)  S>C  Handshake
      Finished
1 12 0.8784 (0.1180)  C>S  application_data
    ---------------------------------------------------------------
    41 4d 51 50 00 00 09 01 41 4d 51 50 00 00 09 01    AMQP....AMQP....
    ---------------------------------------------------------------
1 13 0.8784 (0.0000)  C>S  Alert
    level           warning
    value           close_notify
1    0.8784 (0.0000)  C>S  TCP FIN
1 14 0.8878 (0.0094)  S>C  application_data
    ---------------------------------------------------------------
    01                                                 .
    ---------------------------------------------------------------
1 15 0.8878 (0.0000)  S>C  application_data
    ---------------------------------------------------------------
    00 00 00 00 01 f0 00 0a 00 0a 00 09 00 00 01 cb    ................
    0c 63 61 70 61 62 69 6c 69 74 69 65 73 46 00 00    .capabilitiesF..
    00 c7 12 70 75 62 6c 69 73 68 65 72 5f 63 6f 6e    ...publisher_con
    66 69 72 6d 73 74 01 1a 65 78 63 68 61 6e 67 65    firmst..exchange
...

We can reproduce it with a scale test when trying to concurrently establish many connections from the load test generator. Both broker and generator are under a significant load at that moment.
Note that client closes connection immediately after sending the header. It also logs it cannot read from socket at that time.

The client is rabbitmq-java-client 3.6.5. I went back and forth in the client code, and I really cannot understand how it is even remotely possible to send that header twice.
The only thing I can think of is a race condition when closing / flushing FrameHandler - it will call flush() & close() on the underlying BufferedOutputStream without proper synchronisation.
So in theory, BufferedOutputStream can write the header collected so far in the buffer into the socket twice.
While it seems to be a bit far-fetched, still, out of curiosity - what is the reason for SocketFrameHandler.flush() to not synchronise on the stream as writeFrame or sendHeader does?

Any ideas where to look or what else to look for?

Cheers

Arnaud Cogoluègnes

unread,
Oct 17, 2017, 8:15:54 AM10/17/17
to rabbitm...@googlegroups.com
By taking a look at the code I come to the same conclusion as you. Regarding the non-synchronization on SocketFrameHandler#flush(), maybe because BufferedOutputStream#flush() is already synchronized (I didn't write the code originally). I'm not sure this would change much anyway. Handling the need for a flush directly in SocketFrameHandler could avoid the double call to flush, but the implementation may be error-prone and the source of other problems.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

Dmitry Andrianov

unread,
Oct 20, 2017, 6:38:22 AM10/20/17
to rabbitmq-users
Ah, you are right, BufferedOutputStream#flush() is already synchronized. Missed that.
Then I do not have any ideas at all...
Reply all
Reply to author
Forward
0 new messages