protocol question

66 views
Skip to first unread message

Arnie

unread,
Feb 5, 2019, 8:18:48 PM2/5/19
to rabbitmq-users
As part of the AMQP open connection sequence, the server sends a tune message to the client that includes frame-max, which according to the documentation, if set to 0 "means that the server does not impose any specific limit but may reject very large frames if it cannot allocate resources for them."

In response, the client sends a tune-ok message to the server that includes frame-max, which if set to 0 "means that the client does not impose any specific limit but may reject very large frames if it cannot allocate resources for them."

If the server sends the tune message with frame-max set to 0, and then the client sends back the tune-ok message with frame-max set to 0, is 0 (aka unlimited) now the negotiated value?  Is this valid?

Thanks,
Arnie

Michael Klishin

unread,
Feb 5, 2019, 8:22:07 PM2/5/19
to rabbitm...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Arnie

unread,
Feb 5, 2019, 11:44:40 PM2/5/19
to rabbitmq-users
Thanks for the response.  It seems like several client libraries (pika, node/amqp) artificially limit the frame-max in the tune-ok message, but as you point out the Java client does not, and I can see the configured frame_max as 0 in the connection properties via the management interface on those connections.  However, if I do a basic_consume on a queue over connections that have a 0 frame-max, and then publish a 0 length message (no body) to that queue, the client never receives the message, and the server quickly dies as it exhausts all memory.  I was wondering if the other clients were artificially capping the frame-max because this wasn't a valid configuration, but it sounds like maybe it's just a bug in the server instead?

Thanks,
Arnie

Michael Klishin

unread,
Feb 6, 2019, 12:00:42 AM2/6/19
to rabbitm...@googlegroups.com
They limit it because capped buffer sizes are a standard, sensible thing in protocol design.

We are not aware of any server issues with zero frame max settings but you don’t need infinite buffers.

Michael Klishin

unread,
Feb 6, 2019, 12:04:12 AM2/6/19
to rabbitm...@googlegroups.com
If by node/amqp you mean node-amqp and not amqp.node (amqplib), avoid that client like a plague. It is abandonware that has terrible bugs. It is also known to cause connection storms (which may be what you are experiencing: collect logs and metrics [1] instead of guessing) and the most popular RabbitMQ-as-a-Service hosting provider banned it entirely years ago.

Use the client that our tutorials use. This is always a good starting point.


On 6 Feb 2019, at 07:44, Arnie <arnien...@gmail.com> wrote:

Michael Klishin

unread,
Feb 6, 2019, 12:19:20 AM2/6/19
to rabbitm...@googlegroups.com
Given the claim that there is a RabbitMQ issue with framing I perhaps should explain what the frame max really controls.

Every published message is at least two frames on the wire:

[basic.publish the method][content header][body frame]*

So there is 0 to N body frames, each up to frame_max in size. Since the broker cannot
possibly know actual frame size, it is specified in the frame prefix. If a client miscalculates the value in a way that RabbitMQ can detect, the connection will be closed with a fatal error. Pika will simply indicate an error to the client.

Poorly implemented clients can do just about anything, such as try to reconnect, enter an infinite loop or both. Which obviously has an effect on resource usage of the server
one way or another.

Another potential issue is an overflow or underflow that makes a buggy client put an incorrect frame size. RabbitMQ doesn’t allocate memory directly but the parser is driven by the indicated size, of course, so at some point a supposed frame size of 1 GB will cause about 1 GB to be allocated.

Server logs and a traffic capture will make it very clear what is really going on.

All of this leads to a pretty obvious conclusion: simply use frame size of 128 kB or so,
it is “infinite” for most messages out there anyway (most messages are less than 4 kB in size) and there’s a reasonable limit as to how much damage a buggy client can do in the 2nd scenario above.

The same issue exists with channel_max. Channels consume resources and an app that leaks channels (certainly not unheard of) could previously open up to 65K channels per connection, which is a lot.

Which lead to a default of 2047 (plus one special channel for connection negotiation and error reporting) and even suggested default of 127, plus a way to cap the limit server-wide via configuration. Otherwise the first app that happens to leak channels will potentially affect service availability for every app in the system.

On 6 Feb 2019, at 07:44, Arnie <arnien...@gmail.com> wrote:

Arnie

unread,
Feb 6, 2019, 12:41:40 AM2/6/19
to rabbitmq-users

So what I was noticing was in a client that I had modified to remove the arbitrary frame-max cap (in pika it was 128k).  Using this modified client against a server that specifies a frame-max of 0 showed this strange bug:

1. set server to use frame-max of 0
2. publish a 0 length (no body) message to a queue
3. perform a basic_consume

result: the server would quickly exceed it's memory high water mark and die  Note that if the message had a body, this would not occur, and the message would be delivered just fine with no extraneous memory usage on the server.

Anyway, I had looked at other clients and noticed that some client libraries did cap the frame-max, so I thought perhaps this capping was mandated and my modified client was not allowed to send back a frame-max of 0 on the tune-ok message.  Thus my initial question.  But as you pointed out, 0 is a valid max-frame setting, and the Java client (seemingly the most well supported and proper of the clients) does support this.  So I tried this same simple test case using the Java client and it reproduced the problem I was seeing prior, leading me to believe this is a rabbitmq-server bug.

Thanks,
Arnie

Michael Klishin

unread,
Feb 6, 2019, 12:54:56 AM2/6/19
to rabbitm...@googlegroups.com
If you keep claiming that this is a bug, would you mind sharing an executable way to reproduce or at least a server and Erlang version used, a traffic capture and
some metrics reported by the node [1]?

Unlimited frame max is not something you commonly see used and I don't recall this specific issue being reported in the last 4-5 years
but if it's reproducible we are obviously interested in making the broker more defensive.

Arnie

unread,
Feb 6, 2019, 1:58:55 AM2/6/19
to rabbitmq-users
Here are more concrete repro steps.  I'm seeing this on any recent server, from 3.7.7 to the most recent 3.7.11.

1. Start the most recent rabbit server with only a modified rabbitmq.config.

$ cat > rabbitmq.config <<EOF
[{rabbit, [{frame_max, 0}, {loopback_users, []}]}].
EOF

$ docker run
--rm -it -p 15672:15672 -p 5672:5672 --name rabbit --hostname rabbit -v $(pwd)/rabbitmq.config:/etc/rabbitmq/rabbitmq.config rabbitmq:3.7.11-management

2. browse to http://localhost:15672, with guest/guest, create a queue named test and publish an empty message to it.

3. In another window run a simple test java client to try to get the message from the test queue.  I attached the Test.java file to this message, but it's basically the same as the snippets from the api guide.

$ docker run --rm -it --link rabbit -v $(pwd)/amqp-client-5.6.0.jar:/a.jar -v $(pwd)/slf4j-api-1.7.25.jar:/s.jar -v $(pwd)/Test.java:/Test.java java /bin/bash -c 'javac -cp /a.jar:/s.jar Test.java && java -cp /:/a.jar:/s.jar Test'

Result:  The logs on the rabbit server console will show the high water mark being hit within 2s of the client connecting, before quickly dying shortly after.

Thanks,
Arnie
Test.java

Arnaud Cogoluègnes

unread,
Feb 7, 2019, 8:56:29 AM2/7/19
to rabbitm...@googlegroups.com
Thanks for providing steps to reproduce. We filled in an issue [1] and
a fix is available [2]. Did you manage to reproduce by sending a
message from something else than the management UI? I couldn't. It
seems an empty message from the management UI is packed up in a
specific way (empty binary in Erlang terms) and this would result in
an infinite loop of frame creation. The fix handles this case now.

[1] https://github.com/rabbitmq/rabbitmq-common/issues/299
[2] https://github.com/rabbitmq/rabbitmq-common/pull/300

Michael Klishin

unread,
Feb 7, 2019, 9:51:32 AM2/7/19
to rabbitm...@googlegroups.com
Each iteration in that recursion loop seems to have allocated a binary chunk copy or similar.

On Thu, Feb 7, 2019 at 4:56 PM Arnaud Cogoluègnes <acogol...@pivotal.io> wrote:
Thanks for providing steps to reproduce. We filled in an issue [1] and
a fix is available [2]. Did you manage to reproduce by sending a
message from something else than the management UI? I couldn't. It
seems an empty message from the management UI is packed up in a
specific way (empty binary in Erlang terms) and this would result in
an infinite loop of frame creation. The fix handles this case now.

[1] https://github.com/rabbitmq/rabbitmq-common/issues/299
[2] https://github.com/rabbitmq/rabbitmq-common/pull/300

On Wed, Feb 6, 2019 at 7:59 AM Arnie <arnien...@gmail.com> wrote:
>
> Here are more concrete repro steps.  I'm seeing this on any recent server, from 3.7.7 to the most recent 3.7.11.
>
> 1. Start the most recent rabbit server with only a modified rabbitmq.config.
>
> $ cat > rabbitmq.config <<EOF
> [{rabbit, [{frame_max, 0}, {loopback_users, []}]}].
> EOF
>
> $ docker run --rm -it -p 15672:15672 -p 5672:5672 --name rabbit --hostname rabbit -v $(pwd)/rabbitmq.config:/etc/rabbitmq/rabbitmq.config rabbitmq:3.7.11-management
>

>
> 3. In another window run a simple test java client to try to get the message from the test queue.  I attached the Test.java file to this message, but it's basically the same as the snippets from the api guide.
>
> $ docker run --rm -it --link rabbit -v $(pwd)/amqp-client-5.6.0.jar:/a.jar -v $(pwd)/slf4j-api-1.7.25.jar:/s.jar -v $(pwd)/Test.java:/Test.java java /bin/bash -c 'javac -cp /a.jar:/s.jar Test.java && java -cp /:/a.jar:/s.jar Test'
>
> Result:  The logs on the rabbit server console will show the high water mark being hit within 2s of the client connecting, before quickly dying shortly after.
>
> Thanks,
> Arnie
>
> --
> You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
> To post to this group, send email to rabbitm...@googlegroups.com.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Michael Klishin

unread,
Feb 7, 2019, 1:31:12 PM2/7/19
to rabbitm...@googlegroups.com
A fix has been merged and will be available in an alpha build later today (we will post a link) and 3.7.12-rc.2.
Reply all
Reply to author
Forward
0 new messages