RabbitMQ MQTT - queue.declare/queue.bind timeout at high connection load

460 views
Skip to first unread message

Narendra Sharma

unread,
Jun 28, 2019, 4:23:33 PM6/28/19
to rabbitmq-users
RabbitMQ: 3.7.15

I am seeing following error under high load. The log indicate that the reason for termination is timeout on queue.declare. The usecase is that large number of MQTT clients (100K+) are establishing connections, subscribing to topics and publishing messages. 

** When Server state == {state,#Port<0.212346>,"CLIENTIP:24947 -> SERVERIP:1883",true,undefined,true,running,{none,<0.10535.293>},<0.10495.293>,false,none,{proc_state,#Port<0.212346>,#{},{u       ndefined,undefined},{0,nil},{0,nil},undefined,1,"MQTTCLIENTID",true,undefined,{<0.10463.293>,undefined},<0.10412.293>,<<"amq.topic">>,{amqp_adapter_info,{0,0,0,0,0,65535,2707,18272       },1883,{0,0,0,0,0,65535,2707,18250},24947,<<"CLIENTIP:24947 -> SERVERIP:1883">>,{'MQTT',"N/A"},[{variable_map,#{<<"client_id">> => <<"MQTTCLIENTID">>}},{channels,1},{channe       l_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"MQTTCLIENTID">>}]},{ssl,false}]},none,<0.1201.0>,{auth_state,<<"MQTTUSER">>,{user,<<"MQTTUSER">>,[],[{rabbit_auth_backend_internal,none}]},<<"/">>},#Fun<rabbit_mqtt_processor.0.130296119>,{0,0,0,0,0,65535,2707,18250}},<0.10412.293>,{state,fine,5000,#Re       f<0.1316824539.369623055.41303>}}
** Reason for termination ==
 ** {timeout,{gen_server,call,[<0.10463.293>,{call,{'queue.declare',0,<<"mqtt-subscription-MQTTCLIENTIDqos0">>,false,false,false,true,false,[]},none,<0.10500.293>},60000]}}
2019-06-28 06:54:44.029 [error] <0.10500.293> CRASH REPORT Process <0.10500.293> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.10463.293>,{call,{'queue.declare',0,<<"mqtt-subscription-MQTTCLIENTIDqos0">>,false,false,false,true,false,[]},none,<0.10500.293>},60000]}} in gen_server2:terminate/3 line 1172
2019-06-28 06:54:44.029 [error] <0.10470.293> Supervisor {<0.10470.293>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.10495.293>, {acceptor,       {0,0,0,0,0,0,0,0},1883}) at <0.10500.293> exit with reason {timeout,{gen_server,call,[<0.10463.293>,{call,{'queue.declare',0,<<"mqtt-subscription-MQTTCLIENTIDqos0">>,false,false,false,true,false,[]},none,<0.10500.293>},60000]}} in context child_terminated
2019-06-28 06:54:44.029 [error] <0.10470.293> Supervisor {<0.10470.293>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.10495.293>, {acceptor,       {0,0,0,0,0,0,0,0},1883}) at <0.10500.293> exit with reason reached_max_restart_intensity in context shutdown

What could be the possible reason(s) for the above error?

I also see frequently following errors but not at the same time.
WARNING ** Mnesia is overloaded: {dump_log,write_threshold}

Could load on Mnesia be the cause of above error?

Listing plugins with pattern ".*" ...
 Configured: E = explicitly enabled; e = implicitly enabled
 | Status: * = running on rabbit@ip
 |/
[  ] rabbitmq_amqp1_0                  3.7.15
[  ] rabbitmq_auth_backend_cache       3.7.15
[  ] rabbitmq_auth_backend_http        3.7.15
[  ] rabbitmq_auth_backend_ldap        3.7.15
[  ] rabbitmq_auth_mechanism_ssl       3.7.15
[  ] rabbitmq_consistent_hash_exchange 3.7.15
[  ] rabbitmq_event_exchange           3.7.15
[  ] rabbitmq_federation               3.7.15
[  ] rabbitmq_federation_management    3.7.15
[  ] rabbitmq_jms_topic_exchange       3.7.15
[E*] rabbitmq_management               3.7.15
[e*] rabbitmq_management_agent         3.7.15
[E*] rabbitmq_mqtt                     3.7.15
[  ] rabbitmq_peer_discovery_aws       3.7.15
[  ] rabbitmq_peer_discovery_common    3.7.15
[  ] rabbitmq_peer_discovery_consul    3.7.15
[  ] rabbitmq_peer_discovery_etcd      3.7.15
[  ] rabbitmq_peer_discovery_k8s       3.7.15
[  ] rabbitmq_random_exchange          3.7.15
[  ] rabbitmq_recent_history_exchange  3.7.15
[  ] rabbitmq_sharding                 3.7.15
[  ] rabbitmq_shovel                   3.7.15
[  ] rabbitmq_shovel_management        3.7.15
[  ] rabbitmq_stomp                    3.7.15
[  ] rabbitmq_top                      3.7.15
[  ] rabbitmq_tracing                  3.7.15
[  ] rabbitmq_trust_store              3.7.15
[e*] rabbitmq_web_dispatch             3.7.15
[  ] rabbitmq_web_mqtt                 3.7.15
[  ] rabbitmq_web_mqtt_examples        3.7.15
[  ] rabbitmq_web_stomp                3.7.15
[  ] rabbitmq_web_stomp_examples       3.7.15


Michael Klishin

unread,
Jun 30, 2019, 5:36:37 PM6/30/19
to rabbitmq-users
Most likely the reason is a high rate of concurrent schema operations, which are caused by high connection/queue/binding churn.

Every queue declaration or binding modification is a cluster-wide transaction. This is one scenario where using a subset (< than quorum) of RAM nodes
instead of disk ones makes a difference. I can't find a document with some benchmarks right now but using fewer nodes or making more of them RAM ones
does help in such cases.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/4557a460-25fa-4173-8ea8-c2dd9f44b8f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Narendra Sharma

unread,
Jul 1, 2019, 3:15:23 AM7/1/19
to rabbitmq-users

I am not sure how RAM node will help. If the transaction is cluster-wide then even a single slow node could cause timeout. Does RabbitMQ wait for all nodes to respond or only wait for quorum? If it wait for quorum then definitely RAM node will help.

I am planning to repeat the test on smaller cluster with bigger node (more cpu and ram).

I will appreciate if could locate and share the benchmark document. What is the largest RabbitMQ cluster (in terms of nodes) you have come across so far?


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/4557a460-25fa-4173-8ea8-c2dd9f44b8f4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 1, 2019, 4:21:10 AM7/1/19
to rabbitmq-users
RAM nodes do help according to our benchmarks and prior discussions of this scenario (which go back several years).
It is true that a single node slowdown can have an effect. Using RAM nodes makes them less likely to be contended on disk
access and having fewer nodes further reduces both latency and slowdown probability.

Alternatively you can move to faster disks. I don't remember if we had numbers for this specific scenario (schema churn) with SSDs
but the effect should be positive as well.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Narendra Sharma

unread,
Jul 2, 2019, 4:22:58 PM7/2/19
to rabbitmq-users
I repeated the test with a small cluster size with bigger nodes (vertical scale), and was able to complete the test successfully.


Michael Klishin

unread,
Jul 4, 2019, 6:05:22 AM7/4/19
to rabbitmq-users
Thank you for reporting back to the list.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Woon Yung Liu

unread,
Aug 31, 2021, 11:58:25 PM8/31/21
to rabbitmq-users
Hi Michael,

Apologies for reviving this old thread, but I am experiencing something similar under heavy load, but with the modern-day RabbitMQ (3.8.21, with Erlang 24.0.5). I don't really know what to look for, which is why I am writing a reply here.

When 32K connections attempt to make subscriptions (1 subscription per connection) via MQTT to our 3-node RabbitMQ cluster, I get a similar error as the OP. Connections are made though a load-balancer, which tries to balance the connections to the 3 nodes.
It happens after a few minutes have passed since the 32K connections start to make subscriptions, while the queues are in the process of getting created.

The nodes are clustered with the K8S discovery plugin, without any form of HA configured. The 3 nodes are in t3.xlarge AWS instances, which give them identical specifications. Mnesia is stored on AWS EFS. I can't change the architecture for this experiment. My team's goal is to understand the limitations of RabbitMQ within this cluster and how we may need to perform scaling/change of its design in the future.

However, I can't see what the bottleneck here is. If I look at the output from rabbitmq-diagnostics runtime_thread_stats, the only visible load is to the schedulers (only 60% loaded). According to the output from sar -u, there is surprisingly little CPU time spent in iowait% (nearly everything is in user%). According to both, CPU utilization is not max'ed out throughout the subscription process.
I previously adjusted RabbitMQ's Erlang VM properties to improve CPU utilization during message publishing, by increasing the number of dirty I/O (from 10 to 32) and regular schedulers (from 8 to 16).

I've also tried converting 2/3 of the nodes into RAM nodes, but 32K concurrent subscriptions still take time to complete and eventually fail with the same error.
So I would like to ask: from your experience, why does using 2/3 RAM nodes not seem to help? For such an action like MQTT subscription, what sort of resource(s) would likely affect its speed and/or ability to complete in a timely manner? I guess it might be related to the latency of the storage for the Mnesia database, but would this understanding be correct?

On the other hand, I think it's clearer that the performance bottleneck for publishing at QoS1 within this cluster is the storage for Mnesia, as 1/3 nodes will have very high iowait% and publishing performance reaches a plateau at roughly 1000-1400 messages/s. It's just not visibly the case during subscriptions, for some reason.

Appreciate whatever help you can give.
Thanks for your time.

Best Regards,
Woon Yung

Michal Kuratczyk

unread,
Sep 3, 2021, 8:32:26 AM9/3/21
to rabbitm...@googlegroups.com
Hi,

We hope to improve how RabbitMQ handles many MQTT connections in the future but 32K should be achievable now (I ran some tests with 80K and things were not crashing). There are some things that are intrinsic to the current implementation but there are also some that you might be able to control. Specifically you can play with these parameters:

# disable Management stats and use Prometheus instead
management.disable_stats = true
management_agent.disable_metrics_collector = true
# use less memory per connection
tcp_listen_options.sndbuf  = 8192
tcp_listen_options.recbuf  = 8192
mqtt.tcp_listen_options.sndbuf  = 8192
mqtt.tcp_listen_options.recbuf  = 8192
mqtt.tcp_listen_options.buffer  = 8192

Increase the number of erlang processes (seems like you haven't hit that issue but it's worth keeping in mind that you can):
SERVER_ADDITIONAL_ERL_ARGS="+P134217727"

Resources that play a role (I'm sure there is more):
1. CPU and Mnesia as you are creating 32K queues
2. RAM for 32K queues and connections (currently each MQTT connection requires 16 erlang processes - this is a key part of what we want to change in the future)
3. Network for Mnesia and Connection tracking - RabbitMQ uses Ra internally to keep track of the connections. Since MQTT requires that any previous connections for a given ID are closed immediately if a new connection with the same ID is established, we need to keep synchronized connection information in the cluster - this is another part that can likely be optimized
4. disk for all these operations (writes to mnesia, creation of message stores, Ra data, etc)

Best,



--
Michał
RabbitMQ team

Woon Yung Liu

unread,
Sep 5, 2021, 10:00:49 PM9/5/21
to rabbitmq-users
Hi Michal,

Thank you for sharing. I have tried to adjust the options that you mentioned, and there might have been a slight improvement. Unfortunately, it does not seem to work significantly faster (it is just more consistent with completing 32K requests without any timeouts). With the statistics collection functionality of the management agent disabled, the reductions incurred by the metric-collection processes are removed.
I have applied the parameters you suggested, with exception of the Erlang +P option.

I've observed that approximately 3000 queues and consumers are created every 10s. The default timeout seems to be 130000ms (130s). It seems to take about 10s before consumers start getting created, leaving 120s for all consumers to be created. At 3000 consumers every 10s, only up to about 3000*12=36K requests can be entertained before some operation times out.
I presume this means that you certainly got better numbers? May I know how many nodes and what sort of computer(s) you used, to support 80K requests?

I've also been wondering about the SNDBUF and RCVBUF options: the MQTT plugin translates MQTT operations into AMQP operations. The plugin interfaces directly with RabbitMQ, but do the AMQP TCP options (e.g. tcp_listen_options.sndbuf , tcp_listen_options.rcvbuf) still play a part? If so, does this mean that each MQTT connection will have 4 buffers (linked to tcp_listen_options.sndbuf, tcp_listen_options.rcvbuf,  mqtt.tcp_listen_options.sndbuf  and  mqtt.tcp_listen_options.sndbuf respectively) allocated?

Thanks again.

Best Regards,
Woon Yung

Michal Kuratczyk

unread,
Sep 6, 2021, 6:21:31 AM9/6/21
to rabbitm...@googlegroups.com
Hi,

tcp_listen_options don't affect the MQTT->AMQP part - MQTT plugin talks directly to the AMQP code (using so-called "direct" AMQP client) - there is no TCP connection for that.

I ran my tests using a 3-node cluster with 8 CPU and 64GB RAM per node (I'm not saying this is the "right" configuration though - it's just what I used).

One more thing that can affect queue creation time is queue location strategy: https://www.rabbitmq.com/ha.html#queue-leader-location.
I'd recommend "random" or "client-local" (if your connections are well distributed). min-masters is pretty expensive as the number of queues increases (I'm literally looking into this right now, hopefully it will get faster, but there is a relatively small practical difference between "min-masters" and "random" in most cases anyway).

Best,




--
Michał

Woon Yung Liu

unread,
Mar 14, 2022, 9:22:24 PM3/14/22
to rabbitmq-users
Hi again,

Thank you for sharing your experience. Particularly at the AMQP side of things, as I am relatively new to it. From the get-go, I've always been using RabbitMQ as a MQTT broker, although I realize that RabbitMQ is an AMQP broker at its heart.

I was busy and did not find the time to write an update here. We moved RabbitMQ out of the Kubernetes cluster and found that performance improved. It seemed to be faster at processing connections than before - thus not timing out when a lot (32K in my case) of new MQTT connections are made.
So we plan to operate RabbitMQ outside of Kubernetes, directly within an EC2 instance.

Thanks for sharing your strategy on mirroring. It is also something that I didn't explore much.
Our project has no requirement for High Availability (HA) and we do not wish to deal with the additional resources for supporting mirroring. So wouldn't running the cluster without queue mirroring result in the best performance, if I do not need HA? I place a load balancer in front of the RabbitMQ nodes, thus the connections from my devices should be balanced across the RabbitMQ nodes.

Once again, thanks in advance.

Best Regards,
Woon Yung
Reply all
Reply to author
Forward
0 new messages