Recovery RabbitMQ after restart

356 views
Skip to first unread message

Sergey Grigorev

unread,
May 26, 2020, 1:10:25 PM5/26/20
to rabbitmq-users
Hello everyone!

I have a 3-node RabbitMQ cluster that is running in Kubernetes. There are also 3000 MQTT connections that periodically send messages to RabbitMQ. These connections are initiated by mosquitto instances. Each connection creates 2 queues, so there are 6000 queues. Everything works well until I restart any RabbitMQ container.
In normal state RabbitMQ uses about 70% CPU and 1Gb RAM. If I restart one RabbitMQ node then 2 other instances start to use a lot more CPU and memory. And after 10-30 minutes RabbitMQ stops answer on health checks (rabbitmqctl status) and it leads to kubernetes restart the containers. Removing health checks doesn't solve the problem.
I've performed tests with ha queues policy and without any policies, the result is the same. For me it looks like after restart RabbitMQ cannot stabilize its state.

I see a lot of such errors in the log on the restarted node after it gets online back:
2020-05-26 09:24:46.450 [error] emulator Discarding message {'$gen_call',{<0.7665.0>,#Ref<0.480670917.2464677890.243959>},stat} from <0.7665.0> to <0.20346.24> in an old incarnation (1) of this node (2)


And I see the following errors in the logs of the other instances:
2020-05-26 06:28:20.971 [error] <0.1587.0> Channel error on connection <0.1571.0> (10.28.4.7:40624 -> 10.28.4.6:8883, vhost: '/', user: 'user-0000001'), channel 2:
operation queue
.declare caused a channel exception not_found: home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible
2020-05-26 06:28:20.971 [debug] <0.1585.0> Supervisor {<0.1585.0>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.1584.0>, <0.1584.0>, <0.1571.0>, <<"10.28.4.7:40624 -> 10.28.4.6:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"user-0000001.">>,[testclients],[{rabbit_auth_backend_http,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.1575.0>, <0.1586.0>, {amqp_params_direct,<<"user-0000001.">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',...}) at pid <0.1587.0>
2020-05-26 06:28:20.972 [debug] <0.1581.0> auth_backend_http: POST http://rmqauth:8000/resource
2020-05-26 06:28:20.976 [debug] <0.1581.0> auth_backend_http: response code is 200, body: "allow"
2020-05-26 06:28:20.976 [error] <0.1581.0> Channel error on connection <0.1571.0> (10.28.4.7:40624 -> 10.28.4.6:8883, vhost: '/', user: 'user-0000001.'), channel 1:
operation queue
.declare caused a channel exception not_found: home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible
2020-05-26 06:28:20.978 [error] <0.1567.0> ** Generic server <0.1567.0> terminating
** Last message in was {ssl,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},<<16,80,0,4,77,81,84,84,4,44,0,60,0,17,48,48,48,48,48,48,49,46,99,108,111,117,100,110,111,100,101,0,46,36,83,89,83,47,98,114,111,107,101,114,47,99,111,110,110,101,99,116,105,111,110,47,48,48,48,48,48,48,49,46,99,108,111,117,100,110,111,100,101,47,115,116,97,116,101,0,1,48>>}
** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},"10.28.4.7:40624 -> 10.28.4.6:8883",true,undefined,false,running,{none,none},<0.1566.0>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"testclientss">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1030},8883,{0,0,0,0,0,65535,2588,1031},40624,<<"10.28.4.7:40624 -> 10.28.4.6:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000001.,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T06:26:39Z - 2034-02-02T06:26:39Z">>}]},<<"user-0000001.">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1031},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,5000,undefined}}
** Reason for termination ==
** {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}}
2020-05-26 06:28:20.978 [error] <0.1567.0> CRASH REPORT Process <0.1567.0> with 0 neighbours exited with reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}} in gen_server2:terminate/3 line 1183
2020-05-26 06:28:20.979 [error] <0.1565.0> Supervisor {<0.1565.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1566.0>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.1567.0> exit with reason {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}} in context child_terminated
2020-05-26 06:28:20.979 [debug] <0.569.0> MQTT connection with client id '0000001.testclients' failed
2020-05-26 06:28:20.979 [error] <0.1565.0> Supervisor {<0.1565.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1566.0>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.1567.0> exit with reason reached_max_restart_intensity in context shutdown
2020-05-26 06:28:20.981 [error] <0.1571.0> ** Generic server <0.1571.0> terminating

and
2020-05-26 16:47:55.818 [debug] <0.653.0> MQTT connection with client id '0001292.testclients' failed
2020-05-26 16:47:55.820 [warning] <0.18137.4> Channel (<0.18137.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.822 [error] <0.18137.4> ** Generic server <0.18137.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17759.4>,<0.18100.4>,direct,{[{{<0.17233.4>,#Ref<0.1213858664.1769996290.220559>},<0.17233.4>,{'queue.bind',0,<<"mqtt-subscription-0002361.testclientsqos1">>,<<"testclients">>,<<"cloud.0002361.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18174.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.827 [error] <0.17101.4> CRASH REPORT Process <0.17101.4> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.18496.4>,{call,{'queue.bind',0,<<"mqtt-subscription-0002120.testclientsqos1">>,<<"testclients">>,<<"cloud.0002120.#">>,false,[]},none,<0.17101.4>},60000]}} in gen_server2:terminate/3 line 1183
2020-05-26 16:47:55.838 [error] <0.18083.4> ** Generic server <0.18083.4> terminating
** Last message in was closing_timeout
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751

and many other errors.

RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.

I will appreciate any help in solving this problem.

Michael Klishin

unread,
May 26, 2020, 3:43:48 PM5/26/20
to rabbitm...@googlegroups.com

 

is one immediately visible smoking gun that is enough to form a hypothesis knowing that we are dealing with MQTT clients.

 

Your clients likely run loops trying to “recover” (we have seen this with MQTT clients all too often) from this exeption by repeating the same operation that fails in the same way, creating a connection leak and/or high connection churn scenario.

 

So what does all of that mean? Let’s see.

 

The message is explained in [1]. Some of the clients use a non-mirrored queue which, when its hosting node goes down, is inaccessible and any client operation on it will fail until the node is back online.

 

MQTT clients then eagerly retry the same thing over and over, likely opening more and more connections to the node [2]

until it runs out of file descriptors. There are obvious metrics you can watch to prove this hypothesis right or wrong [3][4][5]. Once a node is out of file descriptors, it cannot accept CLI tool connections,

new MQTT connections or even write to its local disk. Pretty much nothing can degrade gracefully

or fail predictably under these conditions. There is no recovery other than releasing all connections or node restart, too.

 

The condition is a combination of connection leaks or high connection churn. RabbitMQ Connections guide explains both, including what metrics are available for detection and how (to some extent) they can be mitigated.

 

A durable non-mirrored classic queue works as expected in your test and will not “recover” or “migrate” to another node, it will simply become unavailable until the hosting node come back up.

Perhaps enable mirroring to two out of three nodes.

 

It’s a good question how quorum queues can be exposed to MQTT clients given that MQTT operations cannot be extended with optional headers like, say, STOMP frames can.

 

I cannot comment on your “ha queue” tests since the log message clearly states a condition

where this queue is not mirrored. A node can be run out of file descriptors even with mirrored queues, we really need some metrics to work with here.

 

  1. https://www.rabbitmq.com/ha.html#non-mirrored-queue-behavior-on-node-failure
  2. https://www.rabbitmq.com/connections.html#monitoring
  3. https://www.rabbitmq.com/networking.html#open-file-handle-limit
  4. https://www.rabbitmq.com/monitoring.html
  5. https://www.rabbitmq.com/prometheus.html
  6.  

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/a4a9dec2-ee7a-4af4-86b4-2faaab9dfd13%40googlegroups.com.

Michael Klishin

unread,
May 26, 2020, 3:52:50 PM5/26/20
to rabbitm...@googlegroups.com

Another possible scenario is that nodes use the default open file descriptor limit (1024 on many UNIX-y systems by default). With 3000 clients that’s barely enough to sustain them and a bit of disk I/O

even given an even connection distribution.

 

Once one node is stopped, its clients reconnect to other nodes running them out of file descriptors.

This would also explain a higher memory usage because contrary to the popular belief,

TCP connections are not free [1] in terms of resources, and going from 3 nodes to 2 should increase the connection memory consumption alone by a factor of 1.5. The math is more or less the

same for queue master replicas.

 

Of course this is also a mere speculation since we don’t have any metrics to work with at the moment.

 

But there is no need to guess: RabbitMQ nodes can tell you what uses most memory, you just

have to ask [2] and monitor the breakdown. RabbitMQ Grafana dashboards [3] can store and display

the same breakdown metrics over time while CLI tools and management UI can only cover

substantially shorter periods of time.

 

  1. https://www.rabbitmq.com/networking.html#tuning-for-large-number-of-connections
  2. https://www.rabbitmq.com/memory-use.html

Sergey Grigorev

unread,
May 27, 2020, 10:01:42 AM5/27/20
to rabbitmq-users
Thank you for the really fast reply!

I've prepared screenshots with metrics from Grafana. In the beginning, I ran 3000 clients and then restarted the rabbitmq-2 at about 15:19. There is on the screenshot, that 3 nodes are up, but actually, after I restarted the node, there was no period when all 3 instances were up. They got healthy only after I stopped MQTT connections and restarted all RabbitMQ instances.
As you can see, there was no high usage of file descriptors. After I rebooted the node, the number of channels was increasing and rabbitmq-2 stopped receive queues.

rmq-metrics-1.png

rmq-mertics-2.png

rmq-metrics-3.png


I entered into container with RabbintMQ during this "unhealthy" state and checked running processes. Is it normal to have several the same processes? I mean processes like "/usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B ..."
# ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
rabbitmq       1  0.0  0.0   1640  1240 ?        Ss   12:47   0:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-server
rabbitmq     118  0.0  0.0   1004   616 ?        S    12:47   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/epmd -daemon
rabbitmq     167 23.3  2.6 1471752 359596 ?      Sl   12:47   3:21 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 128000 -K true -sbwt none -sbwtdcpu none
rabbitmq     266  0.0  0.0    780   484 ?        Ss   12:47   0:00 erl_child_setup 1048576
rabbitmq     292  0.0  0.0    812    12 ?        Ss   12:47   0:00 inet_gethost 4
rabbitmq     293  0.0  0.0    812    44 ?        S    12:47   0:00 inet_gethost 4
root         441  0.1  0.3 1117228 48040 ?       Ssl  12:48   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root         536  0.0  0.0    784    16 ?        Ss   12:48   0:00 erl_child_setup 1048576
root         555  0.0  0.0    816    16 ?        Ss   12:48   0:00 inet_gethost 4
root         556  0.0  0.0    816    44 ?        S    12:48   0:00 inet_gethost 4
root         590  0.1  0.3 1115692 46800 ?       Ssl  12:48   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root         683  0.0  0.0    784    16 ?        Ss   12:48   0:00 erl_child_setup 1048576
root         704  0.0  0.0    816    16 ?        Ss   12:48   0:00 inet_gethost 4
root         705  0.0  0.0    816    52 ?        S    12:48   0:00 inet_gethost 4
root         784  0.1  0.3 1117740 48172 ?       Ssl  12:50   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root         876  0.1  0.0    784    16 ?        Ss   12:50   0:00 erl_child_setup 1048576
root         895  0.0  0.0    816    16 ?        Ss   12:50   0:00 inet_gethost 4
root         896  0.0  0.0    816    44 ?        S    12:50   0:00 inet_gethost 4
root         901  0.1  0.3 1116716 47684 ?       Ssl  12:51   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root         993  0.1  0.0    784    16 ?        Ss   12:51   0:00 erl_child_setup 1048576
root        1014  0.0  0.0    816    16 ?        Ss   12:51   0:00 inet_gethost 4
root        1015  0.0  0.0    816    44 ?        S    12:51   0:00 inet_gethost 4
root        1016  0.0  0.0   2532  2232 pts/0    Ss   12:51   0:00 bash
root        1104  0.1  0.3 1118252 48348 ?       Ssl  12:52   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1196  0.1  0.0    784    16 ?        Ss   12:52   0:00 erl_child_setup 1048576
root        1217  0.0  0.0    816    16 ?        Ss   12:52   0:00 inet_gethost 4
root        1218  0.0  0.0    816    48 ?        S    12:52   0:00 inet_gethost 4
root        1222  0.1  0.3 1117228 47632 ?       Ssl  12:53   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1314  0.1  0.0    784    16 ?        Ss   12:53   0:00 erl_child_setup 1048576
root        1335  0.0  0.0    816    16 ?        Ss   12:53   0:00 inet_gethost 4
root        1336  0.0  0.0    816    48 ?        S    12:53   0:00 inet_gethost 4
root        1420  0.2  0.3 1117740 48100 ?       Ssl  12:55   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1512  0.1  0.0    784    16 ?        Ss   12:55   0:00 erl_child_setup 1048576
root        1531  0.0  0.0    816    16 ?        Ss   12:55   0:00 inet_gethost 4
root        1532  0.0  0.0    816    48 ?        S    12:55   0:00 inet_gethost 4
root        1535  0.2  0.3 1117228 47276 ?       Ssl  12:55   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1627  0.1  0.0    784    16 ?        Ss   12:55   0:00 erl_child_setup 1048576
root        1648  0.0  0.0    816    16 ?        Ss   12:55   0:00 inet_gethost 4
root        1649  0.0  0.0    816    48 ?        S    12:55   0:00 inet_gethost 4
root        1730  0.3  0.3 1117228 48304 ?       Ssl  12:57   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1823  0.2  0.0    784    16 ?        Ss   12:57   0:00 erl_child_setup 1048576
root        1844  0.0  0.0    816    16 ?        Ss   12:57   0:00 inet_gethost 4
root        1845  0.0  0.0    816    48 ?        S    12:57   0:00 inet_gethost 4
root        1846  0.3  0.3 1117228 48052 ?       Ssl  12:57   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        1939  0.2  0.0    784    16 ?        Ss   12:57   0:00 erl_child_setup 1048576
root        1960  0.0  0.0    816    16 ?        Ss   12:57   0:00 inet_gethost 4
root        1961  0.0  0.0    816    44 ?        S    12:57   0:00 inet_gethost 4
root        2055  0.6  0.3 1118252 48368 ?       Ssl  12:59   0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inet
root        2147  0.5  0.0    784    16 ?        Ss   12:59   0:00 erl_child_setup 1048576
root        2166  0.0  0.0    816    16 ?        Ss   12:59   0:00 inet_gethost 4
root        2167  0.0  0.0    816    44 ?        S    12:59   0:00 inet_gethost 4
root        3457  0.0  0.0   1620   540 pts/0    R+   13:01   0:00 ps aux


Mirrored queues configuration:
# rabbitmqctl list_policies    
Listing policies for vhost "/" ...
vhost   name    pattern apply-to        definition      priority
/       ha-queues       .*      queues  {"ha-mode":"exactly","ha-params":2,"queue-master-locator":"min-masters"}        0


вторник, 26 мая 2020 г., 22:52:50 UTC+3 пользователь Michael Klishin написал:

** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}


** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751

 

and many other errors.

 

RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.

 

I will appreciate any help in solving this problem.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Sergey Grigorev

unread,
Jun 2, 2020, 1:35:40 PM6/2/20
to rabbitmq-users
Today I noticed that RabbitMQ doesn't remove some not durable queues after restarting any instance and stopping MQTT connections. These queues have to be removed if MQTT connection is closed because they have the enabled auto-delete feature. After I had disabled creation mirrors for not durable queues the issue disappeared. Unfortunately, it is not the reason for the unhealthy cluster state after restarting any RabbitMQ instance. I can reproduce the main issue even with 1000 MQTT connections. But everything works well if there are 500 connections.
By the way, I noticed another problem with not durable queues. RabbitMQ doesn't have the expected number of queues after restarting any node. For example, all 3000 clients reconnected, but the number of queues is 5339 instead of 6000. 661 not durable queues (for qos 0) are lost. Not sure why this happens, but I think it is related to the unhealthy state of RabbitMQ cluster after restarting any node.


вторник, 26 мая 2020 г., 20:10:25 UTC+3 пользователь Sergey Grigorev написал:
** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}

** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751

Michael Klishin

unread,
Jun 2, 2020, 1:50:04 PM6/2/20
to rabbitm...@googlegroups.com

Hi Sergey,

 

Perhaps my earlier responses did not make it to the list? You very likely experience a high connection churn condition [1] in your test. On top of that, the 1000

connection threshold sounds way too  close to the default open file descriptor limit on Linux [2]. Have you verified that the nodes use a higher limit? [3]

It’s not clear to me from a few Grafana charts you have posted.

 

A node that is out of file descriptors will not be able to not only accept new connections but also *write data to disk* if it involves opening a new file, which can lead to all kinds of

difficult to explain behaviors that pile up. According to the logs shared, your node runs into channel process flushing timeouts, which I can only attribute to

a very high connection churn [5]. How long lived are you client connections?

 

According to the logs you use the rabbitmq_auth_backend_http plugin. Are you aware of the load this plugin creates on the external HTTP service it uses? It can be substatial as

a new request is performed for *every* protocol operation. We highly recommend using it together with rabbitmq_auth_backend_cache [4].

 

  1. https://www.rabbitmq.com/connections.html
  2. https://www.rabbitmq.com/networking.html#open-file-handle-limit
  3. https://www.rabbitmq.com/production-checklist.html#resource-limits-file-handle-limit
  4. https://github.com/rabbitmq/rabbitmq-auth-backend-cache
  5. https://www.rabbitmq.com/connections.html#high-connection-churn

** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}


** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751

 

and many other errors.

 

RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.

 

I will appreciate any help in solving this problem.

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d361f35f-c14c-4ab3-b5b0-da1699dceb19%40googlegroups.com.

Michael Klishin

unread,
Jun 2, 2020, 1:54:51 PM6/2/20
to rabbitm...@googlegroups.com

I managed to open the screenshots full size and see that the open file handle limit is unlikely to be the issue.

 

Can full log files from all nodes be shared? We have one high MQTT connection scenario under investigation so at the very least

It would be great to see if there may be a connection between them.

 

Besides the non-mirrored queue with unavailable master, the only exception we have is a channel shutdown timeout which

in and of itself is just log noise but it can indicate a node under some kind of stress. But there isn’t much to continue narrowing this one down.

Michael Klishin

unread,
Jun 2, 2020, 2:04:11 PM6/2/20
to rabbitm...@googlegroups.com

Not sure if you use team RabbitMQ’s Prometheus plugin and Grafana dashboards but the following metrics would also be interesting to see [1][2].

 

So far the biggest red flag I see on the charts is the build-up of channels. Not only it is a major problematic symptom in and of itself but it can and will create

a lot of extra load on metric collection [3].

 

Now I am puzzled by how can 1000 MQTT clients lead to 30K channels at peak. Can you describe roughly what your clients are doing after connecting?

Any code snippets that we can use to reason about that? A traffic capture [4] or event trace? [5]

 

  1. https://www.rabbitmq.com/runtime.html#cpu
  2. https://www.rabbitmq.com/runtime.html#thread-stats
  3. https://www.rabbitmq.com/monitoring.html#monitoring-frequency
  4. https://www.rabbitmq.com/amqp-wireshark.html
  5. https://www.rabbitmq.com/logging.html#internal-events

Sergey Grigorev

unread,
Jun 3, 2020, 3:37:59 PM6/3/20
to rabbitmq-users
Hello! Thank you very much for your comments!

Today I performed the stress test one more time. I started 3000 MQTT clients, and after all clients connected I restarted the rabbitmq-2 instance. You can see the spike in CPU usage and huge spike in network I/O on the screenshot after restarting the container. It is one more question. Why there was a big network activity after restarting? Regarding max open files limit, the metric wasn't changed after all clients connected.
Speaking about auth backend, I have 4 running instances of the backend to spread the load. And I didn't notice that they were very busy. Do you think this could have such effect? We plan to use rabbitmq_auth_backend_cache plugin in the future.

What stress test does.
I have RabbitMQ cluster that is running in Kubernetes. I deploy containers there that simulate real clients. I start 25 mosquitto processes inside each container and they connect to RabbitMQ. Then I start sending messages using mosquitto_pub command inside the container with the random delay (100 ms - 1000 ms) to the random mosquitto. And then mosquitto send the messages to the RabbitMQ. RabbitMQ and mosquitto instances are running on the different nodes, so they shouldn't take effect on each other.

About RabbitMQ
As I said before I use RabbitMQ version 3.8.3, but I tried also 3.7.15. List of enabled plugins: rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_mqtt, rabbitmq_auth_backend_http,rabbitmq_prometheus. I set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none". I used the rabbitmq_prometheus plugin to collect metrics and RabbitMQ-overview grafana dashboard. I added to the dashboard graphs about CPU usage, memory usage and Network I/O.

There is a link to download logs (I cannot attach them because its size too big): https://drive.google.com/file/d/1Hv8CW2SLXTQsyGj06JMW-w3WrtNNXBj5/view?usp=sharing
And bellow you can find screenshots with metrics and output of rabbitmq-diagnostics runtime_thread_stats when the cluster was in unhealthy state. I hope this will help somehow with the investigation of the problem.
I'll also try to simulate the same behavior without Kubernetes. I'll try to develop scripts that will start RMQ cluster, auth backend and mosquitto on the local machine in docker containers. I'll let you know If I will succeed with it.

rmq-metrics-1.png

rmq-metrics-2.png


Output of rabbitmq-diagnostics runtime_thread_stats from rabbitmq-0:
Average thread real-time    : 5001063 us
Accumulated system run-time : 6297497 us
Average scheduler run-time  : 2978465 us

        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 7)    0.04%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.08%    2.44%    0.00%   97.45%    0.00%
     async( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(11)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(13)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(14)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(15)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(16)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(17)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(18)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(19)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(20)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(21)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(22)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(23)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(24)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(25)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(26)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(27)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(28)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(29)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(30)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(31)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(32)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(33)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(34)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(35)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(36)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(37)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(38)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(39)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(40)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(41)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(42)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(43)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(44)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(45)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(46)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(47)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(48)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(49)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(50)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(51)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(52)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(53)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(54)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(55)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(56)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(57)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(58)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(59)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(60)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(61)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(62)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(63)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
       aux( 1)    0.00%    0.03%    0.00%    0.00%    0.05%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.92%    0.00%
dirty_cpu_( 1)    0.03%    0.00%    0.00%    0.00%    0.00%    0.08%    0.00%    0.73%    0.02%    0.00%    0.00%    0.00%    0.00%   99.14%    0.00%
dirty_cpu_( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.29%    0.00%    0.00%    0.00%    1.34%    0.01%    0.00%    0.00%   98.36%    0.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.32%    0.00%    0.00%    0.00%    1.30%    0.01%    0.00%    0.00%   98.37%    0.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.04%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.96%    0.00%
 scheduler( 1)    7.49%    1.36%    7.34%    0.30%    0.72%   17.35%    4.92%    2.14%    0.93%    1.83%    2.29%   12.12%    1.54%   38.94%    0.73%
 scheduler( 2)    6.76%    1.88%    6.01%    0.36%    0.81%   13.23%    3.18%    1.75%    0.64%    1.33%    3.09%   16.87%    1.29%   41.94%    0.86%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.04%    0.00%   99.96%    0.00%
           aux    0.00%    0.03%    0.00%    0.00%    0.05%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.92%    0.00%
dirty_cpu_sche    0.01%    0.00%    0.00%    0.00%    0.00%    0.04%    0.00%    0.37%    0.01%    0.00%    0.00%    0.00%    0.00%   99.57%    0.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%    0.00%    0.00%    0.26%    0.00%    0.00%    0.00%   99.67%    0.00%
          poll    0.00%    0.00%    0.00%    0.00%    0.04%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.96%    0.00%
     scheduler    7.12%    1.62%    6.67%    0.33%    0.77%   15.29%    4.05%    1.95%    0.78%    1.58%    2.69%   14.49%    1.42%   40.44%    0.79%

Output of rabbitmq-diagnostics runtime_thread_stats from rabbitmq-1:
Average thread real-time    : 5004544 us
Accumulated system run-time : 8780534 us
Average scheduler run-time  : 4071549 us

        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 7)    0.06%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.10%    1.96%    0.00%   97.88%    0.00%
     async( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(11)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(13)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(14)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(15)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(16)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(17)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(18)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(19)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(20)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(21)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(22)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(23)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(24)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(25)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(26)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(27)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(28)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(29)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(30)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(31)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(32)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(33)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(34)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(35)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(36)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(37)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(38)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(39)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(40)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(41)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(42)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(43)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(44)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(45)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(46)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(47)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(48)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(49)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(50)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(51)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(52)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(53)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(54)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(55)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(56)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(57)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(58)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(59)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(60)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(61)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(62)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(63)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
       aux( 1)    0.01%    0.04%    0.00%    0.00%    0.10%    0.00%    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%    0.00%   99.84%    0.00%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 2)    0.08%    0.00%    0.00%    0.00%    0.00%    0.08%    0.00%    1.46%    0.23%    0.00%    0.00%    0.00%    0.00%   98.15%    0.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 6)    0.01%    0.00%    0.00%    0.01%    0.00%    1.87%    0.00%    0.00%    0.00%    3.75%    0.07%    0.00%    0.00%   94.30%    0.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.01%    0.00%    0.46%    0.00%    0.00%    0.00%    2.36%    0.01%    0.00%    0.00%   97.15%    0.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.05%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.95%    0.00%
 scheduler( 1)    7.76%    1.19%   20.98%    0.17%    0.87%   23.61%    5.33%    2.90%    0.87%    2.60%    1.69%   11.20%    1.31%   18.72%    0.82%
 scheduler( 2)    8.59%    1.20%   16.91%    0.17%    0.99%   23.51%    3.32%    3.11%    0.84%    3.98%    2.12%   13.51%    2.32%   18.56%    0.87%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.03%    0.00%   99.97%    0.00%
           aux    0.01%    0.04%    0.00%    0.00%    0.10%    0.00%    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%    0.00%   99.84%    0.00%
dirty_cpu_sche    0.04%    0.00%    0.00%    0.00%    0.00%    0.04%    0.00%    0.73%    0.11%    0.00%    0.00%    0.00%    0.00%   99.07%    0.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.23%    0.00%    0.00%    0.00%    0.61%    0.01%    0.00%    0.00%   99.14%    0.00%
          poll    0.00%    0.00%    0.00%    0.00%    0.05%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   99.95%    0.00%
     scheduler    8.18%    1.19%   18.94%    0.17%    0.93%   23.56%    4.33%    3.00%    0.85%    3.29%    1.90%   12.36%    1.81%   18.64%    0.84%

Output of rabbitmq-diagnostics runtime_thread_stats from rabbitmq-2:
Average thread real-time    :  5000789 us
Accumulated system run-time : 11107729 us
Average scheduler run-time  :  4895317 us

        Thread    alloc      aux      bifbusy_wait check_io emulator      ets       gc  gc_full      nif    other     port     send    sleep   timers

Stats per thread:
     async( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 3)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 7)    0.18%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.26%   11.46%    0.00%   88.10%    0.00%
     async( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(11)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(12)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(13)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(14)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(15)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(16)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(17)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(18)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(19)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(20)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(21)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(22)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(23)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(24)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(25)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(26)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(27)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(28)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(29)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(30)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(31)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(32)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(33)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(34)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(35)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(36)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(37)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(38)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(39)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(40)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(41)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(42)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(43)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(44)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(45)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(46)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(47)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%                                                                                                                          
     async(48)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(49)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(50)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(51)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(52)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(53)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(54)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(55)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(56)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(57)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(58)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(59)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(60)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(61)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(62)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     async(63)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
       aux( 1)    0.13%    0.48%    0.00%    0.00%    1.61%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   97.79%    0.00%
dirty_cpu_( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_cpu_( 2)    0.07%    0.00%    0.00%    0.00%    0.00%    0.11%    0.00%    0.63%    0.08%    0.00%    0.00%    0.00%    0.00%   99.11%    0.00%
dirty_io_s( 1)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 2)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 3)    0.00%    0.00%    0.00%    0.05%    0.00%    7.02%    0.00%    0.00%    0.00%    3.89%    0.36%    0.00%    0.00%   88.67%    0.00%
dirty_io_s( 4)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 5)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 6)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 7)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 8)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s( 9)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
dirty_io_s(10)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
      poll( 0)    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
 scheduler( 1)    8.06%    1.79%    6.86%    0.02%    1.85%   24.40%    3.01%    3.25%    0.01%    0.04%    3.77%   23.81%   20.48%    2.58%    0.07%
 scheduler( 2)    6.78%    1.83%    6.38%    0.02%    1.17%   37.59%    2.58%    2.57%    0.03%    0.19%    4.04%   19.99%   15.12%    1.63%    0.07%

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.18%    0.00%   99.81%    0.00%
           aux    0.13%    0.48%    0.00%    0.00%    1.61%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%   97.79%    0.00%
dirty_cpu_sche    0.03%    0.00%    0.00%    0.00%    0.00%    0.06%    0.00%    0.32%    0.04%    0.00%    0.00%    0.00%    0.00%   99.55%    0.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.01%    0.00%    0.70%    0.00%    0.00%    0.00%    0.39%    0.04%    0.00%    0.00%   98.87%    0.00%
          poll    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%    0.00%
     scheduler    7.42%    1.81%    6.62%    0.02%    1.51%   30.99%    2.80%    2.91%    0.02%    0.11%    3.90%   21.90%   17.80%    2.11%    0.07%




вторник, 2 июня 2020 г., 21:04:11 UTC+3 пользователь Michael Klishin написал:

** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}


** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751

 

and many other errors.

 

RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.

 

I will appreciate any help in solving this problem.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Message has been deleted

Michael Klishin

unread,
Jun 3, 2020, 6:50:16 PM6/3/20
to rabbitm...@googlegroups.com

Well, I have no evidence that an overloaded authN/authZ backend is playing a major role here but RabbitMQ MQTT frame processor

cannot handle its requests faster than it receives the responses from said backend. A backlog of pending operations can lead to timeouts

to which MQTT clients can react as they usually do — by reconnecting and retrying exactly the same thing (server-to-client error communication

is effectively non-existent in MQTT 3.1.1 the protocol).

 

According to the runtime thread stats each node uses two cores but it’s not close to being maxed out. It can be that this was a specific period of time

during your test.

 

Unfortunately this reveals no new information that would allow me to come up with a hypothesis as to why your clients end up with tens of thousands

of channels (which will swamp the 2 available cores and cause context switching to go up significantly).

 

Can your code that uses mosuitto_pub be shared? Are you certain that mosquitto_pub does not open a new connection per iteration? That’s not generally how MQTT

and other messaging protocols are expected to be used. We have a dedicated metric that detects such conditions [1].

 

  1. https://www.rabbitmq.com/connections.html#monitoring

 

On 03.06.2020, 22:38, rabbitm...@googlegroups.com on behalf of Sergey Grigorev wrote:

 

Hello! Thank you very much for your comments!

 

Today I performed the stress test one more time. I started 3000 MQTT clients, and after all clients connected I restarted the rabbitmq-2 instance. You can see the spike in CPU usage and huge spike in network I/O on the screenshot after restarting the container. It is one more question. Why there was a big network activity after restarting? Regarding max open files limit, the metric wasn't changed after all clients connected.

Speaking about auth backend, I have 4 running instances of the backend to spread the load. And I didn't notice that they were very busy. Do you think this could have such effect? We plan to use rabbitmq_auth_backend_cache plugin in the future.

 

What stress test does.

I have RabbitMQ cluster that is running in Kubernetes. I deploy containers there that simulate real clients. I start 25 mosquitto processes inside each container and they connect to RabbitMQ. Then I start sending messages using mosquitto_pub command inside the container with the random delay (100 ms - 1000 ms) to the random mosquitto. And then mosquitto send the messages to the RabbitMQ. RabbitMQ and mosquitto instances are running on the different nodes, so they shouldn't take effect on each other.

 

About RabbitMQ

As I said before I use RabbitMQ version 3.8.3, but I tried also 3.7.15. List of enabled plugins: rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_mqtt, rabbitmq_auth_backend_http,rabbitmq_prometheus. I set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none". I used the rabbitmq_prometheus plugin to collect metrics and RabbitMQ-overview grafana dashboard. I added to the dashboard graphs about CPU usage, memory usage and Network I/O.

 

There is a link to download logs (I cannot attach them because its size too big): https://drive.google.com/file/d/1Hv8CW2SLXTQsyGj06JMW-w3WrtNNXBj5/view?usp=sharing

And bellow you can find screenshots with metrics and output of rabbitmq-diagnostics runtime_thread_stats when the cluster was in unhealthy state. I hope this will help somehow with the investigation of the problem.

I'll also try to simulate the same behavior without Kubernetes. I'll try to develop scripts that will start RMQ cluster, auth backend and mosquitto on the local machine in docker containers. I'll let you know If I will succeed with it.

 

** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d2786044-5e22-4d72-b4a0-32a5d79e1787%40googlegroups.com.

Michael Klishin

unread,
Jun 3, 2020, 6:53:42 PM6/3/20
to rabbitm...@googlegroups.com

Given that this is a local test you should be able to take a traffic capture using tcpdump, Wireshark or any other libpcap-based tool [1]

and inspect/share it. I assume your tests use default or dummy credentials and connect to a LAN host?

 

And please enable rabbitmq_auth_backend_cache when you do to eliminate one variable. The plugin really doesn’t take a lot of time to hook up.

It would also significantly reduce the size of the capture if your authN service’s traffic is going to be included (which should not be particularly important).

 

  1. https://www.rabbitmq.com/amqp-wireshark.html

 

On 03.06.2020, 22:38, rabbitm...@googlegroups.com on behalf of Sergey Grigorev wrote:

 

Hello! Thank you very much for your comments!

 

Today I performed the stress test one more time. I started 3000 MQTT clients, and after all clients connected I restarted the rabbitmq-2 instance. You can see the spike in CPU usage and huge spike in network I/O on the screenshot after restarting the container. It is one more question. Why there was a big network activity after restarting? Regarding max open files limit, the metric wasn't changed after all clients connected.

Speaking about auth backend, I have 4 running instances of the backend to spread the load. And I didn't notice that they were very busy. Do you think this could have such effect? We plan to use rabbitmq_auth_backend_cache plugin in the future.

 

What stress test does.

I have RabbitMQ cluster that is running in Kubernetes. I deploy containers there that simulate real clients. I start 25 mosquitto processes inside each container and they connect to RabbitMQ. Then I start sending messages using mosquitto_pub command inside the container with the random delay (100 ms - 1000 ms) to the random mosquitto. And then mosquitto send the messages to the RabbitMQ. RabbitMQ and mosquitto instances are running on the different nodes, so they shouldn't take effect on each other.

 

About RabbitMQ

As I said before I use RabbitMQ version 3.8.3, but I tried also 3.7.15. List of enabled plugins: rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_mqtt, rabbitmq_auth_backend_http,rabbitmq_prometheus. I set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none". I used the rabbitmq_prometheus plugin to collect metrics and RabbitMQ-overview grafana dashboard. I added to the dashboard graphs about CPU usage, memory usage and Network I/O.

 

There is a link to download logs (I cannot attach them because its size too big): https://drive.google.com/file/d/1Hv8CW2SLXTQsyGj06JMW-w3WrtNNXBj5/view?usp=sharing

And bellow you can find screenshots with metrics and output of rabbitmq-diagnostics runtime_thread_stats when the cluster was in unhealthy state. I hope this will help somehow with the investigation of the problem.

I'll also try to simulate the same behavior without Kubernetes. I'll try to develop scripts that will start RMQ cluster, auth backend and mosquitto on the local machine in docker containers. I'll let you know If I will succeed with it.

 

** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d2786044-5e22-4d72-b4a0-32a5d79e1787%40googlegroups.com.

Sergey Grigorev

unread,
Jun 4, 2020, 1:24:50 PM6/4/20
to rabbitmq-users
I've enabled the auth backend cache plugin, the result is the same.

Are you certain that mosquitto_pub does not open a new connection per iteration?
Yes. mosquitto_pub is a tool that comes with mosquito and it is used to send messages to it. mosquitto_pub opens a connection to mosquitto, sends a message and closes the connection. Mosquitto is configured to connect to a random RMQ instance, it should reconnect only if the connection closes. And it configured to route the incoming messages (from mosquitto_pub) to RabbitMQ.
Here is the script that simulates clients:
#!/bin/bash

RMQ_HOST
=${RMQ_HOST:-localhost}
START_MOSQUITTO_INDEX
=${START_MOSQUITTO_INDEX:-0}
END_MOSQUITTO_INDEX
=${END_MOSQUITTO_INDEX:-25}
MIN_MSG_SEND_DELAY_MS
=${MIN_MSG_SEND_DELAY_MS:-100}
MAX_MSG_SEND_DELAY_MS
=${MAX_MSG_SEND_DELAY_MS:-1000}
# it is internal number of mosquitto. It is used for calculation of port number.
INTERNAL_NUMBER
=0

for COMMON_NUMBER in $(seq $START_MOSQUITTO_INDEX $END_MOSQUITTO_INDEX)
do
  PANEL_ID
=$(echo "$(printf '0%.0s' $(seq 1 $((7-${#COMMON_NUMBER}))))$COMMON_NUMBER")
  PORT
=$((1800 + $INTERNAL_NUMBER))
  DATA_DIR
="/opt/mosquitto/$COMMON_NUMBER/data"
  CERTS_DIR
="/opt/mosquitto/$COMMON_NUMBER/certs"
  mkdir
-p $DATA_DIR
  mkdir
-p $CERTS_DIR

 
# Generate client cert
  openssl req
-nodes -newkey rsa:2048 -keyout $CERTS_DIR/client.key -out $CERTS_DIR/client.csr -subj "/C=US/ST=Washington/O=Test/OU=client/CN=panel-$PANEL_ID.rmq-test.com"
 
# Sign the client cert
  openssl x509
-req -days 5000 -in $CERTS_DIR/client.csr -CA /etc/ssl/CA.crt -CAkey /etc/ssl/CA.key -CAcreateserial -out $CERTS_DIR/client.crt

 
# prepare config
  CONFIG_PATH
="/opt/mosquitto/$COMMON_NUMBER/mosquitto.conf"
  sed
"s|PANEL_ID|$PANEL_ID|g" /etc/mosquitto.conf > $CONFIG_PATH
  sed
-i "s|RMQ_HOST|$RMQ_HOST|g" $CONFIG_PATH
  sed
-i "s|CERTS_DIR|$CERTS_DIR|g" $CONFIG_PATH
  sed
-i "s|DATA_DIR|$DATA_DIR|g" $CONFIG_PATH
  sed
-i "s|MQTT_PORT|$PORT|g" $CONFIG_PATH

 
/usr/sbin/mosquitto -c $CONFIG_PATH &

  INTERNAL_NUMBER
=$(($INTERNAL_NUMBER + 1))
done

# periodically send messages to the random mosquitto
while true
do
  PORT
=$(shuf -i 1800-$((1800 + $INTERNAL_NUMBER)) -n 1)
  mosquitto_pub
-p $PORT -t test/message -m '{"id": "7a54ec4b-9ee2-42c9-a797-938e03b7eff0", "message": "hello"}'
 
# from 10 to MAX_MSG_SEND_DELAY_MS
  RANDOM_DELAY
=$(shuf -i $MIN_MSG_SEND_DELAY_MS-$MAX_MSG_SEND_DELAY_MS -n 1)
 
# get delay as float value
  DELAY
=$(bc <<< "scale=2; $RANDOM_DELAY/1000")
  sleep $DELAY
done

I forgot to mention, I've tried to limit the number of channels by setting channel_max = 100. It hasn't helped.
When I restart a RabbitMQ instance, all mosquitto, which were connected to this instance, try to connect to another. What is RabbitMQ behavior in this case? Will it accept the connection? And what will happen when the restarted instance get up? Will it remove its channels?
There are two channels for each mqtt connection in the normal state. But in the unhealthy state, there are more than 2 channels per connection, for example, several for rabbitmq-0 and several for rabbitmq-1.
It looks like after a node is down, 2 other instances start synchronization and start to accept new mqtt connections and it leads to some race condition or something like this, I don't know.

Regarding traffic capturing, I've done it. Dump files are here https://drive.google.com/drive/folders/1dzqTacwW-8L9UXdH6f074ItNIL5f0Enw?usp=sharing rmq-0-dump-v2.tar.gz and rmq-1-dump-v2.tar.gz contain less data, while for the rest, I was capturing traffic for about 10 minutes.

четверг, 4 июня 2020 г., 1:50:16 UTC+3 пользователь Michael Klishin написал:
...

Michael Klishin

unread,
Jun 4, 2020, 4:36:03 PM6/4/20
to rabbitm...@googlegroups.com

We know what mosquitto_pub is.

 

I’m not sure what you are trying to test but running mosquitto_pub in a loop will eventually break.

This workload creates a maximum possible connection churn scenario, the complications from which can vary

but connection and channel build-up are guaranteed, and channel termination timeouts are more or less expected.

 

High connection churn is a pretty well documented condition [1].

 

Duplicate IDs would require the MQTT broker to disconnect existing connections with the same ID. This obviously does not help

with connection churn rate. Your script and [2] lead me to believe that mosquitto_pub will generate a new client ID for each loop iteration.

Which, if you use QoS 1, will also leak queues used by the MQTT plugin. They have a TTL but it’s 24 hours or so and in a tight connection loop

might as well not exist.

 

A realistic deployment will sustain, say, a few hundreds new connections a second and this can be a reasonable workload to test

but with a tight infinite loop connection and channel build-up is inevitable and will never ease. Environments where such workloads

are possible should use HAproxy or similar to limit the inbound connection rate and aggressively tune for rapid disposition of TCP sockets,

as [1] mentions. RabbitMQ itself has a configurable number of TCP connection acceptors but does not provide rate limiting.

 

[3] (called keepalives in MQTT, no direct relation to TCP keepalives) can be useful but values < 5 seconds are known to be risky in practice and again,

in a tight loop such intervals likely won’t be very effective.

 

So you can tune socket settings and related kernel settings and achieve a certain rate (say, 300-500 connections a second) after which the connection

and Erlang process build-up on the node becomes sustainable. In practice such workloads is a guaranteed way to run into operational headaches

(and not just with RabbitMQ, there is no shortage of posts about such conditions with HTTP servers, databases and so on), so consider

using a load testing tool that does not create a connection churn scenario. I haven’t evaluated MQTT 3.1 ones in a while but something that works simirly to PerfTest [4,

just using MQTT and exposes its relevant features.

 

  1. https://www.rabbitmq.com/networking.html#dealing-with-high-connection-churn
  2. https://mosquitto.org/man/mosquitto_pub-1.html
  3. https://www.rabbitmq.com/heartbeats.html
  4. https://rabbitmq.github.io/rabbitmq-perf-test/stable/htmlsingle/

Image removed by sender.

Image removed by sender.

...

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Michael Klishin

unread,
Jun 4, 2020, 5:20:30 PM6/4/20
to rabbitm...@googlegroups.com

In case it’s not obvious from PerfTest docs: it opens N publisher and M consumer connections and uses them

for the duration of the run.

 

FWIW this is how messaging protocols are assumed to be used, even though some connection

churn is definitely expected with some “IoT workloads” and even “local microservice” ones as well.

 

While googling for some discussions of this I randomly found [1], which is rather long but explains the trade-offs

Involved while dealing with accumulating connections specifically in the TIME_WAIT state.

 

In theory limiting the number of TCP acceptors from default 10 to 1 might limit the ingress but on a 2-core node

I suspect a tight loop of new connections to take over anyway. But it won’t take long to try this [2].

 

  1. http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
  2. https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.conf.example#L37

Sergey Grigorev

unread,
Jun 5, 2020, 11:40:42 AM6/5/20
to rabbitmq-users
Thank you for the useful links!
I disabled sending messages and set num_acceptors to 1, but this also didn't help to improve the situation. Unfortunately, I couldn't tune net.ipv4.tcp_* settings because Kubernetes doesn't allow change them. I need to configure its service somehow to have this possibility and I didn't have time for this. Also, I can reproduce the issue with 1500 mqtt connections that don't send messages using mosquitto_pub, they just keep the connection to the RMQ. Is it a high load for RabbitMQ cluster?

You were right. Several mosquitto instances start reconnecting to the RabbitMQ and they start doing it after I restart an instance. The question is why they can't connect?

In the mosquitto logs I see:
1591348858: Socket error on client local.0000007.client, disconnecting.

Here are logs from RabbitMQ:
2020-06-05 09:23:16.370 [debug] <0.2375.200> Received a CONNECT, client ID: "0000007.client" (expanded to "0000007.client"), username: undefined, clean session: false, protocol version: 4, keepalive: 300
2020-06-05 09:23:16.377 [debug] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3724.200>
2020-06-05 09:23:16.379 [debug] <0.3724.200> User 'panel-0000007.rmq-test.com' authenticated successfully by backend rabbit_auth_backend_cache
2020-06-05 09:23:16.407 [debug] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} started rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3838.200>
2020-06-05 09:23:16.409 [debug] <0.3842.200> Supervisor {<0.3842.200>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.3841.200>, <0.3841.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3843.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3844.200>
** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},"10.28.5.74:43814 -> 10.28.4.15:8883",true,undefined,false,running,{none,none},<0.2325.200>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"clients">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<<"panel-0000007.rmq-test.com">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1354},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,10000,undefined}}
** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}
2020-06-05 09:24:16.411 [error] <0.2375.200> CRASH REPORT Process <0.2375.200> with 1 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183
2020-06-05 09:24:16.413 [error] <0.2317.200> Supervisor {<0.2317.200>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.2325.200>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.2375.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
** Last message in was {'EXIT',<0.2375.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<0.3771.200>,undefined,1591348996378},<0.3770.200>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...},...}
** "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                                           true,false,false,\n                                                           false,false,[]},\n                                                          none,<0.2375.200>},\n                                                         60000]}}"
2020-06-05 09:24:16.438 [error] <0.3777.200> Supervisor {<0.3777.200>,amqp_channel_sup} had child channel started with amqp_channel:start_link(direct, <0.3724.200>, 1, <0.3778.200>, {<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,1}) at <0.3779.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
2020-06-05 09:24:16.445 [error] <0.3724.200> CRASH REPORT Process <0.3724.200> with 0 neighbours exited with reason: "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                       ..." in gen_server:handle_common_reply/8 line 751
2020-06-05 09:24:16.446 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                       ..." in context child_terminated
2020-06-05 09:24:16.449 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason reached_max_restart_intensity in context shutdown
** Last message in was {'EXIT',<0.3779.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}
** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.3779.200>,<0.3779.200>,<0.3724.200>,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,undefined,{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,<<>>,<0.3771.200>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],none,10,134217728,undefined,#{<<"client_id">> => <<"0000007.client">>}},{lstate,<0.3837.200>,false},none,1,{0,{[],[]}},#{},{state,#{},erlang},#{},#{},{set,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{state,fine,10000,undefined},false,1,{unconfirmed,{0,nil},#{},#{}},[],[],none,flow,[],#{},#Ref<0.3436069672.2065432577.135970>,1000000000}
** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}
2020-06-05 09:24:16.449 [error] <0.3838.200> CRASH REPORT Process <0.3838.200> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183
2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason reached_max_restart_intensity in context shutdown
2020-06-05 09:24:19.452 [error] <0.3844.200> Channel error on connection <0.3724.200> (10.28.5.74:43814 -> 10.28.4.15:8883, vhost: '/', user: 'panel-0000007.rmq-test.com'), channel 2:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'mqtt-subscription-0000007.clientqos1' in vhost '/' due to timeout

I worry about this line:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'mqtt-subscription-0000007.clientqos1' in vhost '/' due to timeout

At that time, the queue 'mqtt-subscription-0000007.clientqos1' is shown as unsynchronized on the UI.

Maybe this is the reason why the mosquitto receives socket error and tries to reconnect?


пятница, 5 июня 2020 г., 0:20:30 UTC+3 пользователь Michael Klishin написал:
...

Michael Klishin

unread,
Jun 9, 2020, 4:54:15 PM6/9/20
to rabbitm...@googlegroups.com

I’m pretty sure we have established that the issue is entirely with connection churn or rapid initial connection rate, not message rates.

 

All the exceptions say is that there was a queue operation that timed out. “Timed out” under a connection churn scenario can mean

that its connection (and thus underlying channel) were closed before the operation could complete.

 

My understanding is that it is a new connection *rate*, not the total number, is the key variable. I cannot say why without trying to reproduce.

 

I don’t have much to add at this point. Connection churn is a scenario that must be explicitly tuned for (not just with RabbitMQ) and avoided where possible.

We have a somewhat similar report for MQTT with more data and it will be investigated after 3.8.5 ships. It can be that the new client ID tracker

In 3.8 is effectively a bottleneck in some scenarios (cluster-wide consensus sounds cool but it certainly isn’t free in terms of latency or availability implications).

Image removed by sender. Image removed by sender.

Image removed by sender. Image removed by sender.

...

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Sergey Grigorev

unread,
Jun 10, 2020, 10:22:25 AM6/10/20
to rabbitmq-users
Thank you for the response!

If I get any new information about the issue, then I will publish it here.

вторник, 9 июня 2020 г., 23:54:15 UTC+3 пользователь Michael Klishin написал:

I’m pretty sure we have established that the issue is entirely with connection churn or rapid initial connection rate, not message rates.

 

All the exceptions say is that there was a queue operation that timed out. “Timed out” under a connection churn scenario can mean

that its connection (and thus underlying channel) were closed before the operation could complete.

 

My understanding is that it is a new connection *rate*, not the total number, is the key variable. I cannot say why without trying to reproduce.

 

I don’t have much to add at this point. Connection churn is a scenario that must be explicitly tuned for (not just with RabbitMQ) and avoided where possible.

We have a somewhat similar report for MQTT with more data and it will be investigated after 3.8.5 ships. It can be that the new client ID tracker

In 3.8 is effectively a bottleneck in some scenarios (cluster-wide consensus sounds cool but it certainly isn’t free in terms of latency or availability implications).

 

On 05.06.2020, 18:40, rabbit...@googlegroups.com on behalf of Sergey Grigorev wrote:

 

Thank you for the useful links!

I disabled sending messages and set num_acceptors to 1, but this also didn't help to improve the situation. Unfortunately, I couldn't tune net.ipv4.tcp_* settings because Kubernetes doesn't allow change them. I need to configure its service somehow to have this possibility and I didn't have time for this. Also, I can reproduce the issue with 1500 mqtt connections that don't send messages using mosquitto_pub, they just keep the connection to the RMQ. Is it a high load for RabbitMQ cluster?

 

You were right. Several mosquitto instances start reconnecting to the RabbitMQ and they start doing it after I restart an instance. The question is why they can't connect?

 

In the mosquitto logs I see:

1591348858: Socket error on client local.0000007.client, disconnecting.


Here are logs from RabbitMQ:

2020-06-05 09:23:16.370 [debug] <0.2375.200> Received a CONNECT, client ID: "0000007.client" (expanded to "0000007.client"), username: undefined, clean session: false, protocol version: 4, keepalive: 300

2020-06-05 09:23:16.377 [debug] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3724.200>

2020-06-05 09:23:16.379 [debug] <0.3724.200> User 'panel-0000007.rmq-test.com' authenticated successfully by backend rabbit_auth_backend_cache

2020-06-05 09:23:16.407 [debug] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} started rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3838.200>

2020-06-05 09:23:16.409 [debug] <0.3842.200> Supervisor {<0.3842.200>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.3841.200>, <0.3841.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3843.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3844.200>

** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},"10.28.5.74:43814 -> 10.28.4.15:8883",true,undefined,false,running,{none,none},<0.2325.200>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"clients">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<<"panel-0000007.rmq-test.com">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1354},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,10000,undefined}}

** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}

2020-06-05 09:24:16.411 [error] <0.2375.200> CRASH REPORT Process <0.2375.200> with 1 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183

2020-06-05 09:24:16.413 [error] <0.2317.200> Supervisor {<0.2317.200>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.2325.200>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.2375.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated

** Last message in was {'EXIT',<0.2375.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}

** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<0.3771.200>,undefined,1591348996378},<0.3770.200>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...},...}

** "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                                           true,false,false,\n                                                           false,false,[]},\n                                                          none,<0.2375.200>},\n                                                         60000]}}"

2020-06-05 09:24:16.438 [error] <0.3777.200> Supervisor {<0.3777.200>,amqp_channel_sup} had child channel started with amqp_channel:start_link(direct, <0.3724.200>, 1, <0.3778.200>, {<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,1}) at <0.3779.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated

2020-06-05 09:24:16.445 [error] <0.3724.200> CRASH REPORT Process <0.3724.200> with 0 neighbours exited with reason: "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                       ..." in gen_server:handle_common_reply/8 line 751

2020-06-05 09:24:16.446 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason "stopping because dependent process <0.2375.200> died: {timeout,\n                                                       {gen_server,call,\n                                                        [<0.3841.200>,\n                                                         {call,\n                                                          {'queue.declare',0,\n                                                           <<\"mqtt-subscription-0000007.clientqos1\">>,\n                                       ..." in context child_terminated

2020-06-05 09:24:16.449 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason reached_max_restart_intensity in context shutdown

** Last message in was {'EXIT',<0.3779.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}

** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.3779.200>,<0.3779.200>,<0.3724.200>,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,undefined,{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,<<>>,<0.3771.200>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],none,10,134217728,undefined,#{<<"client_id">> => <<"0000007.client">>}},{lstate,<0.3837.200>,false},none,1,{0,{[],[]}},#{},{state,#{},erlang},#{},#{},{set,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{state,fine,10000,undefined},false,1,{unconfirmed,{0,nil},#{},#{}},[],[],none,flow,[],#{},#Ref<0.3436069672.2065432577.135970>,1000000000}

** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}

2020-06-05 09:24:16.449 [error] <0.3838.200> CRASH REPORT Process <0.3838.200> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183

2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated

2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason reached_max_restart_intensity in context shutdown

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages