2020-05-26 09:24:46.450 [error] emulator Discarding message {'$gen_call',{<0.7665.0>,#Ref<0.480670917.2464677890.243959>},stat} from <0.7665.0> to <0.20346.24> in an old incarnation (1) of this node (2)
2020-05-26 06:28:20.971 [error] <0.1587.0> Channel error on connection <0.1571.0> (10.28.4.7:40624 -> 10.28.4.6:8883, vhost: '/', user: 'user-0000001'), channel 2:
operation queue.declare caused a channel exception not_found: home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible
2020-05-26 06:28:20.971 [debug] <0.1585.0> Supervisor {<0.1585.0>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.1584.0>, <0.1584.0>, <0.1571.0>, <<"10.28.4.7:40624 -> 10.28.4.6:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"user-0000001.">>,[testclients],[{rabbit_auth_backend_http,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.1575.0>, <0.1586.0>, {amqp_params_direct,<<"user-0000001.">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',...}) at pid <0.1587.0>
2020-05-26 06:28:20.972 [debug] <0.1581.0> auth_backend_http: POST http://rmqauth:8000/resource
2020-05-26 06:28:20.976 [debug] <0.1581.0> auth_backend_http: response code is 200, body: "allow"
2020-05-26 06:28:20.976 [error] <0.1581.0> Channel error on connection <0.1571.0> (10.28.4.7:40624 -> 10.28.4.6:8883, vhost: '/', user: 'user-0000001.'), channel 1:
operation queue.declare caused a channel exception not_found: home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible
2020-05-26 06:28:20.978 [error] <0.1567.0> ** Generic server <0.1567.0> terminating
** Last message in was {ssl,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},<<16,80,0,4,77,81,84,84,4,44,0,60,0,17,48,48,48,48,48,48,49,46,99,108,111,117,100,110,111,100,101,0,46,36,83,89,83,47,98,114,111,107,101,114,47,99,111,110,110,101,99,116,105,111,110,47,48,48,48,48,48,48,49,46,99,108,111,117,100,110,111,100,101,47,115,116,97,116,101,0,1,48>>}
** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},"10.28.4.7:40624 -> 10.28.4.6:8883",true,undefined,false,running,{none,none},<0.1566.0>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.179>,tls_connection,[{option_tracker,<0.585.0>},{session_tickets_tracker,disabled}]},[<0.1564.0>,<0.1563.0>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"testclientss">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1030},8883,{0,0,0,0,0,65535,2588,1031},40624,<<"10.28.4.7:40624 -> 10.28.4.6:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000001.,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T06:26:39Z - 2034-02-02T06:26:39Z">>}]},<<"user-0000001.">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1031},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,5000,undefined}}
** Reason for termination ==
** {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}}
2020-05-26 06:28:20.978 [error] <0.1567.0> CRASH REPORT Process <0.1567.0> with 0 neighbours exited with reason: {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}} in gen_server2:terminate/3 line 1183
2020-05-26 06:28:20.979 [error] <0.1565.0> Supervisor {<0.1565.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1566.0>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.1567.0> exit with reason {{shutdown,{server_initiated_close,404,<<"NOT_FOUND - home node 'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local' of durable queue 'mqtt-subscription-0000001.testclientsqos1' in vhost '/' is down or inaccessible">>}},{gen_server,call,[<0.1578.0>,{call,{'queue.declare',0,<<"mqtt-subscription-0000001.testclientsqos1">>,false,true,false,false,false,[{<<"x-expires">>,long,86400000}]},none,<0.1567.0>},60000]}} in context child_terminated
2020-05-26 06:28:20.979 [debug] <0.569.0> MQTT connection with client id '0000001.testclients' failed
2020-05-26 06:28:20.979 [error] <0.1565.0> Supervisor {<0.1565.0>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.1566.0>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.1567.0> exit with reason reached_max_restart_intensity in context shutdown
2020-05-26 06:28:20.981 [error] <0.1571.0> ** Generic server <0.1571.0> terminating
2020-05-26 16:47:55.818 [debug] <0.653.0> MQTT connection with client id '0001292.testclients' failed
2020-05-26 16:47:55.820 [warning] <0.18137.4> Channel (<0.18137.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.822 [error] <0.18137.4> ** Generic server <0.18137.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17759.4>,<0.18100.4>,direct,{[{{<0.17233.4>,#Ref<0.1213858664.1769996290.220559>},<0.17233.4>,{'queue.bind',0,<<"mqtt-subscription-0002361.testclientsqos1">>,<<"testclients">>,<<"cloud.0002361.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18174.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.827 [error] <0.17101.4> CRASH REPORT Process <0.17101.4> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.18496.4>,{call,{'queue.bind',0,<<"mqtt-subscription-0002120.testclientsqos1">>,<<"testclients">>,<<"cloud.0002120.#">>,false,[]},none,<0.17101.4>},60000]}} in gen_server2:terminate/3 line 1183
2020-05-26 16:47:55.838 [error] <0.18083.4> ** Generic server <0.18083.4> terminating
** Last message in was closing_timeout
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751
is one immediately visible smoking gun that is enough to form a hypothesis knowing that we are dealing with MQTT clients.
Your clients likely run loops trying to “recover” (we have seen this with MQTT clients all too often) from this exeption by repeating the same operation that fails in the same way, creating a connection leak and/or high connection churn scenario.
So what does all of that mean? Let’s see.
The message is explained in [1]. Some of the clients use a non-mirrored queue which, when its hosting node goes down, is inaccessible and any client operation on it will fail until the node is back online.
MQTT clients then eagerly retry the same thing over and over, likely opening more and more connections to the node [2]
until it runs out of file descriptors. There are obvious metrics you can watch to prove this hypothesis right or wrong [3][4][5]. Once a node is out of file descriptors, it cannot accept CLI tool connections,
new MQTT connections or even write to its local disk. Pretty much nothing can degrade gracefully
or fail predictably under these conditions. There is no recovery other than releasing all connections or node restart, too.
The condition is a combination of connection leaks or high connection churn. RabbitMQ Connections guide explains both, including what metrics are available for detection and how (to some extent) they can be mitigated.
A durable non-mirrored classic queue works as expected in your test and will not “recover” or “migrate” to another node, it will simply become unavailable until the hosting node come back up.
Perhaps enable mirroring to two out of three nodes.
It’s a good question how quorum queues can be exposed to MQTT clients given that MQTT operations cannot be extended with optional headers like, say, STOMP frames can.
I cannot comment on your “ha queue” tests since the log message clearly states a condition
where this queue is not mirrored. A node can be run out of file descriptors even with mirrored queues, we really need some metrics to work with here.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/a4a9dec2-ee7a-4af4-86b4-2faaab9dfd13%40googlegroups.com.
Another possible scenario is that nodes use the default open file descriptor limit (1024 on many UNIX-y systems by default). With 3000 clients that’s barely enough to sustain them and a bit of disk I/O
even given an even connection distribution.
Once one node is stopped, its clients reconnect to other nodes running them out of file descriptors.
This would also explain a higher memory usage because contrary to the popular belief,
TCP connections are not free [1] in terms of resources, and going from 3 nodes to 2 should increase the connection memory consumption alone by a factor of 1.5. The math is more or less the
same for queue master replicas.
Of course this is also a mere speculation since we don’t have any metrics to work with at the moment.
But there is no need to guess: RabbitMQ nodes can tell you what uses most memory, you just
have to ask [2] and monitor the breakdown. RabbitMQ Grafana dashboards [3] can store and display
the same breakdown metrics over time while CLI tools and management UI can only cover
substantially shorter periods of time.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/F64D6B92-8824-47C3-A583-6ADC11AA08BA%40vmware.com.
# ps auxUSER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMANDrabbitmq 1 0.0 0.0 1640 1240 ? Ss 12:47 0:00 /bin/sh /opt/rabbitmq/sbin/rabbitmq-serverrabbitmq 118 0.0 0.0 1004 616 ? S 12:47 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/epmd -daemonrabbitmq 167 23.3 2.6 1471752 359596 ? Sl 12:47 3:21 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -W w -A 64 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t 5000000 -stbt db -zdbbl 128000 -K true -sbwt none -sbwtdcpu nonerabbitmq 266 0.0 0.0 780 484 ? Ss 12:47 0:00 erl_child_setup 1048576rabbitmq 292 0.0 0.0 812 12 ? Ss 12:47 0:00 inet_gethost 4rabbitmq 293 0.0 0.0 812 44 ? S 12:47 0:00 inet_gethost 4root 441 0.1 0.3 1117228 48040 ? Ssl 12:48 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 536 0.0 0.0 784 16 ? Ss 12:48 0:00 erl_child_setup 1048576root 555 0.0 0.0 816 16 ? Ss 12:48 0:00 inet_gethost 4root 556 0.0 0.0 816 44 ? S 12:48 0:00 inet_gethost 4root 590 0.1 0.3 1115692 46800 ? Ssl 12:48 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 683 0.0 0.0 784 16 ? Ss 12:48 0:00 erl_child_setup 1048576root 704 0.0 0.0 816 16 ? Ss 12:48 0:00 inet_gethost 4root 705 0.0 0.0 816 52 ? S 12:48 0:00 inet_gethost 4root 784 0.1 0.3 1117740 48172 ? Ssl 12:50 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 876 0.1 0.0 784 16 ? Ss 12:50 0:00 erl_child_setup 1048576root 895 0.0 0.0 816 16 ? Ss 12:50 0:00 inet_gethost 4root 896 0.0 0.0 816 44 ? S 12:50 0:00 inet_gethost 4root 901 0.1 0.3 1116716 47684 ? Ssl 12:51 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 993 0.1 0.0 784 16 ? Ss 12:51 0:00 erl_child_setup 1048576root 1014 0.0 0.0 816 16 ? Ss 12:51 0:00 inet_gethost 4root 1015 0.0 0.0 816 44 ? S 12:51 0:00 inet_gethost 4root 1016 0.0 0.0 2532 2232 pts/0 Ss 12:51 0:00 bashroot 1104 0.1 0.3 1118252 48348 ? Ssl 12:52 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1196 0.1 0.0 784 16 ? Ss 12:52 0:00 erl_child_setup 1048576root 1217 0.0 0.0 816 16 ? Ss 12:52 0:00 inet_gethost 4root 1218 0.0 0.0 816 48 ? S 12:52 0:00 inet_gethost 4root 1222 0.1 0.3 1117228 47632 ? Ssl 12:53 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1314 0.1 0.0 784 16 ? Ss 12:53 0:00 erl_child_setup 1048576root 1335 0.0 0.0 816 16 ? Ss 12:53 0:00 inet_gethost 4root 1336 0.0 0.0 816 48 ? S 12:53 0:00 inet_gethost 4root 1420 0.2 0.3 1117740 48100 ? Ssl 12:55 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1512 0.1 0.0 784 16 ? Ss 12:55 0:00 erl_child_setup 1048576root 1531 0.0 0.0 816 16 ? Ss 12:55 0:00 inet_gethost 4root 1532 0.0 0.0 816 48 ? S 12:55 0:00 inet_gethost 4root 1535 0.2 0.3 1117228 47276 ? Ssl 12:55 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1627 0.1 0.0 784 16 ? Ss 12:55 0:00 erl_child_setup 1048576root 1648 0.0 0.0 816 16 ? Ss 12:55 0:00 inet_gethost 4root 1649 0.0 0.0 816 48 ? S 12:55 0:00 inet_gethost 4root 1730 0.3 0.3 1117228 48304 ? Ssl 12:57 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1823 0.2 0.0 784 16 ? Ss 12:57 0:00 erl_child_setup 1048576root 1844 0.0 0.0 816 16 ? Ss 12:57 0:00 inet_gethost 4root 1845 0.0 0.0 816 48 ? S 12:57 0:00 inet_gethost 4root 1846 0.3 0.3 1117228 48052 ? Ssl 12:57 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 1939 0.2 0.0 784 16 ? Ss 12:57 0:00 erl_child_setup 1048576root 1960 0.0 0.0 816 16 ? Ss 12:57 0:00 inet_gethost 4root 1961 0.0 0.0 816 44 ? S 12:57 0:00 inet_gethost 4root 2055 0.6 0.3 1118252 48368 ? Ssl 12:59 0:00 /usr/local/lib/erlang/erts-10.7.2.1/bin/beam.smp -B -- -root /usr/local/lib/erlang -progname erl -- -home /var/lib/rabbitmq -- -boot start_clean -noshell -noinput -noshell -hidden -smp enable -kernel inetroot 2147 0.5 0.0 784 16 ? Ss 12:59 0:00 erl_child_setup 1048576root 2166 0.0 0.0 816 16 ? Ss 12:59 0:00 inet_gethost 4root 2167 0.0 0.0 816 44 ? S 12:59 0:00 inet_gethost 4root 3457 0.0 0.0 1620 540 pts/0 R+ 13:01 0:00 ps aux
# rabbitmqctl list_policies
Listing policies for vhost "/" ...
vhost name pattern apply-to definition priority
/ ha-queues .* queues {"ha-mode":"exactly","ha-params":2,"queue-master-locator":"min-masters"} 0
** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751
and many other errors.
RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.
I will appreciate any help in solving this problem.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/a4a9dec2-ee7a-4af4-86b4-2faaab9dfd13%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751
Hi Sergey,
Perhaps my earlier responses did not make it to the list? You very likely experience a high connection churn condition [1] in your test. On top of that, the 1000
connection threshold sounds way too close to the default open file descriptor limit on Linux [2]. Have you verified that the nodes use a higher limit? [3]
It’s not clear to me from a few Grafana charts you have posted.
A node that is out of file descriptors will not be able to not only accept new connections but also *write data to disk* if it involves opening a new file, which can lead to all kinds of
difficult to explain behaviors that pile up. According to the logs shared, your node runs into channel process flushing timeouts, which I can only attribute to
a very high connection churn [5]. How long lived are you client connections?
According to the logs you use the rabbitmq_auth_backend_http plugin. Are you aware of the load this plugin creates on the external HTTP service it uses? It can be substatial as
a new request is performed for *every* protocol operation. We highly recommend using it together with rabbitmq_auth_backend_cache [4].
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751
and many other errors.
RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.
I will appreciate any help in solving this problem.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/d361f35f-c14c-4ab3-b5b0-da1699dceb19%40googlegroups.com.
I managed to open the screenshots full size and see that the open file handle limit is unlikely to be the issue.
Can full log files from all nodes be shared? We have one high MQTT connection scenario under investigation so at the very least
It would be great to see if there may be a connection between them.
Besides the non-mirrored queue with unavailable master, the only exception we have is a channel shutdown timeout which
in and of itself is just log noise but it can indicate a node under some kind of stress. But there isn’t much to continue narrowing this one down.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/0CE0CA0B-0BF5-4F68-8646-096134BAC38B%40vmware.com.
Not sure if you use team RabbitMQ’s Prometheus plugin and Grafana dashboards but the following metrics would also be interesting to see [1][2].
So far the biggest red flag I see on the charts is the build-up of channels. Not only it is a major problematic symptom in and of itself but it can and will create
a lot of extra load on metric collection [3].
Now I am puzzled by how can 1000 MQTT clients lead to 30K channels at peak. Can you describe roughly what your clients are doing after connecting?
Any code snippets that we can use to reason about that? A traffic capture [4] or event trace? [5]
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/58FB265E-CAD8-4AA4-9E4C-EA34BC8DAAD3%40vmware.com.
Average thread real-time : 5001063 usAccumulated system run-time : 6297497 usAverage scheduler run-time : 2978465 us
Thread alloc aux bifbusy_wait check_io emulator ets gc gc_full nif other port send sleep timers
Stats per thread: async( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 7) 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.08% 2.44% 0.00% 97.45% 0.00% async( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(11) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(12) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(13) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(14) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(15) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(16) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(17) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(18) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(19) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(20) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(21) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(22) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(23) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(24) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(25) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(26) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(27) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(28) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(29) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(30) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(31) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(32) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(33) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(34) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(35) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(36) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(37) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(38) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(39) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(40) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(41) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(42) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(43) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(44) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(45) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(46) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(47) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(48) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(49) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(50) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(51) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(52) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(53) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(54) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(55) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(56) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(57) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(58) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(59) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(60) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(61) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(62) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(63) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% aux( 1) 0.00% 0.03% 0.00% 0.00% 0.05% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.92% 0.00%dirty_cpu_( 1) 0.03% 0.00% 0.00% 0.00% 0.00% 0.08% 0.00% 0.73% 0.02% 0.00% 0.00% 0.00% 0.00% 99.14% 0.00%dirty_cpu_( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.29% 0.00% 0.00% 0.00% 1.34% 0.01% 0.00% 0.00% 98.36% 0.00%dirty_io_s( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.32% 0.00% 0.00% 0.00% 1.30% 0.01% 0.00% 0.00% 98.37% 0.00% poll( 0) 0.00% 0.00% 0.00% 0.00% 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.96% 0.00% scheduler( 1) 7.49% 1.36% 7.34% 0.30% 0.72% 17.35% 4.92% 2.14% 0.93% 1.83% 2.29% 12.12% 1.54% 38.94% 0.73% scheduler( 2) 6.76% 1.88% 6.01% 0.36% 0.81% 13.23% 3.18% 1.75% 0.64% 1.33% 3.09% 16.87% 1.29% 41.94% 0.86%
Stats per type: async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.04% 0.00% 99.96% 0.00% aux 0.00% 0.03% 0.00% 0.00% 0.05% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.92% 0.00%dirty_cpu_sche 0.01% 0.00% 0.00% 0.00% 0.00% 0.04% 0.00% 0.37% 0.01% 0.00% 0.00% 0.00% 0.00% 99.57% 0.00%dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.06% 0.00% 0.00% 0.00% 0.26% 0.00% 0.00% 0.00% 99.67% 0.00% poll 0.00% 0.00% 0.00% 0.00% 0.04% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.96% 0.00% scheduler 7.12% 1.62% 6.67% 0.33% 0.77% 15.29% 4.05% 1.95% 0.78% 1.58% 2.69% 14.49% 1.42% 40.44% 0.79%
Average thread real-time : 5004544 usAccumulated system run-time : 8780534 usAverage scheduler run-time : 4071549 us
Thread alloc aux bifbusy_wait check_io emulator ets gc gc_full nif other port send sleep timers
Stats per thread: async( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 7) 0.06% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.10% 1.96% 0.00% 97.88% 0.00% async( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(11) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(12) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(13) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(14) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(15) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(16) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(17) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(18) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(19) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(20) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(21) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(22) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(23) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(24) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(25) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(26) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(27) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(28) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(29) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(30) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(31) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(32) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(33) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(34) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(35) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(36) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(37) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(38) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(39) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(40) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(41) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(42) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(43) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(44) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(45) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(46) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(47) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(48) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(49) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(50) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(51) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(52) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(53) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(54) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(55) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(56) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(57) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(58) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(59) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(60) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(61) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(62) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(63) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% aux( 1) 0.01% 0.04% 0.00% 0.00% 0.10% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 99.84% 0.00%dirty_cpu_( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_cpu_( 2) 0.08% 0.00% 0.00% 0.00% 0.00% 0.08% 0.00% 1.46% 0.23% 0.00% 0.00% 0.00% 0.00% 98.15% 0.00%dirty_io_s( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 6) 0.01% 0.00% 0.00% 0.01% 0.00% 1.87% 0.00% 0.00% 0.00% 3.75% 0.07% 0.00% 0.00% 94.30% 0.00%dirty_io_s( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 9) 0.00% 0.00% 0.00% 0.01% 0.00% 0.46% 0.00% 0.00% 0.00% 2.36% 0.01% 0.00% 0.00% 97.15% 0.00%dirty_io_s(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% poll( 0) 0.00% 0.00% 0.00% 0.00% 0.05% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.95% 0.00% scheduler( 1) 7.76% 1.19% 20.98% 0.17% 0.87% 23.61% 5.33% 2.90% 0.87% 2.60% 1.69% 11.20% 1.31% 18.72% 0.82% scheduler( 2) 8.59% 1.20% 16.91% 0.17% 0.99% 23.51% 3.32% 3.11% 0.84% 3.98% 2.12% 13.51% 2.32% 18.56% 0.87%
Stats per type: async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.03% 0.00% 99.97% 0.00% aux 0.01% 0.04% 0.00% 0.00% 0.10% 0.00% 0.00% 0.00% 0.00% 0.00% 0.01% 0.00% 0.00% 99.84% 0.00%dirty_cpu_sche 0.04% 0.00% 0.00% 0.00% 0.00% 0.04% 0.00% 0.73% 0.11% 0.00% 0.00% 0.00% 0.00% 99.07% 0.00%dirty_io_sched 0.00% 0.00% 0.00% 0.00% 0.00% 0.23% 0.00% 0.00% 0.00% 0.61% 0.01% 0.00% 0.00% 99.14% 0.00% poll 0.00% 0.00% 0.00% 0.00% 0.05% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 99.95% 0.00% scheduler 8.18% 1.19% 18.94% 0.17% 0.93% 23.56% 4.33% 3.00% 0.85% 3.29% 1.90% 12.36% 1.81% 18.64% 0.84%
Average thread real-time : 5000789 usAccumulated system run-time : 11107729 usAverage scheduler run-time : 4895317 us
Thread alloc aux bifbusy_wait check_io emulator ets gc gc_full nif other port send sleep timers
Stats per thread: async( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 3) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 7) 0.18% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.26% 11.46% 0.00% 88.10% 0.00% async( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(11) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(12) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(13) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(14) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(15) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(16) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(17) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(18) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(19) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(20) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(21) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(22) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(23) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(24) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(25) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(26) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(27) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(28) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(29) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(30) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(31) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(32) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(33) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(34) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(35) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(36) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(37) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(38) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(39) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(40) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(41) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(42) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(43) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(44) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(45) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(46) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(47) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(48) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(49) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(50) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(51) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(52) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(53) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(54) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(55) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(56) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(57) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(58) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(59) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(60) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(61) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(62) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% async(63) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% aux( 1) 0.13% 0.48% 0.00% 0.00% 1.61% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 97.79% 0.00%dirty_cpu_( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_cpu_( 2) 0.07% 0.00% 0.00% 0.00% 0.00% 0.11% 0.00% 0.63% 0.08% 0.00% 0.00% 0.00% 0.00% 99.11% 0.00%dirty_io_s( 1) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 2) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 3) 0.00% 0.00% 0.00% 0.05% 0.00% 7.02% 0.00% 0.00% 0.00% 3.89% 0.36% 0.00% 0.00% 88.67% 0.00%dirty_io_s( 4) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 5) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 6) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 7) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 8) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s( 9) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00%dirty_io_s(10) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% poll( 0) 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% scheduler( 1) 8.06% 1.79% 6.86% 0.02% 1.85% 24.40% 3.01% 3.25% 0.01% 0.04% 3.77% 23.81% 20.48% 2.58% 0.07% scheduler( 2) 6.78% 1.83% 6.38% 0.02% 1.17% 37.59% 2.58% 2.57% 0.03% 0.19% 4.04% 19.99% 15.12% 1.63% 0.07%
Stats per type: async 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.18% 0.00% 99.81% 0.00% aux 0.13% 0.48% 0.00% 0.00% 1.61% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 97.79% 0.00%dirty_cpu_sche 0.03% 0.00% 0.00% 0.00% 0.00% 0.06% 0.00% 0.32% 0.04% 0.00% 0.00% 0.00% 0.00% 99.55% 0.00%dirty_io_sched 0.00% 0.00% 0.00% 0.01% 0.00% 0.70% 0.00% 0.00% 0.00% 0.39% 0.04% 0.00% 0.00% 98.87% 0.00% poll 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 0.00% 100.00% 0.00% scheduler 7.42% 1.81% 6.62% 0.02% 1.51% 30.99% 2.80% 2.91% 0.02% 0.11% 3.90% 21.90% 17.80% 2.11% 0.07%
** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rabbit@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
** Reason for termination ==
** {closing_timeout,normal}
2020-05-26 16:47:55.841 [warning] <0.18184.4> Channel (<0.18184.4>) closing: timed out flushing while connection closing
2020-05-26 16:47:55.841 [error] <0.18184.4> ** Generic server <0.18184.4> terminating
** Last message in was timed_out_flushing_channel
** When Server state == {state,1,<0.17806.4>,<0.18163.4>,direct,{[{{<0.17265.4>,#Ref<0.1213858664.1769996290.220562>},<0.17265.4>,{'queue.bind',0,<<"mqtt-subscription-0001877.testclientsqos1">>,<<"testclients">>,<<"cloud.0001877.#">>,false,[]},none,noflow}],[]},{connection,normal},<0.18285.4>,none,none,0,true,none,{0,nil},{0,nil},true,true}
** Reason for termination ==
** timed_out_flushing_channel
2020-05-26 16:47:55.842 [error] <0.18184.4> CRASH REPORT Process <0.18184.4> with 0 neighbours exited with reason: timed_out_flushing_channel in gen_server:handle_common_reply/8 line 751
and many other errors.
RabbitMQ version is 3.8.2, Erlang 22.3.4.1. But the issue is reproduced on version 3.7.15 as well.
I will appreciate any help in solving this problem.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/d361f35f-c14c-4ab3-b5b0-da1699dceb19%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/0CE0CA0B-0BF5-4F68-8646-096134BAC38B%40vmware.com.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.
Well, I have no evidence that an overloaded authN/authZ backend is playing a major role here but RabbitMQ MQTT frame processor
cannot handle its requests faster than it receives the responses from said backend. A backlog of pending operations can lead to timeouts
to which MQTT clients can react as they usually do — by reconnecting and retrying exactly the same thing (server-to-client error communication
is effectively non-existent in MQTT 3.1.1 the protocol).
According to the runtime thread stats each node uses two cores but it’s not close to being maxed out. It can be that this was a specific period of time
during your test.
Unfortunately this reveals no new information that would allow me to come up with a hypothesis as to why your clients end up with tens of thousands
of channels (which will swamp the 2 available cores and cause context switching to go up significantly).
Can your code that uses mosuitto_pub be shared? Are you certain that mosquitto_pub does not open a new connection per iteration? That’s not generally how MQTT
and other messaging protocols are expected to be used. We have a dedicated metric that detects such conditions [1].
Hello! Thank you very much for your comments!
Today I performed the stress test one more time. I started 3000 MQTT clients, and after all clients connected I restarted the rabbitmq-2 instance. You can see the spike in CPU usage and huge spike in network I/O on the screenshot after restarting the container. It is one more question. Why there was a big network activity after restarting? Regarding max open files limit, the metric wasn't changed after all clients connected.
Speaking about auth backend, I have 4 running instances of the backend to spread the load. And I didn't notice that they were very busy. Do you think this could have such effect? We plan to use rabbitmq_auth_backend_cache plugin in the future.
What stress test does.
I have RabbitMQ cluster that is running in Kubernetes. I deploy containers there that simulate real clients. I start 25 mosquitto processes inside each container and they connect to RabbitMQ. Then I start sending messages using mosquitto_pub command inside the container with the random delay (100 ms - 1000 ms) to the random mosquitto. And then mosquitto send the messages to the RabbitMQ. RabbitMQ and mosquitto instances are running on the different nodes, so they shouldn't take effect on each other.
About RabbitMQ
As I said before I use RabbitMQ version 3.8.3, but I tried also 3.7.15. List of enabled plugins: rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_mqtt, rabbitmq_auth_backend_http,rabbitmq_prometheus. I set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none". I used the rabbitmq_prometheus plugin to collect metrics and RabbitMQ-overview grafana dashboard. I added to the dashboard graphs about CPU usage, memory usage and Network I/O.
There is a link to download logs (I cannot attach them because its size too big): https://drive.google.com/file/d/1Hv8CW2SLXTQsyGj06JMW-w3WrtNNXBj5/view?usp=sharing
And bellow you can find screenshots with metrics and output of rabbitmq-diagnostics runtime_thread_stats when the cluster was in unhealthy state. I hope this will help somehow with the investigation of the problem.
I'll also try to simulate the same behavior without Kubernetes. I'll try to develop scripts that will start RMQ cluster, auth backend and mosquitto on the local machine in docker containers. I'll let you know If I will succeed with it.
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/d2786044-5e22-4d72-b4a0-32a5d79e1787%40googlegroups.com.
Given that this is a local test you should be able to take a traffic capture using tcpdump, Wireshark or any other libpcap-based tool [1]
and inspect/share it. I assume your tests use default or dummy credentials and connect to a LAN host?
And please enable rabbitmq_auth_backend_cache when you do to eliminate one variable. The plugin really doesn’t take a lot of time to hook up.
It would also significantly reduce the size of the capture if your authN service’s traffic is going to be included (which should not be particularly important).
Hello! Thank you very much for your comments!
Today I performed the stress test one more time. I started 3000 MQTT clients, and after all clients connected I restarted the rabbitmq-2 instance. You can see the spike in CPU usage and huge spike in network I/O on the screenshot after restarting the container. It is one more question. Why there was a big network activity after restarting? Regarding max open files limit, the metric wasn't changed after all clients connected.
Speaking about auth backend, I have 4 running instances of the backend to spread the load. And I didn't notice that they were very busy. Do you think this could have such effect? We plan to use rabbitmq_auth_backend_cache plugin in the future.
What stress test does.
I have RabbitMQ cluster that is running in Kubernetes. I deploy containers there that simulate real clients. I start 25 mosquitto processes inside each container and they connect to RabbitMQ. Then I start sending messages using mosquitto_pub command inside the container with the random delay (100 ms - 1000 ms) to the random mosquitto. And then mosquitto send the messages to the RabbitMQ. RabbitMQ and mosquitto instances are running on the different nodes, so they shouldn't take effect on each other.
About RabbitMQ
As I said before I use RabbitMQ version 3.8.3, but I tried also 3.7.15. List of enabled plugins: rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_mqtt, rabbitmq_auth_backend_http,rabbitmq_prometheus. I set RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+sbwt none +sbwtdcpu none +sbwtdio none". I used the rabbitmq_prometheus plugin to collect metrics and RabbitMQ-overview grafana dashboard. I added to the dashboard graphs about CPU usage, memory usage and Network I/O.
There is a link to download logs (I cannot attach them because its size too big): https://drive.google.com/file/d/1Hv8CW2SLXTQsyGj06JMW-w3WrtNNXBj5/view?usp=sharing
And bellow you can find screenshots with metrics and output of rabbitmq-diagnostics runtime_thread_stats when the cluster was in unhealthy state. I hope this will help somehow with the investigation of the problem.
I'll also try to simulate the same behavior without Kubernetes. I'll try to develop scripts that will start RMQ cluster, auth backend and mosquitto on the local machine in docker containers. I'll let you know If I will succeed with it.
** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{user,<<"user-0000527o">>,[testclients],[{rabbit_auth_backend_http,none}]},<<"/">>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,2588,1476},39502,<<"10.28.5.196:39502 -> 10.28.1.149:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000527.testclients">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000527.testclients">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=user-0000527,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-05-26T16:30:58Z - 2034-02-02T16:30:58Z">>}]},<0.18335.4>,normal,1590511552269},<0.18328.4>,{amqp_params_direct,<<"user-0000527">>,none,<<"/">>,'rab...@rabbitmq-0.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,405},8883,{0,0,0,0,0,65535,...},...},...},...}
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/d2786044-5e22-4d72-b4a0-32a5d79e1787%40googlegroups.com.
Are you certain that mosquitto_pub does not open a new connection per iteration?
#!/bin/bash
RMQ_HOST=${RMQ_HOST:-localhost}
START_MOSQUITTO_INDEX=${START_MOSQUITTO_INDEX:-0}
END_MOSQUITTO_INDEX=${END_MOSQUITTO_INDEX:-25}
MIN_MSG_SEND_DELAY_MS=${MIN_MSG_SEND_DELAY_MS:-100}
MAX_MSG_SEND_DELAY_MS=${MAX_MSG_SEND_DELAY_MS:-1000}
# it is internal number of mosquitto. It is used for calculation of port number.
INTERNAL_NUMBER=0
for COMMON_NUMBER in $(seq $START_MOSQUITTO_INDEX $END_MOSQUITTO_INDEX)
do
PANEL_ID=$(echo "$(printf '0%.0s' $(seq 1 $((7-${#COMMON_NUMBER}))))$COMMON_NUMBER")
PORT=$((1800 + $INTERNAL_NUMBER))
DATA_DIR="/opt/mosquitto/$COMMON_NUMBER/data"
CERTS_DIR="/opt/mosquitto/$COMMON_NUMBER/certs"
mkdir -p $DATA_DIR
mkdir -p $CERTS_DIR
# Generate client cert
openssl req -nodes -newkey rsa:2048 -keyout $CERTS_DIR/client.key -out $CERTS_DIR/client.csr -subj "/C=US/ST=Washington/O=Test/OU=client/CN=panel-$PANEL_ID.rmq-test.com"
# Sign the client cert
openssl x509 -req -days 5000 -in $CERTS_DIR/client.csr -CA /etc/ssl/CA.crt -CAkey /etc/ssl/CA.key -CAcreateserial -out $CERTS_DIR/client.crt
# prepare config
CONFIG_PATH="/opt/mosquitto/$COMMON_NUMBER/mosquitto.conf"
sed "s|PANEL_ID|$PANEL_ID|g" /etc/mosquitto.conf > $CONFIG_PATH
sed -i "s|RMQ_HOST|$RMQ_HOST|g" $CONFIG_PATH
sed -i "s|CERTS_DIR|$CERTS_DIR|g" $CONFIG_PATH
sed -i "s|DATA_DIR|$DATA_DIR|g" $CONFIG_PATH
sed -i "s|MQTT_PORT|$PORT|g" $CONFIG_PATH
/usr/sbin/mosquitto -c $CONFIG_PATH &
INTERNAL_NUMBER=$(($INTERNAL_NUMBER + 1))
done
# periodically send messages to the random mosquitto
while true
do
PORT=$(shuf -i 1800-$((1800 + $INTERNAL_NUMBER)) -n 1)
mosquitto_pub -p $PORT -t test/message -m '{"id": "7a54ec4b-9ee2-42c9-a797-938e03b7eff0", "message": "hello"}'
# from 10 to MAX_MSG_SEND_DELAY_MS
RANDOM_DELAY=$(shuf -i $MIN_MSG_SEND_DELAY_MS-$MAX_MSG_SEND_DELAY_MS -n 1)
# get delay as float value
DELAY=$(bc <<< "scale=2; $RANDOM_DELAY/1000")
sleep $DELAY
done
...
We know what mosquitto_pub is.
I’m not sure what you are trying to test but running mosquitto_pub in a loop will eventually break.
This workload creates a maximum possible connection churn scenario, the complications from which can vary
but connection and channel build-up are guaranteed, and channel termination timeouts are more or less expected.
High connection churn is a pretty well documented condition [1].
Duplicate IDs would require the MQTT broker to disconnect existing connections with the same ID. This obviously does not help
with connection churn rate. Your script and [2] lead me to believe that mosquitto_pub will generate a new client ID for each loop iteration.
Which, if you use QoS 1, will also leak queues used by the MQTT plugin. They have a TTL but it’s 24 hours or so and in a tight connection loop
might as well not exist.
A realistic deployment will sustain, say, a few hundreds new connections a second and this can be a reasonable workload to test
but with a tight infinite loop connection and channel build-up is inevitable and will never ease. Environments where such workloads
are possible should use HAproxy or similar to limit the inbound connection rate and aggressively tune for rapid disposition of TCP sockets,
as [1] mentions. RabbitMQ itself has a configurable number of TCP connection acceptors but does not provide rate limiting.
[3] (called keepalives in MQTT, no direct relation to TCP keepalives) can be useful but values < 5 seconds are known to be risky in practice and again,
in a tight loop such intervals likely won’t be very effective.
So you can tune socket settings and related kernel settings and achieve a certain rate (say, 300-500 connections a second) after which the connection
and Erlang process build-up on the node becomes sustainable. In practice such workloads is a guaranteed way to run into operational headaches
(and not just with RabbitMQ, there is no shortage of posts about such conditions with HTTP servers, databases and so on), so consider
using a load testing tool that does not create a connection churn scenario. I haven’t evaluated MQTT 3.1 ones in a while but something that works simirly to PerfTest [4,
just using MQTT and exposes its relevant features.
...
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/6ba312a4-978c-42db-9df8-3b685a5e6eab%40googlegroups.com.
In case it’s not obvious from PerfTest docs: it opens N publisher and M consumer connections and uses them
for the duration of the run.
FWIW this is how messaging protocols are assumed to be used, even though some connection
churn is definitely expected with some “IoT workloads” and even “local microservice” ones as well.
While googling for some discussions of this I randomly found [1], which is rather long but explains the trade-offs
Involved while dealing with accumulating connections specifically in the TIME_WAIT state.
In theory limiting the number of TCP acceptors from default 10 to 1 might limit the ingress but on a 2-core node
I suspect a tight loop of new connections to take over anyway. But it won’t take long to try this [2].
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/94A124BE-A38A-4221-B222-98303D7706A4%40vmware.com.
1591348858: Socket error on client local.0000007.client, disconnecting.
2020-06-05 09:23:16.370 [debug] <0.2375.200> Received a CONNECT, client ID: "0000007.client" (expanded to "0000007.client"), username: undefined, clean session: false, protocol version: 4, keepalive: 3002020-06-05 09:23:16.377 [debug] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3724.200>2020-06-05 09:23:16.379 [debug] <0.3724.200> User 'panel-0000007.rmq-test.com' authenticated successfully by backend rabbit_auth_backend_cache2020-06-05 09:23:16.407 [debug] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} started rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3838.200>2020-06-05 09:23:16.409 [debug] <0.3842.200> Supervisor {<0.3842.200>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.3841.200>, <0.3841.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3843.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3844.200>** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},"10.28.5.74:43814 -> 10.28.4.15:8883",true,undefined,false,running,{none,none},<0.2325.200>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"clients">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<<"panel-0000007.rmq-test.com">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1354},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,10000,undefined}}** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}2020-06-05 09:24:16.411 [error] <0.2375.200> CRASH REPORT Process <0.2375.200> with 1 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 11832020-06-05 09:24:16.413 [error] <0.2317.200> Supervisor {<0.2317.200>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.2325.200>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.2375.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated** Last message in was {'EXIT',<0.2375.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}** When Server state == {state,amqp_direct_connection,{state,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<0.3771.200>,undefined,1591348996378},<0.3770.200>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...},...}** "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n true,false,false,\n false,false,[]},\n none,<0.2375.200>},\n 60000]}}"2020-06-05 09:24:16.438 [error] <0.3777.200> Supervisor {<0.3777.200>,amqp_channel_sup} had child channel started with amqp_channel:start_link(direct, <0.3724.200>, 1, <0.3778.200>, {<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,1}) at <0.3779.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated2020-06-05 09:24:16.445 [error] <0.3724.200> CRASH REPORT Process <0.3724.200> with 0 neighbours exited with reason: "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n ..." in gen_server:handle_common_reply/8 line 7512020-06-05 09:24:16.446 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n ..." in context child_terminated2020-06-05 09:24:16.449 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason reached_max_restart_intensity in context shutdown** Last message in was {'EXIT',<0.3779.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.3779.200>,<0.3779.200>,<0.3724.200>,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,undefined,{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,<<>>,<0.3771.200>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],none,10,134217728,undefined,#{<<"client_id">> => <<"0000007.client">>}},{lstate,<0.3837.200>,false},none,1,{0,{[],[]}},#{},{state,#{},erlang},#{},#{},{set,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{state,fine,10000,undefined},false,1,{unconfirmed,{0,nil},#{},#{}},[],[],none,flow,[],#{},#Ref<0.3436069672.2065432577.135970>,1000000000}** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}2020-06-05 09:24:16.449 [error] <0.3838.200> CRASH REPORT Process <0.3838.200> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 11832020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason reached_max_restart_intensity in context shutdown2020-06-05 09:24:19.452 [error] <0.3844.200> Channel error on connection <0.3724.200> (10.28.5.74:43814 -> 10.28.4.15:8883, vhost: '/', user: 'panel-0000007.rmq-test.com'), channel 2:operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'mqtt-subscription-0000007.clientqos1' in vhost '/' due to timeout
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'mqtt-subscription-0000007.clientqos1' in vhost '/' due to timeout
...
I’m pretty sure we have established that the issue is entirely with connection churn or rapid initial connection rate, not message rates.
All the exceptions say is that there was a queue operation that timed out. “Timed out” under a connection churn scenario can mean
that its connection (and thus underlying channel) were closed before the operation could complete.
My understanding is that it is a new connection *rate*, not the total number, is the key variable. I cannot say why without trying to reproduce.
I don’t have much to add at this point. Connection churn is a scenario that must be explicitly tuned for (not just with RabbitMQ) and avoided where possible.
We have a somewhat similar report for MQTT with more data and it will be investigated after 3.8.5 ships. It can be that the new client ID tracker
In 3.8 is effectively a bottleneck in some scenarios (cluster-wide consensus sounds cool but it certainly isn’t free in terms of latency or availability implications).
...
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/rabbitmq-users/4ed1d477-962b-4019-80d1-0b6262257929o%40googlegroups.com.
I’m pretty sure we have established that the issue is entirely with connection churn or rapid initial connection rate, not message rates.
All the exceptions say is that there was a queue operation that timed out. “Timed out” under a connection churn scenario can mean
that its connection (and thus underlying channel) were closed before the operation could complete.
My understanding is that it is a new connection *rate*, not the total number, is the key variable. I cannot say why without trying to reproduce.
I don’t have much to add at this point. Connection churn is a scenario that must be explicitly tuned for (not just with RabbitMQ) and avoided where possible.
We have a somewhat similar report for MQTT with more data and it will be investigated after 3.8.5 ships. It can be that the new client ID tracker
In 3.8 is effectively a bottleneck in some scenarios (cluster-wide consensus sounds cool but it certainly isn’t free in terms of latency or availability implications).
On 05.06.2020, 18:40, rabbit...@googlegroups.com on behalf of Sergey Grigorev wrote:
Thank you for the useful links!
I disabled sending messages and set num_acceptors to 1, but this also didn't help to improve the situation. Unfortunately, I couldn't tune net.ipv4.tcp_* settings because Kubernetes doesn't allow change them. I need to configure its service somehow to have this possibility and I didn't have time for this. Also, I can reproduce the issue with 1500 mqtt connections that don't send messages using mosquitto_pub, they just keep the connection to the RMQ. Is it a high load for RabbitMQ cluster?
You were right. Several mosquitto instances start reconnecting to the RabbitMQ and they start doing it after I restart an instance. The question is why they can't connect?
In the mosquitto logs I see:
1591348858: Socket error on client local.0000007.client, disconnecting.
Here are logs from RabbitMQ:
2020-06-05 09:23:16.370 [debug] <0.2375.200> Received a CONNECT, client ID: "0000007.client" (expanded to "0000007.client"), username: undefined, clean session: false, protocol version: 4, keepalive: 300
2020-06-05 09:23:16.377 [debug] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} started amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3724.200>
2020-06-05 09:23:16.379 [debug] <0.3724.200> User 'panel-0000007.rmq-test.com' authenticated successfully by backend rabbit_auth_backend_cache
2020-06-05 09:23:16.407 [debug] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} started rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3838.200>
2020-06-05 09:23:16.409 [debug] <0.3842.200> Supervisor {<0.3842.200>,rabbit_channel_sup} started rabbit_channel:start_link(2, <0.3841.200>, <0.3841.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3843.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at pid <0.3844.200>
** When Server state == {state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},"10.28.5.74:43814 -> 10.28.4.15:8883",true,undefined,false,running,{none,none},<0.2325.200>,false,none,{proc_state,{sslsocket,{gen_tcp,#Port<0.7330>,tls_connection,[{option_tracker,<0.15965.0>},{session_tickets_tracker,disabled}]},[<0.2294.200>,<0.2260.200>]},#{},{undefined,undefined},{0,nil},{0,nil},undefined,1,undefined,undefined,undefined,{undefined,undefined},undefined,<<"clients">>,{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"N/A"},[{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<<"panel-0000007.rmq-test.com">>,undefined,undefined,#Fun<rabbit_mqtt_processor.0.93352367>,{0,0,0,0,0,65535,2588,1354},#Fun<rabbit_mqtt_util.4.116266148>,#Fun<rabbit_mqtt_util.5.116266148>,undefined},undefined,{state,fine,10000,undefined}}
** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}
2020-06-05 09:24:16.411 [error] <0.2375.200> CRASH REPORT Process <0.2375.200> with 1 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183
2020-06-05 09:24:16.413 [error] <0.2317.200> Supervisor {<0.2317.200>,rabbit_mqtt_connection_sup} had child rabbit_mqtt_reader started with rabbit_mqtt_reader:start_link(<0.2325.200>, {acceptor,{0,0,0,0,0,0,0,0},8883}) at <0.2375.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
** Last message in was {'EXIT',<0.2375.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}
** When Server state == {state,amqp_direct_connection,{state,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rab...@rabbitmq-2.rabbitmq.default.svc.cluster.local',{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},[]},{amqp_adapter_info,{0,0,0,0,0,65535,2588,1039},8883,{0,0,0,0,0,65535,2588,1354},43814,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,{'MQTT',"3.1.1"},[{variable_map,#{<<"client_id">> => <<"0000007.client">>}},{channels,1},{channel_max,1},{frame_max,0},{client_properties,[{<<"product">>,longstr,<<"MQTT client">>},{client_id,longstr,<<"0000007.client">>}]},{ssl,true},{ssl_protocol,'tlsv1.2'},{ssl_key_exchange,ecdhe_rsa},{ssl_cipher,aes_256_gcm},{ssl_hash,aead},{peer_cert_issuer,<<"CN=ROOT_CA,OU=Sys-Sec,O=Test,ST=Washington,C=US">>},{peer_cert_subject,<<"CN=panel-0000007.rmq-test.com,OU=client,O=Test,ST=Washington,C=US">>},{peer_cert_validity,<<"2020-06-05T08:48:58Z - 2034-02-12T08:48:58Z">>}]},<0.3771.200>,undefined,1591348996378},<0.3770.200>,{amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...},...}
** "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n true,false,false,\n false,false,[]},\n none,<0.2375.200>},\n 60000]}}"
2020-06-05 09:24:16.438 [error] <0.3777.200> Supervisor {<0.3777.200>,amqp_channel_sup} had child channel started with amqp_channel:start_link(direct, <0.3724.200>, 1, <0.3778.200>, {<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,1}) at <0.3779.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
2020-06-05 09:24:16.445 [error] <0.3724.200> CRASH REPORT Process <0.3724.200> with 0 neighbours exited with reason: "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n ..." in gen_server:handle_common_reply/8 line 751
2020-06-05 09:24:16.446 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason "stopping because dependent process <0.2375.200> died: {timeout,\n {gen_server,call,\n [<0.3841.200>,\n {call,\n {'queue.declare',0,\n <<\"mqtt-subscription-0000007.clientqos1\">>,\n ..." in context child_terminated
2020-06-05 09:24:16.449 [error] <0.3722.200> Supervisor {<0.3722.200>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3723.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3724.200> exit with reason reached_max_restart_intensity in context shutdown
** Last message in was {'EXIT',<0.3779.200>,{timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}}
** When Server state == {ch,{conf,running,rabbit_framing_amqp_0_9_1,1,<0.3779.200>,<0.3779.200>,<0.3724.200>,<<"10.28.5.74:43814 -> 10.28.4.15:8883">>,undefined,{user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]},<<"/">>,<<>>,<0.3771.200>,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentication_failure_close">>,bool,true}],none,10,134217728,undefined,#{<<"client_id">> => <<"0000007.client">>}},{lstate,<0.3837.200>,false},none,1,{0,{[],[]}},#{},{state,#{},erlang},#{},#{},{set,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{state,fine,10000,undefined},false,1,{unconfirmed,{0,nil},#{},#{}},[],[],none,flow,[],#{},#Ref<0.3436069672.2065432577.135970>,1000000000}
** {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}}
2020-06-05 09:24:16.449 [error] <0.3838.200> CRASH REPORT Process <0.3838.200> with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in gen_server2:terminate/3 line 1183
2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason {timeout,{gen_server,call,[<0.3841.200>,{call,{'queue.declare',0,<<"mqtt-subscription-0000007.clientqos1">>,true,false,false,false,false,[]},none,<0.2375.200>},60000]}} in context child_terminated
2020-06-05 09:24:16.451 [error] <0.3780.200> Supervisor {<0.3780.200>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.3779.200>, <0.3779.200>, <0.3724.200>, <<"10.28.5.74:43814 -> 10.28.4.15:8883">>, rabbit_framing_amqp_0_9_1, {user,<<"panel-0000007.rmq-test.com">>,[client],[{rabbit_auth_backend_cache,none}]}, <<"/">>, [{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,...},...], <0.3771.200>, <0.3837.200>, {amqp_params_direct,<<"panel-0000007.rmq-test.com">>,none,<<"/">>,'rabbit@rabbitmq-2.rabbitmq.default.svc.cluster.local',...}) at <0.3838.200> exit with reason reached_max_restart_intensity in context shutdown
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitm...@googlegroups.com.