Mirrored queues cause cluster partition

239 views
Skip to first unread message

Rob A.

unread,
Aug 17, 2018, 7:40:33 AM8/17/18
to rabbitmq-users
Hello,

I have setup a 3 node rabbitmq 3.7.7. cluster, with the queue policy: "ha-mode":"exactly","ha-params":2 within an OpenShift Cluster in AWS.
My cluster uses paue_minority if a partition is detected.
Each node has 8 GB ram (0.49 high watermark --> ~4 GB), 4 CPU cores and 30 GB EBS volume.

In my test I have 3 queues: transform, format, tester:


A tester service producing persistent messages at fixed rate to transform queue. 14 Consumers are processing these messages and send them back to the tester queue.
The tester service consumes these messages and forwards the message to format queue which also has 14 Consumers sending back the final message to the tester queue.
During this roundtrip the message size increased:
1x 30 KB
2x 120 KB
1x 130 KB

If my tester produces 50 msg/s we have an overall message rate of 200 msg/s with
50 * 30 KB = 1,5MB
100 * 120 KB = 12 MB
50 * 130 KB  = 8 MB
--> 21,5 MB/s

With this rate the cluster is running without any problems.
But if I double the rate (100 msg/s input rate) to 400 msg/s and 43 MB/s , the cluster is crashing instantly (within 5 seconds). Every node detects a partition:



Finally the queues are getting recreated, but the policy is not applied anymore:


If I disable queue mirroring, the cluster is running fine, even with much higher loads.
800 msg/s input rate (format and transform consumers may only consume 130 msg/s) we have an overall rate of:
1.2K msg/s and 72 MB/s.

In this case the 72 MB/s load is (unequally) splitted to 3 EBS volumes. The 43 MB/s in the HA mode in the test before is duplicated, so we have about 86 MB/s splitted over all nodes, so the disc preassure could be a little bit higher there.
On the other hand side my second test was running much longer with an increasing queue length, which leaded to disk reads. So I assume that the disc preassure could be similar.
To be sure I started the last test with 1 node only and like expected the throughput is even higher:

With disk writes 1300/s and reads 300/s I think we can exclude the EBS performance as reason.

Also I wonder that the partition appears within 5 seconds after starting the test. I didn't change the nettick_time of 60s.

Additioally some screenshots from the I/O graph with a single node, and a 3 node HA cluster:

Single

3 Node Cluster with test rate: 50 msg/s

RabbitMQ-0

 

RabbitMQ-1

RabbitMQ-2


So even if I double the 40 MB/s I/O of Node-1 it is not much higher than the single node I/O.

environment.txt

Michael Klishin

unread,
Aug 17, 2018, 7:49:05 AM8/17/18
to rabbitm...@googlegroups.com
Inter-node communication buffer that's too low or too high can result in similar behavior. In 3.7.7 it is 10 times the value
it should be by default [1]. Set it to 128 MB and compare.

Also, while the charts are nice to have, server logs and Erlang version are a lot more important and do not see them
posted in this thread.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Rob A.

unread,
Aug 17, 2018, 9:00:46 AM8/17/18
to rabbitmq-users
Thank you, I have set these values in my docker entrypoint:
export RABBITMQ_DISTRIBUTION_BUFFER_SIZE=128000
export RABBITMQ_MAX_NUMBER_OF_PROCESSES=1048576
export RABBITMQ_MAX_NUMBER_OF_ATOMS=500000

Unfortunately it is still crashing. How can I see if these values where applied?

I'm using erlang 21.0.4. Attached you will find the logs.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
node1.log
node2.log
node0.log

Michael Klishin

unread,
Aug 17, 2018, 9:38:17 AM8/17/18
to rabbitm...@googlegroups.com
You can see the values in `ps aux | grep beam` output, they should be passed to the beam.smp process.

The nodes report

> 2018-08-17 12:33:39.846 [info] <0.421.0> node 'rab...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local' down: killed

which is not specific enough for us to suggest much.

What's the effective configuration of the nodes? [1]

How much RAM do the nodes have? Is there a way to capture their stdout/stderr (it is more
likely to contain errors from the runtime itself than the log)? RabbitMQ Debian packages redirect them to /var/log/rabbitmq, for example.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Aug 17, 2018, 9:41:19 AM8/17/18
to rabbitm...@googlegroups.com
You can leave out RABBITMQ_MAX_NUMBER_OF_PROCESSES and RABBITMQ_MAX_NUMBER_OF_ATOMS as they
are very unlikely to be relevant here.

The runtime has a hard limit on the distribution buffer size (that kicks off when it is reached, not configured). My best guess
is that with a really large buffer (e.g. the 1.28 GB 3.7.7 uses by default) that limit is reached with enough traffic and the runtime just stops. RabbitMQ
has no chance to know why or log anything. The runtime will spit an error message to stdout/stderr (not sure which one), though.

Another reason can be an OOM killer since TCP connection buffers consume RAM.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rob A.

unread,
Aug 20, 2018, 5:03:45 AM8/20/18
to rabbitmq-users
The containers didn't get OOMKilled. If the cgroup limit is reached all container processes should be killed, which wasn't the case.

I did some additional testing:
* I have increased the RAM to 14 GB per node.
* I have stopped all consumers to exclude the consumer load and have a fixed rate to find the tipping point

The message size is 30 KB. Sending with a rate of 500 msg/s (15 MB/s) the cluster is stable.
Sending with a rate of 1000 msg/s (30 MB/s) the cluster crashes after max. 5 seconds. Therefore the buffer size should be max. 150 MB.
Message has been deleted

Rob A.

unread,
Aug 20, 2018, 7:44:17 AM8/20/18
to rabbitmq-users
I have increased my ressources:
30 GB RAM (14 GB high watermark) per node,
8 CPU cores per node,
500 GB EBS volume per node (to increase max. I/OPs)

With exactly the same outcome like in the tests before.

I verified, the distribution buffer size was applied:
sh-4.2$ ps -aux | grep beam
1003860+    315  3.1  0.1 7570892 148932 ?      Sl   09:12   3:23 /usr/lib64/erlang/erts-10.0.4/bin/beam.smp -W w -A 192 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t
 500000 -stbt db -zdbbl 128000 -K true -B i -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.7.7/ebin -noshell -noinput -s rabbit boot
 -name rab...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local -boot start_sasl -conf /etc/rabbitmq/rabbitmq -conf_dir /var/lib/rabbitmq/config -conf_script_dir /usr/lib/rabbitmq/bin -con
f_schema_dir /var/lib/rabbitmq/schema -conf_advanced /etc/rabbitmq/advanced -config /etc/rabbitmq/advanced -kernel inet_default_connect_options [{nodelay,true}] -pa "/usr/lib64/erlang/lib/ssl-9.0/ebin
" -proto_dist inet_tls -ssl_dist_opt server_depth 2 -ssl_dist_opt server_cacertfile /opt/cacerts/cacerts.pem -ssl_dist_opt server_certfile /opt/ssl/server/oms_server.cer -ssl_dist_opt server_keyfile /
opt/ssl/server/oms_server.key -ssl_dist_opt server_secure_renegotiate true client_secure_renegotiate true -sasl errlog_type error -sasl sasl_error_logger false -rabbit lager_log_root "/var/log/rabbitm
q" -rabbit lager_default_file "/var/log/rabbitmq/rabbit@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local.log" -rabbit lager_upgrade_file "/var/log/rabbitmq/rabbit@rabbitmq-0.rabbitmq-clus
ter.dcrpi-omsf-dev0.svc.cluster.local_upgrade.log" -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.7
.7/plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rab...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup
false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rab...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen
_max 25672


I also tested with RabbitMQ 3.7.6 and Erlang 20.3.8.
If I send with a rate of 1000 msg/s the UI becomes a little bit unresponsive, but the messages where transmitted to the queue successfully.
The cluster didn't crash.

Michael Klishin

unread,
Aug 21, 2018, 2:25:44 PM8/21/18
to rabbitm...@googlegroups.com
Without a way to reproduce I am out of ideas.

On Mon, Aug 20, 2018 at 2:44 PM, Rob A. <robby...@gmail.com> wrote:
I have increased my ressources:
30 GB RAM (14 GB high watermark) per node,
8 CPU cores per node,
500 GB EBS volume per node (to increase max. I/OPs)

With exactly the same outcome like in the tests before.

I verified, the distribution buffer size was applied:
sh-4.2$ ps -aux | grep beam
1003860+    315  3.1  0.1 7570892 148932 ?      Sl   09:12   3:23 /usr/lib64/erlang/erts-10.0.4/bin/beam.smp -W w -A 192 -MBas ageffcbf -MHas ageffcbf -MBlmbcs 512 -MHlmbcs 512 -MMmcs 30 -P 1048576 -t
 500000 -stbt db -zdbbl 128000 -K true -B i -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.7.7/ebin -noshell -noinput -s rabbit boot
 -name rab...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local -boot start_sasl -conf /etc/rabbitmq/rabbitmq -conf_dir /var/lib/rabbitmq/config -conf_script_dir /usr/lib/rabbitmq/bin -con
f_schema_dir /var/lib/rabbitmq/schema -conf_advanced /etc/rabbitmq/advanced -config /etc/rabbitmq/advanced -kernel inet_default_connect_options [{nodelay,true}] -pa "/usr/lib64/erlang/lib/ssl-9.0/ebin
" -proto_dist inet_tls -ssl_dist_opt server_depth 2 -ssl_dist_opt server_cacertfile /opt/cacerts/cacerts.pem -ssl_dist_opt server_certfile /opt/ssl/server/oms_server.cer -ssl_dist_opt server_keyfile /
opt/ssl/server/oms_server.key -ssl_dist_opt server_secure_renegotiate true client_secure_renegotiate true -sasl errlog_type error -sasl sasl_error_logger false -rabbit lager_log_root "/var/log/rabbitm
q" -rabbit lager_default_file "/var/log/rabbitmq/rabbit@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local.log" -rabbit lager_upgrade_file "/var/log/rabbitmq/rabbit@rabbitmq-0.rabbitmq-clus
ter.dcrpi-omsf-dev0.svc.cluster.local_upgrade.log" -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/lib/rabbitmq/lib/rabbitmq_server-3.7
.7/plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbi...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local-plugins-expand" -os_mon start_cpu_sup false -os_mon start_disksup
false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbi...@rabbitmq-0.rabbitmq-cluster.dcrpi-omsf-dev0.svc.cluster.local" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen
_max 25672
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages