HA queues cause RabbitMQ cluster to partition

2,397 views
Skip to first unread message

Frederick

unread,
Dec 15, 2014, 11:01:35 PM12/15/14
to rabbitm...@googlegroups.com
We are experiencing a problem where using HA queues causes entire cluster to partition and become unresponsive when total send rates exceeds 10k messages per second,
We are running performance tests on a 5 node cluster hosted on AWS EC2 instances.  
For example, within a few minutes of starting test with 10 producers each sending 1k messages/per sec, rabbit nodes will partition and eventually, entire cluster becomes unresponsive.  Around the time partitioning occurs, rabbitmq logs will show output like this:

Error log of rabbit connection loss event:
=ERROR REPORT==== 8-Dec-2014::21:25:34 ===
Partial partition detected:
 * We saw DOWN from rabbit@rabbit5
 * We can still see rabbit@rabbit4 which can see rabbit@rabbit5
We will therefore intentionally disconnect from rabbit@rabbit4

In addition, our Management UI flashes a red alert window on stating:
"Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. This is a dangerous situation. RabbitMQ clusters should not be installed on networks which can experience partitions."

We have tried several approaches to increase performance through AWS including upgrade to Compute Optimized 8 cpu instances, EBS-optimized, using placement groups with not much better results.
We have also tried experimenting with autoheal and increasing net_ticktime to 180.  Those have helped slightly but only pushed back the inevitable partition by a couple minutes.

Our HA policy setting was configured as follows:
rabbitmqctl set_policy ha-two-nodes ".*" "{"ha-mode":"exactly","ha-params":2}"

With further detailed analysis using collectd tool to monitor AWS EC2 nodes, we see spikes in disk i/o and average write time on some rabbit nodes around time of partition.
Why do mirrored queues cause a spike in disk i/o activity when well below the high water mark?
Is there anything we missed with set_policy config or are HA queues simply not recommended with total throughput greater than 10k messages/sec.

Any insight would be appreciated.
RabbitMQ version: 3.4.2

Michael Klishin

unread,
Dec 16, 2014, 1:44:03 AM12/16/14
to rabbitm...@googlegroups.com, Frederick
On 16 December 2014 at 07:01:37, Frederick (fred.y...@srbtech.com) wrote:
> Why do mirrored queues cause a spike in disk i/o activity when
> well below the high water mark?

Persistent messages routed to durable queues need to be stored on disk regardless of
RAM watermark. Even transient messages can be moved to disk well before the watermark is hi.
If you don't want that, publish some messages as transient.

> Is there anything we missed with set_policy config or are HA queues
> simply not recommended with total throughput greater than 10k
> messages/sec.

The rate may be a red herring.

Please send us logs from all nodes around the event
(say, +/- 5 minutes) off-list. Note that 2/3rds of RabbitMQ engineering team are on holiday.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Dec 16, 2014, 1:54:57 AM12/16/14
to rabbitm...@googlegroups.com, Frederick
 On 16 December 2014 at 09:43:57, Michael Klishin (mkli...@pivotal.io) wrote:
> Please send us logs from all nodes around the event
> (say, +/- 5 minutes) off-list.

…and their configs, too.

Thank you.

Laing, Michael

unread,
Dec 16, 2014, 6:50:10 AM12/16/14
to Michael Klishin, rabbitm...@googlegroups.com, Frederick
We sustain similar rates on mirrored queues in AWS without partition.

However, having encountered such partitions early on in development, we have architected our systems to never use persistent messages. We persist to a Cassandra cluster.

Also we have parallelized our load and run multiple clusters to disperse the strain and add resilience.

You can play with erlang OTP net_ticktime to increase its tolerance...

I suspect that if you are tracking IO_wait on these machines, it will correspond with your partitions.

We now use only SSDs for our mnesia db and logs.

Good luck.

ml



--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Frederick

unread,
Dec 16, 2014, 2:10:55 PM12/16/14
to rabbitm...@googlegroups.com, fred.y...@srbtech.com
We appreciate the help, attached are logs as requested from all 5 nodes with same test described earlier:
10 consumers/10 producers sending at total rate of ~ 10K msg/sec with HA policy of 2 mirrors per queue.

Also, we are not testing with durable queues or persistent messages currently.

Here is our current config on all nodes:

[
  {mnesia, [{dump_log_write_threshold, 1000}]},
  {rabbit, [
        {vm_memory_high_watermark, 0.4},
        {cluster_partition_handling, autoheal},
        {kernel, [{net_ticktime, 180}]}
  ]}
].

Thanks,
Fred

rabbit-logs-12-16.zip

Laing, Michael

unread,
Dec 22, 2014, 2:59:34 PM12/22/14
to Frederick, rabbitm...@googlegroups.com
I wonder if there is any followup on this.

We peak at 5k msgs/sec on our 2 primary production clusters, each spread over 3 AWS zones.

Our production queues are mirrored across 3 machines; each a C3-xlarge.

Total queued messages may spike to 2k every now and then; rabbitmq does no disk IO.

Anyway, we have had no problems with partitions, hence my curiosity.

ml

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Frederick

unread,
Jan 2, 2015, 10:12:33 AM1/2/15
to rabbitm...@googlegroups.com, fred.y...@srbtech.com
Thanks for the feedback Michael, we still have not found a root cause.  Also, after further analysis, the disk i/o spike may be a symptom of a larger issue, rather than main cause as noted earlier.
The only answer for now was switching to Google compute engine with same setup and partitioning is not an issue anymore and we get much better throughput.
So it may have something to do with AWS network.

By the way, are you using HA queues in your production cluster?  As noted, using HA queues is when we have primarily observed partitioning.

- fred

VNA

unread,
Feb 15, 2018, 3:57:53 PM2/15/18
to rabbitmq-users
Did anyone had a solution for the "Network partition detected" issue? Please help

Rob A.

unread,
Aug 17, 2018, 5:25:10 AM8/17/18
to rabbitmq-users
I have the same issue. I have setup a 3 node cluster, with the policy: "ha-mode":"exactly","ha-params":2 within an OpenShift Cluster in AWS.
My cluster uses paue_minority if a partition is detected.
Each node has 8 GB ram (0.5 high watermark --> 4 GB), 4 CPU cores and 30 GB EBS volume.

In my test I have 3 queues: transform, format, tester
A tester service producing persistent messages at fixed rate to transform queue. 14 Consumers are processing these messages and send them back to the tester queue.
The tester service consumes these messages and forwards the message to format queue which also has 14 Consumers sending back the final message to the tester queue.
During this roundtrip the message size increased:
1x 30 KB
2x 120 KB
1x 130 KB

If my tester produces 50 msg/s we have an overall message rate of 200 msg/s with
50 * 30 KB = 1,5MB
100 * 120 KB = 12 MB
50 * 130 KB  = 8 MB
--> 21,5 MB/s

With this rate the cluster is running without any problems.
But if I double the rate (100 msg/s input rate) to 400 msg/s and 43 MB/s , the cluster is crashing instantly (within 5 seconds). Every node detects a partition:



If I disable queue mirroring, the cluster is running fine, even with much higher loads.
800 msg/s input rate (format and transform consumers may only consume 130 msg/s) we have an overall rate of:
1.2K msg/s and 72 MB/s.

In this case the 72 MB/s load is (unequally) splitted to 3 EBS volumes. The 43 MB/s in the HA mode in the test before is duplicated, so we have about 86 MB/s splitted over all nodes, so the disc preassure could be a little bit higher there.
On the other hand side my second test was running much longer with an increasing queue length, which leaded to disk reads. So I assume that the disc preassure could be similar.
To be sure I started the last test with 1 node only and like expected the throughput is even higher:

With disk writes 1300/s and reads 300/s I think we can exclude the EBS performance as reason.

Also I wonder that the partition appears within 5 seconds after starting the test. I didn't change the nettick_time of 60s.

Michael Klishin

unread,
Aug 17, 2018, 6:12:58 AM8/17/18
to rabbitm...@googlegroups.com
This list uses one thread per question. Please start a new one instead of posting to existing threads. This is technical
forum etiquette 101.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Rob A.

unread,
Aug 17, 2018, 7:42:54 AM8/17/18
to rabbitmq-users
Sorry, I have created a new thread.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages