How long should syncing take?

Dave Collins

unread,

Mar 28, 2017, 1:39:34 PM3/28/17

to rabbitmq-users

Hello,

We are still stress-testing our new RabbitMQ setup. Have pounded it with some high volumes and everything went extremely well.

Now we are trying stopping one of the nodes in our mirrored HA cluster:

Two Node Cluster on EC2 t2.large's

Each is RabbitMQ 3.6.8, Erlang 19.3 on Ubuntu

We had a backlog of about 92,000 images (each about 300Kb)

We stopped one of the nodes and all seemed to keep going well.

Then we restarted the stopped node. It appeared to start syncing, but then got stuck at 25%, and the RAM on the "healthy" node went Red at 3.1Gb with a question mark.

Stats look like this:

Messages:

Total: 98,042

Ready: 92,042

Unacked: 6,000

in Memory: 14,927

Persistent: 0

Transient, paged out: 60,838

Message body bytes:

Total: 11GB

Ready: 10GB

Unacked: 653MB

in Memory: 1.4GB

Persistent: 0

Transient, paged out: 7.1GB

Very roughly, how long would you expect this sync to take? 10 minutes? Half an hour? Two hours?

Eventually the entire cluster seemed to just hang. Stress test successful!

Just looking for some overall level setting on how long syncing SHOULD take.

Thanks,

Dave

Michael Klishin

unread,

Mar 28, 2017, 2:15:42 PM3/28/17

to rabbitm...@googlegroups.com

The receiving node will block its source when it's overwhelmed (flow control in modern versions cover eager sync) and in your case, all messages are transient therefore you *asked* RabbitMQ to keep them in RAM as much as possible.

If your queues are durable, publish messages as persistent; otherwise lowering rabbit.vm_memory_high_watermark_paging_ratio or turning queues into lazy ones

will help, as should lowering the eager sync batch size (IIRC it is 4096 messages at a time by default).

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Mar 28, 2017, 2:20:00 PM3/28/17

to rabbitm...@googlegroups.com

It's worth mentioning that 300 kB is way more than an average message size (it's usually < 4kB) so using transient messages and the default batch sync size of 4096

is not something I'd recommend.

Syncing 11 GB of messages will take maybe 150-200% of the time it takes for that much

data to be loaded from disk and transferred between the nodes (this is a ballpark

estimate and not a scientific number). Of course if you run into alarms then

the estimate no longer applies.

On 28 Mar 2017, at 20:39, Dave Collins <davec...@cc-logic.com> wrote:

Dave Collins

unread,

Mar 28, 2017, 4:57:18 PM3/28/17

to rabbitmq-users

Thanks as always Michael for your help!

In the "real world" we are only going to have about 5-10 of these images in flight every second, and the consumer threads can prefetch 100 per thread and we have 30+ consumer threads on different consumer machines. So there will "normally" be nothing in the RMQ image queue. It is just when we simulate extremely high volume + dead consumers + loss of a node of the cluster that we are seeing this.

We had originally set this queue as durable but thought the extra writes to disk were going to slow things down too much in HA, and possibly create an unintended coupling between the producers and consumers due to this bit in the docs: (Publisher Confirms - When will messages be confirmed?) "For routable messages, the basic.ack is sent when a message has been accepted by all the queues. For persistent messages routed to durable queues, this means persisting to disk. For mirrored queues, this means that all mirrors have accepted the message." So our concern was that, in this durable mirrored scenario, we would be slowing down the publisher confirm basic.ack due to the disk writes.

We'll try a few of these suggestions and determine the impact.

Thanks again,

Dave

Dave Collins

unread,

Mar 30, 2017, 12:25:27 PM3/30/17

to rabbitmq-users

Hello Michael,

OK we tried another experiment. This time we set the queue to Durable: true, and the messages to Persistent. We also set ha-sync-batch-size to 80. Since the average image size is around 300kB that should mean that, when it syncs, it only has to deal with around 24Mb of data.

We then pushed this two-node cluster up to around 14K image backlog. All seemed to be running fine. The RAM seemed to, in general, stay balanced between the two machines.

Then we killed a cluster node. Let it sit for a bit, then brought the cluster node back.

The results were pretty much the same (sync did not finish) although it got further through the sync process: The first time it got to about 70% then seemed to be stopped. We tried restarting that node again and it got to 69% and then stopped. The first (surviving) cluster also has a red alarm on the RAM again : 3.2Gb (3.1Gb is high water mark).

The error report is in this gist (Note "reason for termination" around line 145)

https://gist.github.com/brendenyule/d7210caf52784c0467aaf1f8bc30e3d2

Here is a summary of the system memory:

Messages

Total: 14,514

Ready: 14,514

Unacked: 0

In Memory: 0

Persistent: 14,514

Transient: 0

Message Body Bytes

Total: 4.0Gb

Ready: 4.0Gb

Unacked: 0

In Memory: 0

Persistent: 4.0Gb

Transient: 0

It is a bit concerning that we can't survive the restart of a cluster node. Can you please suggest other configuration changes we might make? Should we try this experiment on Ubuntu servers with more RAM?

Thanks in advance

Dave

Michael Klishin

unread,

Mar 30, 2017, 12:31:23 PM3/30/17

to rabbitm...@googlegroups.com

The problem here is that the receiving node hits an alarm. According to your summary

there are no messages on memory but something still triggers the alarm.

You can use `rabbitmqctl status`, node details page in the management UI and finally rabbitmq-top to learn more.

Last time we've seen an issue with eager sync overwhelming a node due to lacking flow control was around 3.6.1-3.6.2

and it involved 100s of gigabytes of data. After we introduced flow control for sync, I don't recall similar reports.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

brend...@gmail.com

unread,

Mar 30, 2017, 4:29:35 PM3/30/17

to rabbitmq-users

Hey Michael,

Working on the project with Dave.

Here is the output of rabbitmqctl status while our node is locked with a memory warning:

```

[{pid,1070},

{running_applications,

[{rabbitmq_top,"RabbitMQ Top","3.6.8"},

{rabbitmq_management_visualiser,"RabbitMQ Visualiser","3.6.8"},

{rabbitmq_management,"RabbitMQ Management Console","3.6.8"},

{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.8"},

{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.8"},

{rabbit,"RabbitMQ","3.6.8"},

{cowboy,"Small, fast, modular HTTP server.","1.0.4"},

{ranch,"Socket acceptor pool for TCP protocols.","1.3.0"},

{ssl,"Erlang/OTP SSL application","8.1.1"},

{public_key,"Public key infrastructure","1.4"},

{cowlib,"Support library for manipulating Web protocols.","1.0.2"},

{crypto,"CRYPTO","3.7.3"},

{mnesia,"MNESIA CXC 138 12","4.14.3"},

{os_mon,"CPO CXC 138 46","2.4.2"},

{inets,"INETS CXC 138 49","6.3.6"},

{amqp_client,"RabbitMQ AMQP Client","3.6.8"},

{rabbit_common,

"Modules shared by rabbitmq-server and rabbitmq-erlang-client",

"3.6.8"},

{compiler,"ERTS CXC 138 10","7.0.4"},

{xmerl,"XML parser","1.3.13"},

{syntax_tools,"Syntax tools","2.1.1"},

{asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},

{sasl,"SASL CXC 138 11","3.0.3"},

{stdlib,"ERTS CXC 138 10","3.3"},

{kernel,"ERTS CXC 138 10","5.2"}]},

{os,{unix,linux}},

{erlang_version,

"Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:2:2] [async-threads:64] [hipe] [kernel-poll:true]\n"},

{memory,

[{total,3398023440},

{connection_readers,463912},

{connection_writers,9216},

{connection_channels,824200},

{connection_other,367504},

{queue_procs,438344},

{queue_slave_procs,16990856},

{plugins,4379184},

{other_proc,24018184},

{mnesia,151312},

{metrics,225664},

{mgmt_db,1396112},

{msg_index,1878640},

{other_ets,2953144},

{binary,3309367288},

{code,24794668},

{atom,1033401},

{other_system,8954643}]},

{alarms,[memory]},

{listeners,

[{clustering,25672,"::"},

{amqp,5672,"::"},

{'amqp/ssl',5671,"::"},

{http,15672,"::"}]},

{vm_memory_high_watermark,0.4},

{vm_memory_limit,3347939328},

{disk_free_limit,50000000},

{disk_free,146945978368},

{file_descriptors,

[{total_limit,924},{total_used,10},{sockets_limit,829},{sockets_used,6}]},

{processes,[{limit,1048576},{used,487}]},

{run_queue,0},

{uptime,472},

{kernel,{net_ticktime,60}}]

```

rabbitmq-top doesn't seem to show anything odd.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Mar 30, 2017, 4:55:45 PM3/30/17

to rabbitm...@googlegroups.com

So nearly all of the memory is the binary heap :/

Try enabling background GC (which was disabled in 3.6.7 by default because it's not worth the

effect it has on latency):

https://github.com/rabbitmq/rabbitmq-server/blob/stable/docs/rabbitmq.config.example#L319

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

brend...@gmail.com

unread,

Mar 30, 2017, 5:51:30 PM3/30/17

to rabbitmq-users

GC didn't seem to help us out.

Honestly the problem happens almost as soon as the node joins the cluster, it probably doesn't have time to collect.

I will try making the GC interval shorter as well

{memory,

[{total,3444942128},

{connection_readers,164248},

{connection_writers,7680},

{connection_channels,110080},

{connection_other,210776},

{queue_procs,158112},

{queue_slave_procs,9818088},

{plugins,2891992},

{other_proc,18081448},

{mnesia,133336},

{metrics,219080},

{mgmt_db,907672},

{msg_index,1828352},

{other_ets,2956648},

{binary,3372915912},

{code,24784275},

{atom,1033401},

{other_system,8937276}]}

Michael Klishin

unread,

Mar 30, 2017, 6:25:35 PM3/30/17

to rabbitm...@googlegroups.com

The problem with binary heap GC is that it's lazy and RabbitMQ has no control

over when it happens. Usually the binary heap doesn't grow quickly or stay that way

for long in 19.x.

We will try reproducing this but FWIW I don't recall this reported in a while.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward