RabbitMQ mirrored queues containing messages very slow IO

Jelle Smet

unread,

Feb 15, 2016, 10:19:39 AM2/15/16

to rabbitmq-users

Hi all,

I have 2 clustered RabbitMQ nodes running RabbitMQ 3.6.0 on Erlang 18.2, CentOS 7.2, Linux kernel-3.10.0-327.4.4.el7.x86_64

I have setup a small test where I consume and produce the same messages to the same queue effectively creating a "feedback" loop.
I graph the throughput in order to get some insight into what the impact might be of a certain configuration.

The queue I have created is a durable, mirrored queue and is called "test_queue".
Consuming happens with a "prefetch" value of 1500 and "no_ack" true.

Scenario 1:

My stress test tool continuously produces 10 new messages of 233 bytes per second into the "test_queue"
The "test_queue" queue is then consumed by the same process and the consumed messages are produced again into the "test_queue" basically creating an endless loop.

Initially, the throughput is fine at 15k/s but gradually decreases to lower throughput but still fine. (presumably because of the growing number of messages see attached graph).

Scenario 2:

I first produce 1000000 messages of 233 bytes into the empty queue "test_queue".

Once done, I start my test tool to consume the messages and produce them back into the same queue.

Throughput now crawls to ~ 100 msg/s which is, at least to my feeling, too slow.

When the throughput is this slow, epmd is at 100% cpu constantly.

Tweaking ha-sync-batch-size does not seem to help at first glance.

If you have any tips/pointers about what is happening that would be great.

Cheers,

Jelle

Selection_122.png

Michael Klishin

unread,

Feb 15, 2016, 10:25:24 AM2/15/16

to rabbitm...@googlegroups.com, Jelle Smet

On 15 February 2016 at 18:19:44, Jelle Smet (smet...@gmail.com) wrote:
> I first produce 1000000 messages of 233 bytes into the empty
> queue "test_queue".
> Once done, I start my test tool to consume the messages and produce
> them back into the same queue.

Can you please post a code snippet that demonstrates what exactly you're doing?

> When the throughput is this slow, epmd is at 100% cpu constantly.
> Tweaking ha-sync-batch-size does not seem to help at first glance.

epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system.

Sync batch size does not control message replication, only initial sync when a mirror
is unsynchronised (e.g. when a new node or a new mirror come up, or a node recovers from
being unavailable).

In tests like this, it's a good idea to use lazy queues to not be affected by unpredictable
message paging, too.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Jelle Smet

unread,

Feb 15, 2016, 10:56:39 AM2/15/16

to rabbitmq-users, smet...@gmail.com

Can you please post a code snippet that demonstrates what exactly you're doing?

There's no slimmed down code snipped I can show you but you should be able to replicate this behavior when you have installed wishbone (dev branch) https://github.com/smetj/wishbone/tree/develop

Once installed you can run it with following bootstrap file https://gist.github.com/smetj/87f9469ecd031f9ea950.

$ wishbone debug --config bootstrap.yaml

epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system.

Good to know.

Sync batch size does not control message replication, only initial sync when a mirror
is unsynchronised (e.g. when a new node or a new mirror come up, or a node recovers from
being unavailable).

I see. That's a misunderstanding from my side.

In tests like this, it's a good idea to use lazy queues to not be affected by unpredictable
message paging, too.

Ok I could have a look into that.

Thanks for the feedback.

Jelle Smet

unread,

Feb 15, 2016, 11:57:23 AM2/15/16

to rabbitmq-users, smet...@gmail.com

This is me deleting the HA policy whilst the Wishbone process is consuming/producing from the queue.

The HA policy looks like:

Name

Pattern

Apply to

Definition

Priority

Test

^ha_.*

all

ha-mode:	all
ha-sync-mode:	automatic

0

Jelle Smet

unread,

Feb 16, 2016, 4:41:56 AM2/16/16

to rabbitmq-users, smet...@gmail.com

epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system.

Sorry I double checked. It's "beam.smp" consuming 100% cpu when throughput is that low.

Michael Klishin

unread,

Feb 16, 2016, 5:28:30 AM2/16/16

to rabbitm...@googlegroups.com, Jelle Smet

On 16 February 2016 at 12:41:59, Jelle Smet (smet...@gmail.com) wrote:
> It's "beam.smp" consuming 100% cpu when throughput is that
> low.

I don’t know anything about wishbone or what it does but my guess would be that it is
related to https://groups.google.com/forum/#!searchin/rabbitmq-users/slow$20GC/rabbitmq-users/6ucb0Dwns-M/gj9bvxnmDAAJ.

You can try setting rabbit.hipe_compile to true to compare:
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L298

Michael Klishin

unread,

Feb 16, 2016, 5:33:38 AM2/16/16

to rabbitm...@googlegroups.com, Jelle Smet

On 16 February 2016 at 13:28:23, Michael Klishin (mkli...@pivotal.io) wrote:
> You can try setting rabbit.hipe_compile to true to compare:
> https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L298

err, I meant to say “try ERL_FULLSWEEP_AFTER as in the post above and also enable HiPE compilation”.

Jelle Smet

unread,

Feb 16, 2016, 7:01:35 AM2/16/16

to rabbitmq-users, smet...@gmail.com

err, I meant to say “try ERL_FULLSWEEP_AFTER as in the post above and also enable HiPE compilation”.

I have enabled ERL_FULLSWEEP_AFTER and assigned it a value of 32 (I have *no* idea what I'm doing here)

Initially the outcome of (presumably) that change was promising but after a while without any intervention of my side or a change in environmental factors throughput fall back to the original rate:

Besides that, enabling HiPE also doesn't seem to work with the RabbitMQ provided rpms for both rabbitmq-server (rabbitmq-server-3.6.0-1.noarch.rpm) and erlang (erlang-18.2-1.el7.centos.x86_64.rpm)

I can find in "/var/log/rabbitmq/startup_log" following information logged last:

HiPE compiling: |---------------------------------------------------------|

|[FAILED]

Jelle Smet

unread,

Feb 16, 2016, 9:56:04 AM2/16/16

to rabbitmq-users, smet...@gmail.com

I don’t know anything about wishbone or what it does ...

If you (or someone else) would like to replicate the problem I'm experiencing you can build a Docker container of wishbone using this build file:

https://gist.github.com/smetj/a7f8499e75c2855da417

To run Wishbone you can execute something like this:

$ docker run --volume ${PWD}/rabbit.yaml:/tmp/rabbit.yaml smetj/wishbone:2.1.0 debug --config /tmp/rabbit.yaml

The rabbit.yaml bootstrap file can be found here: https://gist.github.com/smetj/87f9469ecd031f9ea950 (change in here the hostname where rabbitmq runs)