RabbitMQ mirrored queues containing messages very slow IO

589 views
Skip to first unread message

Jelle Smet

unread,
Feb 15, 2016, 10:19:39 AM2/15/16
to rabbitmq-users
Hi all,

I have 2 clustered RabbitMQ nodes running RabbitMQ 3.6.0 on Erlang 18.2, CentOS 7.2, Linux kernel-3.10.0-327.4.4.el7.x86_64

I have setup a small test where I consume and produce the same messages to the same queue effectively creating a "feedback" loop.
I graph the throughput in order to get some insight into what the impact might be of a certain configuration.

The queue I have created is a durable, mirrored queue and is called "test_queue".
Consuming happens with a "prefetch" value of 1500 and "no_ack" true.

Scenario 1:

My stress test tool continuously produces 10 new messages of 233 bytes per second into the "test_queue"
The "test_queue" queue is then consumed by the same process and the consumed messages are produced again into the "test_queue" basically creating an endless loop.
Initially, the throughput is fine at 15k/s but gradually decreases to lower throughput but still fine. (presumably because of the growing number of messages see attached graph).

Scenario 2:

I first produce 1000000 messages of 233 bytes into the empty queue "test_queue".
Once done, I start my test tool to consume the messages and produce them back into the same queue.
Throughput now crawls to ~ 100 msg/s which is, at least to my feeling, too slow.

When the throughput is this slow, epmd is at 100% cpu constantly.
Tweaking  ha-sync-batch-size does not seem to help at first glance.


If you have any tips/pointers about what is happening that would be great.

Cheers,

Jelle

Selection_122.png

Michael Klishin

unread,
Feb 15, 2016, 10:25:24 AM2/15/16
to rabbitm...@googlegroups.com, Jelle Smet
On 15 February 2016 at 18:19:44, Jelle Smet (smet...@gmail.com) wrote:
> I first produce 1000000 messages of 233 bytes into the empty
> queue "test_queue".
> Once done, I start my test tool to consume the messages and produce
> them back into the same queue.

Can you please post a code snippet that demonstrates what exactly you're doing?

> When the throughput is this slow, epmd is at 100% cpu constantly.
> Tweaking ha-sync-batch-size does not seem to help at first glance.

epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system.

Sync batch size does not control message replication, only initial sync when a mirror
is unsynchronised (e.g. when a new node or a new mirror come up, or a node recovers from
being unavailable).

In tests like this, it's a good idea to use lazy queues to not be affected by unpredictable
message paging, too. 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Jelle Smet

unread,
Feb 15, 2016, 10:56:39 AM2/15/16
to rabbitmq-users, smet...@gmail.com

Can you please post a code snippet that demonstrates what exactly you're doing?


There's no slimmed down code snipped I can show you but you should be able to replicate this behavior when you have installed wishbone (dev branch) https://github.com/smetj/wishbone/tree/develop
Once installed you can run it with following bootstrap file https://gist.github.com/smetj/87f9469ecd031f9ea950.

$ wishbone debug --config bootstrap.yaml


epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system.


Good to know.
 
Sync batch size does not control message replication, only initial sync when a mirror
is unsynchronised (e.g. when a new node or a new mirror come up, or a node recovers from
being unavailable).


I see.  That's a misunderstanding from my side.
 
In tests like this, it's a good idea to use lazy queues to not be affected by unpredictable
message paging, too. 

Ok I could have a look into that.

Thanks for the feedback.

Jelle Smet

unread,
Feb 15, 2016, 11:57:23 AM2/15/16
to rabbitmq-users, smet...@gmail.com
This is me deleting the HA policy whilst the Wishbone process is consuming/producing from the queue.



The HA policy looks like:

NamePatternApply toDefinitionPriority
Test^ha_.*all
ha-mode:all
ha-sync-mode:automatic
0

Jelle Smet

unread,
Feb 16, 2016, 4:41:56 AM2/16/16
to rabbitmq-users, smet...@gmail.com

epmd is not used to transfer messages; it's a node discovery daemon (similar in purpose to what DNS does). I can't see how
it would need to go to 100% CPU under normal circumstances. So something is up
with your system. 

Sorry I double checked.  It's "beam.smp" consuming 100% cpu when throughput is that low.

Michael Klishin

unread,
Feb 16, 2016, 5:28:30 AM2/16/16
to rabbitm...@googlegroups.com, Jelle Smet
On 16 February 2016 at 12:41:59, Jelle Smet (smet...@gmail.com) wrote:
> It's "beam.smp" consuming 100% cpu when throughput is that
> low.

I don’t know anything about wishbone or what it does but my guess would be that it is
related to https://groups.google.com/forum/#!searchin/rabbitmq-users/slow$20GC/rabbitmq-users/6ucb0Dwns-M/gj9bvxnmDAAJ.

You can try setting rabbit.hipe_compile to true to compare:
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L298 

Michael Klishin

unread,
Feb 16, 2016, 5:33:38 AM2/16/16
to rabbitm...@googlegroups.com, Jelle Smet
On 16 February 2016 at 13:28:23, Michael Klishin (mkli...@pivotal.io) wrote:
> You can try setting rabbit.hipe_compile to true to compare:
> https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L298

err, I meant to say “try ERL_FULLSWEEP_AFTER as in the post above and also enable HiPE compilation”. 

Jelle Smet

unread,
Feb 16, 2016, 7:01:35 AM2/16/16
to rabbitmq-users, smet...@gmail.com

err, I meant to say “try ERL_FULLSWEEP_AFTER as in the post above and also enable HiPE compilation”. 

I have enabled ERL_FULLSWEEP_AFTER and assigned it a value of 32 (I have *no* idea what I'm doing here)
Initially the outcome of (presumably) that change  was promising but after a while without any intervention of my side or a change in environmental factors throughput fall back to the original rate:





Besides that, enabling HiPE also doesn't seem to work with the RabbitMQ provided rpms for both rabbitmq-server (rabbitmq-server-3.6.0-1.noarch.rpm) and erlang (erlang-18.2-1.el7.centos.x86_64.rpm)

I can find in "/var/log/rabbitmq/startup_log" following information logged last:

HiPE compiling:  |---------------------------------------------------------|
                 |[FAILED]

Jelle Smet

unread,
Feb 16, 2016, 9:56:04 AM2/16/16
to rabbitmq-users, smet...@gmail.com
I don’t know anything about wishbone or what it does ...

If you (or someone else) would like to replicate the problem I'm experiencing you can build a Docker container of wishbone using this build file:

To run Wishbone you can execute something like this:

$ docker run --volume ${PWD}/rabbit.yaml:/tmp/rabbit.yaml smetj/wishbone:2.1.0 debug --config /tmp/rabbit.yaml

The rabbit.yaml bootstrap file can be found here:  https://gist.github.com/smetj/87f9469ecd031f9ea950 (change in here the hostname where rabbitmq runs)

Cheers,

Jelle


Reply all
Reply to author
Forward
0 new messages