RabbitMQ on OpenStack with a persistent queue runs Very Slow

694 views
Skip to first unread message

Phill Tomlinson

unread,
Mar 30, 2015, 11:45:22 AM3/30/15
to rabbitm...@googlegroups.com
Hi,

I have been running RabbitMQ brokers as docker containers. This seemed to work fine on my local VM, but when I try to use them in my companies OpenStack environment the speed of the broker slows dramatically (its around 4-5x slower, but the Openstack VM has more resources).

I am using a persistent queue so not sure if there is some sort of disk issue?

Anyone else has the same issue?

Thanks,
Phill

Michael Klishin

unread,
Mar 30, 2015, 11:54:43 AM3/30/15
to Phill Tomlinson, rabbitm...@googlegroups.com
On 30 March 2015 at 18:45:24, Phill Tomlinson (p.g.to...@gmail.com) wrote:
> I have been running RabbitMQ brokers as docker containers.
> This seemed to work fine on my local VM, but when I try to use them
> in my companies OpenStack environment the speed of the broker
> slows dramatically (its around 4-5x slower, but the Openstack
> VM has more resources).
>
> I am using a persistent queue so not sure if there is some sort of
> disk issue?

iostat can help answer that question. It can be that you don't have mirroring locally but mirror to N
nodes on OpenStack. Mirroring adds significantly to intra-cluster traffic and the amount of coordination
nodes have to do.

You may also want to let RabbitMQ node(s) on OpenStack to use more RAM and make it page later:
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L176-188 
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Phill Tomlinson

unread,
Mar 30, 2015, 12:02:24 PM3/30/15
to rabbitm...@googlegroups.com, p.g.to...@gmail.com
Thanks.

When I turn off persistence OpenStack gives significantly better performance than my laptop - around 5x better. Its as soon as I write the messages to disk that Openstack then drops off.

When I used Kafka, another messaging technology, with disk persistence I did not observe this drop in performance. Will get IOStat on there and see if that gives any clues.

Phill

Michael Klishin

unread,
Mar 30, 2015, 12:13:07 PM3/30/15
to Phill Tomlinson, rabbitm...@googlegroups.com
On 30 March 2015 at 19:02:26, Phill Tomlinson (p.g.to...@gmail.com) wrote:
> When I turn off persistence OpenStack gives significantly
> better performance than my laptop - around 5x better. Its as soon
> as I write the messages to disk that Openstack then drops off.

This is expected: not having to hit the disk is .

Depending on the workload allowing RabbitMQ use more RAM and page later may help.

You can also enable HiPE if throughput is your primary concern, it can offer double digit % improvement (again, workload-dependent):
https://github.com/rabbitmq/rabbitmq-server/issues/58#issuecomment-77812255
https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq.config.example#L240 

Kafka almost always offers higher throughput (and trades off many features for that).

Phill Tomlinson

unread,
Mar 30, 2015, 12:18:04 PM3/30/15
to rabbitm...@googlegroups.com, p.g.to...@gmail.com
Sorry should have been clearer.

RabbitMQ with no disk persistence (openstack considerably quicker):
Laptop - 160msg/s
Openstack -1,000msg/s

With disk persistence (not sure what happens here as the ratios are not as expected):
Laptop - 100msg/s
Openstack - 65msg/s

So there is a large discrepancy.

When I mentioned Kafka (Kafka always writes to disk, you don't have an option not too) this was with full acks - but it went a lot faster in Openstack compared to my laptop.

Thanks,
Phill

Michael Klishin

unread,
Mar 30, 2015, 12:22:13 PM3/30/15
to Phill Tomlinson, rabbitm...@googlegroups.com
On 30 March 2015 at 19:18:05, Phill Tomlinson (p.g.to...@gmail.com) wrote:
> RabbitMQ with no disk persistence (openstack considerably
> quicker):
> Laptop - 160msg/s
> Openstack -1,000msg/s
>
> With disk persistence (not sure what happens here as the ratios
> are not as expected):
> Laptop - 100msg/s
> Openstack - 65msg/s
>
> So there is a large discrepancy.

Those are very low numbers regardless of whether you publish messages as persistent.

How are you measuring this, do you use our PerfTest tool? Something else?

Does your laptop have an SSD? What about your OpenStack node? 

Phill Tomlinson

unread,
Mar 30, 2015, 12:31:49 PM3/30/15
to rabbitm...@googlegroups.com, p.g.to...@gmail.com
I'm using full acknowledgements on a per message basis, therefore you are correct it is slower than using something like publisher confirms for example.

I'm using JMeter to test messages going in.

My laptop has an SSD, however like I say in Openstack Kafka was much quicker and I was using full synchronized blocking calls between each message being confirmed.

I have heard of an known issue for example with MySQL when run in Openstack where the disk writes do not perform that well. I wondered if this was a similar issue.

Michael Klishin

unread,
Mar 30, 2015, 8:08:55 PM3/30/15
to Phill Tomlinson, rabbitm...@googlegroups.com
On 30 March 2015 at 19:31:50, Phill Tomlinson (p.g.to...@gmail.com) wrote:
> I'm using full acknowledgements on a per message basis, therefore
> you are correct it is slower than using something like publisher
> confirms for example.

What are "full acknowledgements", if not publisher confirms? Publisher confirms is the recommended way
(waiting for each message is possible with the Java client, although not necessary).

Kafka is a different system with different trade-offs. It will offer better throughput most of the time. What is surprising
is how low the throughput you are observing, regardless of whether Kafka is faster and by how much.

I highly recommend trying PeftTest:
http://www.rabbitmq.com/java-tools.html

Even on modest hardware with HDD's and (streaming) publisher confirms you should be seeing 5-10K small messages
a second with no or next to no configuration tuning .

Simon MacMullen

unread,
Mar 31, 2015, 6:02:09 AM3/31/15
to Phill Tomlinson, rabbitm...@googlegroups.com
To expand a bit on what Michael said:

If you are publishing a message, then waiting for confirmation before
you publish the next message, you will get bad performance. Network
latency will quite possibly dominate, and even if it doesn't then the
persister will not perform as well as you might hope.

Why? Because queues attempt to write to disk in batches - essentially
they accept messages as they come in, then write out a batch whenever
they go idle, or after 200ms. And they only issue confirmations for
persistent messages after they have been written to disk. So in the
worst case (where the queue is servicing other requests and never goes
idle) you can wait 200ms for each confirm. Even if the queue is doing
nothing else it still has to go idle.

So one-publish-one-confirm is a performance antipattern. Much better to
accept confirms as they come in, or if you can't do that, at least
publish a batch of messages then wait for confirms for all of them
before continuing.

Also: which version are you running? 3.5.0 has a faster persister, as
well as more diagnostic information on how the persister is doing, see
http://www.rabbitmq.com/persistence-conf.html

Cheers, Simon

On 30/03/15 17:31, Phill Tomlinson wrote:
> I'm using full acknowledgements on a per message basis, therefore you
> are correct it is slower than using something like publisher confirms
> for example.
>
> I'm using JMeter to test messages going in.
>
> My laptop has an SSD, however like I say in Openstack Kafka was much
> quicker and I was using full synchronized blocking calls between each
> message being confirmed.
>
> I have heard of an known issue for example with MySQL when run in
> Openstack where the disk writes do not perform that well. I wondered if
> this was a similar issue.
>
> On Monday, 30 March 2015 17:22:13 UTC+1, Michael Klishin wrote:
>
> On 30 March 2015 at 19:18:05, Phill Tomlinson (p.g.to...@gmail.com
> <javascript:>) wrote:
> > RabbitMQ with no disk persistence (openstack considerably
> > quicker):
> > Laptop - 160msg/s
> > Openstack -1,000msg/s
> >
> > With disk persistence (not sure what happens here as the ratios
> > are not as expected):
> > Laptop - 100msg/s
> > Openstack - 65msg/s
> >
> > So there is a large discrepancy.
>
> Those are very low numbers regardless of whether you publish
> messages as persistent.
>
> How are you measuring this, do you use our PerfTest tool? Something
> else?
>
> Does your laptop have an SSD? What about your OpenStack node?
> --
> MK
>
> Staff Software Engineer, Pivotal/RabbitMQ
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "rabbitmq-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to rabbitmq-user...@googlegroups.com
> <mailto:rabbitmq-user...@googlegroups.com>.
> To post to this group, send email to rabbitm...@googlegroups.com
> <mailto:rabbitm...@googlegroups.com>.
> For more options, visit https://groups.google.com/d/optout.

Simon MacMullen

unread,
Mar 31, 2015, 6:23:44 AM3/31/15
to Phill Tomlinson, rabbitm...@googlegroups.com
On 31/03/15 11:02, Simon MacMullen wrote:
> Why? Because queues attempt to write to disk in batches - essentially
> they accept messages as they come in, then write out a batch whenever
> they go idle, or after 200ms. And they only issue confirmations for
> persistent messages after they have been written to disk. So in the
> worst case (where the queue is servicing other requests and never goes
> idle) you can wait 200ms for each confirm. Even if the queue is doing
> nothing else it still has to go idle.

One final point I forgot: we don't confirm persistent messages until
they are actually fsync()ed to disk. So if you publish one and wait for
one you are quite possibly limited by how fast your disk can fsync(). If
your laptop has an SSD and your server doesn't, that could explain a lot.

3.5.0 will make this clearer (the management plugin will let you see how
often fsync()s are happening) but obviously it can't go any faster.

Cheers, Simon

Phill Tomlinson

unread,
Apr 1, 2015, 3:49:40 AM4/1/15
to rabbitm...@googlegroups.com, p.g.to...@gmail.com
Thanks all the comments.

I agree - I've used publisher confirms before and the output is far superior. The existing application I work on uses the anti-pattern described, but I think if the application can be changed to detect duplicates in event of failure (we can't have duplicates) then using this type of publish mechanism may be possible.

Laing, Michael

unread,
Apr 1, 2015, 4:56:46 AM4/1/15
to Phill Tomlinson, rabbitm...@googlegroups.com
For high-performance, reliable apps, it would be a nice feature to have an exchange option that rejects duplicates passing through it based upon msg_id, or perhaps a configurable header, together with a ring buffer and/or time window, the size(s) of which are configurable. Perhaps they go to the alternate exchange.

We do this with microservices everywhere currently.

Might also help with multi-hop exchange federation which is quite tricky for ordinary humans (myself included) to configure properly into a reliable delivery mesh.

ml

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages