Publishing to RabbitMQ degraded by F5?

598 views
Skip to first unread message

Ryan Brown

unread,
May 28, 2015, 4:07:05 PM5/28/15
to rabbitm...@googlegroups.com
Hello all,

I have an application that works as a restful pub/sub system that leverages RabbitMQ for persistence and routing. This system was originally designed to handle load-balancing across out 4 node RabbitMQ cluster in a simple round-robin fashion. We have recently switched to using an F5 to abstract our cluster implementation from the application. However, via a series of performance/load tests we have established that publication throughput into RabbitMQ is reduced by nearly 50% with the F5 handling the load-balancing. To me this is a clear sign that we are doing something wrong with our configuration.

I am admittedly a bit out of my element with some of the F5 settings. Below is the VIP we have currently configured:

ltm virtual rabbitmq.edu4u.net {
destination 10.52.165.51:5672
ip-protocol tcp
mask 255.255.255.255
pool rabbitmq
profiles {
fastL4 { }
}
snatpool app_pool
source-port change
translate-address enabled
translate-port enabled
vlans-disabled
}

The rabbitmq pool looks like this:

ltm pool rabbitmq {
load-balancing-mode least-connections-node
members {
rmq01:5672 {
address 10.52.246.132
}
rmq02:5672 {
address 10.52.246.133
}
rmq03:5672 {
address 10.52.246.134
}
rmq04:5672 {
address 10.52.246.135
}
}
monitor tcp
}

I'm not entirely sure what I should be looking for here. I have confirmed that our timeout is longer than than our timeout. Other than that, I have not been able to find anything that is glaringly obvious to me that may cause the performance degradation we're seeing.

Any help or guidance would be greatly appreciated.

Best.

Ryan


Michael Klishin

unread,
May 28, 2015, 9:32:09 PM5/28/15
to rabbitm...@googlegroups.com, Ryan Brown
 On 28 May 2015 at 23:07:05, Ryan Brown (ryank...@gmail.com) wrote:
> a series of performance/load tests we have established that
> publication throughput into RabbitMQ is reduced by nearly 50%
> with the F5 handling the load-balancing. To me this is a clear
> sign that we are doing something wrong with our configuration.

In both tests, do publishers and consumers connect to the same nodes?
It can be a data locality issue, when a load balancer distributes
connections to different nodes and thus every message has to be moved
between them.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Ryan Brown

unread,
May 28, 2015, 9:56:31 PM5/28/15
to Michael Klishin, rabbitm...@googlegroups.com
Michael,

Publishers and consumers do connect to the same nodes. The way the application currently works is we have connections that publish incoming messages to RMQ. The same nodes also subscribe to all of the queues and deliver to the subscribing endpoints. It appears that the bottleneck is appearing in the initial publishing to RMQ.

Your comment about data locality is interesting. I wonder if the issue could be related to the fact that we are using all HA queues with active/active replication? That would significantly increase the chatter between the nodes. Could that potentially slow-down publishing? We have noticed backpressure being applied under higher loads. (But not actually high compared to what I have seen RMQ handle in the past. ~250mps)

One additional note is that we are using a headers exchange to facilitate some fairly complex routing schemes. My understanding is that this is significantly slower than a topic or fanout exchange.

-rb

Michael Klishin

unread,
May 28, 2015, 10:10:08 PM5/28/15
to Ryan Brown, rabbitm...@googlegroups.com
On 29 May 2015 at 04:56:28, Ryan Brown (ryank...@gmail.com) wrote:
> Your comment about data locality is interesting. I wonder if
> the issue could be related to the fact that we are using all HA queues
> with active/active replication? That would significantly
> increase the chatter between the nodes. Could that potentially
> slow-down publishing?

It will slow down queue processes that will eventually result in back pressure
to publishers.

This is not something a load balancer can affect, though. 

Ryan Brown

unread,
May 28, 2015, 10:42:57 PM5/28/15
to Michael Klishin, rabbitm...@googlegroups.com
Understood. My thought process took a bit of a tangent. Thank you.

-rb

Michael Klishin

unread,
May 28, 2015, 10:44:56 PM5/28/15
to Ryan Brown, rabbitm...@googlegroups.com
On 29 May 2015 at 05:42:55, Ryan Brown (ryank...@gmail.com) wrote:
> Understood. My thought process took a bit of a tangent.

I should point out that the mirroring implementation we have today
over-emphasizes safety of delivery by trading off a lot of efficiency
when you mirror to less than "all" nodes.

We are working on a new one for future versions, using a well known  algorithm.
Reply all
Reply to author
Forward
0 new messages