10K message limit on RabbitMq

5,369 views
Skip to first unread message

Mark Galea

unread,
Oct 4, 2017, 4:34:30 PM10/4/17
to rabbitmq-users
I'm using RabbitMq to publish messages to interested consumers but it seems that I have hit some sort of weird 10K msg/sec limit.  My observations are as follows: 

  • Send messages to one queue (non-durable, no transactions) I achieve a throughput of 10K msg/sec.  No consumer consuming from the queue.  
  • Send messages to a queue with a consumer (non-durable, no tx) reduces the production rate to around 8K msg/sec.  Not sure why this happens. 
  • Send messages to a queue with multiple consumers (non-durable, no tx) reduces the production rate to around 8K msg/sec. Same as above.  So multiple consumers do not have an effect on production.  
  • Send messages to `n` queues at a time will result in a 10K/n  throughput.  2 queues will reach (10k/2) 5k throughput each, 3 queues will reach (10k/3) 3K throughput each and so on. 
  • Sending messages to `n` queues at a time with consumers connected will result in a < 10K/n throughput.  
  • Playing around with the credit flow settings has no affect on throughput.  I experimented with {500,200}, {500,100}, {400,200} (default), {400,100},{200,100}, {200,50}  (as specified on your blog post).  No change could be perceived in throughput. 
  • We never hit the high water mark and no flow control is triggered.  
  • HIPE does not make any huge differences in these rates.  

The producer is a simple python producer:

#!/usr/bin/env python
import pika
import sys

queue_name = sys.argv[1]
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue=queue_name)

while True:
channel.basic_publish(exchange='', routing_key=queue_name, body='Hello!')

and the consumer is a simple python consumer: 

import pika
import sys

queue_name = sys.argv[1]

connection = pika.BlockingConnection()
channel = connection.channel()
channel.basic_qos(prefetch_count=2048)
for method_frame, properties, body in channel.consume(queue_name):
channel.basic_ack(method_frame.delivery_tag)

I start multiple producers as follows: 

python producer.py "a"
python producer.py "b"
python producer.py "c"
python producer.py "d"

and multiple consumers as follows: 

python consumer.py "a"
python consumer.py "b"
python consumer.py "c"
python consumer.py "d"

Does anyone have any idea how I can break this 10K msg/sec global limit? 

My computer is a MacBook Pro (15-inch, 2016): 

2.9 GHz Intel Core i7

16 GB 2133 MHz LPDDR3



Luke Bakken

unread,
Oct 4, 2017, 5:56:15 PM10/4/17
to rabbitmq-users
Hi Mark,

Try running two instances each (or more!) of your producer and consumer using the same queue and see what throughput you get. Also, the for loop that acknowledges messages will limit how fast that consumer can run.

I did some tests running RabbitMQ 3.6.12 on one machine and instances of the producer / consumer apps on a separate machine over a 1GiB network link and exceeded the 10K "limit" pretty easily. I used a QoS of 0 (i.e. deliver as fast as possible) and eliminated the basic_ack call and associated loop by using no_ack=True as described in this tutorial - https://www.rabbitmq.com/tutorials/tutorial-one-python.html

Depending on what you do with the messages and your requirements using auto-ack may not be feasible and your QoS will most certainly be non-zero.

Thanks,
Luke

Mark Galea

unread,
Oct 4, 2017, 6:20:56 PM10/4/17
to rabbitm...@googlegroups.com
Hi Luke 

Thanks for your reply. 

In your reply you suggested to use the same queue. Wouldnt this hurt performance since a queue is handled by one process in Erlang. Technically having more channels sending to different queues shoudl result in better throughout. 

Would you be able to share what message rate you are able to achieve and perhaps share your code for the producer and consumer. 

-- 
Mark Galea
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/SKtMy3L44Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 4, 2017, 6:32:05 PM10/4/17
to rabbitm...@googlegroups.com
Python and Ruby clients cannot publish, leave alone consume, more than about 10K messages a second. We’ve seen this in some of our own tests.

A much better idea is to use PerfTest.
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

Luke Bakken

unread,
Oct 4, 2017, 6:37:07 PM10/4/17
to rabbitmq-users
Hi Mark,

Michael beat me to it but basically you're running into limitations in Python, not RabbitMQ. So in your case the only way to increase throughput via a single queue is to add producers and consumers.

I've committed the code I used to this gist - https://gist.github.com/lukebakken/1aefe2716f8bfc3d2ead59a54ad44d95

In my tests with two producers and two consumers using the same queue and my code I saw 13K - 15K msg/sec in the RabbitMQ management UI.

Thanks,
Luke

Mark Galea

unread,
Oct 4, 2017, 6:45:26 PM10/4/17
to rabbitm...@googlegroups.com
Perftest achieves around 28k without persistence. That rate remains constant-ish as I increase producers. If I add more producers the individual throughput rates for each queue go down whereas the global throughput increases slowly. Is this normal? From experience what should be expected on a single server with the specs identified below. 

Another thing I observed is that when the consumers are switched off the throughput is higher (obviously until the mem limit). Is this normal? Why does his happen? 

-- 
Mark Galea

Mark Galea

unread,
Oct 4, 2017, 6:48:33 PM10/4/17
to rabbitm...@googlegroups.com
Hi Luke

How are you setting QoS of 0? Is that default with no acks?

-- 
Mark Galea

Luke Bakken

unread,
Oct 4, 2017, 7:19:47 PM10/4/17
to rabbitmq-users
Hi Mark,

The behavior you're seeing is normal. In the second case, without consumers RabbitMQ has "less to do" so you do see an increase in publish rate.

With regard to QoS and the Pika client (and RabbitMQ clients in general) - if you don't set QoS the default is 0 (unset). Please see the following documents:


We can't provide expected performance numbers based on hardware specs alone. Benchmarking on a single machine isn't very useful as that does not model a real-world architecture. The network, client library used and workload all have an effect. My advice is to closely model your expected application behavior and workload in an environment similar to production to ensure performance meets your expectations.

Thanks,
Luke

Mark Galea

unread,
Oct 5, 2017, 5:02:03 AM10/5/17
to rabbitm...@googlegroups.com
Thank you both for the feedback.  

I'm starting to think that RabbitMq is not a good fit for our use case - mind you, RabbitMq is awesome and we use it extensively.   

In our use case, we have a number of messages stored in a database and we want to push these messages to interested parties which are configured independently (think giant router!).  We want to achieve a throughput of around 10K msg/sec on non-durable, non-transactional queues (we just want a buffer in memory and we are sure that we will not reach the memory limits).  Using a direct socket-to-socket implementation we achieve a message rate of around 300K msg/sec on each configured "queue" (read socket here).  This, however, is not an ideal setup since the consumer needs to expose ports and the server configuration becomes a bit messier.  Rabbit provides a nice abstraction which we would really benefit from!

Reading through the https://content.pivotal.io/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine it is now clear that Rabbit is optimised to serve a large number of queues (on horizontally scaled rabbits) rather than for throughput on a small number of queues and this boils down to the fact that queues are the concurrency primitive.  In the above blog post, a cluster of around 30  rabbits was created and 186 queues distributed through this RabbitMq fabric. If you have a cluster of 30 rabbits with 186 queues you end up with 186/30=6 queues on each rabbit node and a throughput of approx 33k/sec each. This is exactly what we observed on a single (monster) machine.  

Would appreciate some comments (hopefully not insults) about this.  



--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/SKtMy3L44Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Mark Galea
CTO / SuprNation

GSM: +356 79 09 19 78
Skype: markgalea

Luke Bakken

unread,
Oct 5, 2017, 10:48:04 AM10/5/17
to rabbitmq-users
Hi Mark,

If you must use Python, using additional processes for producers and consumers on the same queue should get you the performance you're looking for. Did you try that out in your environment?

The "Work Queues" tutorial gives a good example of sharing work between consumers (https://www.rabbitmq.com/tutorials/tutorial-two-python.html). In addition, I recommend using your preferred asynchronous Python library (Pika supports several) to consume messages as well as schedule the basic_ack call once the interested party has consumed the message. You will want to experiment with different values for channel.basic_qos(prefetch_count=N) as larger values aren't always better.

This is an interesting problem that I'm sure other Python users run into so I'm happy to continue helping out with a solution.

Thanks,
Luke

Mark Galea

unread,
Oct 5, 2017, 1:08:49 PM10/5/17
to rabbitm...@googlegroups.com
Hi Luke, 

We are actually using Java.  I cooked up the python POC so that I could quickly explain what we are trying to do. 

In Java we are using Spring Integration and the Rabbit Template and Akka all of which are optimised for throughput.    

Whatever we do we are stuck at 35K message per node. We have around 10 queues so 35K/10 gives us 3.5K each. Ideally, we are somewhere in the 300K region to start with. Any ideas?

--

Mark


--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/SKtMy3L44Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luke Bakken

unread,
Oct 5, 2017, 2:01:11 PM10/5/17
to rabbitmq-users
Hi Mark,

Thanks for the additional information.

In my testing with your code, adding additional producer and consumer processes increased throughput. Have you tried this in your environment yet?

Luke

Mark Galea

unread,
Oct 5, 2017, 2:46:11 PM10/5/17
to rabbitm...@googlegroups.com
Hi Luke, 

Yes, throughput increase but caps at around 15K msg/sec.  That's the most I can push.  After 6 queues, the global throughput stays constant. 

I think the question here is can one achieve a throughput of 100K msg/sec on 10 queues where each has a throughput of 10K msg/sec? Any language would do really. 

--

Mark

On 5 October 2017 at 20:01, Luke Bakken <lba...@pivotal.io> wrote:
Hi Mark,

Thanks for the additional information.

In my testing with your code, adding additional producer and consumer processes increased throughput. Have you tried this in your environment yet?

Luke

On Thursday, October 5, 2017 at 10:08:49 AM UTC-7, Mark Galea wrote:
Hi Luke, 

We are actually using Java.  I cooked up the python POC so that I could quickly explain what we are trying to do. 

In Java we are using Spring Integration and the Rabbit Template and Akka all of which are optimised for throughput.    

Whatever we do we are stuck at 35K message per node. We have around 10 queues so 35K/10 gives us 3.5K each. Ideally, we are somewhere in the 300K region to start with. Any ideas?

--

Mark


On 5 October 2017 at 16:48, Luke Bakken <lba...@pivotal.io> wrote:
Hi Mark,

If you must use Python, using additional processes for producers and consumers on the same queue should get you the performance you're looking for. Did you try that out in your environment?

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/SKtMy3L44Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 5, 2017, 4:06:58 PM10/5/17
to rabbitmq-users
Any messaging technology does a lot more than what a "direct socket connection" does.

With PerfTest if you publish to an exchange that has no bindings, it can go up to 80K/s or more.
However, that's basically reading from the socket, framing, performing authorization checks and throwing away the result.
Routing messages, storing them to disk, delivering them to consumers, applying flow control each takes its toll.
Mirroring and inter-node routing (data locality) add additional network hops as well.

To make sense of what exactly is the limiting factor in a multi-service, multi-node system you need
quite a few metrics, including network link saturation monitoring, I/O stats, CPU stats and so on.

You won't be able to achieve 100K messages a second when message size is 10 kB on a 100 Mbit/s or even a 1 GBit/s link,
regardless of how fast your client and server can go, how many queues are involved and so on.

workload without replication and with excellent producer/consumer locality and no conditions of network link saturation (since the number of publishers
per node is lower).

Michael Klishin

unread,
Oct 5, 2017, 4:13:52 PM10/5/17
to rabbitmq-users
The answer is: we don't know. What are the message sizes? Network link throughput? How many CPU cores are on that node?
Can your consumers go that fast constantly? (it's quite often is not the case on realistic workloads even with JVM, .NET and Go clients).

Achieving 100K/s per node will take collecting multiple metrics, tuning, trial, error and profiling your apps. And once you add more nodes,
unless your producers and consumers are perfectly colocated, the growth won't be linear because inter-node traffic and replication
affect more than a particular node.

Producers can batch and compress messages, however: it's often a cheap and straightforward way to gain an order of magnitude in effective throughput
at the expense of CPU load on both ends.

On Thursday, October 5, 2017 at 7:46:11 PM UTC+1, Mark Galea wrote:
Hi Luke, 

Yes, throughput increase but caps at around 15K msg/sec.  That's the most I can push.  After 6 queues, the global throughput stays constant. 

I think the question here is can one achieve a throughput of 100K msg/sec on 10 queues where each has a throughput of 10K msg/sec? Any language would do really. 

--

Mark

gl...@pivotal.io

unread,
Oct 11, 2017, 12:17:15 PM10/11/17
to rabbitmq-users
Hi Mark,

I'm a bit late to the party, but hope this will be still useful:

* a single queue will be limited by the speed of a single CPU core
* channels compete with queues for CPU time - the more channels a RabbitMQ node has, the less CPU time for queues
* channels are fast because they do little work
* queues are slower than channels because they do more work

The fastest RabbitMQ node has 4 CPU cores with 1 producer channel, 1 empty queue (no mirroring, no persistence) & 1 consumer channel (auto-ack). The reason for this is simple:

* each Erlang process has 1 CPU core to itself (it's more complicated than this, but this is easy to understand)
* when the queue is empty, the producer channel sends messages straight to the consumer channel, making the entire produce/consume process as efficient as possible
* queue mirrors require extra Erlang processes which require CPU time to do their work, and queue operations depend on other processes, possibly on other hosts which will involve the network, so things can only get slower
* message persistence is similar to the previous point, but now you have disks and the filesystem involved




On Thursday, October 5, 2017 at 7:46:11 PM UTC+1, Mark Galea wrote:
Hi Luke, 

Yes, throughput increase but caps at around 15K msg/sec.  That's the most I can push.  After 6 queues, the global throughput stays constant. 

I think the question here is can one achieve a throughput of 100K msg/sec on 10 queues where each has a throughput of 10K msg/sec? Any language would do really. 

--

Mark
On 5 October 2017 at 20:01, Luke Bakken <lba...@pivotal.io> wrote:
Hi Mark,

Thanks for the additional information.

In my testing with your code, adding additional producer and consumer processes increased throughput. Have you tried this in your environment yet?

Luke

On Thursday, October 5, 2017 at 10:08:49 AM UTC-7, Mark Galea wrote:
Hi Luke, 

We are actually using Java.  I cooked up the python POC so that I could quickly explain what we are trying to do. 

In Java we are using Spring Integration and the Rabbit Template and Akka all of which are optimised for throughput.    

Whatever we do we are stuck at 35K message per node. We have around 10 queues so 35K/10 gives us 3.5K each. Ideally, we are somewhere in the 300K region to start with. Any ideas?

--

Mark


On 5 October 2017 at 16:48, Luke Bakken <lba...@pivotal.io> wrote:
Hi Mark,

If you must use Python, using additional processes for producers and consumers on the same queue should get you the performance you're looking for. Did you try that out in your environment?

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/SKtMy3L44Qs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mark Galea

unread,
Oct 11, 2017, 8:57:03 PM10/11/17
to rabbitm...@googlegroups.com
Hi, 

Luke had shared with me a  configuration on which he was able to achieve 10K messages on 10 queues.  The perftest configuration to achieve such results is the following: 

`mvn -q exec:java -Dexec.mainClass=com.rabbitmq.perf.PerfTest -Dexec.args=“--uri amqp://localhost:5672 -x 10 -y 10 -q 512 -A 512`

On our 8CPU, 30Gb dev server we managed to see such a throughput.    



Seeing such results I got curious and started reading the perftest code to see what's different.   What I noticed is that the configuration -x 10 will only initiate one publisher and will broadcast to 10 unique queues by using the exchange (rather than create 10 producers). In a way, the perftest is "cheating" by letting the exchange do the work.   

Our use case is somewhat different. It consists of several producers publishing to different exchanges. To replicate this I  ran `n` instances of the following configuration 

mvn -q exec:java -Dexec.mainClass=com.rabbitmq.perf.PerfTest -Dexec.args="--uri amqp://localhost:5672 -x 1 -y 1 -q 512 -A 512"

to simulate `n` producers and consumers.

The results achieved are as follows:

Producers    Total Message rate
1                                26K
2                                38K
3                                46.8K
4                                45.5K
5                                44.9K
6                                45.1K
7                                44.2K
8                                44.3K
9                                44.2K
10                              44.1K

From my investigation, it seems that independent producers slow down the system quite substantially (from approx 90K/s to 45K/s).  

My expectation here is that independent producers should not impact performance given that each producer has its own connection and its own separate channel.   

Can anyone justify such an observation? It seems totally weird to me.  

--

Mark

To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

gl...@pivotal.io

unread,
Oct 13, 2017, 6:15:53 AM10/13/17
to rabbitmq-users
Hi Mark,


What I noticed is that the configuration -x 10 will only initiate one publisher and will broadcast to 10 unique queues by using the exchange (rather than create 10 producers).

The `-x | --producers N` PerfTest flag will create `N` producer connections with 1 channel each.

`-x | --producers N` used in conjuction with `-u | --queue Q` will create `N` producers all publishing to the same queue `Q`.


 
In a way, the perftest is "cheating" by letting the exchange do the work.

Actually, the exchange doesn't do any work, it's just routing logic.
 
From my investigation, it seems that independent producers slow down the system quite substantially (from approx 90K/s to 45K/s).  

My expectation here is that independent producers should not impact performance given that each producer has its own connection and its own separate channel.
 
Can anyone justify such an observation? It seems totally weird to me. 

Every connection and every channel is an Erlang process. Every Erlang process gets a pre-defined number of reductions (1 reduction = 1 function call). Once a process consumes its reductions, it will be forced (preempted) by the Erlang VM back in the run queue, where it will wait its turn for another scheduler slot. On a system with 8 CPUs dedicated to RabbitMQ you will have 8 Erlang schedulers, all running the Erlang processes that require scheduler time. Adding more connections or channels adds more work to a fixed number of schedulers, which once saturated, cannot make your system go faster.

Your use-case is saturating the schedulers at 3 producers, adding more results in preempting slowing down throughput. Once you go beyond 7 producers, the preempting penalty plateaus and my guess is that you would have to go to 20 or more producers to observe further degradation.

Try the same workload on more cores to understand what I mean, and try to erlang observe: `erl -sname observer -hidden -run observer`

Mark Galea

unread,
Nov 3, 2017, 4:48:37 PM11/3/17
to rabbitm...@googlegroups.com
Hi 

Thanks a lot for your reply. Your reductions explanation has really helped us understand that CPU is as important as RAM in order to achieve throughput. 

After increasing our server to 16 CPUs we have managed to reach 30k messages and hit our next plateau. So much to learn so little time!

Thanks for all the help guys!

-- 
Mark Galea

swati nair

unread,
Feb 14, 2019, 12:19:01 AM2/14/19
to rabbitmq-users
Hi , I want to find the number of messages published in one second to rabbitMQ .How can i find it?

Mark Galea

unread,
Feb 14, 2019, 2:23:25 AM2/14/19
to rabbitm...@googlegroups.com
It largely depends on your setup and the type of queue you are publishing to. Run the performance test packaged with Rabbit

Martin Schröder

unread,
Feb 14, 2019, 2:42:01 AM2/14/19
to rabbitm...@googlegroups.com
Am Do., 14. Feb. 2019 um 06:19 Uhr schrieb swati nair <swat...@gmail.com>:
> Hi , I want to find the number of messages published in one second to rabbitMQ .How can i find it?

Please start a new thread for new topics.

And please don't top post.

Best
Martin
Reply all
Reply to author
Forward
0 new messages