Hi - Single Queue load

493 views
Skip to first unread message

sarju Garg

unread,
Oct 11, 2021, 11:45:31 PM10/11/21
to rabbitm...@googlegroups.com
Hi,

I am new to RabbitMQ.

As per RabbitMQ documentation, the TPS a single queue can handle is 50000 TPS.

This means a single producer, single Q, single consumer can handle 50000 TPS.

Producer/Consumer are also single threaded application with single connection. 

Rabbit MQ - 3.9.7
Erlang  - 23.3.47
Server configuration - 12 core, 32 GB,

Producer/Consumer/Queue all running on seperate servers in a LAN environment.

I am not able to achieve more than 15000 TPS.. 

Does the message size, or any configuration of RabbitMQ also matters. My message size is 500 bytes.

Can someone suggest more steps for this.
Regards
Sarju

sarju Garg

unread,
Oct 12, 2021, 1:07:47 AM10/12/21
to rabbitmq-users
further to this,

We ran 1 producer and it gave us 4000 TPS, when we ran 3 more producer, it peaked at 15k TPS. One consumer is running.

As soon as I ran next producer, all connection goes from running to flow mode thus limiting it to be 15000 TPS.

As per documentation, this is done to limit memory usage but how can be control this and reach the threshold limit of 50000 TPS.

Regards
Sarju

sarju Garg

unread,
Oct 12, 2021, 6:02:29 AM10/12/21
to rabbitmq-users
Further, 

if there is a back pressure, first the queue should build up. The queue length is 0... So the whole idea of messaging and buffering has gone for a toss. 

As per documentation, 

This guide covers a back pressure mechanism applied by RabbitMQ nodes to publishing connections in order to avoid runaway memory usage growth. It is necessary because some components in a node can fall behind particularly fast publishers as they have to do significantly more work than publishing clients (e.g. replicate data to N peer nodes or store it on disk).

Now, how to control memory usage growth... pl help

Regards

Sarju

Wes Peng

unread,
Oct 12, 2021, 6:11:36 AM10/12/21
to rabbitm...@googlegroups.com
Maybe you need a faster network.
If you have the good hardware capability, I'd suggest you run all consumer/producer/broker on the same node for testing.
Also see this documentation:

Regards


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAHt0CsnfhxoFe-S0P7AVnPPmsgTNh7YS%2Brcf7vTJdqDdSEDP%3DA%40mail.gmail.com.

Michal Kuratczyk

unread,
Oct 12, 2021, 6:57:10 AM10/12/21
to rabbitm...@googlegroups.com
Hi,

Many factors impact the throughput you can expect:
1. As Wes said, network latency and throughput
2. Erlang 24 provides 30-40% better performance usually
3. Disks may be the bottleneck
4. You need sufficient CPU (at least 2, so that the queue process can use one and there is still at least one more for other processes)
5. Message size of course (500 bytes is perfectly reasonable size for a message but I believe these tests were using smaller size)
6. The application itself (you can use https://github.com/rabbitmq/rabbitmq-perf-test which is our go-to testing/benchmarking tool)
7. And then there is still additional tuning that could be performed (Erlang flags, TCP buffers and many other things)

Also, if you need every last bit of the performance you can get from RabbitMQ then you should probably consider some alternative design choices. Options include:
1. Using multiple queues
2. Using sharding or consistent hash exchange
3. Using a stream instead: https://rabbitmq.com/stream.html

Best,



--
Michał
RabbitMQ team

Wes Peng

unread,
Oct 12, 2021, 7:07:30 AM10/12/21
to rabbitm...@googlegroups.com
That's a really nice answer. Thanks Michal.

sarju Garg

unread,
Oct 12, 2021, 7:24:00 AM10/12/21
to rabbitmq-users
Hi Michal,

It is not about testing and extracting last bit of the performance but just seeing if we can reach the claim as suggested by RabbitMQ. 

We have gone thru many articles but did not get the exact configuration

Can you share more information about point 7. 

We are using transient message, so disk should not be an issue. 

Regards
Sarju

Michal Kuratczyk

unread,
Oct 12, 2021, 8:05:27 AM10/12/21
to rabbitm...@googlegroups.com
Hi,

Simply running RabbitMQ (Erlang 24.1) and perf-test on my laptop (after a few seconds of warm-up):
id: test-135827-696, time: 8.000s, sent: 39169 msg/s, received: 40600 msg/s, min/median/75th/95th/99th consumer latency: 283094/397953/460863/532935/549410 µs
id: test-135827-696, time: 9.000s, sent: 49659 msg/s, received: 53161 msg/s, min/median/75th/95th/99th consumer latency: 245027/300542/331187/490269/517161 µs
id: test-135827-696, time: 10.000s, sent: 48452 msg/s, received: 47901 msg/s, min/median/75th/95th/99th consumer latency: 234996/309163/344532/393642/434959 µs

When I use 500B messages (-s 500):
id: test-140044-296, time: 7.000s, sent: 34661 msg/s, received: 35257 msg/s, min/median/75th/95th/99th consumer latency: 68129/92042/111841/229480/257584 µs
id: test-140044-296, time: 8.000s, sent: 42978 msg/s, received: 42794 msg/s, min/median/75th/95th/99th consumer latency: 70455/86348/94476/266890/307487 µs
id: test-140044-296, time: 9.000s, sent: 44366 msg/s, received: 44331 msg/s, min/median/75th/95th/99th consumer latency: 68639/89243/97144/114215/126132 µs

So 35-45k msg/s with 500B messages with everything running on one machine. Of course, the network is very good when everything is running locally.

You can find some tuning information here:
https://www.rabbitmq.com/runtime.html

but there is certainly more based on your operating system, deployment model, etc.

sarju Garg

unread,
Oct 12, 2021, 10:00:12 AM10/12/21
to rabbitm...@googlegroups.com
Hi Michal,

Just to reconfirm you mentioned Erlang 24.1 has 30-40% improvement compared to just previous version like 23.9 ( we are using something like this)

We downloaded the Erlang which came with latest version of RabbitMQ.

As per documentation the latest version of RabbitMQ support max 4.0.5.

Regards
Sarju




Michal Kuratczyk

unread,
Oct 12, 2021, 10:50:04 AM10/12/21
to rabbitm...@googlegroups.com
Erlang 24 introduced a Just In Time compiler which improves performance of any Erlang-based app significantly (exact numbers vary but 30-40% is pretty common and we saw that in many tests). 

Latest RabbitMQ supports any 24.x version (

We are also working on a new index implementation that improves single queue throughput significantly. It’s not ready but you can see many test results for both the old (current) and upcoming index implementation here:

--
Michał

sarju Garg

unread,
Oct 13, 2021, 10:08:42 AM10/13/21
to rabbitmq-users
Hi Michal,

In your above testing, did you ran single instance of perftest (one for consumer and one for producer)

We are on Centos 7 and this does not support Erlang 4.0 version. It is supported on Centos 8 so, we need to plan for OS upgrade.

I have one confusion, this 50000 TPS was working with earlier version as well. Can you suggest what is the architecture..  We are doing consumer ack as we want message to processed atleast once.

Regards
Sarju

Loïc Hoguin

unread,
Oct 13, 2021, 10:36:01 AM10/13/21
to rabbitm...@googlegroups.com

Hello,

 

I would be interested to know which version performed better than the current version. I can double check and if there’s a performance difference take a look at what’s changed.

 

Cheers,

 

-- 

Loïc Hoguin

Michal Kuratczyk

unread,
Oct 13, 2021, 11:08:39 AM10/13/21
to rabbitm...@googlegroups.com
Hi,

I was just running a single perf-test - literally started RabbitMQ and then `./runjava com.rabbitmq.perf.PerfTest`.
If I add `-s 500 -c 100` (500 bytes messages; confirm every 100 messages), I still get about 25-30K msg/s (both published and consumed).




--
Michał
RabbitMQ team

Karl Nilsson

unread,
Oct 13, 2021, 11:45:14 AM10/13/21
to rabbitmq-users
Sarju, where in the documentation did you find the 50k TPS figure? Throughput of a single queue is highly variable and depends on many factors, many outside of our control.

If you genuinely are looking to scale you should divide your workload up over multiple queues. This is the only way to make proper use of multi-core systems.

Cheers
Karl



--
Karl Nilsson

sarju Garg

unread,
Oct 14, 2021, 12:30:16 AM10/14/21
to rabbitmq-users
Hi Karl,

We have a system where we are using multiple queue but somehow not getting desired performance.Hence resorted to test one queue at a time.

We are not adamant on 50K TPS but I read somewhere. What is the otherwise benchmarked figure for queue? I will share the link where I read about 50K TPS.

We will go thru perf test code as as per it, the TPS is about 25/30k. The only difference we are not  doing multiple Ack. With single ack each time, it is coming to 15K. which is good number as well.

Our problem is that once we are running our complete setup, the load work for some time (range from few min to few hours), then queue build up, Consumer utilization is near 0. If we stop producers, or increase consumer, it does not help. Look the consume is not picking message for some time at all. 

We have tried many options, including queue length, code reviews but not reaching to any conclusion. Look like we do not understand it well or may be there is something badly missed out.

For ex, Consumer utilisation -- how can we increase this KPI. It say prefetch count,  increase consumers.. but it do not help.

any help in this regard would be highly appreciated. Written on commerical support but no response so far from their side. Each time, they just ask for name.

Regards
Sarju

Karl Nilsson

unread,
Oct 14, 2021, 1:28:18 AM10/14/21
to rabbitm...@googlegroups.com
I see. It does sound like there is an issue in you consumer code that cause then to hang. Which client library are you using? Do you do any RPC type synchronous work in you consumers? When the system hangs does the queue show as having messages in flight?

--
Karl Nilsson

sarju Garg

unread,
Oct 14, 2021, 1:55:44 AM10/14/21
to rabbitmq-users
Hi Karl,

We are using AMQPCPP library for  this purpose. The link is as follows https://github.com/akalend/amqpcpp

Are we using right library? No RPC type work but we are a building messaging system (SMSC) so we want to ensure message are delivered to end user hence using consumer acks.

Yes, the message are in flight. They are both in ready/unacked on the management UI.

Thanks. Any help and support from the whole community is appreciated. We are struck badly and seeing no way coming out.

Regards
Sarju


Karl Nilsson

unread,
Oct 14, 2021, 2:28:06 AM10/14/21
to rabbitmq-users
Ok the c/c++ library isn't written or maintained by the RabbitMQ team so we have very little familiarity with it. Feel free to post the simplest possible consumer code that reproduces the issue and perhaps there is someone else on the list that is familiar with the c client (which the c++ library wraps). Alternatively you could try to contact the maintainers of the c++/c library to see if they can give advice.

Cheers
Karl





--
Karl Nilsson

sarju Garg

unread,
Oct 14, 2021, 2:49:29 AM10/14/21
to rabbitmq-users
Hi karl,

Thanks, like perftest is a code in Java, do anyone use test code in C++ as well

Regards
Sarju

sarju Garg

unread,
Oct 18, 2021, 11:54:50 PM10/18/21
to rabbitmq-users
Hi,

Thanks, with perftest, we were able to get 50k TPS. This mean consumer take 1 second to process 50k messages. Now if I ran multiple consumer and assume queue capacity to process is 50k messages, then if I ran 10 consumer, each of them will get 50k/10 = 5000 messages. This will scale and if consumer are increased, the load should be distributed evenly to all the consumer nodes.  However this does not scale like this and seems to reduce the performance. Do we have any benchmark figures. All consumers are single threaded process with single connection to the RabbitMQ.

In our case, the time to process the request is5  sec, so each process should take 200 request and will process in 1 second. We set the prefetch count to 200 and implemented multiple ack count as 400. This means that queue will give 400 request and consumer will process the 200 request and send ack. Once 200 ack are received, 200 more request will be sent. So there is almost a initial load 2 sec with final load of 1 sec with consumer.

So if we need to get 1000 TPS, then we can run 5 such processes and if 10000, then we should run 50 such process to achieve our requirement. 

But it is not happening that way. The consumer utilisation on is around 100%, upto 3000 TPS and then it message come in ready state, though enough message are there..

CPU are 12 core machine and all producer, queue and consumer, ran on different machine.

Have we done some analysis on such cases as well.

I think this still is not a candidate for scenario for sharding etc though we are exploring them.

Regards
Sarju
Reply all
Reply to author
Forward
0 new messages