lower than expected throughput on 8 core 32GB server

Dave Cottlehuber

unread,

Jan 28, 2019, 5:10:40 PM1/28/19

to rabbitm...@googlegroups.com, sujoy...@gmail.com

On Mon, 28 Jan 2019, at 17:31, Sujoy Ghosh wrote:
> Hi!!
>
> We see to be stuck at a throughput of 1000msg/second on a 8core,32gb server.

What did you expect to get?
What other data have you collected?
What other limits are being hit?

> We are serving an API endpoint in which we have 2 processes to be completed
> 1. Hitting another HTTP API to
> 2. Save the response status to mongodb .

Are you sure this isn't a bottleneck?

> We are using Celery with RabbitMq and we are able to process 1k
> requests per second with 2 workers and 8 concurrencies. The
> configuration of server is 8 CPU 32 GB Ram.

> Any help is appreciated.
>
> How do we get to 30+k messages per second on this node.

Welcome Sujoy. Next time, please start a new thread, not reply to an existing one. It's confusing.

You’ve not provided a lot of information here but generally:

- do actual benchmarks and share real data
- do as much batching as possible (prefetch + ack)
- use more queues and more workers this increases overall concurrency
- make sure your consumer batch size is enough to keep each consumer fully occupied - see https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/

more detail:

- ensure your consumers can drain faster than your producers produce: the fastest queues are already empty
- using sharding or consistent hash exchanges may help here to distribute work across more queues
- use ram backed queues for perf, avoid disk nodes and disk backed queues
- look at how you handle acks and batching to see if you get better throughput with alternative approaches
- benchmark all of the stack perhaps celery isn’t the best choice here or your mongodb insertion can’t keep up or you are cpu bound or op bound or network bound etc aka do your homework and benchmark
- use latest OTP release and benchmark if HIPE helps
- various HA policies and load balancing provide further throughout options at greater operational complexity
- my desktop is bigger than your server: what limits are you hitting already? io? net? cpu?

For example, on my desktop with a single consumer+producer on a ram-backed queue easily hits over 20k req/s **
https://screenshotscdn.firefoxusercontent.com/images/6807d579-5ba9-4c26-9a42-a0bb357d0328.png

- bound to the default exchange
- a single elixir producer sending a small JSON doc
- a single rust consumer which simply writes out the JSON to /tmp ramdisk
- localhost without TLS (plain amqp stack)
- using under 300MiB RAM resident
- roughly 3Mb/s network io in & out
- but maxes out all 8 cores (i.e. cpu bound)

In my specific case, if I want better throughput I don't need more consumers yet, just more CPU.

Further reading aside from the excellent sections on the official rabbitmq site are here https://duckduckgo.com/?q=rabbitmq+performance including getting to 1million /second https://content.pivotal.io/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine

The first place to look is most likely how you can get more workers on those queues, and what if any bottlenecks the server sees. Once you have the queues basically staying at zero depth you can start looking into whether your subscribers can handle batched operations more efficiently.

I highly recommend both the RabbitMQ books published by Manning for background reading.

A+
Dave
—
Dave Cottlehuber
+43 67 67 22 44 78
Managing Director
Skunkwerks, GmbH
http://skunkwerks.at/
ATU70126204
Firmenbuch 410811i

Message has been deleted

Ashish V Kulshreshtha

unread,

Jan 29, 2019, 9:46:42 AM1/29/19

to rabbitmq-users

Hi Dave

continuing Sujoy's conversation..

so far we have tested with gunicorn, nginx and we have tested with the simple ("Hello world") task with single as well as multiple Queues and workers, but we were able to achieve not more than 2000 message/sec.

- In our Raabbitmq queue, we are writing message on RAM

- We have also enabled hipe

- As per our understanding batching has been deprecated for the latest version.

- We haven't implemented sharding yet, currently we are focusing on increasing existing throughput.

- We are using latest OTP release

what is the current version of Rabbitmq you are using, we are using 3.7.8

Please find the attached screenshot

Screenshot from 2019-01-29 12-51-49 (copy).png

Amer Hwitat

unread,

Jan 29, 2019, 11:52:22 AM1/29/19

to rabbitm...@googlegroups.com

Dears,

I have been wondering, is a 8 core 32 GB enough for a server to run dedicated RabbitMQ, or are you Running other servers also, besides Web, and what is the range classified for this server, I'm interested Technically.

Best regards

Amer Hwitat

On Tue, Jan 29, 2019 at 5:42 PM Ashish V Kulshreshtha <ashishvku...@gmail.com> wrote:

Hi Dave
continuing Sujoy's conversation..
so far we have tested with gunicorn, nginx and we have tested with the simple ("Hello world") task with single as well as multiple Queues and workers, but we were able to achieve not more than 2000 message/sec.
- In our Raabbitmq queue, we are writing message on RAM
- We have also enabled hipe
- As per our understanding batching has been deprecated for the latest version.
- We haven't implemented sharding yet, currently we are focusing on increasing existing throughput.

Please find the attached screenshot

On Tuesday, 29 January 2019 03:40:40 UTC+5:30, Dave Cottlehuber wrote:

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Jan 29, 2019, 12:02:20 PM1/29/19

to rabbitm...@googlegroups.com

According to your screenshot you are using fewer than 8 queues, most likely only 1 queue is under heavy load.

Are you aware of the fact that a single queue is limited to a single CPU core? [1] You can have 32 cores and with this test

the result would not be meaningfully different.

rabbitmq-sharding [2] provides a way to address this. PerfTest [3] lets you easily vary the number of consumers (each with its own

queue if you let it declare the topology), publishers, acknowledgement modes, use a predefined topology and so on.

Consider using it at least to establish a base line.

1. https://www.rabbitmq.com/queues.html#runtime-characteristics

2. https://github.com/rabbitmq/rabbitmq-sharding

3. https://rabbitmq.github.io/rabbitmq-perf-test/stable/htmlsingle/

On Tue, Jan 29, 2019 at 5:42 PM Ashish V Kulshreshtha <ashishvku...@gmail.com> wrote:

Hi Dave
continuing Sujoy's conversation..
so far we have tested with gunicorn, nginx and we have tested with the simple ("Hello world") task with single as well as multiple Queues and workers, but we were able to achieve not more than 2000 message/sec.
- In our Raabbitmq queue, we are writing message on RAM
- We have also enabled hipe
- As per our understanding batching has been deprecated for the latest version.
- We haven't implemented sharding yet, currently we are focusing on increasing existing throughput.
Please find the attached screenshot

On Tuesday, 29 January 2019 03:40:40 UTC+5:30, Dave Cottlehuber wrote:

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Luke Bakken

unread,

Jan 29, 2019, 1:07:27 PM1/29/19

to rabbitmq-users

Hi Ashish -

I assume that you are following up to this message - https://groups.google.com/d/msg/rabbitmq-users/tKRt1GMV-B4/Cfkq78WEFwAJ

Please see Michael's response here - https://groups.google.com/d/msg/rabbitmq-users/6QWJbYsY43A/f1LGAy_JFwAJ

I would like to emphasize his recommendation to use PerfTest to test the throughput of a single queue in your environment. I can easily exceed 30Kmsgs/sec on my workstation running RabbitMQ and PerfTest on the same machine.

Based on Sujoy's description of your environment the most likely limiting factor is not RabbitMQ. Start by benchmarking just RabbitMQ with one queue using PerfTest.

Thanks,

Luke

Sujoy

unread,

Feb 1, 2019, 12:46:01 PM2/1/19

to rabbitm...@googlegroups.com

Thank you everyone for the inputs and suggestions. We finally identified our usage of Celery to be a blocking component in achieving the throughput needed while keeping the flow intact.

--

Michael Klishin

unread,

Feb 4, 2019, 4:59:47 PM2/4/19

to rabbitm...@googlegroups.com

Thank you for reporting back to the list.

Reply all

Reply to author

Forward