On Mon, 28 Jan 2019, at 17:31, Sujoy Ghosh wrote:
> Hi!!
>
> We see to be stuck at a throughput of 1000msg/second on a 8core,32gb server.
What did you expect to get?
What other data have you collected?
What other limits are being hit?
> We are serving an API endpoint in which we have 2 processes to be completed
> 1. Hitting another HTTP API to
> 2. Save the response status to mongodb .
Are you sure this isn't a bottleneck?
> We are using Celery with RabbitMq and we are able to process 1k
> requests per second with 2 workers and 8 concurrencies. The
> configuration of server is 8 CPU 32 GB Ram.
> Any help is appreciated.
>
> How do we get to 30+k messages per second on this node.
Welcome Sujoy. Next time, please start a new thread, not reply to an existing one. It's confusing.
You’ve not provided a lot of information here but generally:
- do actual benchmarks and share real data
- do as much batching as possible (prefetch + ack)
- use more queues and more workers this increases overall concurrency
- make sure your consumer batch size is enough to keep each consumer fully occupied - see
https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/
more detail:
- ensure your consumers can drain faster than your producers produce: the fastest queues are already empty
- using sharding or consistent hash exchanges may help here to distribute work across more queues
- use ram backed queues for perf, avoid disk nodes and disk backed queues
- look at how you handle acks and batching to see if you get better throughput with alternative approaches
- benchmark all of the stack perhaps celery isn’t the best choice here or your mongodb insertion can’t keep up or you are cpu bound or op bound or network bound etc aka do your homework and benchmark
- use latest OTP release and benchmark if HIPE helps
- various HA policies and load balancing provide further throughout options at greater operational complexity
- my desktop is bigger than your server: what limits are you hitting already? io? net? cpu?
For example, on my desktop with a single consumer+producer on a ram-backed queue easily hits over 20k req/s **
https://screenshotscdn.firefoxusercontent.com/images/6807d579-5ba9-4c26-9a42-a0bb357d0328.png
- bound to the default exchange
- a single elixir producer sending a small JSON doc
- a single rust consumer which simply writes out the JSON to /tmp ramdisk
- localhost without TLS (plain amqp stack)
- using under 300MiB RAM resident
- roughly 3Mb/s network io in & out
- but maxes out all 8 cores (i.e. cpu bound)
In my specific case, if I want better throughput I don't need more consumers yet, just more CPU.
Further reading aside from the excellent sections on the official rabbitmq site are here
https://duckduckgo.com/?q=rabbitmq+performance including getting to 1million /second
https://content.pivotal.io/blog/rabbitmq-hits-one-million-messages-per-second-on-google-compute-engine
The first place to look is most likely how you can get more workers on those queues, and what if any bottlenecks the server sees. Once you have the queues basically staying at zero depth you can start looking into whether your subscribers can handle batched operations more efficiently.
I highly recommend both the RabbitMQ books published by Manning for background reading.
A+
Dave
—
Dave Cottlehuber
+43 67 67 22 44 78
Managing Director
Skunkwerks, GmbH
http://skunkwerks.at/
ATU70126204
Firmenbuch 410811i