Question on throughput with RabbitMQ-3.1.1 single node as well as cluster

Priyanki Vashi

unread,

Jun 25, 2013, 3:44:00 AM6/25/13

to rabbitmq...@googlegroups.com

Hi there,

I am doing a performance study of RabbitMQ-3.1.1 and this is my first time to do such a performance study with any messaging broker :))-

1) I have thoroughly gone through rabbitMQ in action' and learnt important concepts.

2) Tried single node broker to get a feel of how it is working and then set up a four node cluster (with two disk and two RAM type of node). Also configured HAproxy TCP Load balancer so that I can just provide single port to connect to the Cluster.

3) I am simulating producer and consumers through Python scripts ( using Python-pika library methods to connect to server , publish subscribe etc.)

4) My scripts are working fine but where I am stuck is no matter what I do my throughout is always 300 msg/sec.

5) I have defines durable exchanges and queues

My final requirement is to run atleast 10 to 15 producer and 60 to 70 consumer simultaneously and I want to start with linear increase in number of producer and consumer so that I can make conclusions about throughout, fault handling, processor utilization etc. etc but I am seriously stuck now after trying to start in initial steps only. This group's help would be really appreciated.

I have started with following different scenarios but no matter what I do my throughput is more or less remaining same (300 msg/sec) except for Scenario-1

Scenario-1

-1 producer and No consumer and no queue binded to exchange

-Producer is running in infinite loop and publishing to one fanout exchange

- publisher/confirm disabled

-Publisher rate - 6200 msg/sec ( checked through web management plugin)

Tried scenario-1 with also fanout type of exchange and it's the same publish rate

I know that Scenario-1 is not really useful, since there are no queues and ultimately messages will be dropped but as a part of debugging process I tried this and I see above mentioned results.

Scenario-2

-1 producer and 1 consumer

-Producer is running in infinite loop and publishing to one direct exchange

-A consumer has it's own dedicated queue and listening to above exchange

- publisher/confirm and consumer ack are disabled

Throughput - 300 msg/sec ( which is basically publish rate = 300 msg/sec and deliver rate - 300 msg/sec)

Tried Scenario-2 also with fanout type of exchange and enabling publisher confirm and consumer ack

Still the same throughput as 300 msg/sec

Scenario-3

-1 producer and 4 consumer

-Producer is running in infinite loop and publishing to four direct exchange

-A consumer has it's own dedicated queue and listening to respective exchange

- publisher/confirm and consumer ack are disabled

Throughput - 300 msg/sec ( which is basically publish rate = 300 msg/sec and deliver rate - 300 msg/sec)

Tried Scenario-3 also with fanout type of exchange and enabling publisher confirm and consumer ack

Still the same throughput as 300 msg/sec

Tried configuring prefetch_count parameters also to 100 but it still gives me same throughput of 300 msg/sec.

I am honestly going crazy with this.

After seeing this behavior, I am seriously suspecting that there is some serious limitation with my simulated producers and consumers.

Has anyone else has tried Python-pika client and any idea on throughput with this version of rabbit ?

Did anyone have rough idea about throughout with rabbitMQ-3.1.1 ?

I can also share my python scripts if required but I would really appreciate some light on this situation

Also what points to take care, in order to improve throughput ?

Best Regards,

Priyanki

Tim Watson

unread,

Jun 25, 2013, 4:38:47 AM6/25/13

to Discussions about RabbitMQ, rabbitmq...@googlegroups.com

What does your publishing code look like? The figures below are expected in that the consumer can keep pace with the producer - it could hardly be expected to consume faster than messages are arriving in the queue(s). So the slowness is very likely on the producing side.

Are you using persistent messages and either publisher confirms or transactions? If so, how often are you waiting on confirms/commits?

With the official clients we typically see avg rates of 50 - 60Khz with non-persistent messages. Persistence slows things down a tad, as do confirms (and more so transactions) but even with persistent messages and confirms, rates >= 5Khz are expected. It /sounds/ like you might be publishing persistent messages with confirms enabled and waiting for a confirm (ack) from the broker for each message. That involves disk I/O on the server for each message plus network latency, effectively making publishing synchronous (and very slow by comparison).

Cheers,

Tim

_______________________________________________
rabbitmq-discuss mailing list
rabbitmq...@lists.rabbitmq.com
https://lists.rabbitmq.com/cgi-bin/mailman/listinfo/rabbitmq-discuss

Priyanki Vashi

unread,

Jun 25, 2013, 6:47:38 AM6/25/13

to rabbitmq...@googlegroups.com

Hi Tim,

Here comes my scripts. I have created respective class for producer and consumer and also respective methods.

I created queues and exchanges with such names so that later I can automate multiple producers , consumers and their respective queues and exchanges.

As you mentioned, may be I am doing something wrong in the way I expect publisher confirm and consumer ack. But my requirement is also to make sure there is no loss of messages at all.

Many thanks again and hope to hear back from you soon.

Best Regards,

Priyanki.

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-discuss/VkDgOVQn7wU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-discu...@googlegroups.com.
To post to this group, send email to rabbitmq...@googlegroups.com.
Visit this group at http://groups.google.com/group/rabbitmq-discuss.
For more options, visit https://groups.google.com/groups/opt_out.

Producer_Consumer_Scripts.zip

PRIYANKI VASHI

unread,

Jun 25, 2013, 6:57:25 AM6/25/13

to rabbitmq...@googlegroups.com

Hi again,

I forgot to mention that 5670 is the port for my TCP load balancer. Behind TCP load balancer I have four node rabbit cluster with 2 disk n two ram nodes.

I also have few questions regarding automation of producer n consumer but first good to understand if i have understood basic properly.

I have working scripts of automated producers n consumers but they dont behave same as starting manual producer n consumer.

Sent from my iPhone

--

Tim Watson

unread,

Jun 25, 2013, 7:09:28 AM6/25/13

to Tim Watson, Discussions about RabbitMQ, rabbitmq...@googlegroups.com

Reposting

Priyanki Vashi

unread,

Aug 1, 2013, 3:06:03 PM8/1/13

to rabbitmq...@googlegroups.com, Discussions about RabbitMQ

Hi There ,

I have done many small small tests to understand how the throughput of rabbitMQ node scales with respect to cores as well as number of producers and consumers. Here are my observations. Tests are for both single node as well as Cluster configuration.

Based on this observations, first thing I would like to confirm if these are expected behaviour/results or I can do something more to improve throughput. Secondly there are some specific questions with respect to them so would help me to have clarification on them to continue further.

Just so you know, I have also checked the performance statistics, which were published on rabbitMQ site for 2.8.1 version, and based on the points there I tried different things. Like changing prefetch_count values, non-persistent type of messages as well as persistent type of messages, DISK node and RAM node etc. etc.

My test configuration are as follows.

RabbitMQ version . 3.1.3 with Erlang version - R16B01

I have one virtual machine with 8 GB of RAM and 20 cores - This is dedicated for mainly rabbit nodes.

I have another virtual machine with 8 GB of RAM and 20 cores - This is dedicated mainly my producer and consumers. Also my producer and consumer are single threaded type (using python pika library) so I have try to go 10 producer and 10 consumer by giving each of them 1 dedicated core to find out system limit of RabbitMQ. I start linearly. first with 1P and 1C and so on.....

Since my interest is to benchmark performance and find system limits on RabbitMQ, I have simulated producer and consumer and basically there is no processing with messages after consumer receives the messages. This means, I believe that I am producing as fast as above configuration of VM supports as well as I am consuming as fast as I can.

I am using non-blocking method of connection (using select.connection) method of pika.

Message size is 100 bytes. I have configured 'direct' type of exchange

I also have enabled publisher confirm as well as consumer ack since I am interested in reliable delivery and confirmation of messages till application layer. Hence I explicitly use publisher confirm and consumer ack.

Here comes my statistics

Test-1) With Single Node configuration:

The maximum throughout I can get is around 5000 msg/sec - with 1 publisher and 1 consumer, with prefetch_count = 0

Node type = Disk

Both producer and consumer are given dedicated cores using linux taskset command. (if I leave core assignment on linux then throughput is only around 3500 msg/sec) Core assignment for Rabbit Nodes are left to the linux and not touched.

Here the limiting factor is publisher since it loads it's assigned core to almost 100%. So to have better throughput , I started another producer assigning another core and also started corresponding consumer with it's own dedicated core. Try to publish to same queue as well as different queue.

But still I see the overall throughout remains around the same value and increases little bit and its division is roughly as follows for each of the producer and receiver.

P1 and P2- publish rate - roughly 2500-2700 msg/sec. Same for consumption, which adds to the total of around 5000-5500 msg/sec.

Even if I introduce prefetch_count value it hardly changes the throughput.

Also, I tried with both persistent and non-persistent messages, throughput does not change much. It's almost the same as listed above with node of type DISK.

So from this it feels that maximum capacity of a single node of type DISK is limited to 5000 msg/sec when publisher confirm and consumer ack is enabled in this version. I thought main reason for this could be server latency. Is it correct understanding or I am missing something here to consider ?

And my specific questions on this particular observations are as follows.

1) Is this expected behaviour on throughput scaling when number of producer and consumer increases linearly ?

2) Can something be done to improve throughout with single node configuration without changing publisher confirm and comsumer ack configuration ( means keeping them enabled) ?

3) How to calculate server latency in approximate way ? here I thought by adding round trip time (RTT) for both publisher confirm and consumer ack, one can get latency. Is this correct understanding ? What is the effective method to calculate RTT ?

Test-2) With Cluster configuration:

First I tried, Cluster with 1 DISK node and 1 RAM node

Here when my producer and receiver try to connect to DISK node, statistics are almost similar to Test-1.

I tried with single producer-single consumer, 2 publisher and 2consumer and so on. Not any observable diff in throughput

Now when my producer and consumer connect to RAM node, I see following.

1 P and 1 C - throughout is around 4500-5000 msg/sec

2P and 2C - throughout is around 9000-10000 msg/sec

beyond this if I increase producer and consumer throughput starts to drop little bit with overall throughput to 12000 msg/sec with each producer/consumer having 4000 msg/sec.

So again I feel, after certain number of producer and consumer, server latency do come into picture even for RAM type of node and slowly drops the throughout.

Instead of having multiple of 5000 msg/sec for every increase in producer and consumer it becomes roughly 4000 , 3500 , 3000 msg/sec per producer-consumer pair.

After this I added third node, fourth node and so on in the cluster, All are of also of type RAM. And maximum throughout I can get is around 22000-24500 msg/sec.

Changing prefetch_count or delivery_mode ( from persistent to non-persistent and vice versa) do not really makes any big difference.

So then my specific questions on Test-2) observation are as follows.

1) Why there is no linear increase in throughput with DISK type of node as it's seen with RAM type of node ?

2) At least for messages of type non-persistent I believe DISK type and RAM type should behave similar but they are not so what are the main difference in the way DISK type and RAM type of node handles non-persistent messages ?

3) What can be done to improve throughput in both the Tests ?

4) Since I have VM with 20 cores dedicated for rabbitMQ execution, how can I load the CPU to it's limit ? with the current tests I can load CPU maximum to 800% with above mentioned throughput. currently the limiting factor seems to be server latency so how to overcome that ?