Queue size keeps increasing with time: Number of ready messages keep increasing

998 views
Skip to first unread message

Jaikishan Rupani

unread,
May 25, 2016, 11:48:29 AM5/25/16
to rabbitmq-users
Hi,
I am creating a client-server application using RabbitMQ as message broker between the users and the application server. For that I am testing the number of concurrent connections with the desired throughput that the broker can support.

The test machine configurations are as follows:-
Broker machine:-
1. Disk Space - 98 GB
2. Memory - 60GB
3. CPU - Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz x 8
4. OS - Ubuntu 14.04 64-bit

Server and Client machines(both have same):-
1. Disk Space - 19.55GB
2. Memory - 7.79GB
3. CPU - Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz x 2
4. OS - Ubuntu 14.04 64-bit

RabbitMQ specs:-
1. Version - 3.6.1
2. File Descriptor Limit - 6,000,000
3. High Memory Watermark - 0.8
4. Queue-mode: Lazy

Erlang specs:-
1. Version - 1.8.3
2. Process Limit - 4,194,304
3. Async Threads - 128

Details of the simulations :-
1. There are 30k concurrent users connected to the broker(using a script). Also there is one connection with the application server.
2. Users are subscribed to a topic "/response/USER_ID/" and publish on a topic "/request/USER_ID/". Server is subscribed to all request topics using "/request/*/".
3. Users to broker messages(total for all users) - 400/sec, 100bytes/message. For this I pick 400 users randomly every second. Application server receives the messages and reverts back on the respective user's response topic(/response/USER_ID/) a message of size 3kb.
So the app server has a single queue on which the broker enqueues 400 messages per second and a 3kb reply on a user's queue for each message it sends.

I recorded the send and receive message frequency(per minute) on the app server as well as the client server(on which the user script is running). What I observed is that for some time(~1.5 hrs) the send and receive frequency for both client and server was the same. Average latency for a message was 60-70ms. But after that the message frequencies instantly dropped by a significant amount. The message rate from users to broker remained about 400/sec but the receive rate at app server was almost one third of that. The send and receive rates at the app server were same so I guess the server was replying back instantly as it got the message. What was really happening was that the ready messages in the app server's queue kept increasing and at a rate of around a million per hour. The CPU usage also dropped instantly around that time from 54% to 40% but memory usage was almost constant. The client, server and broker are all on the same VPC and are connected by a low latency network so this doesn't appear to be the bottleneck.
I cannot understand why the broker isn't sending the messages to the app server as fast as the messages are received which is leading to overtime accumulation of ready messages. What might be going wrong here? Can anybody help me with this?

Regards
Jaikishan Rupani 


Jaikishan Rupani

unread,
May 25, 2016, 12:21:00 PM5/25/16
to rabbitmq-users
Hi,
Sorry I forgot to mention that I am using the MQTT plugin. Other plugin that is enabled is rabbitmq_management(message rates stats have been turned ). The prefetch count is 10. I cannot increase it as unlike the simulation where users are also connected to broker via a low-latency network, the actual users on production level will be normally connected by higher latency and unreliable networks so increasing prefetch by much might not be desirable as there might be large number of unacknowledged messages. But the app server will always be connected to the broker with a low latency network(probably on the same VPC). Is it possible to increase prefetch for the app server connection without increasing it for the users? More, precisely can the prefetch be changed selectively for a single connection(or queue for that matter).

Michael Klishin

unread,
May 25, 2016, 2:57:28 PM5/25/16
to rabbitmq-users
We cannot suggest much with this much information. If you see that the number of messages in Ready state is increasing,
then nothing consumes them. If they are also kept in RAM, by far the most likely reason is that they are not matched by a policy that is supposed to make them lazy.

Jaikishan Rupani

unread,
May 26, 2016, 7:58:11 AM5/26/16
to rabbitmq-users
Hi,
I updated my broker to 3.6.2 and ran the simulations again, this time with prefetch count = 30 and 200 messages/second from the client side. I started the simulation at 23:50 IST (UTC+5:30). 
Again everything worked fine for some time. The send and receive message counts and rates were the same at the client and matched the counts and the rates at the application server. The ready messages count was zero as seen from the graph on rabbitmq management web UI. But after around 5 hours, the same thing happened - network traffic dropped within a few minutes(5-10 minutes approximately) and so increased the latency of the messages(here I calculate the latency by averaging the time in milliseconds it takes for any message to go from client to app server and receive a reply back at the client). Network traffic(at broker) and latency measured(at client) have been stable since then. The clients are still sending 200 messages/s but the receive rate at application server is lower(and almost constant). The receive rate at client matches the send(as well as receive) rate at the server so messages are not being queued up in client queues.


> If you see that the number of messages in Ready state is increasing,then nothing consumes them.
- The connection between the application server and the broker is a low latency network as I mentioned earlier(they are on the same VPC). I am logging the timestamps, received and sent message counts and frequency at both client and application server. I can see from the logs that the server is always connected to the broker. Also all the ready messages belong to the queue of the application server. More importantly, the ready messages do not start increasing right away. I am attaching the graph from the rabbitmq management plugin.

> If they are also kept in RAM, by far the most likely reason is that they are not matched by a policy that is supposed to make them lazy.
I checked, they are not in the RAM(only a few hundred are). All queues are durable(QoS1) and messages are persistent(QoS1).

I am also attaching the client and server logs, Memory and CPU utilization and Network In and Out graphs here. It would be very helpful If you could have a look at them. I am not able to understand the sudden drop in traffic after 5 hours as visible from the graphs.
I am running two separate scripts for clients(each making 15000 connections and sending 100 messages/s). The server should receive 200 messages in total - for 175 it replies back with 100 byte messages and for 25 it replies back with 40kb messages.

The columns of client<number>_latency.csv stand for the following in order:-
Timestamp (in seconds), connected users, small message receive frequency(per minute), small message latency(in milliseconds), large message receive frequency, large message latency, send message frequency (per minute), receive rate(per minute, this is the sum of small and large message frequencies).

The columns of server_frequency.csv stand for the following in order:-
Timestamp(in seconds), server connected(true or false), send frequency(per minute), receive frequency(per minute). 

The timestamps in the logs files are in UTC(sorry for the inconsistency between time zones in the graphs and logs). I will also provide anything else you need.

Also like I asked earlier, is it possible to only increase prefetch count for one connection instead of global setting? 
server1_frequency.csv
client1_latency.csv
client2_latency.csv
ready_messages.png
network_traffic.png
cpu_memory_utilization.png
Reply all
Reply to author
Forward
0 new messages