Hi,
I updated my broker to 3.6.2 and ran the
simulations again, this time with prefetch count = 30 and
200 messages/second from the client side. I started the simulation at 23:50 IST (UTC+5:30).
Again everything worked fine
for some time. The send and receive message counts and rates were the same at the client and matched the counts and the
rates at the application server. The ready messages count was zero as seen from the graph on rabbitmq management web UI. But after around 5 hours, the same thing
happened - network traffic dropped within a few minutes(5-10 minutes
approximately) and so increased the latency of the messages(here I calculate
the latency by averaging the time in milliseconds it takes for any
message to go from client to app server and receive a reply back at the
client). Network traffic(at broker) and latency measured(at client) have been stable since then. The clients are still sending 200 messages/s but the receive rate at application server is lower(and almost constant). The receive rate at client matches the send(as well as receive) rate at the server so messages are not being queued up in client queues.
> If you see that the number of messages in Ready state is increasing,then nothing consumes them.
- The
connection between the application server and the broker is a low
latency network as I mentioned earlier(they are on the same VPC). I am
logging the timestamps, received and sent message counts and frequency at
both client and application server. I can see from the logs that the server is
always connected to the broker. Also all the ready messages belong to the queue of the application server. More importantly, the ready messages do not start increasing right away. I am attaching the graph from the rabbitmq management plugin.
> If they are also kept in RAM, by far the most likely reason is that they
are not matched by a policy that is supposed to make them lazy.
I checked, they are not in the RAM(only a few hundred are). All queues are durable(QoS1) and messages are persistent(QoS1).
I am also attaching the client and server logs, Memory and CPU utilization and Network In and Out graphs here. It would be very helpful If you could have a look at them. I am not able to understand the sudden drop in traffic after 5 hours as visible from the graphs.
I am running two separate scripts for clients(each making 15000 connections and sending 100 messages/s). The server should receive 200 messages in total - for 175 it replies back with 100 byte messages and for 25 it replies back with 40kb messages.
The columns of client<number>_latency.csv stand for the following in order:-
Timestamp (in seconds), connected users, small message receive frequency(per minute), small message latency(in milliseconds), large message receive frequency, large message latency, send message frequency (per minute), receive rate(per minute, this is the sum of small and large message frequencies).
The columns of server_frequency.csv stand for the following in order:-
Timestamp(in seconds), server connected(true or false), send frequency(per minute), receive frequency(per minute).
The timestamps in the logs files are in UTC(sorry for the inconsistency between time zones in the graphs and logs). I will also provide anything else you need.
Also like I asked earlier, is it possible to only increase prefetch count for one connection instead of global setting?