Stalls between prefetches in RabbitMQ

205 views
Skip to first unread message

Roy Reznik

unread,
Jun 22, 2017, 1:25:35 PM6/22/17
to rabbitmq-users
Hi,

We're seeing very large stalls between prefetches in our consumer.
We see that we're handling a batch of messages very quickly, and then the thread who handled the messages waits for a new task (probably because it was from a pool), and a very long time (10s of seconds, sometimes even 90 seconds) pass before a new thread starts handling another batch.
We're using Java, RabbitMQ 3.6.10 & Erlang 19.3.6.
We can try to increase the prefetch count, but why does it take so long to fetch a batch?
Our current prefetch count is 250.
Message size is ~2K.
Our RabbitMQ node has plenty of free CPU, RAM, IO and network bandwidth.

Thanks,
Roy.

Michael Klishin

unread,
Jun 22, 2017, 1:48:36 PM6/22/17
to rabbitm...@googlegroups.com
Use Wireshark to see when exactly messages are sent (as well as acknowledgements).

RabbitMQ does not deliver batches but messages one by one, when prefetch allows for it and queue processes are not busy doing something else, such as moving messages to disk or garbage collection.

Message breakdown for individual queues, GC activity for individual queues as well as node as a whole, consumer utilisation and effective QoS cam all be inspected via management UI. Those values 
help reasoning about the system.

Java client starting with 4.0 provides metrics as well.

Collecting data and metrics is step number 1 for any performance evaluation.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jun 22, 2017, 1:52:06 PM6/22/17
to rabbitm...@googlegroups.com
Here's a blog post that explains what Consumer Utilisation is in the UI and how it can be used
to reason about prefetch and consumer throughput (without any consumer application metrics, at least):

Here's how Java client metrics can be accessed, e.g. via JMX:

On Thu, Jun 22, 2017 at 8:48 PM, Michael Klishin <mkli...@pivotal.io> wrote:
Use Wireshark to see when exactly messages are sent (as well as acknowledgements).

RabbitMQ does not deliver batches but messages one by one, when prefetch allows for it and queue processes are not busy doing something else, such as moving messages to disk or garbage collection.

Message breakdown for individual queues, GC activity for individual queues as well as node as a whole, consumer utilisation and effective QoS cam all be inspected via management UI. Those values 
help reasoning about the system.

Java client starting with 4.0 provides metrics as well.

Collecting data and metrics is step number 1 for any performance evaluation.
On Thu, 22 Jun 2017 at 20:25, Roy Reznik <r...@adallom.com> wrote:
Hi,

We're seeing very large stalls between prefetches in our consumer.
We see that we're handling a batch of messages very quickly, and then the thread who handled the messages waits for a new task (probably because it was from a pool), and a very long time (10s of seconds, sometimes even 90 seconds) pass before a new thread starts handling another batch.
We're using Java, RabbitMQ 3.6.10 & Erlang 19.3.6.
We can try to increase the prefetch count, but why does it take so long to fetch a batch?
Our current prefetch count is 250.
Message size is ~2K.
Our RabbitMQ node has plenty of free CPU, RAM, IO and network bandwidth.

Thanks,
Roy.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Roy Reznik

unread,
Jun 22, 2017, 1:59:02 PM6/22/17
to rabbitmq-users

Hi Michael,

We see in Wireshark that basicGet doesn't return for 10s of seconds.
Consumer utilization looks like a sin function (ups and downs)
Queue metrics look like this:



To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jun 22, 2017, 1:59:14 PM6/22/17
to rabbitm...@googlegroups.com
If I had to investigate a "slower than expected" JVM-based consumer, I'd use VisualVM or
similar to profile the app and see what that may reveal about Consumer interface implementations.

If there's evidence that consumer utilisation is high (not necessarily 100% but
in that ballpark),
see CPU Utilisation and Parallelism Considerations in http://www.rabbitmq.com/queues.html.

Existing rabbitmq-users thread with some very generic advice about
throughput:


To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jun 22, 2017, 2:04:11 PM6/22/17
to rabbitm...@googlegroups.com
basic.get is polling, there is a roundtrip for each messages.
QoS prefetch only really makes sense for basic.consume (which
you definitely should use if throughput and latency matter).

With basic.consume there is no roundtrip, messages are "pushed" to the consumer
as QoS prefetch allows.

See "Receiving Messages by Subscription ("Push API")" in http://www.rabbitmq.com/api-guide.html.

According to the chart the queue isn't overloaded. "Reductions" is an abstract VM unit
so 150K isn't necessarily a high or low number, it's primarily useful for comparing how much
CPU and scheduler time different processes (queues, channels, connections) take compared to each other.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Roy Reznik

unread,
Jun 22, 2017, 2:04:30 PM6/22/17
to rabbitmq-users
We're pretty certain the problem relies in the server and not the client, that's why we haven't debugged it like this.
It looks like the server simply "halts" every once in a while.
We run all sorts of queue listing, status, etc. with rabbitmqctl, could this be holding all sorts of locks of any kind?

Roy.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Jun 22, 2017, 2:06:36 PM6/22/17
to rabbitm...@googlegroups.com
The above can be misleading.

basic.get is like HTTP GET, it's a roundtrip plus then the client sends an ack
(if manual acknowledgements are used). So it takes from 2 to 3 "hops" per message
where the last one is asynchronous.

With basic.consume the client subscribes once and receives deliveries with
optional acknowledgements: http://www.rabbitmq.com/confirms.html.

So it can be as low as 1 "hop", or 2, and both are entirely asynchronous in the protocol.

Michael Klishin

unread,
Jun 22, 2017, 2:13:07 PM6/22/17
to rabbitm...@googlegroups.com
`rabbitmqctl list_queue` does not acquire locks per se but it does query
the state of all queue processes, which means they all will be competing
for scheduler resources.

Listing via HTTP API uses tables with periodically collected stats, which
in theory should be less intrusive.

Unless you list all queues every few seconds and there are at least 1000s of them,
I'd be surprised if it blocked basic.get for 10 or 90 seconds.

Even if your connection or channel frequently go into flow control state
(throttled by their downstream components), 10s of seconds is too long.
Only resource-driven (e.g. memory) alarms can throttle for several seconds or more.

I'd suggest switching to basic.consume first.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Roy Reznik

unread,
Jun 22, 2017, 2:22:37 PM6/22/17
to rabbitmq-users
Hi Michael,

Thanks for the answers.
Our solution does use basic.consume, we did basic.get just to see if the queue is working.
I think that the fact that basic.get took 10s of seconds tells us that something is wrong, and just switching to basic.consume won't solve it.
There's something fundamentally not working here.
From the picture I pasted with the queue metrics, do they make any sense? I don't know what numbers should be expected there.

Thanks
Roy.

Michael Klishin

unread,
Jun 22, 2017, 2:33:12 PM6/22/17
to rabbitm...@googlegroups.com
I mentioned the metric above:

> According to the chart the queue isn't overloaded. "Reductions" is an abstract VM unit
> so 150K isn't necessarily a high or low number, it's primarily useful for comparing how much
> CPU and scheduler time different processes (queues, channels, connections) take compared to each other.

Please do not use basic.get if throughput is of any significance.

Steps I'd take if I were you:

 * Switch to basic.cosume
 * Increase QoS or try automatic acknowledgements as an experiment to get consumer utilisation high

Then

 * See message location breakdown (RAM, disk, paged out transient messages) on the queue page
 * See disk activity charts on the same page: are they high?
 * See what queue's GC chart on the same page says (it's in MBs, not abstract units)
 * See node I/O metrics
 * Post details about how many queues are used, how many nodes are involved, how many cores they have
 * Ideally post some kind of diagram that demonstrates how your system operates

I'd suspect that VM scheduler-to-CPU-core binding used by default can be suboptimal for your case
but no amount of CPU context switching can take 10 to 90 seconds.

The shortest road to finding what is "fundamentally not working" is through collecting and analysis metrics.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 22, 2017, 3:56:47 PM6/22/17
to rabbitmq-users
Hi Michael, thanks. Yes, I'd like to collect any interesting metrics that will help us analyze the issue.

We're already using basic.consume.
Regarding automatic acknowledges - durability is important for us so that is not an option currently.

Here is the GC chart for the entire node:


Roy.

Michael Klishin

unread,
Jun 22, 2017, 4:20:09 PM6/22/17
to rabbitm...@googlegroups.com
This chart alone only tells us that the node is far from idle.

Other metrics mentioned above, in particular for the queue(s) in question,
are more interesting.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jun 22, 2017, 4:42:08 PM6/22/17
to rabbitmq-users
Here are the queue metrics:




Roy.

Michael Klishin

unread,
Jun 22, 2017, 5:05:01 PM6/22/17
to rabbitm...@googlegroups.com
There are 302 consumers on a single queue with 2M+ messages, at 50% consumer utilisation rate.

While there is no information on the number of channels or connections those consumers are on,
this is asking for

 * rabbitmq-sharding or more queues used
 * Experiments with a higher prefetch (finding the right number is usually a matter of trial and error, with fast consumers and their memory usage monitored you can safely go into 100s)

550 deliveries/second with 2 kB is 1.1 MB or 8.8 MBit a second, which are low.

We are aware of a couple of scenarios where after a queue with a backlog and sudden rate
spikes and drops can settle on delivering at a low rate, the exact reason is yet to be investigated.

However, the recommendation above still stands. This queue is mostly busy with moving data to disk
and GC. With N queues those activities can be parallelised (to some extent, eventually message store throughput
will become the limiting factor).


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jul 5, 2017, 1:46:37 PM7/5/17
to rabbitmq-users
Eventually, we upgraded from network SSD to local SSD and that resolved the issue for a while, but the issue returned after a week or so.
We got to the conclusion that RabbitMQ is simply unable to handle the rate of 10-15K messages sized 4KB on a 220GB RAM server with 16 cores & local SSD.

Michael Klishin

unread,
Jul 5, 2017, 2:37:14 PM7/5/17
to rabbitm...@googlegroups.com
I cannot argue with your conclusions but we are aware of environments that do at least that many
messages an order of magnitude greater in size with much less RAM and fewer cores.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 5, 2017, 2:51:12 PM7/5/17
to rabbitm...@googlegroups.com
…as well as environments with monster nodes (in the 200+ GB of RAM range, massive network links) that
go well above 100K (message size distribution is unknown).

But use whatever works for you.

Roy Reznik

unread,
Jul 5, 2017, 4:59:22 PM7/5/17
to rabbitmq-users
Hi Michael,

I guess these environments probably don't have the ack/durability requirements we have (most strict ones) or they are heavily optimized and tweaked (we're using pretty basic settings) or they are using a different version which works better.
We split all of our data across different queues with hundreds of consumers that are independent of each other and still see large stalls at the *node* level, which are probably a bug/lock of some kind (this image is the *node* level message rates):


Like I said, all consumers are independent and therefore it doesn't make any sense that they all "halt" at the same second - the issue *must* be in RabbitMQ.
None of the resources are fully utilized - memory, CPU or IOPS.

Roy.

Michael Klishin

unread,
Jul 5, 2017, 5:50:07 PM7/5/17
to rabbitm...@googlegroups.com
There are pretty much no "node wide locks" in RabbitMQ and even GC is per-process, not stop-the-world.
However, there are resources that are inherently shared by everything on a node: two message stores,
a file handle cache, internal database schema, and a few others. Except for the message stores
and file handle cache, they are rarely on the hot path.

Reviewing the data provided in this thread suggests that there is a single queue with hundreds
of consumers that is expected to deliver throughput higher than X. This is a known anti-pattern.

There is a section on this scenario in the docs (http://www.rabbitmq.com/queues.html) but let me repeat:
a workload based around a single queue is limited to one CPU core on the hot path and will be
affected by said queue's GC pauses. It does not matter how many cores, network bandwidth
or RAM you have: a single queue can only go so fast. RabbitMQ was built with the assumption that
there will be multiple queues in concurrent use.

rabbitmq-sharding can let you trade off ordering for parallelism on a "single logical queue".

I cannot suggest much without access to application code and/or a fair amount of metrics — which in this thread were limited
to a chart at a time — but we hear claims that "RabbitMQ is the problem" every few days and 
in a pretty substantial % of cases they end up
being issues with applications, downstream consumer services, reliance on known anti-patterns,
users ignoring any advice given to them, and so on.

We plan on publishing a few posts about how such issues can be investigated in a manner
that does not rely on gut feeling.



To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Roy Reznik

unread,
Jul 6, 2017, 8:27:09 AM7/6/17
to rabbitmq-users
Michael,

Like I stated above, we have *hundreds of queues* with *thousands of consumers* (overall, not per queue) - we are *not* using a single queue whatsoever.
When only producers run (consumers are turned off) - they produce data *very* quickly into the node.
When only consumers run (producers are turned off) - they consume data *very* quickly from the node.
When both of them run together, the node periodically halts and behaves as stated above with very, very poor performance.

We are not ignoring advice, we are just trying to understand what are the next steps in order to debug such an issue.
Unfortunately, after reading all the data available on the internet (that we were able to find or referenced to) regarding RabbitMQ performance, and also, asking here in the forum, we were unable to find any concrete next steps.
If there is any more data that will help to debug this issue and understand why RabbitMQ is behaving so poorly, please let me know.
We also tried to use rabbit-top - seems like most of the queues are "waiting" (!?)




Roy.

Michael Klishin

unread,
Jul 6, 2017, 10:24:08 AM7/6/17
to rabbitm...@googlegroups.com
Sorry, I did not suggest that you specifically were ignoring our advice.
It's not a fairly common thing to see.

Thanks for the details, we will investigate it as time permits.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Jul 6, 2017, 10:24:26 AM7/6/17
to rabbitm...@googlegroups.com
This should read "I did not mean to suggest".

Roy Reznik

unread,
Jul 6, 2017, 4:36:30 PM7/6/17
to rabbitmq-users
Thanks.

Is there any way to extract stack trace of currently running tasks?
We want to try and see what it halts on.

Roy.

Michael Klishin

unread,
Jul 6, 2017, 6:31:00 PM7/6/17
to rabbitm...@googlegroups.com
Current stack trace changes many times a second and is not a reliable way.
Asking every process for its state can be a very expensive operation for production systems.

Erlang profiling tools are all based on VM tracing: dbg, Recon [1], and so on.
Except for dbg, it means a certain library must be on code path.

Some examples of how the observer app and tracing can be used on remote nodes:


I'd highly recommend experimenting with this on a non-production node first,
e.g. loaded with PerfTest [2].


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages