Please help me understand more details of flow control

1,377 views
Skip to first unread message

Scott Nichol

unread,
Mar 5, 2015, 5:27:00 PM3/5/15
to rabbitm...@googlegroups.com
I've read several things about flow control including http://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/http://www.rabbitmq.com/memory.html and the venerable and likely outdated http://www.lshift.net/wp-content/uploads/2009/12/slides.pdf.  I definitely understand flow control due to alarms.  I understand in theory the information in the "finding bottlenecks" link.  However, nothing is really satisfying my desire to understand the root cause of an ongoing intermittent flow control issue I have been experiencing for quite a while.

I have attached the rabbitmqctl report output for two scenarios, one when our system exhibits "healthy" behavior, the other when a main publishing connection and its channel are in flow.

In order to eliminate as many red herrings as possible, this RabbitMQ instance is in a dev environment being accessed by only a few applications.  Of particular interest regarding flow, there is one connection with a single channel that publishes to a topic exchange named exchange.publish.  The messages published to exchange.publish are routed to a durable queue named jenga.danube.  There is another connection with eight channels, where each channel has a consumer for jenga.danube with a prefetch of 1000.  (Note that I have tried many different prefetch values, none of which avoids flow altogether.)

The message publisher is very bursty.  The consumers are slow relative to the bursts, but over time windows of hours the queues always clear.  We see the flow scenario on both Windows and Linux and on both modest and quite overpowered (24 cores with SSDs) machines.  I never see indications in tools like iostat, vmstat or perfmon that the machines are stressed.

I can provide many more details, but none are entirely germane to my core question for this post, which is whether there is anything in the report output that indicates why the publishing connection is in flow?  I'm looking for something concrete that I might be able to act on.  My secondary question is why in the flow scenario are messages being provided to the consumers at such a slow rate, with none unacknowledged, as if there was no prefetch?

If there is more information I can provide to help get to the bottom of this, I would be happy to oblige.  I am in a rare period where I have time to pursue truly nagging questions.

TIA.

Scott



report-healthy2.txt
report-flow.txt

Michael Klishin

unread,
Mar 5, 2015, 6:13:50 PM3/5/15
to rabbitm...@googlegroups.com, Scott Nichol
On 6 March 2015 at 01:27:02, Scott Nichol (charlessc...@gmail.com) wrote:
> In order to eliminate as many red herrings as possible, this
> RabbitMQ instance is in a dev environment being accessed by only
> a few applications. Of particular interest regarding flow,
> there is one connection with a single channel that publishes
> to a topic exchange named exchange.publish. The messages published
> to exchange.publish are routed to a durable queue named jenga.danube.
> There is another connection with eight channels, where each
> channel has a consumer for jenga.danube with a prefetch of 1000.
> (Note that I have tried many different prefetch values, none
> of which avoids flow altogether.)
>
> The message publisher is very bursty. The consumers are slow
> relative to the bursts, but over time windows of hours the queues
> always clear. We see the flow scenario on both Windows and Linux
> and on both modest and quite overpowered (24 cores with SSDs)
> machines. I never see indications in tools like iostat, vmstat
> or perfmon that the machines are stressed.
>
> I can provide many more details, but none are entirely germane
> to my core question for this post, which is whether there is anything
> in the report output that indicates why the publishing connection
> is in flow?

Developers tend to see systems (e.g. a running RabbitMQ node) as mostly constant:
if you continuously feed it inputs of X, you should get Ys pretty much always.

Unfortunately, real world systems (software, hardware, anything) are rarely
perfectly stable. There's quite a bit going on even on an idle machine. Disk performance
is not constant, CPU performance is not constant (modern CPUs almost always have
bursting which they employ for periods of time as they see fit), network throughput
is not constant. You may have "busy neighbours" in your environment
without realising it (certainly very common in public cloud environments but also
in private ones when it's not uncommon for VMs to share a NAS device, for instance).

On top of that, the OS kernel and runtimes (the Erlang VM in our case) employ sophisticated
algorithms in their schedulers that are not necessarily easy to predict. They can do things
differently this minute compared to the previous one depending on many factors.

As the result, throughput of your system *will* vary over time, even if the fluctuations are small
and sometimes very hard to observe even with solid monitoring in place.

Given all this, why do you see channels go into flow control for brief periods of time?
RabbitMQ is built with Erlang and uses a bunch of processes for all kinds of things internally.
Channels are represented as processes, for example. So are queues.
When [AMQP 0-9-1/STOMP/MQTT/etc] messages come in, they are parsed and passed along as Erlang
messages between processes. Processes have mailboxes — queues (as in data structures, not RabbitMQ
queues) that grow if more messages come in on average than is processed. This means that if
process A cannot keep up with process B and B continues sending it messages, A's mailbox will
keep growing, consuming more and more RAM. To guard against this, RabbitMQ has a small
utility module — credit_flow — that allows processes to control how many messages they are
allowed to send according to the receiving end.

If your channel runs into flow control, it means it tries to go faster than some other part
of the node, e.g. a queue or channel writer (a helper process that serialises protocol methods and
sends resulting data to the socket). This can happen for a bunch of reasons:

 * Some processes do a lot more work than others
 * Some processes (queues) have to do disk or network I/O, which is expensive relative to in-memory operations
 * The runtime has prioritised scheduling some processes over others
 * I/O operations temporarily run slower (e.g. a busy neighbour has loaded the network)
 * Clients that consume messages became slower
 * RabbitMQ decided to move a lot of messages to or from disk , e.g. as more or less RAM became available

and so on. Some part of the system temporarily becomes a bottleneck. Some are more likely to become
the bottleneck than others:

 * Those that do I/O, in particular disk I/O
 * Those that may transfer messages in bulk
 * Some processes, in particular channels, are naturally hot spots because so much goes through them

> I'm looking for something concrete that I might be 
> able to act on.

You need to first understand what part of the node slows down. Use tools such as iostat, for example.
Collect CPU usage metrics. Once you have a decent understanding of what it is, you'll probably know
how to act on that. RabbitMQ 3.5.0 will display some I/O statistics right in the management UI. Traffic
to and from client on a connection is displayed in any reasonably recent release.

Sorry, there is little room for generic advice when it comes to tuning for throughput. Only measurements
can tell what exactly is the bottleneck.

Be prepared that the reason may be less than obvious. Erlang VM generally does a great job of scheduling
things under load but it cannot cover every workload equally well (more on this in [1][4]).
It has quite a few flags to tune [2]. Finding the right balance with them requires a lot of experimentation.
It may be easier and/or cheaper to just "throw more hardware at it" in some cases.

Before you blame the VM, keep in mind that it has its sophisticated scheduler for a reason:
the infamous GC pause problem pretty much does not exist in Erlang (certain language aspects
helped make this possible). So it may take a very smart person to outsmart it.

Another thing we mention over and over: eliminating flow control is trading [CPU] time
for space (RAM). Once you go above a particular threshold, you either hit resource
flow control or your process gets killed by the OOM killer (or similar). Pyrrhic victory.

Alternatively you
can drop some data on the floor — that is not a particularly popular option but future
RabbitMQ versions may include featured that would allow you to make the queue delete
messages instead of paging them to disk under memory pressure (as an example).

1. http://jlouisramblings.blogspot.ru/2013/01/how-erlang-does-scheduling.html
2. http://docs.basho.com/riak/2.0.4/ops/tuning/erlang/
3. http://erlang.org/doc/man/erl.html
4. https://www.erlang-solutions.com/resources/webinars/understanding-erlang-scheduler
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Scott Nichol

unread,
Mar 5, 2015, 10:55:37 PM3/5/15
to rabbitm...@googlegroups.com
Thanks for the quick reply and for pointing me to credit_flow.  Now I understand that where http://www.rabbitmq.com/blog/2014/04/14/finding-bottlenecks-with-rabbitmq-3-3/ says 

  • A connection is in flow control, some of its channels are, but none of the queues it is publishing to are - This means that one or more of the queues is the bottleneck; the server is either CPU-bound on accepting messages into the queue or I/O-bound on writing queue indexes to disc. This is most likely to be seen when publishing small persistent messages.
I shouldn't think of CPU-bound or IO-bound in terms of the typical observables (vmstat, iostat, perfmon), but as the exhaustion of credit from, say, rabbit_msg_store.  I don't need to wonder why flow is indicating I am CPU- or IO-bound when OS monitoring tools are telling me that peak CPU usage is 8% or peak disk writes are 100/sec.

Now, of course, I am wondering how you guys came up with the magic values for DEFAULT_CREDIT and CREDIT_DISC_BOUND.  I am also thinking that if I build the server myself, I can instrument a couple of modules to see what's going on (exactly what is causing flow) and/or adjust the magic values to see if I can avoid flow in my usage scenario or even disable flow to see if I ever come close to running out of RAM.

Michael Klishin

unread,
Mar 6, 2015, 3:11:27 AM3/6/15
to Scott Nichol, rabbitm...@googlegroups.com
Note that some very relevant values (e.g. VM memory high watermark, paging ration) can be configured via config file. Admittedly their defaults are also quite conservative and we need to document more reasonable values for machines with, say, 8 or more GB of RAM.

We are curious to know if tweaking credit flow limits solves the problem for you and what kind of hardware configuration you target.

Please let us know.

MK

Scott Nichol

unread,
Mar 6, 2015, 8:04:56 AM3/6/15
to rabbitm...@googlegroups.com
I am glad you brought up the other configurable parameters.  I really haven't exhausted my attempts to alleviate my problems by adjusting those.  I got side tracked months back when I was working on them because I thought I had discovered the holy grail (adjusting erlang's async threads), but that has not rid us of our problems.  The prospect of doing some code changes and finding the root cause is exciting, but that route may not get me to a solution any faster than adjusting the knobs already provided.


On Thursday, March 5, 2015 at 5:27:00 PM UTC-5, Scott Nichol wrote:

Michael Klishin

unread,
Mar 6, 2015, 8:11:43 AM3/6/15
to rabbitm...@googlegroups.com, Scott Nichol
 On 6 March 2015 at 16:04:58, Scott Nichol (charlessc...@gmail.com) wrote:
> I am glad you brought up the other configurable parameters.
> I really haven't exhausted my attempts to alleviate my problems
> by adjusting those. I got side tracked months back when I was working
> on them because I thought I had discovered the holy grail (adjusting
> erlang's async threads), but that has not rid us of our problems.

I'm curious what kind of tweaking to async threads you had to do, on what Erlang
version and with what kind of hardware? What kind of effect did it have? 

Simon MacMullen

unread,
Mar 6, 2015, 8:32:37 AM3/6/15
to Michael Klishin, rabbitm...@googlegroups.com, Scott Nichol
Note that our current thinking on async-thread tuning is here:
http://next.rabbitmq.com/persistence-conf.html#async-threads

It's only on next.rabbitmq.com because it talks about I/O metrics that
will be in 3.5.0, but the underlying causes are identical.

Cheers, Simon

Scott Nichol

unread,
Mar 11, 2015, 9:28:56 PM3/11/15
to rabbitm...@googlegroups.com, charlessc...@gmail.com
I'm curious what kind of tweaking to async threads you had to do, on what Erlang
version and with what kind of hardware? What kind of effect did it have? 
--  
MK  

Staff Software Engineer, Pivotal/RabbitMQ

This reply is going to answer the above question, but also go into detail about the original post and what I have discovered regarding it.  In short, the problems we experienced correlated with a much larger than usual proportion of our messages being around 160k in size rather than 2-10k.  Ultimately, the investigation revealed that restricting RabbitMQ memory counter-intuitively allowed the messages to be process normally.  Of course, that's just our app, YMMV.

TL;DR

I want to elaborate on the observed issues we had, our hardware and software configuration, and what we ultimately found.

A more complete description of our symptoms:

A queue has 2-8 million persistent messages totaling 20-80 GB on disk.  When either the publisher or consumer application is run alone, processing is "normal", which means little to no flow for the publisher, and the consumer keeps getting fed messages consistent with its prefetch (the consumer app has 8 channels with one consumer each and prefetch 100).  Other queues on the same machine have 10,000 or fewer messages and publishers and consumers are working fine.

The problems start when the publisher and consumer are both running.

1. In the management console, queue stats show 800 unacked (consistent with prefetch), a slower than usual ack rate (<10/sec), and no flow on the queue, channel or connection (consumer).  The consumer channels show 0 unacked messages (inconsistent with the queue stats claim).  The publisher channel and connection show flow and 0 messages published.  This is not a transient state.  I watched it continue for 30 minutes once before I gave up and stopped the applications.
2. Running "rabbitmqctl report" pauses after printing "Queues on /:".
3. Attempting to shutdown the consumer app gracefully pauses at the first call to basicCancel; the erlang process is issuing reads during this time.  I only noticed this on a Windows machine where I was running Process Monitor.  Our Linux machines have so much RAM the entire mnesia database is cached, so reads never show up in iostat.
4. After anywhere from 3-15 minutes, the shutdown does finally complete, erlang reads stop, the "rabbitmqctl report" completes and the publisher resumes publishing messages.

This behavior is unusual.  We have run tens if not hundreds of billions of messages through RabbitMQ and have seen some problems, but none as dramatic as this.

I pursued the usual avenues to understand what was going on: posted to this list, played with configuration parameters, read the rabbitmq-server source, read the erlang tutorial, read the rabbitmq-server source again ;-), read the Java client source (a relief because I could understand it).

The breakthrough came when I realized that some of our machines were hardly affected by this problem, if at all, even though they were handling the same stream of messages.  The "happy" machines were homogeneous: all OpenStack VMs with 4 cpus, 8 GB of RAM and SSD storage running RabbitMQ 3.1.5 on R14B04 on RHEL 6.  The "unhappy" machines were a mixed bag, all running newer versions of RabbitMQ.  The most obvious difference (to me) was that the unhappy machines had spinning disks.  When I stopped fixating on that, I started methodically configuring away the differences.  The "unhappy" machines were all beefy: 16-24 CPUs, 32-160 GB RAM.  I used Erlang +S to reduce the effective CPU count, Erlang +A to try to aid the slower disks and RabbitMQ vm_memory_high_watermark to reduce the available memory.

What I found in my scenario was:

1. Bumping +A to 512 had an adverse affect. RabbitMQ was unable to keep the prefetch filled when the consumer was running alone.  I stopped varying it after I made that observation.
2. Specifying +S 8 or 4 had no apparent affect.
3. Specifying vm_memory_high_watermark of 0.02 on a 160 GB linux machine got rid of all observed problems.  Bumping up to 0.03 was similar with a small throughput improvement (as measured by our application logs, not the management console).  Bumping up to 0.04 looked good at first, but then started a "flow oscillation" on the publisher application, in which the publisher would be blocked from publishing anything for 10-30 seconds at a time, then would return to normal for a minute or so.
4. Specifying vm_memory_high_watermark of 0.1 on a 32 GB Windows machine, combined with the reduction of prefetch from 100 to 20, got rid of all observed problems on that machine.

I have continued to observe the servers that have been configured for reduced memory limits.  The large messages that created the initial issues have passed through those systems.  There are still a few million messages in the queues.  Throughput, as measured by our application logs, is a bit better than it had been with the default configuration.

I have posted this in case it can help anyone that has a similar experience as well as to alert the RabbitMQ developers to this scenario, which may or may not be specific to my application.

Raymond Rizzuto

unread,
May 12, 2015, 9:41:56 AM5/12/15
to rabbitm...@googlegroups.com, charlessc...@gmail.com
Does RabbitMQ yet have the option of dropping messages instead of applying back-pressure to the publisher?  In our system we are receiving messages via multicast, and want to publish them to consumers via RabbitMQ.  If there is a high load that persists long enough to reach memory limits, we'd ideally prefer to drop messages.  Ideally we'd like to drop the oldest message that is in memory and not persisted since newer messages are more valuable.

Michael Klishin

unread,
May 12, 2015, 9:57:32 AM5/12/15
to rabbitm...@googlegroups.com, Raymond Rizzuto, charlessc...@gmail.com
On 12 May 2015 at 16:41:57, Raymond Rizzuto (ray.r...@gmail.com) wrote:
> Does RabbitMQ yet have the option of dropping messages instead
> of applying back-pressure to the publisher? In our system we
> are receiving messages via multicast, and want to publish them
> to consumers via RabbitMQ. If there is a high load that persists
> long enough to reach memory limits, we'd ideally prefer to drop
> messages. Ideally we'd like to drop the oldest message that is
> in memory and not persisted since newer messages are more valuable.

Here are the relevant features we have at the moment :
https://www.rabbitmq.com/blog/2014/01/23/preventing-unbounded-buffers-with-rabbitmq/

Skrzypek, Jonathan

unread,
May 12, 2015, 10:25:35 AM5/12/15
to Michael Klishin, rabbitm...@googlegroups.com, Scott Nichol
Hi,

This is a very nice write up !
Would you consider putting it on rabbitmq.com ?
I know credit and flow control already have articles/blogs, but your paragraphs are more generic and give a good overview
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send an email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
May 12, 2015, 10:42:55 AM5/12/15
to Skrzypek, Jonathan, rabbitm...@googlegroups.com
On 12 May 2015 at 17:25:34, Skrzypek, Jonathan (jonathan...@gs.com) wrote:
> This is a very nice write up !
> Would you consider putting it on rabbitmq.com ?
> I know credit and flow control already have articles/blogs,
> but your paragraphs are more generic and give a good overview

It may be worth putting on our blog, with some editing.

We'll see. 

Raymond Rizzuto

unread,
May 12, 2015, 12:12:00 PM5/12/15
to rabbitm...@googlegroups.com, ray.r...@gmail.com, charlessc...@gmail.com
I don't think any of the current options quite fit my need, but I am not quite sure.  

I am planning on using persistent queues, but it may be that the publisher is publishing messages faster than they can be persisted.  I suspect this may be further aggregated by a slow consumer since there may be additional disk I/O to retrieve older messages that were evicted from memory.  In this case, I would ideally like to discard older unpersisted messages to make room for the newer messages, understanding that this creates a gap.



Alvaro Videla

unread,
May 12, 2015, 12:29:40 PM5/12/15
to Raymond Rizzuto, rabbitm...@googlegroups.com, charlessc...@gmail.com
At the moment there are no options for dropping messages instead of paging them to disk.

Have you seen this BTW, it's kinda related to how to tune your consumers/publishers https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

Raymond Rizzuto

unread,
May 12, 2015, 2:44:17 PM5/12/15
to rabbitm...@googlegroups.com, charlessc...@gmail.com, ray.r...@gmail.com


On Tuesday, May 12, 2015 at 12:29:40 PM UTC-4, Alvaro Videla wrote:
At the moment there are no options for dropping messages instead of paging them to disk.

Have you seen this BTW, it's kinda related to how to tune your consumers/publishers https://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/

Thanks, that is interesting, though my issue is on the pub side, not the consumer.  

Alvaro Videla

unread,
May 12, 2015, 3:34:58 PM5/12/15
to Raymond Rizzuto, rabbitm...@googlegroups.com, charlessc...@gmail.com
Slow consumers can ultimately lead to blocked publishers. 
--
Reply all
Reply to author
Forward
0 new messages