Performance issue - 'flow' state for no reason?

1,285 views
Skip to first unread message

Alex K

unread,
Oct 21, 2015, 12:28:11 PM10/21/15
to rabbitmq-users
Hello guys,

I'm trying to measure max throughput of my RabbitMQ node. Rabbit sets 'flow' state for connections and channels though there is enough ram/cpu and consumers are 'relaxed'.

# rabbitmqctl status
[{pid,11925},
 {running_applications,
     [{rabbitmq_tracing,"RabbitMQ message logging / tracing","3.5.6"},
      {rabbitmq_management,"RabbitMQ Management Console","3.5.6"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.5.6"},
      {webmachine,"webmachine","1.10.3-rmq3.5.6-gite9359c7"},
      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.5.6-git680dba8"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.5.6"},
      {rabbit,"RabbitMQ","3.5.6"},
      {os_mon,"CPO  CXC 138 46","2.3"},
      {inets,"INETS  CXC 138 49","5.10.4"},
      {mnesia,"MNESIA  CXC 138 12","4.12.4"},
      {amqp_client,"RabbitMQ AMQP Client","3.5.6"},
      {xmerl,"XML parser","1.3.7"},
      {sasl,"SASL  CXC 138 11","2.4.1"},
      {stdlib,"ERTS  CXC 138 10","2.3"},
      {kernel,"ERTS  CXC 138 10","3.1"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:24:24] [async-threads:900] [kernel-poll:true]\n"},
 {memory,
     [{total,78163824},
      {connection_readers,337600},
      {connection_writers,76648},
      {connection_channels,444360},
      {connection_other,795224},
      {queue_procs,1584376},
      {queue_slave_procs,0},
      {plugins,961168},
      {other_proc,14590240},
      {mnesia,66432},
      {mgmt_db,766520},
      {msg_index,46856},
      {other_ets,1169176},
      {binary,8081840},
      {code,20243042},
      {atom,711569},
      {other_system,28288773}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.5},
 {vm_memory_limit,16815964160},
 {disk_free_limit,50000000},
 {disk_free,893252980736},
 {file_descriptors,
     [{total_limit,249900},
      {total_used,28},
      {sockets_limit,224908},
      {sockets_used,26}]},
 {processes,[{limit,1048576},{used,494}]},
 {run_queue,0},
 {uptime,10781}]

Server specs: Cent OS, 24Cores E5-...@2.00GHz, 32Gb RAM. Tests are performed over local network. RabbitMQ servers' bandwidth is 10Gb/s. 

My sample message size is  ~900 bytes. I only have 1 queue and 'topic' routing. Queue is durable, messages are not persistent. Consumers simply read the message and send ack. Producers try to send as much messages as they can in endless loop.

/etc/rabbitmq/rabbitmq-env.conf: RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+A 512"

Here are few screenshots:

1. Queue is in 'running' state. Consumer utilization is 100%.


2. Connections and channels are in 'flow' state


All screenshots were made at the same minute. 

Ignore prefetch value on screenshot. Tested it with 0, 1, 10 and 1000. So 0, 10 and 1000 behaved the same but setting it to 1 made total max throughput even a bit worse.


Server load during the test is: 

- Network IO: 200Mb/s (out of 10Gb/s)

- CPU load: ~20%. Half of cores were completely idle.

- Memory utilization: 600Mb (out of 32Gb)


Adding more consumers doesn't help. Adding more producers doesn't help either because connections are already 'flow'. 


What else can I measure/tune to be able to load CPU and all cores? Happy to share more info to detect the bottleneck.


Thanks in advance!

Alex


Michael Klishin

unread,
Oct 21, 2015, 12:31:09 PM10/21/15
to rabbitm...@googlegroups.com, Alex K
On 21 Oct 2015 at 19:28:14, Alex K (kazekoa...@gmail.com) wrote:
> I'm trying to measure max throughput of my RabbitMQ node. Rabbit
> sets 'flow' state for connections and channels though there
> is enough ram/cpu and consumers are 'relaxed’.

Flow control is not only resource-driven, when some parts of RabbitMQ cannot
keep up with other parts, publishers are throttled temporarily. Internal flow control can toggle
many times a second.

More details: 
http://videlalvaro.github.io/2013/09/rabbitmq-internals-credit-flow-for-erlang-processes.html
https://www.rabbitmq.com/blog/2015/10/06/new-credit-flow-settings-on-rabbitmq-3-5-5/
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Alex K

unread,
Oct 22, 2015, 5:27:40 AM10/22/15
to rabbitmq-users, kazekoa...@gmail.com
Michael, 

Thanks for useful links!

But:
1. My messages are not persistent. So it seems it's not message store blocking my channel.
2. I have many (extra) consumers and adding more of them does not let publishers to publish faster.
3. I have 1 queue only.
4. Queue state is running but Publishers' Channels are in flow state.

How can modification of credit_flow_default_credit help? It gives you X after you spend X. Let me ask another question - how can I get maximum from RabbitMQ on this powerful server? Let's assume I'm sure I always have more consumers than I need so queue growth is not a problem for me. How can I publish faster? How can I detect why exactly RabbitMQ blocks Reader and Channel?

Thanks!
Alex

Alvaro Videla

unread,
Oct 22, 2015, 5:29:51 AM10/22/15
to rabbitmq-users, kazekoa...@gmail.com
Paging performance affects both persistent and transient messages. When RabbitMQ is under memory pressure it will page messages to disk, whether they are persistent or not.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 22, 2015, 5:31:49 AM10/22/15
to rabbitm...@googlegroups.com, Alex K
On 22 Oct 2015 at 12:27:45, Alex K (kazekoa...@gmail.com) wrote:
> 1. My messages are not persistent. So it seems it's not message
> store blocking my channel.

It doesn’t have to be the message store.

> 2. I have many (extra) consumers and adding more of them does not
> let publishers to publish faster.

Channels and queues have certain throughput limits. Having more consumers
won’t help with that and internal flow control will still kick in every so often.

> 3. I have 1 queue only.

It doesn’t matter (and, in fact, worst case scenario because 1 queue will only use 1 CPU core).

> 4. Queue state is running but Publishers' Channels are in flow
> state.

Publishers send [Erlang] messages to queues. It is *publishers* that will be blocked,
not both sender and receiver (again, I’m talking about Erlang processes here, not RabbitMQ clients).

Take a look at this plugin, which in a way trades message ordering for higher parallelism and
total throughput, making internal flow control much less noticeable:
http://github.com/rabbitmq/rabbitmq-sharding 
Reply all
Reply to author
Forward
0 new messages