I'm trying to measure max throughput of my RabbitMQ node. Rabbit sets 'flow' state for connections and channels though there is enough ram/cpu and consumers are 'relaxed'.
[{pid,11925},
{running_applications,
[{rabbitmq_tracing,"RabbitMQ message logging / tracing","3.5.6"},
{rabbitmq_management,"RabbitMQ Management Console","3.5.6"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.5.6"},
{webmachine,"webmachine","1.10.3-rmq3.5.6-gite9359c7"},
{mochiweb,"MochiMedia Web Server","2.7.0-rmq3.5.6-git680dba8"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.5.6"},
{rabbit,"RabbitMQ","3.5.6"},
{os_mon,"CPO CXC 138 46","2.3"},
{inets,"INETS CXC 138 49","5.10.4"},
{mnesia,"MNESIA CXC 138 12","4.12.4"},
{amqp_client,"RabbitMQ AMQP Client","3.5.6"},
{xmerl,"XML parser","1.3.7"},
{sasl,"SASL CXC 138 11","2.4.1"},
{stdlib,"ERTS CXC 138 10","2.3"},
{kernel,"ERTS CXC 138 10","3.1"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 17 [erts-6.3] [source] [64-bit] [smp:24:24] [async-threads:900] [kernel-poll:true]\n"},
{memory,
[{total,78163824},
{connection_readers,337600},
{connection_writers,76648},
{connection_channels,444360},
{connection_other,795224},
{queue_procs,1584376},
{queue_slave_procs,0},
{plugins,961168},
{other_proc,14590240},
{mnesia,66432},
{mgmt_db,766520},
{msg_index,46856},
{other_ets,1169176},
{binary,8081840},
{code,20243042},
{atom,711569},
{other_system,28288773}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
{vm_memory_high_watermark,0.5},
{disk_free_limit,50000000},
{disk_free,893252980736},
{file_descriptors,
[{total_limit,249900},
{total_used,28},
{sockets_limit,224908},
{sockets_used,26}]},
{processes,[{limit,1048576},{used,494}]},
{run_queue,0},
{uptime,10781}]
Server specs: Cent OS, 24Cores E5-...@2.00GHz, 32Gb RAM. Tests are performed over local network. RabbitMQ servers' bandwidth is 10Gb/s.
My sample message size is ~900 bytes. I only have 1 queue and 'topic' routing. Queue is durable, messages are not persistent. Consumers simply read the message and send ack. Producers try to send as much messages as they can in endless loop.
1. Queue is in 'running' state. Consumer utilization is 100%.
2. Connections and channels are in 'flow' state
Ignore prefetch value on screenshot. Tested it with 0, 1, 10 and 1000. So 0, 10 and 1000 behaved the same but setting it to 1 made total max throughput even a bit worse.
- CPU load: ~20%. Half of cores were completely idle.
Adding more consumers doesn't help. Adding more producers doesn't help either because connections are already 'flow'.
What else can I measure/tune to be able to load CPU and all cores? Happy to share more info to detect the bottleneck.