Hi RabbitMQ community,
We have a RabbitMQ three-node cluster installation on a system and we keep getting these warnings quite often for some time:
[warning] <0.192.0> rabbit_sysmon_handler busy_dist_port <0.1805.0> [{initial_call,{rabbit_channel,init,1}},{erlang,bif_return_trap,2},{message_queue_len,4}] {#Port<0.16>,unknown}
We checked the cluster and there are no node failures or other issue at these times.
Yesterday I installed the Grafana distribution monitoring and was able to capture some data when such a warning happens:
The warning happened at 16:38 and at that time I see a spike in both the data buffered in the distribution queue and the port driver buffer.
I would like to get a better understanding of what is going on:
- Is the warning caused by the ~ 8KB data in the port driver or the ~ 30 KB data in the distribution queue?
- What are the differences between these two metrics?
- Can we tune something else besides the RABBITMQ_DISTRIBUTION_BUFFER_SIZE ?
- What implications can such spikes have on the performance and stability of a RabbitMQ cluster?
RabbitMQ version: 4.1.2 (but also tested that it is reproducible with the latest 4.2)
Erlang version: 27.3.4.1
Thank you,
Radu.