Question about distribution warning

47 views
Skip to first unread message

Radu Marian

unread,
Dec 3, 2025, 8:49:10 AM (11 days ago) Dec 3
to rabbitmq-users
Hi RabbitMQ community,

We have a RabbitMQ three-node cluster installation on a system and we keep getting these warnings quite often for some time:

 [warning] <0.192.0> rabbit_sysmon_handler busy_dist_port <0.1805.0> [{initial_call,{rabbit_channel,init,1}},{erlang,bif_return_trap,2},{message_queue_len,4}] {#Port<0.16>,unknown}

We checked the cluster and there are no node  failures or other issue at these times.

We increased the distribution buffer to 1GB (as instructed in https://www.rabbitmq.com/docs/runtime#distribution-buffer) but that didn't make these warnings go away sadly.

Yesterday I installed the Grafana distribution monitoring and was able to capture some data when such a warning happens:

op-event-bus-3-distribution-stats.png

The warning happened at 16:38 and at that time I see a spike in both the data buffered in the distribution queue and the port driver buffer.

I would like to get a better understanding of what is going on:
  1. Is the warning caused by the ~ 8KB data in the port driver or the ~ 30 KB data in the distribution queue?
  2. What are the differences between these two metrics?
  3. Can we tune something else besides the RABBITMQ_DISTRIBUTION_BUFFER_SIZE ?
  4. What implications can such spikes have on the performance and stability of a RabbitMQ cluster?

RabbitMQ version: 4.1.2 (but also tested that it is reproducible with the latest 4.2)
Erlang version: 27.3.4.1

Thank you,
Radu.

jo...@cloudamqp.com

unread,
Dec 10, 2025, 3:16:14 PM (4 days ago) Dec 10
to rabbitmq-users
Hi,

Are you using quorum queues, classic queues, both, or something else? 
Most likely you are sending some big messages at occasions that need to traverse the distribution link? You can check with the Prometheus stats for message size histogram.

Q: What implications can such spikes have on the performance and stability of a RabbitMQ cluster?
If there is not more data than you are showing in the plot it is usually only higher latency.

/Johan
Reply all
Reply to author
Forward
0 new messages