Question about distribution warning

151 views
Skip to first unread message

Radu Marian

unread,
Dec 3, 2025, 8:49:10 AM12/3/25
to rabbitmq-users
Hi RabbitMQ community,

We have a RabbitMQ three-node cluster installation on a system and we keep getting these warnings quite often for some time:

 [warning] <0.192.0> rabbit_sysmon_handler busy_dist_port <0.1805.0> [{initial_call,{rabbit_channel,init,1}},{erlang,bif_return_trap,2},{message_queue_len,4}] {#Port<0.16>,unknown}

We checked the cluster and there are no node  failures or other issue at these times.

We increased the distribution buffer to 1GB (as instructed in https://www.rabbitmq.com/docs/runtime#distribution-buffer) but that didn't make these warnings go away sadly.

Yesterday I installed the Grafana distribution monitoring and was able to capture some data when such a warning happens:

op-event-bus-3-distribution-stats.png

The warning happened at 16:38 and at that time I see a spike in both the data buffered in the distribution queue and the port driver buffer.

I would like to get a better understanding of what is going on:
  1. Is the warning caused by the ~ 8KB data in the port driver or the ~ 30 KB data in the distribution queue?
  2. What are the differences between these two metrics?
  3. Can we tune something else besides the RABBITMQ_DISTRIBUTION_BUFFER_SIZE ?
  4. What implications can such spikes have on the performance and stability of a RabbitMQ cluster?

RabbitMQ version: 4.1.2 (but also tested that it is reproducible with the latest 4.2)
Erlang version: 27.3.4.1

Thank you,
Radu.

jo...@cloudamqp.com

unread,
Dec 10, 2025, 3:16:14 PM12/10/25
to rabbitmq-users
Hi,

Are you using quorum queues, classic queues, both, or something else? 
Most likely you are sending some big messages at occasions that need to traverse the distribution link? You can check with the Prometheus stats for message size histogram.

Q: What implications can such spikes have on the performance and stability of a RabbitMQ cluster?
If there is not more data than you are showing in the plot it is usually only higher latency.

/Johan

Radu Marian

unread,
Jan 12, 2026, 4:14:37 AMJan 12
to rabbitmq-users
Hi Johan,

We are using a three replica quorum queue.

I managed to capture some data about message sizes when it happened again:

Node #1

-# HELP rabbitmq_message_size_bytes Size of messages received from publishers
rabbitmq_message_size_bytes_sum{protocol="amqp10"} 0
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100"} 0
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000"} 7888410
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000"} 9463246
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000"} 9463514
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000000"} 9463514
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000000"} 9463514
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="50000000"} 9463514
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000000"} 9463514
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="+Inf"} 9463514
rabbitmq_message_size_bytes_count{protocol="amqp091"} 9463514
rabbitmq_message_size_bytes_sum{protocol="amqp091"} 7974536557

Node #2

--:-# HELP rabbitmq_message_size_bytes Size of messages received from publishers
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100"} 0
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000"} 925443
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000"} 1205169
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000"} 1205218
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000000"} 1205218
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000000"} 1205218
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="50000000"} 1205218
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000000"} 1205218
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="+Inf"} 1205218
rabbitmq_message_size_bytes_count{protocol="amqp091"} 1205218
rabbitmq_message_size_bytes_sum{protocol="amqp091"} 1091645874

Node #3

rabbitmq_message_size_bytes_bucket{protocol="amqp10",le="+Inf"} 0
rabbitmq_message_size_bytes_count{protocol="amqp10"} 0
rabbitmq_message_size_bytes_sum{protocol="amqp10"} 0
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100"} 0
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000"} 6376109
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000"} 7694154
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000"} 7694369
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="1000000"} 7694369
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="10000000"} 7694369
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="50000000"} 7694369
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="100000000"} 7694369
rabbitmq_message_size_bytes_bucket{protocol="amqp091",le="+Inf"} 7694369
rabbitmq_message_size_bytes_count{protocol="amqp091"} 7694369
rabbitmq_message_size_bytes_sum{protocol="amqp091"} 6539963921

Seems we had a few messages between 10KB and 100KB sent over the distribution link at that time. Do you think these could explain the warning?

Thanks,
Radu.

jo...@cloudamqp.com

unread,
Jan 19, 2026, 11:42:21 AMJan 19
to rabbitmq-users
Hi,

I don't think those messages are to blame. Maybe the next time it happens you can check the Erlang process info for the rabbit channel process, something like: rabbitmqctl eval 'erlang:process_info(list_to_pid("<0.1805.0>")).' (where 0.1805.0 is from the log message). That might lead to which channel and maybe queue this is happening to.

How many QQs do you have? Do you publish to many of them (fan-out) at once, occasionally? 

/Johan 

Radu Marian

unread,
Jan 21, 2026, 4:01:43 AMJan 21
to rabbitm...@googlegroups.com
Hi Johan,

The topology we use is quite simple, an exchange with a single subscribed QQ and many publishers on that exchange through multiple connections.

I have a suspicion that we might have network bottlenecks from time to time on this system.

Does RabbitMQ provide metrics to asses if communication between cluster nodes is slow?

Thanks,
Radu.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/rabbitmq-users/64c64ad3-7c1b-42b1-9c5b-7e8d494a2dcdn%40googlegroups.com.

jo...@cloudamqp.com

unread,
Jan 23, 2026, 12:39:55 PMJan 23
to rabbitmq-users
Hi,
There is one single queue on the 3-node cluster?

AFAIK there is no easy way to trace only the latency of the cluster link, but it is easy to trace the latency of publisher confirms and full round-trip latency, see the rabbitmq-perftest [https://perftest.rabbitmq.com/#running-producers-and-consumers-on-different-machines].

/Johan
Reply all
Reply to author
Forward
0 new messages