Network partitioned and erlang_vm_dist_port_queue_size_bytes

23 views

Skip to first unread message

Xiaoyan Song

unread,

Sep 30, 2025, 7:24:04 AMSep 30

to rabbitmq-users

Dear support,

We are running 3 node cluster with rmq 3.12.14 and erlang 25.3.2.21.

All queues are mirror (exactly 2) queues.

RABBITMQ_DISTRIBUTION_BUFFER_SIZE=1048576.

When we try to stop one node for maintenance, we meet another 2 nodes network partitioned 1-2 minutes right after node0 reset and stopped from the cluster.

From Prometheus/Grafana dashboard, we observed

6:29:30: erlang_vm_dist_port_queue_size_bytes > 75k

6:32:00: erlang_vm_dist_node_queue_size_bytes > 200M (less than 1048576 (1GB)).

6:30:18: rmq1/2 node logged: rmq0 down.

6:32:09: rmq2/1 network partitioned.

I find when erlang_vm_dist_node_queue_size_bytes > 1GB, almost the cluster will get brain-split. But this time, it is only 200MB.

Any difference between erlang_vm_dist_port_queue_size_bytes and erlang_vm_dist_node_queue_size_bytes ?

What value in these metrics will trigger network partitioned for mirroring queue scenarios?