Network partitioned and erlang_vm_dist_port_queue_size_bytes

20 views
Skip to first unread message

Xiaoyan Song

unread,
Sep 30, 2025, 7:24:04 AM (7 days ago) Sep 30
to rabbitmq-users
Dear support,

We are running 3 node cluster with rmq 3.12.14 and erlang 25.3.2.21.
All queues are mirror (exactly 2) queues.
RABBITMQ_DISTRIBUTION_BUFFER_SIZE=1048576.

When we try to stop one node for maintenance, we meet another 2 nodes network partitioned 1-2 minutes right after node0 reset and stopped from the cluster. 

From Prometheus/Grafana dashboard, we observed 
6:29:30:  erlang_vm_dist_port_queue_size_bytes > 75k
6:32:00:  erlang_vm_dist_node_queue_size_bytes >  200M (less than 1048576 (1GB)).

6:30:18: rmq1/2 node logged: rmq0 down.
6:32:09:  rmq2/1 network partitioned.

I find when erlang_vm_dist_node_queue_size_bytes > 1GB, almost the cluster will get brain-split.  But this time, it is only 200MB.

Any difference between erlang_vm_dist_port_queue_size_bytes  and erlang_vm_dist_node_queue_size_bytes ?

What value in these metrics will trigger network partitioned for mirroring queue scenarios?

Many thanks if could share some documentation related.

BR
xiaoyan


Reply all
Reply to author
Forward
0 new messages