busy_dist_port warning(s) and high mem usage in v3.12.13

Roy

unread,

Jun 28, 2024, 8:36:16 PM (6 days ago) Jun 28

to rabbitmq-users

Hi everyone,

I have a RabbitMQ cluster running version 3.12.13 with Erlang 25.0.4. Each node is equipped with 252GB of memory and 814GB of local disk. Things were running smoothly for a few weeks, but recently we've started seeing a lot of "busy_dist_port" warning messages in the logs, followed by the nodes hitting the vm_memory_high_watermark.

Our monitoring system indicates spikes in internode communication during these times. I'm wondering if there's any specific area I should be tuning to avoid this high memory usage. Below, I've included a snippet of the logs and important configuration details for reference.

Any insights or suggestions would be greatly appreciated!

Logs:

2024-06-28 07:17:34.336865-05:00 [warning] <0.203.0> rabbit_sysmon_handler busy_dist_port <0.14999.114> [{initial_call,{rabbit_mqtt_reader,init,1}},{erlang,bif_return_trap,2},{message_queue_len,0}] {#Port<0.29>,unknown}

2024-06-28 07:17:35.692013-05:00 [warning] <0.203.0> rabbit_sysmon_handler busy_dist_port <0.420.115> [{initial_call,{rabbit_mqtt_reader,init,1}},{erts_internal,dsend_continue_trap,1},{message_queue_len,1}] {#Port<0.29>,unknown}

2024-06-28 07:17:36.448316-05:00 [warning] <0.203.0> rabbit_sysmon_handler busy_dist_port <0.420.115> [{initial_call,{rabbit_mqtt_reader,init,1}},{erts_internal,dsend_continue_trap,1},{message_queue_len,1}] {#Port<0.29>,unknown}

2024-06-28 07:17:37.341571-05:00 [warning] <0.203.0> rabbit_sysmon_handler busy_dist_port <0.420.115> [{initial_call,{rabbit_mqtt_reader,init,1}},{erts_internal,dsend_continue_trap,1},{message_queue_len,1}] {#Port<0.29>,unknown}

2024-06-28 07:17:38.333252-05:00 [warning] <0.203.0> rabbit_sysmon_handler busy_dist_port <0.17932.114> [{initial_call,{rabbit_mqtt_reader,init,1}},{erlang,bif_return_trap,2},{message_queue_len,1}] {#Port<0.29>,unknown}

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0> memory resource limit alarm set on node 'rabbit@<hostname>'.

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0>

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0> **********************************************************

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0> *** Publishers will be blocked until this alarm clears ***

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0> **********************************************************

2024-06-28 07:17:39.099611-05:00 [warning] <0.470.0>

Some of the notable settings are as follows.

Important configurations in RABBITMQ_CONF_ENV_FILE

------------------------------------------------------------------------

# file descriptor
ulimit -n 50000

Important configuration in RABBITMQ_CONFIG_FILE

------------------------------------------------------------------

## Additional network and protocol related configuration

heartbeat = 600

frame_max = 131072

initial_frame_max = 4096

channel_max = 128

## Customising TCP Listener (Socket) Configuration.

tcp_listen_options.backlog = 128

tcp_listen_options.nodelay = false

tcp_listen_options.exit_on_close = false

tcp_listen_options.buffer = 3872198

tcp_listen_options.sndbuf = 3872198

tcp_listen_options.recbuf = 3872198

vm_memory_high_watermark.relative = 0.8

vm_memory_high_watermark_paging_ratio = 0.75

memory_monitor_interval = 2500

disk_free_limit.absolute = 50MB

Message has been deleted

jo...@cloudamqp.com

unread,

Jul 2, 2024, 5:35:44 PM (3 days ago) Jul 2

to rabbitmq-users

Hi,

You can safely increase FDs to 1M. You can also increase the distribution buffer (+zdbbl) https://www.rabbitmq.com/docs/runtime#distribution-buffer.

Are you sending a lot of big messages? (This is one of the most common ways to run into this warning message)

How many queues do you have and of what type?

Note: RabbitMQ 3.12 is out of community support.

/Johan

Roy

unread,

Jul 3, 2024, 11:17:20 AM (2 days ago) Jul 3

to rabbitmq-users

Hi Johan,

Appreciate your response. I will increase the File Descriptor and Distribution buffer size.

It looks like we were sending big messages (larger than 1 GiB) during the time when we experienced the issue.

We also noticed some memory leak during that time. Attached is the Erlang Memory Allocator graph for that time frame (when the node reached the vm_memeory_high_watermark), and you can see that the “eheap_alloc” reached ~350 GB around 7:14 (and never really released all the memory. I am including/attaching a snippet of the erl crash dump also to the ticket. We are using v3.12.13 and Erlang v25.0.4.

On the side notes, is there any way to prevent/throttle the publishers from sending large messages?

Thanks

Roy

erlang_mem_allocator_graph.png

erl_crash_dump_snippet.txt

jo...@cloudamqp.com

unread,

Jul 3, 2024, 12:32:50 PM (2 days ago) Jul 3

to rabbitmq-users

It should be impossible (or at least very hard) to send 1GiB messages. The max message size is ~512 MiB (536870912 bytes) [0].

eheap_alloc unfortunately doesn't tell much about what was stuck/leaking. If you encounter the situation again you can collect some Erlang-level statistics such as largest mailboxes and biggest memory users:

rabbitmqctl eval 'rabbit_diagnostics:top_memory_use().'

rabbitmqctl eval 'rabbit_diagnostics:top_binary_refs().'

and

rabbitmqctl eval 'recon:proc_count(message_queue_len, 3).'

the observer can also be used ("rabbitmq-diagnostics observer") to see top memory users, their stack traces etc.

It is possible to limit message size on on the server side (using max_message_size in [0]). Which client are you using? Should be possible to do a check before sending the message.

Again it would be interesting to know more about your topology (number of queues, types of queues, types of connections - e.g. mqtt is mentioned in the log snippet)

[0] https://www.rabbitmq.com/docs/configure

/Johan

Reply all

Reply to author

Forward