High memory usage while small number of messages/queues

1,225 views
Skip to first unread message

Dmitry Bulgak

unread,
May 8, 2021, 8:44:41 AM5/8/21
to rabbitmq-users
I've been using RabbitMQ to decouple external requests that go through Apache web server from my PHP application. What I have is quite a low load on Rabbit with total of 8 queues and about 100 msg/s, but with periodical requests spikes several times a day. These spikes may be as high as 1k connections per second, and usually last for 20 minutes. 

The problem was that Prometheus couldn't collect Rabbit status during these spikes, so it looked like that the service was down. I checked number of ephemeral  ports, CPU load, everything was good. What I had was pretty old version of RabbitMQ for CentOS 7, so I decided to update RabbitMQ to 3.8.14 (Erlang/OTP 23). However, this didn't solve the problem at all, except that now I've got a memory issue too. With total number of messages about 100 in all queues RabbitMQ may use up to 11GiB of RAM. It increases memory usage after each spike. If I restart it then it shrinks down to 200MiB. 

Finally, I've got 2 problems:
- RabbitMQ uses too much memory after requests spikes and doesn't free it up. As it can be seen from debug output bellow there are other_proc 4.3166 gb (48.89 %) and binary: 3.6104 gb (40.89 %) among top consumers. 
- during these spikes it may take up to 20 seconds of delay for "rabbitmqctl status" command, which makes monitoring system to think the service is down.

I've read the Rabbit troubleshooting documentation and tried to use 
/usr/sbin/rabbitmqctl eval 'recon:bin_leak(10).' && /usr/sbin/rabbitmqctl force_gc && rabbitmqctl eval 'rabbit_mgmt_storage:reset().'
command to force gc to claim the memory, but this hardly helps.

So I'm looking for advice. It would be nice If anybody who solved the same problem may shed a light on this.

Thanks in advance

You can find debug outputs below

---

# CentOS Linux release 7.9.2009 (Core)

# uname -a
Linux node 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# free -m
              total        used        free      shared  buff/cache   available
Mem:         128770       57306        6155       10491       65308       60201
Swap:             0           0           0

# rabbitmq.conf
log.file.level = error

tcp_listen_options.backlog = 128
tcp_listen_options.nodelay = true
tcp_listen_options.linger.on = true
tcp_listen_options.linger.timeout = 0

# rabbitmq status
warning: the VM is running with native name encoding of latin1 which may cause Elixir to malfunction as it expects utf8. Please ensure your locale is set to UTF-8 (which can be verified by running "locale" in your shell)
Status of node rabbit@node ...
Runtime

OS PID: 28315
OS: Linux
Uptime (seconds): 110749
Is under maintenance?: false
RabbitMQ version: 3.8.14
Node name: rabbit@node
Erlang configuration: Erlang/OTP 23 [erts-11.2] [source] [64-bit] [smp:32:32] [ds:32:32:10] [async-threads:1] [hipe]
Erlang processes: 602 used, 1048576 limit
Scheduler run queue: 1
Cluster heartbeat timeout (net_ticktime): 60

Plugins

Enabled plugin file: /etc/rabbitmq/enabled_plugins
Enabled plugins:

 * rabbitmq_management
 * amqp_client
 * rabbitmq_web_dispatch
 * cowboy
 * cowlib
 * rabbitmq_management_agent

Data directory

Node data directory: /var/lib/rabbitmq/mnesia/rabbit@node
Raft data directory: /var/lib/rabbitmq/mnesia/rabbit@node/quorum/rabbit@node

Config files

 * /etc/rabbitmq/rabbitmq.conf

Log file(s)

 * /var/log/rabbitmq/rab...@node.log
 * /var/log/rabbitmq/rabbit@node_upgrade.log

Alarms

(none)

Memory

Total memory used: 7.0288 gb
Calculation strategy: rss
Memory high watermark setting: 0.4 of available memory, computed to: 54.0104 gb

other_proc: 4.3166 gb (48.89 %)
binary: 3.6104 gb (40.89 %)
other_system: 0.8488 gb (9.61 %)
code: 0.0283 gb (0.32 %)
other_ets: 0.0057 gb (0.06 %)
plugins: 0.005 gb (0.06 %)
mnesia: 0.0044 gb (0.05 %)
mgmt_db: 0.0033 gb (0.04 %)
queue_procs: 0.002 gb (0.02 %)
atom: 0.0015 gb (0.02 %)
metrics: 0.0008 gb (0.01 %)
connection_channels: 0.0005 gb (0.01 %)
connection_writers: 0.0005 gb (0.01 %)
connection_other: 0.0004 gb (0.0 %)
connection_readers: 0.0002 gb (0.0 %)
quorum_ets: 0.0 gb (0.0 %)
msg_index: 0.0 gb (0.0 %)
allocated_unused: 0.0 gb (0.0 %)
queue_slave_procs: 0.0 gb (0.0 %)
quorum_queue_procs: 0.0 gb (0.0 %)
reserved_unallocated: 0.0 gb (0.0 %)

File Descriptors

Total: 18, limit: 32671
Sockets: 8, limit: 29401

Free Disk Space

Low free disk space watermark: 0.05 gb
Free disk space: 100.0893 gb

Totals

Connection count: 64
Queue count: 8
Virtual host count: 1

Listeners

Interface: [::], port: 15672, protocol: http, purpose: HTTP API
Interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0

real    0m13.190s
user    0m0.531s
sys     0m0.227s

Michal Kuratczyk

unread,
May 8, 2021, 10:55:40 AM5/8/21
to rabbitm...@googlegroups.com
Hi,

We've recently seen memory issues when running on a very old Linux kernel. I'd recommend starting by upgrading your OS (or trying to reproduce the issue on a modern system). You can read all details here, although the exact symptoms differed: https://github.com/rabbitmq/rabbitmq-server/discussions/2785.

Not sure this will solve your original problem but at least the new one will likely go away.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/95dbf1ac-d5c0-4d4f-a5a3-203455fdb01an%40googlegroups.com.


--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages