Memory consumption too high - queues

227 views
Skip to first unread message

HGN

unread,
Jun 15, 2023, 4:31:51 AM6/15/23
to rabbitmq-users
Hello,
I have problem with huge memory consumption of RMQ (3.11.18-1) on Debian (v11) using esl-erlang (1:25.0.4-1). I kinda suspect old queue indexes because most memory is consumed by queue_procs. See:

# rabbitmq-diagnostics memory_breakdown
Reporting memory breakdown on node rabbit@tradepit-connector...
queue_procs: 2.2467 gb (89.8%)
reserved_unallocated: 0.1064 gb (4.25%)
code: 0.0359 gb (1.44%)
binary: 0.0293 gb (1.17%)
other_proc: 0.0232 gb (0.93%)
allocated_unused: 0.0146 gb (0.58%)
other_system: 0.0146 gb (0.58%)
connection_channels: 0.0099 gb (0.4%)
plugins: 0.0057 gb (0.23%)
connection_writers: 0.0055 gb (0.22%)
other_ets: 0.0034 gb (0.14%)
msg_index: 0.0018 gb (0.07%)
connection_other: 0.0015 gb (0.06%)
atom: 0.0015 gb (0.06%)
mgmt_db: 0.0007 gb (0.03%)
metrics: 0.0007 gb (0.03%)
mnesia: 0.0003 gb (0.01%)
connection_readers: 0.0001 gb (0.0%)
quorum_ets: 0.0 gb (0.0%)
quorum_queue_procs: 0.0 gb (0.0%)
quorum_queue_dlx_procs: 0.0 gb (0.0%)
stream_queue_procs: 0.0 gb (0.0%)
stream_queue_replica_reader_procs: 0.0 gb (0.0%)
queue_slave_procs: 0.0 gb (0.0%)
stream_queue_coordinator_procs: 0.0 gb (0.0%)


Weird is that there's not many queues and not even messages. Is there a possibility to force RMQ to "clean" old (residual) data in Mnesia DB? I've attached 2 screenshot with overview of RMQ and overview of _all_queues in RMQ. You can see there's no big traffic or abnormaly huge numbers of connections, queues, messages.
Since the (virtual) machine has only 4GB of memory it is quite a problem.

How can I lower the consumption? Thank you

Best Regards
Marek

memprob-01_2023-06-15_10-27-41.png
memprob-02_2023-06-15_10-28-27.png

Michal Kuratczyk

unread,
Jun 15, 2023, 4:57:19 AM6/15/23
to rabbitm...@googlegroups.com
Hi,

Please take screenshots of:
1. rabbitmq-diagnostics observer -> then "m" and ENTER (it will sort processes by memory)
2. then, still in observer, "P" to show details of classic queues

The publish rate was much higher early on, can you show the memory usage for the whole period as well?
What's the message size?
Are messages expiring right now?

rabbitmqctl force_gc could release some memory, but not necessarily and if it does, it will be harder to continue investigating.

General suggestions:
* make the queues lazy
* switch to classic queues version 2 (https://rabbitmq.com/persistence-conf.html#queue-version, you can migrate them with a queue-version=2 policy)
* upgrade to 3.12 (all classic queues are lazy-like in 3.12), ideally move to version 2 on 3.12

3.12 with classic queues v2 is the "state of the art" in terms of what we can offer. If you upgrade to it and still have the problem (it'd need to happen again),
then it's definitely worth investigating.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/91d4ccb5-41fd-4aec-973d-be2215d27e5cn%40googlegroups.com.


--
Michał
RabbitMQ team

Marek Cermak

unread,
Jun 15, 2023, 5:38:43 AM6/15/23
to rabbitm...@googlegroups.com
Hello Michal,
attached are both screenshots you asked for.

Initial publish rate is much higher due to restart of RMQ - I tried to restart to "clean" MnesiaDB, but the RMQ "stabilized" in same values. So it did not help. Message size varies around 1 kB. Messages are expiring according to policy and I must say that connections don't seem to be interrupted or limited.

I tried rabbitmqctl force_gc before but it did not help much. Problem is probably in Mnesia tables written on disk which are read to memory and taken as "valid" however I'd guess those are not valid/used anymore.

Actually I tried to update RMQ to v3.12 and it refused to start and I was not able to find a reason. Downgrade to 3.11 solved the problem of a start but brought the problem with memory.

----- systemd try to start v3.12:

Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.021562+00:00 [error] <0.231.0> Feature flags: `classic_queue_type_delivery_support`: required feature flag not enabled! It must be enabled before upgrading RabbitMQ.
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.034233+00:00 [error] <0.231.0> Failed to initialize feature flags registry: {disabled_required_feature_flag,
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.034233+00:00 [error] <0.231.0>                                               classic_queue_type_delivery_support}
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.051709+00:00 [error] <0.231.0>
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.051709+00:00 [error] <0.231.0> BOOT FAILED
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.051709+00:00 [error] <0.231.0> ===========
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.051709+00:00 [error] <0.231.0> Error during startup: {error,failed_to_initialize_feature_flags_registry}
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:52.051709+00:00 [error] <0.231.0>
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: BOOT FAILED
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: ===========
Jun 14 13:25:52 tradepit-connector rabbitmq-server[3664]: Error during startup: {error,failed_to_initialize_feature_flags_registry}
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>   crasher:
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     initial call: application_master:init/4
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     pid: <0.230.0>
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     registered_name: []
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     exception exit: {failed_to_initialize_feature_flags_registry,
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>                         {rabbit,start,[normal,[]]}}
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>       in function  application_master:init/4 (application_master.erl, line 142)
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     ancestors: [<0.229.0>]
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     message_queue_len: 1
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     messages: [{'EXIT',<0.231.0>,normal}]
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     links: [<0.229.0>,<0.44.0>]
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     dictionary: []
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     trap_exit: true
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     status: running
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     heap_size: 233
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     stack_size: 28
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>     reductions: 168
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>   neighbours:
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.053580+00:00 [error] <0.230.0>
Jun 14 13:25:53 tradepit-connector rabbitmq-server[3664]: 2023-06-14 13:25:53.061762+00:00 [notice] <0.44.0> Application rabbit exited with reason: {failed_to_initialize_feature_flags_registry,{rabbit,start,[normal,[]]}}
Jun 14 13:25:54 tradepit-connector rabbitmq-server[3664]: {"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{failed_to_initialize_feature_flags_registry,{rabbit,start,[normal,[]]}}}"}
Jun 14 13:25:54 tradepit-connector rabbitmq-server[3664]: Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{failed_to_initialize_feature_flags_registry,{rabbit,start,[normal,[]]}}})
Jun 14 13:25:54 tradepit-connector rabbitmq-server[3664]:
Jun 14 13:25:54 tradepit-connector rabbitmq-server[3664]: Crash dump is being written to: erl_crash.dump...done
Jun 14 13:25:54 tradepit-connector systemd[1]: rabbitmq-server.service: Main process exited, code=exited, status=1/FAILURE
Jun 14 13:25:54 tradepit-connector systemd[1]: rabbitmq-server.service: Failed with result 'exit-code'.
Jun 14 13:25:54 tradepit-connector systemd[1]: Failed to start RabbitMQ broker.
Jun 14 13:25:54 tradepit-connector systemd[1]: rabbitmq-server.service: Consumed 3.197s CPU time.
Jun 14 13:26:01 tradepit-connector systemd[1]: Stopped RabbitMQ broker.

As I wrote it worked in the configuration before upgrade and downgrade. I would not like to need to switch to lazy queues, 4G of memory must be enough for (everybody :) such a small load. I assume there must be some "uncleaned trash" from migration or something.

Thank you for help
Marek








čt 15. 6. 2023 v 10:57 odesílatel Michal Kuratczyk <mkura...@gmail.com> napsal:
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/8Ag1jnnLhWw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/CAA81d0umsThZF%2BR6yJWQVR3Fm8MWr0Z1qDiG%2BY1bsbySj%2BAWTw%40mail.gmail.com.
observer1_2023-06-15_11-22-40.png
observer2_2023-06-15_11-24-37.png

Michal Kuratczyk

unread,
Jun 15, 2023, 5:57:53 AM6/15/23
to rabbitm...@googlegroups.com
Hi,

Mnesia should have nothing to do with the problem - your screenshot clearly shows a few queue processes using a lot of memory.

I'm not clear what you mean by "it worked in the configuration before upgrade and downgrade": do you mean this problem only started
after you tried to upgrade to 3.12 and then downgraded to 3.11? If so, that's interesting, but not exactly something what we officially support. :)

The reason the upgrade failed is because you didn't follow the necessary steps, see: https://groups.google.com/g/rabbitmq-users/c/sbQYOYfINRE
You can enable the feature flags, upgrade again and see what happens.

No reason to be afraid of lazy queues - they are better in almost every regard and basically just mean that messages are not kept in memory.
3.12 makes this the default and only behaviour anyway (3.12 logic is not exactly the same as "lazy" in previous versions, but similar).

Best,



--
Michał
RabbitMQ team

HGN

unread,
Jun 15, 2023, 6:33:59 AM6/15/23
to rabbitmq-users
Hi Michal,
why enabling feature flags cannot be part of an upgrade process of the debian package?

However I did what you recommended: enabled all feature flags and upgraded to v3.12 and v3.12 is running. And the memory problem stays the same.

Reporting memory breakdown on node rabbit@tradepit-connector...
queue_procs: 1.8693 gb (94.17%)
code: 0.0359 gb (1.81%)
other_proc: 0.0226 gb (1.14%)
other_system: 0.0146 gb (0.73%)
binary: 0.0143 gb (0.72%)
allocated_unused: 0.0124 gb (0.63%)
plugins: 0.0057 gb (0.29%)
other_ets: 0.0033 gb (0.17%)
atom: 0.0015 gb (0.07%)
msg_index: 0.0014 gb (0.07%)
connection_other: 0.0014 gb (0.07%)
mgmt_db: 0.0008 gb (0.04%)
metrics: 0.0007 gb (0.03%)
connection_writers: 0.0004 gb (0.02%)
connection_channels: 0.0004 gb (0.02%)
mnesia: 0.0002 gb (0.01%)

connection_readers: 0.0001 gb (0.0%)
quorum_ets: 0.0 gb (0.0%)
quorum_queue_procs: 0.0 gb (0.0%)
quorum_queue_dlx_procs: 0.0 gb (0.0%)
stream_queue_procs: 0.0 gb (0.0%)
stream_queue_replica_reader_procs: 0.0 gb (0.0%)
queue_slave_procs: 0.0 gb (0.0%)
stream_queue_coordinator_procs: 0.0 gb (0.0%)
reserved_unallocated: 0.0 gb (0.0%)

... and attached files.

Maybe there's a way how to forcely rebuild/reindex message storage or so? Thank you.

Best regards
Marek



Dne čtvrtek 15. června 2023 v 11:57:53 UTC+2 uživatel Michal Kuratczyk napsal:
observer3_2023-06-15_12-31-37.png
observer4_2023-06-15_12-32-15.png

Michal Kuratczyk

unread,
Jun 15, 2023, 7:13:12 AM6/15/23
to rabbitm...@googlegroups.com
Migrating to v2 will lead to rebuilding on-disk state

--
Michał

Michal Kuratczyk

unread,
Jun 15, 2023, 9:24:43 AM6/15/23
to rabbitm...@googlegroups.com
Regarding your question about enabling feature flags automatically - on the package level, it can't happen, because
some people use these packages to manage clusters and the whole point is to only enable the feature flags after a successful upgrade of the whole cluster. There are things that we are looking into though:

1. Single node installations should enable all FFs automatically (since there are no other nodes to coordinate with),
but it seems like this is not happening currently.
2. We are looking into automatically enabling some feature flags on the RabbitMQ (not package) level: https://github.com/rabbitmq/rabbitmq-server/issues/5212

Feature flags are tricky (at least some of them) if we just automatically enabled all of them immediately after an upgrade, you would not be able to downgrade as you just did. Based on what specific functionality a feature flag controls, the state of the node could no longer be compatible with older code.

Best,
--
Michał
RabbitMQ team

Marek Cermak

unread,
Jun 16, 2023, 1:10:58 AM6/16/23
to rabbitm...@googlegroups.com
Hello Michal,
thank you for explanation of issues with automatic upgrades of feature flags during installation, sounds reasonable.

I did upgrade most queues to v2 and it really helped. It took around 12 hours but now it is finished and "consumed" memory is around 800 MB. Thank you very much for your help.

Best Regards
Marek


čt 15. 6. 2023 v 15:24 odesílatel Michal Kuratczyk <mkura...@gmail.com> napsal:

Michal Kuratczyk

unread,
Jun 16, 2023, 2:06:56 AM6/16/23
to rabbitm...@googlegroups.com
12 hours?! Can you please tell us as much as you can about this environment and share the logs?
What's the message size?

We don't understand why the memory usage was high before
and 12 hours to move from v1 to v2 is orders of magnitude more than expected.

For example, I can convert a queue with 1 million 5kb messages in a few seconds:

$ rabbitmqctl set_policy version ".*" '{"queue-version":1}' --apply-to classic_queues
Setting policy "version" for pattern ".*" to "{"queue-version":1}" with priority "0" for vhost "/" ...

$ perf-test -ad false -f persistent -u v1_v2_test -C 1000000 -s 5000 -c 1000 -y 0
...

$ rabbitmqctl list_queues
Listing queues for vhost / ...
name    messages
v1_v2_test      1000000

$ rabbitmq-diagnostics log_tail -N 2000 | rg -e 'Converting running queue' -e 'converted 1000000 total messages'
2023-06-16 08:01:33.478217+02:00 [info] <0.6481.0> Converting running queue v1_v2_test in vhost / from v1 to v2
2023-06-16 08:01:37.290262+02:00 [info] <0.6481.0> Queue v1_v2_test in vhost / converted 1000000 total messages from v1 to v2

We've run lots of tests like that and it never took more than a few seconds. Could you run a test like this in your environment?

Best,

RabbitMQ team

Marek Cermak

unread,
Jun 16, 2023, 3:09:21 AM6/16/23
to rabbitm...@googlegroups.com
Hello Michal,
since I've started migration of other queues and it is taking it's time (again) I gathered various logs from our server while the migration is running.

As I've already mentioned, the whole queue structure was probably somehow compromised/unstable from upgrade and downgrade. Hence those memory problems and length of migration. I suppose that on a clean running RMQ the process would be finished in a couple of seconds as you wrote.

Logs are attached, you can investigate.

Let me know if you need more, I still have some queues to migrate.
Marek



pá 16. 6. 2023 v 8:06 odesílatel Michal Kuratczyk <mkura...@gmail.com> napsal:
rmq_logs.tar.xz

Michal Kuratczyk

unread,
Jun 16, 2023, 3:36:12 AM6/16/23
to rabbitm...@googlegroups.com
Any chance you could share the files for one of the non-migrated queues with us?
(one of those the previously used a lot of memory)

Thanks,



--
Michał
RabbitMQ team
Reply all
Reply to author
Forward
0 new messages