ANN Default node RAM consumption calculation strategy will change in 3.6.11

Michael Klishin

unread,

Jun 19, 2017, 5:29:16 PM6/19/17

to rabbitm...@googlegroups.com

Hi RabbitMQ users,

We've been investigating a number of issues where nodes that were not approaching

memory usage alarms were killed by the OOM killer and similar kernel watchdog features.

Long story short, the mechanism RabbitMQ currently uses to calculate how much

RAM its Erlang VM uses is inaccurate, sometimes significantly so

(up to double digit %, in some cases as high as 60%!).

Starting with 3.6.11, nodes will use kernel reported RSS size instead with an option

to go back to the old strategy for those who have reasons to do so:

https://github.com/rabbitmq/rabbitmq-server/issues/1223.

Note that the new strategy isn't entirely perfect either but it's much less likely to result

in underreported values. Some edge cases with the new strategy are

discussed in https://github.com/rabbitmq/rabbitmq-server/pull/1259.

What this means to you when you will be upgrading? If your nodes often hit memory

alarms, we recommend that you either bump the alarm threshold or provision more RAM.

Effective RAM usage as observed by the kernel has NOT changed but configuration

management and monitoring tools may still need adjustments.

We understand the pain that comes with this change but also feel that reporting an incorrect

value in this case was a very serious bug and we should ship a fix for it ASAP.

Setting `rabbitmq.vm_memory_calculation_strategy` to `erlang` in your config file

lets you go back to the behavior found in earlier versions if that's desired.

--

MK

Staff Software Engineer, Pivotal/RabbitMQ

Sergey Yarkin

unread,

Jul 25, 2017, 8:41:30 AM7/25/17

to rabbitmq-users

Hi,

Have you tried to use allocators information to calculate memory usage? It gives a pretty same results as RSS, but it's cheaper than call external program.

Something like this:

instance_size(Items, Init) ->
    lists:foldl(fun
            ({_, [{blocks_size,BS,_,_}, {carriers_size,CS,_,_}]}, {Used, Total}) ->
                {Used+BS, Total+CS};
            ({_,[{blocks_size,BS},{carriers_size,CS}]}, {Used, Total}) ->
                {Used+BS, Total+CS};
            (_, Acc) -> Acc
        end, Init, Items).

allocator_size(Instances, Init) ->
    lists:foldl(
        fun({instance, _No, Items}, Acc) ->
            instance_size(Items, Acc)
        end,
        Init,
        Instances).

total_size() ->
    AllocNames = erlang:system_info(alloc_util_allocators),
    AllAllocators = erlang:system_info({allocator_sizes, AllocNames}),
    lists:foldl(
        fun({_, Instances}, Acc) -> allocator_size(Instances, Acc) end,
        {0, 0},
        AllAllocators).

вторник, 20 июня 2017 г., 0:29:16 UTC+3 пользователь Michael Klishin написал:

Michael Klishin

unread,

Jul 25, 2017, 8:44:54 AM7/25/17

to rabbitm...@googlegroups.com

Hi Sergey,

Thanks for sharing, it's useful to know.

We believe that ultimately this information should be retrieved from the kernel. Doing anything else

runs the same risk of nodes being killed by the OOM killer before we report a high enough value.

It's not a piece of code that's executed on the hot path, so efficiency is of little concern here.

However, it can be a useful fallback mechanism. We will discuss and see if it works on all the

Erlang releases we support.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,

Sep 25, 2017, 1:39:08 PM9/25/17

to rabbitmq-users

Hi Michael,

Have you discussed about taking a memory consumption from the allocators information already?

Thanks!

P.S. a better looking implementation, it takes a few milliseconds:

GetVmMemory = fun() ->
    AllocNames = erlang:system_info(alloc_util_allocators),
    AllocMems = erlang:system_info({allocator_sizes, AllocNames}),
    AllItems = [ Item ||
        {_, Instances} <- AllocMems,
        {instance, _, Items} <- Instances,
        Item <- Items
    ],


    lists:foldl(fun
            ({_, [{blocks_size,BS,_,_}, {carriers_size,CS,_,_}]}, {Used, Total}) ->
                {Used+BS, Total+CS};
            ({_, [{blocks_size,BS}, {carriers_size,CS}]}, {Used, Total}) ->
                {Used+BS, Total+CS};
            (_, Acc) -> Acc


        end, {0, 0}, AllItems)
    % {UsedBytes, AllocatedBytes}
end.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Sep 25, 2017, 9:54:11 PM9/25/17

to rabbitm...@googlegroups.com

that's an interesting idea to investigate, thanks

--

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,

Sep 27, 2017, 2:23:44 AM9/27/17

to rabbitmq-users

A couple of updates on this thread.

First, 3.6.13 will significantly reduce the frequency of calls to the external tools

that report process memory consumption (RSS). It currently can be called on the hot path, which

is not a good idea on any platform and with any tool.

Second, as of 3.6.13 we will disable

this strategy on Windows because the best known [to us] way of retriving

the amount of RAM a process uses with a sub-process is via wmic.exe, which consumes

a high enough amount of CPU even when invoked once a second (a reasonable rate for

important node metrics).

Lastly, we plan on investigating the allocator information strategy Sergey suggested above.

If it's accurate enough, it will be the new default on Windows (likely after 3.6.13) and will be available

elsewhere.

On Tuesday, September 26, 2017 at 4:54:11 AM UTC+3, Michael Klishin wrote:

that's an interesting idea to investigate, thanks

Sergey Yarkin

unread,

Oct 6, 2017, 6:56:45 AM10/6/17

to rabbitmq-users

We have started testing 3.7.0rc1 and Erlang/OTP 20.1 in our use case and got a problem with RSS mechanism.

0) A test server has 12GB RAM, vm_memory_high_watermark=0.4 --> RabbitMQ shows memory limit as 4.6GB

1) We made many objects (connections, consumers, queues) and pushed some load, Erlang ate more than 5GB and RabbitMQ sat alarm.

2) We closed all connections, but the alarm didn't clear, the reason is that Erlang didn't return free memory into OS

3) But actually there was used only less than 750MB of memory, and other memory was cached for future use

4) Then we started the test again and RabbitMQ was continue blocking publishers, although there was enough memory.

P.S. The GetVmMemory function shows a correct result for used and allocated memory, but I think there need some smart algorithm to clear/set alarm based on both these numbers.

Auto Generated Inline Image 1

Auto Generated Inline Image 2

Auto Generated Inline Image 3

Auto Generated Inline Image 4

Auto Generated Inline Image 5

Michael Klishin

unread,

Oct 9, 2017, 6:53:44 PM10/9/17

to rabbitm...@googlegroups.com

You are looking at the same `erlang:memory/1` stat we were using prior to 3.6.11. That information is wrong.

What Erlang VM says it is *using* is not the same as what it has *allocated*.

Nothing around alarms has changed, only the way we calculate the amount of memory actually used. We were expecting that

for some cases it will be a matter of having alarms vs. not having them. If you prefer to have inaccurate reporting but not have the alarms,

switch to the old strategy. Most users would be better off adjusting the config and using the more precise strategy.

As a side note, changes are as of 3.6.13 RabbitMQ will use the allocators strategy suggested by Sergey.

--

You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,

Oct 9, 2017, 7:18:47 PM10/9/17

to rabbitmq-users

Yes, the number of used bytes looks the same as `erlang:memory/1`, but the allocated bytes is close to RSS.
The problem is that the used bytes is less than 15% of the allocated and we can use memory, but RabbitMQ is blocking any publishes and Erlang doesn't free allocated bytes, everything in stuck.
I think it can help if default memory watermark is increased with the rss strategy.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,

Oct 9, 2017, 7:22:31 PM10/9/17

to rabbitm...@googlegroups.com

RabbitMQ does not allocate or free memory. The runtime does. You have different allocator settings to try.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,

Oct 9, 2017, 7:29:25 PM10/9/17

to rabbitmq-users

Yes, I understand that RabbitMQ does not allocate memory, but it is behavior out of the box and it can bring some pain because it is a default.

Dmitry Andrianov

unread,

Oct 10, 2017, 3:20:39 PM10/10/17

to rabbitmq-users

If I read it correctly, it is the real practical test for my "theoretical" questions here: https://groups.google.com/forum/#!searchin/rabbitmq-users/from$3A$20dmitry%7Csort:relevance/rabbitmq-users/wbq5X87QTbs/buOnup2GCAAJ

Now my question is - are there any guarantees that pre-allocated memory will be eventually released back to OS eventually? How much time does it take?
Just to explain why I am asking that. If memory_used goes above memory_limit, RabbitMQ stops accepting new messages until memory_used drops.
But what causes it to drop?

First scenario is when majority of memory is taken by queues - lots of messages get queued taking the memory, RabbitMQ memory_used grows above threshold and RabbitMQ blocks producers. Now as consumers can go through queues and release queue memory back to Erlang. But is it enough? It won't be reflected in RSS until Erlang returns that memory to the OS. When is it going to happen?

can be wrong of course...

Reply all

Reply to author

Forward