ANN Default node RAM consumption calculation strategy will change in 3.6.11

951 views
Skip to first unread message

Michael Klishin

unread,
Jun 19, 2017, 5:29:16 PM6/19/17
to rabbitm...@googlegroups.com
Hi RabbitMQ users,

We've been investigating a number of issues where nodes that were not approaching
memory usage alarms were killed by the OOM killer and similar kernel watchdog features.

Long story short, the mechanism RabbitMQ currently uses to calculate how much
RAM its Erlang VM uses is inaccurate, sometimes significantly so
(up to double digit %, in some cases as high as 60%!).

Starting with 3.6.11, nodes will use kernel reported RSS size instead with an option
to go back to the old strategy for those who have reasons to do so:

Note that the new strategy isn't entirely perfect either but it's much less likely to result
in underreported values. Some edge cases with the new strategy are

What this means to you when you will be upgrading? If your nodes often hit memory
alarms, we recommend that you either bump the alarm threshold or provision more RAM.
Effective RAM usage as observed by the kernel has NOT changed but configuration
management and monitoring tools may still need adjustments.

We understand the pain that comes with this change but also feel that reporting an incorrect
value in this case was a very serious bug and we should ship a fix for it ASAP.

Setting `rabbitmq.vm_memory_calculation_strategy` to `erlang` in your config file
lets you go back to the behavior found in earlier versions if that's desired.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Sergey Yarkin

unread,
Jul 25, 2017, 8:41:30 AM7/25/17
to rabbitmq-users
Hi,

Have you tried to use allocators information to calculate memory usage? It gives a pretty same results as RSS, but it's cheaper than call external program.


Something like this:
instance_size(Items, Init) ->
    lists
:foldl(fun
           
({_, [{blocks_size,BS,_,_}, {carriers_size,CS,_,_}]}, {Used, Total}) ->
               
{Used+BS, Total+CS};
           
({_,[{blocks_size,BS},{carriers_size,CS}]}, {Used, Total}) ->
               
{Used+BS, Total+CS};
           
(_, Acc) -> Acc
       
end, Init, Items).

allocator_size
(Instances, Init) ->
    lists
:foldl(
        fun
({instance, _No, Items}, Acc) ->
            instance_size
(Items, Acc)
       
end,
       
Init,
       
Instances).

total_size
() ->
   
AllocNames = erlang:system_info(alloc_util_allocators),
   
AllAllocators = erlang:system_info({allocator_sizes, AllocNames}),
    lists
:foldl(
        fun
({_, Instances}, Acc) -> allocator_size(Instances, Acc) end,
       
{0, 0},
       
AllAllocators).







вторник, 20 июня 2017 г., 0:29:16 UTC+3 пользователь Michael Klishin написал:

Michael Klishin

unread,
Jul 25, 2017, 8:44:54 AM7/25/17
to rabbitm...@googlegroups.com
Hi Sergey,

Thanks for sharing, it's useful to know.

We believe that ultimately this information should be retrieved from the kernel. Doing anything else
runs the same risk of nodes being killed by the OOM killer before we report a high enough value.

It's not a piece of code that's executed on the hot path, so efficiency is of little concern here.
However, it can be a useful fallback mechanism. We will discuss and see if it works on all the
Erlang releases we support.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,
Sep 25, 2017, 1:39:08 PM9/25/17
to rabbitmq-users
Hi Michael,

Have you discussed about taking a memory consumption from the allocators information already?

Thanks!

P.S. a better looking implementation, it takes a few milliseconds:
GetVmMemory = fun() ->
   
AllocNames = erlang:system_info(alloc_util_allocators),
   
AllocMems = erlang:system_info({allocator_sizes, AllocNames}),
   
AllItems = [ Item ||
       
{_, Instances} <- AllocMems,
       
{instance, _, Items} <- Instances,
       
Item <- Items
   
],

    lists
:foldl(fun
           
({_, [{blocks_size,BS,_,_}, {carriers_size,CS,_,_}]}, {Used, Total}) ->
               
{Used+BS, Total+CS};
           
({_, [{blocks_size,BS}, {carriers_size,CS}]}, {Used, Total}) ->
               
{Used+BS, Total+CS};
           
(_, Acc) -> Acc

       
end, {0, 0}, AllItems)
   
% {UsedBytes, AllocatedBytes}
end.


To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Sep 25, 2017, 9:54:11 PM9/25/17
to rabbitm...@googlegroups.com
that's an interesting idea to investigate, thanks 
--
Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Sep 27, 2017, 2:23:44 AM9/27/17
to rabbitmq-users
A couple of updates on this thread.

First, 3.6.13 will significantly reduce the frequency of calls to the external tools
that report process memory consumption (RSS). It currently can be called on the hot path, which
is not a good idea on any platform and with any tool.

Second, as of 3.6.13 we will disable
this strategy on Windows because the best known [to us] way of retriving
the amount of RAM a process uses with a sub-process is via wmic.exe, which consumes
a high enough amount of CPU even when invoked once a second (a reasonable rate for
important node metrics).

Lastly, we plan on investigating the allocator information strategy Sergey suggested above.
If it's accurate enough, it will be the new default on Windows (likely after 3.6.13) and will be available
elsewhere.

On Tuesday, September 26, 2017 at 4:54:11 AM UTC+3, Michael Klishin wrote:
that's an interesting idea to investigate, thanks 

Sergey Yarkin

unread,
Oct 6, 2017, 6:56:45 AM10/6/17
to rabbitmq-users
We have started testing 3.7.0rc1 and Erlang/OTP 20.1 in our use case and got a problem with RSS mechanism.
 0) A test server has 12GB RAM, vm_memory_high_watermark=0.4 --> RabbitMQ shows memory limit as 4.6GB
 1) We made many objects (connections, consumers, queues) and pushed some load, Erlang ate more than 5GB and RabbitMQ sat alarm.

  2) We closed all connections, but the alarm didn't clear, the reason is that Erlang didn't return free memory into OS

  3) But actually there was used only less than 750MB of memory, and other memory was cached for future use


  4) Then we started the test again and RabbitMQ was continue blocking publishers, although there was enough memory.

 P.S. The GetVmMemory function shows a correct result for used and allocated memory, but I think there need some smart algorithm to clear/set alarm based on both these numbers.

Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3
Auto Generated Inline Image 4
Auto Generated Inline Image 5

Michael Klishin

unread,
Oct 9, 2017, 6:53:44 PM10/9/17
to rabbitm...@googlegroups.com
You are looking at the same `erlang:memory/1` stat we were using prior to 3.6.11. That information is wrong.
What Erlang VM says it is *using* is not the same as what it has *allocated*.

Nothing around alarms has changed, only the way we calculate the amount of memory actually used. We were expecting that
for some cases it will be a matter of having alarms vs. not having them. If you prefer to have inaccurate reporting but not have the alarms,
switch to the old strategy. Most users would be better off adjusting the config and using the more precise strategy.

As a side note, changes are as of 3.6.13 RabbitMQ will use the allocators strategy suggested by Sergey.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,
Oct 9, 2017, 7:18:47 PM10/9/17
to rabbitmq-users
Yes, the number of used bytes looks the same as `erlang:memory/1`, but the allocated bytes is close to RSS.
The problem is that the used bytes is less than 15% of the allocated and we can use memory, but RabbitMQ is blocking any publishes and Erlang doesn't free allocated bytes, everything in stuck.
I think it can help if default memory watermark is increased with the rss strategy.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Oct 9, 2017, 7:22:31 PM10/9/17
to rabbitm...@googlegroups.com
RabbitMQ does not allocate or free memory. The runtime does. You have different allocator settings to try.

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Sergey Yarkin

unread,
Oct 9, 2017, 7:29:25 PM10/9/17
to rabbitmq-users
Yes, I understand that RabbitMQ does not allocate memory, but it is behavior out of the box and it can bring some pain because it is a default.

Dmitry Andrianov

unread,
Oct 10, 2017, 3:20:39 PM10/10/17
to rabbitmq-users
If I read it correctly, it is the real practical test for my "theoretical" questions here: https://groups.google.com/forum/#!searchin/rabbitmq-users/from$3A$20dmitry%7Csort:relevance/rabbitmq-users/wbq5X87QTbs/buOnup2GCAAJ

Now my question is - are there any guarantees that pre-allocated memory will be eventually released back to OS eventually? How much time does it take?
Just to explain why I am asking that. If memory_used goes above memory_limit, RabbitMQ stops accepting new messages until memory_used drops.
But what causes it to drop?

First scenario is when majority of memory is taken by queues - lots of messages get queued taking the memory, RabbitMQ memory_used grows above threshold and RabbitMQ blocks producers. Now as consumers can go through queues and release queue memory back to Erlang. But is it enough? It won't be reflected in RSS until Erlang returns that memory to the OS. When is it going to happen?
 
can be wrong of course...
Reply all
Reply to author
Forward
0 new messages