[erlang-questions] very long init:stop() and +P emulator flag

Vyacheslav Levytskyy

unread,

Sep 25, 2017, 5:11:50 PM9/25/17

to erlang-q...@erlang.org

Hi,

I've experienced very long duration of Erlang node shutdown after
init:stop() : about 8 minutes. At last I've found a solution -- when I
remove emulator flag for the maximum number of simultaneously existing
processes (+P 134217727) the shutdown of the node return to normal
couple of seconds.

I'm using Erlang/OTP 20.0 on Ubuntu LTS 16.04, node runs in embedded
mode. The problem with long shutdown happens after all applications have
already gone, and just 30-40 processes and about 10 ports of kernel
remains running. Actually it happens even after a clean boot with just
kernel and stdlib applications started. Option -shutdown_time does not
help. The only solution that works for me at the moment is to remove +P
134217727 -- maybe decreasing the number of simultaneously existing
processes helps as well (it should), but I did not check that and did
not search after which value of +P the problem would disappear.

I'm quite puzzled with relations between +P and really long init:stop(),
and would like to ask if somebody can explain that?

Vyacheslav

.
_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions

Tristan Sloughter

unread,

Sep 25, 2017, 5:26:42 PM9/25/17

to erlang-q...@erlang.org

I checked and running erl with that +P 134217727 results in a beam
process using 1.517g of memory. So cleaning that up is probably what is
taking so much time on shutdown :)

134217727 certainly seems excessive... What size system would even be
able to handle dealing with that many processes?

--
Tristan Sloughter
"I am not a crackpot" - Abe Simpson
t...@crashfast.com

Vyacheslav Levytskyy

unread,

Sep 26, 2017, 12:13:19 AM9/26/17

to t...@crashfast.com, erlang-q...@erlang.org

That's true, I don't need 134217727 processes to be running, but it is not substantial. It is just counter intuitive for me to have a beam process with only kernel+stdlib that must take 5-10 minutes to shutdown.

If you are right and this indeed happens because of 1,5Gb allocated (btw does this mean that each of 134217727 potential processes is pre-created?), I would say that setting an upper bound of something I don't expect immediate actions. Out of curiosity, I wonder what should be cleaned up in the system immediately after the clean boot, and did this behaviour exist earlier or was introduced recently (maybe as an optimization) with Erlang/OTP 19 or 20?

Vyacheslav

On 25.09.2017 23:46, Vyacheslav Levytskyy wrote:

I checked and running erl with that +P 134217727 results in a beam
process using 1.517g of memory. So cleaning that up is probably what is
taking so much time on shutdown :)

134217727 certainly seems excessive... What size system would even be
able to handle dealing with that many processes?

-- 
  Tristan Sloughter
  "I am not a crackpot" - Abe Simpson

Vyacheslav Levytskyy

unread,

Jan 31, 2020, 1:22:03 PM1/31/20

to erlang-q...@erlang.org

Hello,

I wonder why memory used by Erlang VM as reported by the kernel via the
/proc/pid differs from erlang:memory(total). In my current configuration
I observe realistic response from erlang:memory(total) and much lower
values from the /proc/pid.

I'm not surprised by the difference itself, but rather by the fact that
the /proc/pid gives unrealistically lower values -- I'm not 100% sure,
but it looks like RabbitMQ is using the /proc/pid approach by default,
proposing also recon_alloc:memory(allocated) and erlang:memory(total) as
available options of Erlang VM memory consumption calculation strategy.

Does anybody have insights of what and why is going on with those
calculations of memory used by Erlang VM? Is it possible to select one
strategy beforehand for my Erlang app, or I must measure on each new
configuration what looks more precise? Should I compare and change the
strategy during run-time, or after I selected a strategy once for my
configuration I can be sure that selected approach always better than
other two?

Thank you,
Vyacheslav

Jesper Louis Andersen

unread,

Jan 31, 2020, 2:00:44 PM1/31/20

to Vyacheslav Levytskyy, Erlang (E-mail)

When you say /proc/pid, what are you looking at specifically in there? It is a bit different depending on which Unix you run on, so a simple example will help a lot.

In particular, my early guess is going to be virtual memory vs physical RSS mapping. The former can be much higher than the latter. Especially in system such as Linux, which allow overcommitting memory.

--

J.

Vyacheslav Levytskyy

unread,

Jan 31, 2020, 2:23:45 PM1/31/20

to Jesper Louis Andersen, Erlang (E-mail)

It's linux (ubuntu). I read /proc/[pid]/statm, take RSS number of pages (the 2nd value in the line of that file) and calculate memory used by Erlang VM as RSS x page size (read as "getconf PAGESIZE").

Vyacheslav Levytskyy

unread,

Jan 31, 2020, 2:44:17 PM1/31/20

to Erlang (E-mail)

I have also zero swappiness:

cat /proc/sys/vm/swappiness
0

so I expect that RSS shows total memory used.

Gerhard Lazu

unread,

Jan 31, 2020, 3:15:48 PM1/31/20

to Jesper Louis Andersen, Erlang (E-mail)

I remember that battle with Erlang vs OS memory reporting well, it was such a show. This captures my thinking from 2017: https://github.com/rabbitmq/rabbitmq-server/issues/1223 . This has more context: https://github.com/rabbitmq/rabbitmq-server/pull/1259#issuecomment-308428057. There are a few other comments in that thread that you may find useful.

Erlang will either over-report or under-report the memory used. erlang:memory(total) will very rarely match the physical RSS mapping, as explained earlier by Jesper + https://stackoverflow.com/questions/7880784/what-is-rss-and-vsz-in-linux-memory-management. This detailed snapshot of Erlang memory allocators state shows this in the clearest way that I am aware of: https://grafana.gcp.rabbitmq.com/dashboard/snapshot/wM5JcgR9oQToU4CR54IbOq1NtAsa6jvu

As it stands, the Linux OOM takes action based on the RSS value. While from an Erlang perspective it may seem OK to ask the OS for more memory, we've had years of poor RabbitMQ experience when the Erlang VM process would get killed by the Linux OOM because it was asking for memory that the system didn't have available. This has more information: https://www.rabbitmq.com/memory-use.html

Hope this helps, Gerhard.

Jesper Louis Andersen

unread,

Jan 31, 2020, 3:27:52 PM1/31/20

to Vyacheslav Levytskyy, Erlang (E-mail)

It is expected your RSS value is lower than what Erlang reports.

Erlang requests virtual memory from the operating system (Linux). But it is mapped into the process on-demand. RSS is the resident set size, which is the currently physical mapped pages. As Erlang hits pages it hasn't used before, a kernel trap is generated and it maps in that page to to the process.

For example, lets say we allocate a large array to hold processes, but we are only using a small smidgen of that array. Then there are many virtually mapped pages, but we aren't really "using" the memory yet. It is taken on-demand. This on-demand method is smart because it lowers memory pressure. Rather than having to do all the work up front, we can amortize the work over the course of the program running, which is much more efficient. If the application requests 1 gigabyte of memory say, and we want to give it right away, we must zero one gigabyte of memory. This takes time. But on-demand, we can keep a background process for zeroing memory and allocate it as it is needed, among other things.

Where this can create a problem is if you have several processes all wanting lots of memory, and you have less memory in the machine than what they've got promised by the kernel. Then, if they all starts wanting memory, you should expect to see various trouble.

As Erlang runs more, I would expect the RSS to go up as it populates more of the memory space.

--

J.

Loïc Hoguin

unread,

Jan 31, 2020, 5:33:34 PM1/31/20

to Jesper Louis Andersen, Vyacheslav Levytskyy, Erlang (E-mail)

On 31/01/2020 21:27, Jesper Louis Andersen wrote:
> It is expected your RSS value is lower than what Erlang reports.

And it would be fantastic if that was always true. But RSS is not that
simple. See Gerhard's link
https://grafana.gcp.rabbitmq.com/dashboard/snapshot/wM5JcgR9oQToU4CR54IbOq1NtAsa6jvu
that has RSS lower than allocators report. Depending on the allocator
strategies we've seen both RSS and Virt go all over the place. I'm
afraid after spending months on this I still don't understand RSS.

Cheers,

--
Loïc Hoguin
https://ninenines.eu

Max Lapshin

unread,

Feb 2, 2020, 2:32:01 PM2/2/20

to Loïc Hoguin, Erlang (E-mail)

So what is RSS?

Is it a badly designed aggregate or some real value that is very hard to predict (just like it is hard to tell temperature when you know speed of several moleculas)?

Jesper Louis Andersen

unread,

Feb 2, 2020, 3:48:35 PM2/2/20

to Max Lapshin, Erlang (E-mail)

RSS is the latter.

Your processes allocate virtual memory. It is mapped to physical memory (and secondarily to the swap drive). RSS is the amount of memory currently resident in the physical memory of the computer. It is genuinely used.

What makes it hard to predict is that you don't get it all straight away. If I allocate 8 megabyte of contiguous memory via, e.g., mmap(), I'm granted permission to access it (so the VM memory goes up). But since I didn't touch the pages yet, RSS is not changing. As I start touching memory, it is mapped in and physical pages are allocated. Typically in either 4kb blocks or 2mb blocks. Contrast this with me memory mapping in a file. Then when I access memory, I'm accessing the underlying file through the disk cache. Also consider a fork. They use copy-on-write, so memory will be allocated as the processes start diverging in the memory space by doing different updates to different pages[0].

We also map libraries and the program code itself into the memory space. But this is marked as read-only, so it can be shared among several processes needing use of the same library.

[0] By default. You can memory map so things are shared.

On Sun, Feb 2, 2020 at 8:31 PM Max Lapshin <max.l...@gmail.com> wrote:

So what is RSS?

Is it a badly designed aggregate or some real value that is very hard to predict (just like it is hard to tell temperature when you know speed of several moleculas)?

--

J.

Vyacheslav Levytskyy

unread,

Jun 15, 2021, 4:09:16 PM6/15/21

to erlang-q...@erlang.org

Hello,

I'm surprised to see my gen_server process hanging forever when
executing spawn/4 call. Process info shows spawn_opt/5 as a current
function and status is waiting:

> process_info(P).
[{current_function,{erlang,spawn_opt,5}},
{status,waiting},
{message_queue_len,13},
{trap_exit,false},
{priority,normal},
...]

Current stacktrace looks like:

> process_info(P, current_stacktrace).
{current_stacktrace,[{erlang,spawn_opt,5,[]},
                     {erlang,spawn,4,[]},
                     ...
                     {my_gen_server,handle_cast,2,
                                    [{file,"..."},
                                     {line,...}]},
                     {gen_server,try_dispatch,4,
[{file,"gen_server.erl"},{line,695}]},
                     {gen_server,handle_msg,6,
[{file,"gen_server.erl"},{line,771}]},
                     {proc_lib,init_p_do_apply,3,
[{file,"proc_lib.erl"},{line,226}]}]}

I wonder what could be conditions that made spawn_opt/5 to stuck in this
state. Is there something I can do on my side to prevent such problems
in future?

It's Erlang/OTP 24.0.2:

Erlang/OTP 24 [erts-12.0.2] [source] [64-bit] [smp:8:8] [ds:8:8:10]
[async-threads:1] [jit]

Thank you,
Vyacheslav

Luke Bakken

unread,

Jun 15, 2021, 4:12:33 PM6/15/21

to Vyacheslav Levytskyy, erlang-q...@erlang.org

Hi Vyacheslav,

It would be very helpful to share the code that spawn_opt is calling,
or provide code to reproduce this condition.

Does the function called by spawn_opt interact in _any_ way with the
gen_server that calls spawn_opt?

Thanks,
Luke

Vyacheslav Levytskyy

unread,

Jun 16, 2021, 3:15:27 AM6/16/21

to Luke Bakken, erlang-q...@erlang.org

It's an old piece of code and the code is very simple: gen_server just
spawns a function on several connected nodes. spawn_opt/5 is called by
spawn/4, I don't call it directly.

The function doesn't interact with the gen_server that calls spawn/4,
although I'd expect spawn/4 to run a process and return immediately
anyway, am I wrong?

The code very literally looks like:

===

handle_cast(do_start, State) ->
    try
        my_start()
    catch
        What:Why:Stacktrace ->
            logger:warning("~p:~p~n~p", [What, Why, Stacktrace], #{mod
=> ?MODULE, line => ?LINE})
    end,
    {noreply, State};

my_start() ->
    [spawn(Node, ?MODULE, my_action, [erlang:system_time(1)]) || Node
<- [node() | my_cluster_nodes()]].

my_cluster_nodes() ->
    {L, _} = lists:unzip(ets:tab2list(?TBL)),
    L.

my_action(Now) ->
    % work with ETS, use an external database service

===

The code worked for a couple of years: I can't be 100% sure if this
happened before or not, but in any case I see this situation for the
first time. A recent change is upgrade to OTP24. The environment is
Kubernetes, and the code runs inside docker container.

Thanks.

Rickard Green

unread,

Jun 16, 2021, 11:03:55 AM6/16/21

to Vyacheslav Levytskyy, erlang-questions@erlang.org Questions

On Wed, Jun 16, 2021 at 9:15 AM Vyacheslav Levytskyy <v.lev...@yahoo.com> wrote:

The function doesn't interact with the gen_server that calls spawn/4,
although I'd expect spawn/4 to run a process and return immediately
anyway, am I wrong?

All spawn operations except for spawn_request() (introduced in OTP 23) are synchronous and block until the new process has been created and the caller of the BIF has received the process identifier of the newly created process or an error is detected. In case the connection between the nodes stalls the caller will be blocked until the network ticker takes down the connection (default 60 seconds).

>>
>> I'm surprised to see my gen_server process hanging forever when
>> executing spawn/4 call. Process info shows spawn_opt/5 as a current
>> function and status is waiting:
>>
>> > process_info(P).
>> [{current_function,{erlang,spawn_opt,5}},
>> {status,waiting},
>> {message_queue_len,13},
>> {trap_exit,false},
>> {priority,normal},
>> ...]
>>

Would have been interessting to know what process_info(P, messages) had returned. In the distributed case spawn_opt() is waiting for a message on the format: {spawn_reply, Ref, ok|error, Pid|Error}

What type of node is the node that you are trying to spawn the new process on? Ordinary Erlang node, C-node, ...? OTP release of that node?

Regards,

Rickard

--

Rickard Green, Erlang/OTP, Ericsson AB

Vyacheslav Levytskyy

unread,

Jun 17, 2021, 7:46:06 AM6/17/21

to erlang-questions@erlang.org Questions, Rickard Green

Thank you for details, I think it explains the most part of the situation. I checked messages indeed, they were all specific to my application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, just usual '$gen_cast' and system. Judging from messages, the caller was blocked for about 4 hours when I noticed that. The node is ordinary Erlang node, nothing special except for the complicated environment.

The environment is Kubernetes with istio used for networking. It's possible that one of nodes of the cluster was restarted abruptly, and may be it was related to version upgrade of istio networking, so we have either restart of a node or a possible glitch of networking to break connection, and also a generally interesting networking implementation. One surprising issue, however, is that there were no timeouts and spawn_opt/5 just stuck in that state. Could it be related to the environment? If yes, and the caller may be blocked in unfortunate circumstances in K8s/istio env, would you suggest a way to prevent such situations?

Thank you,
Vyacheslav

Rickard Green

unread,

Jun 17, 2021, 4:41:47 PM6/17/21

to Vyacheslav Levytskyy, Rickard Green, erlang-questions@erlang.org Questions

On Thu, Jun 17, 2021 at 1:45 PM Vyacheslav Levytskyy <v.lev...@yahoo.com> wrote:

Thank you for details, I think it explains the most part of the situation. I checked messages indeed, they were all specific to my application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, just usual '$gen_cast' and system. Judging from messages, the caller was blocked for about 4 hours when I noticed that. The node is ordinary Erlang node, nothing special except for the complicated environment.

Is it an OTP 24 node as well?

The environment is Kubernetes with istio used for networking. It's possible that one of nodes of the cluster was restarted abruptly, and may be it was related to version upgrade of istio networking, so we have either restart of a node or a possible glitch of networking to break connection, and also a generally interesting networking implementation. One surprising issue, however, is that there were no timeouts and spawn_opt/5 just stuck in that state. Could it be related to the environment? If yes, and the caller may be blocked in unfortunate circumstances in K8s/istio env, would you suggest a way to prevent such situations?

If the connection to the other node is lost, the local runtime system will send a {spawn_reply, Ref, error, noconnection} message to the process blocked in spawn_opt() which will cause spawn_opt() to return. The local runtime system detects that the connection is lost either by the tcp socket being closed or by the local runtime system detecting that there has been no incoming traffic during net_ticktime seconds <https://erlang.org/doc/man/kernel_app.html#net_ticktime> (which defaults to 60 seconds). I guess that you haven't increased net_ticktime to more than 4 hours which indicates that there is a bug somewhere.

Please open a bug issue at <https://github.com/erlang/otp/issues> where we can continue this.