That's true, I don't need 134217727 processes to be running, but it is not substantial. It is just counter intuitive for me to have a beam process with only kernel+stdlib that must take 5-10 minutes to shutdown.
If you are right and this indeed happens because of 1,5Gb
allocated (btw does this mean that each of 134217727 potential
processes is pre-created?), I would say that setting an upper
bound of something I don't expect immediate actions. Out of
curiosity, I wonder what should be cleaned up in the system
immediately after the clean boot, and did this behaviour exist
earlier or was introduced recently (maybe as an optimization) with
Erlang/OTP 19 or 20?
I checked and running erl with that +P 134217727 results in a beam process using 1.517g of memory. So cleaning that up is probably what is taking so much time on shutdown :) 134217727 certainly seems excessive... What size system would even be able to handle dealing with that many processes? -- Tristan Sloughter "I am not a crackpot" - Abe Simpson
It's linux (ubuntu). I read /proc/[pid]/statm, take RSS number of
pages (the 2nd value in the line of that file) and calculate
memory used by Erlang VM as RSS x page size (read as "getconf
PAGESIZE").
I have also zero swappiness:
cat /proc/sys/vm/swappiness
0
so I expect that RSS shows total memory used.
So what is RSS?Is it a badly designed aggregate or some real value that is very hard to predict (just like it is hard to tell temperature when you know speed of several moleculas)?
The function doesn't interact with the gen_server that calls spawn/4,
although I'd expect spawn/4 to run a process and return immediately
anyway, am I wrong?
>>
>> I'm surprised to see my gen_server process hanging forever when
>> executing spawn/4 call. Process info shows spawn_opt/5 as a current
>> function and status is waiting:
>>
>> > process_info(P).
>> [{current_function,{erlang,spawn_opt,5}},
>> {status,waiting},
>> {message_queue_len,13},
>> {trap_exit,false},
>> {priority,normal},
>> ...]
>>
Thank you for details, I think it explains the most part of the situation. I checked messages indeed, they were all specific to my application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, just usual '$gen_cast' and system. Judging from messages, the caller was blocked for about 4 hours when I noticed that. The node is ordinary Erlang node, nothing special except for the complicated environment.
The environment is Kubernetes with istio used for networking. It's possible that one of nodes of the cluster was restarted abruptly, and may be it was related to version upgrade of istio networking, so we have either restart of a node or a possible glitch of networking to break connection, and also a generally interesting networking implementation. One surprising issue, however, is that there were no timeouts and spawn_opt/5 just stuck in that state. Could it be related to the environment? If yes, and the caller may be blocked in unfortunate circumstances in K8s/istio env, would you suggest a way to prevent such situations?
Thank you,
Vyacheslav
Thank you for details, I think it explains the most part of the situation. I checked messages indeed, they were all specific to my application - no "{spawn_reply, Ref, ok|error, Pid|Error}" for sure, just usual '$gen_cast' and system. Judging from messages, the caller was blocked for about 4 hours when I noticed that. The node is ordinary Erlang node, nothing special except for the complicated environment.
The environment is Kubernetes with istio used for networking. It's possible that one of nodes of the cluster was restarted abruptly, and may be it was related to version upgrade of istio networking, so we have either restart of a node or a possible glitch of networking to break connection, and also a generally interesting networking implementation. One surprising issue, however, is that there were no timeouts and spawn_opt/5 just stuck in that state. Could it be related to the environment? If yes, and the caller may be blocked in unfortunate circumstances in K8s/istio env, would you suggest a way to prevent such situations?