_______________________________________________
erlang-questions mailing list
erlang-q...@erlang.org
http://erlang.org/mailman/listinfo/erlang-questions
I saw this message in our RabbitMQ server. Investigation lead me to the garbage collection. It happened only once. After restart everything seems to be fine.
On May 23, 2013, at 1:04 PM, Morgan Segalis <mseg...@gmail.com> wrote:Every process spawned is monitored and started by a supervisor…
Do you use start_child to spawn a new process? If so, do you clean it up?What is supervising strategy?- Dmitry
I have made a little function a while back, getting all processes and removing the processes inited at the beginning…
Keep etop running and capture the output to a file (e.g. etop ... | tee stop.log). After it gets into trouble look back and see what was happening beforehand.
Have you analyzed the crash dump file with the crash dump viewer?
You can run a little function that writes process info to files every N seconds. Like this:F = fun(F2, T) -> Seconds=calendar:datetime_to_gregorian_seconds(calendar:now_to_local_time(now())), Fname=lists:flatten(io_lib:format("/tmp/f-~p", [Seconds])), [begin Info=process_info(Pid), Data=io_lib:format("~p:~n~p~n", [Pid, Info]), file:write_file(Fname, Data, [append]) end || Pid <- processes()], timer:sleep(T), F2(F2, T) end.run it from console with F(F, 5000) and get a bunch of files in /tmp that probably can provide you something useful
Have you set the ERL_CRASH_DUMP_SECONDS environment variable?:
It won't create one unless you set it to a positive value. Set it to 60 or more to be sure it completes.
It "freezes"? So it hasn't crashed at all. In that case you just need to be more patient and wait for it to either crash, and leave a crash dump, or output to etop. Possibly setting process priorities would help. Give the suspicious ones low priority.
If it's CPU resources which are being depleted you would want to observe which have the most reductions. Use stop to order by reductions and see who's the busiest.
Another way would be to run a debug emulator and interrupt it while it's frozen. Then inspect the backtrace to see what it has been doing.