There are 2 rabbitmq pods in k8s. They continually restart after deployed.
$ kubectl get pods --watch NAME READY STATUS RESTARTS AGE crmq-rabbitmq-0 1/1 Running 0 1h crmq-rabbitmq-1 1/1 Running 0 1h crmq-rabbitmq-0 0/1 Running 0 1h crmq-rabbitmq-0 0/1 Running 1 1h crmq-rabbitmq-0 1/1 Running 1 1h
From the log file, can see rabbit_disk_monitor is in terminating status.2019-02-15 08:29:20.020 [error] <0.211.0> ** Generic server rabbit_disk_monitor terminating ** Last message in was update ** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit@crmq-rabbitmq-0",50000000,8259809280,100,10000,#Ref<0.739307764.2369781761.195649>,false,true,10,120000} ** Reason for termination == ** {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,686}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]} 2019-02-15 08:29:20.020 [error] <0.211.0> CRASH REPORT Process rabbit_disk_monitor with 0 neighbours crashed with reason: {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,...},...]} 2019-02-15 08:29:20.021 [error] <0.210.0> Supervisor rabbit_disk_monitor_sup had child rabbit_disk_monitor started with rabbit_disk_monitor:start_link(50000000) at <0.211.0> exit with reason {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,...},...]} in context child_terminated 2019-02-15 08:29:20.022 [info] <0.6260.0> FORMAT ERROR: "Free disk space monitor encountered an error (e.g. failed to parse output from OS tools): ~p, retries left: ~s~n" [{{'EXIT',{eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,enable,1,[{file,"src/rabbit_disk_monitor.erl"},{line,273}]},{rabbit_disk_monitor,init,1,[{file,"src/rabbit_disk_monitor.erl"},{line,132}]},{gen_server,init_it,2,[{file,"gen_server.erl"},{line,365}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,333}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}},16335470592},10]
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
2019-02-15 08:29:20.020 [error] <0.211.0> ** Generic server rabbit_disk_monitor terminating
** Last message in was update
** When Server state == {state,"/var/lib/rabbitmq/mnesia/rabbit@crmq-rabbitmq-0",50000000,8259809280,100,10000,#Ref<0.739307764.2369781761.195649>,false,true,10,120000}
** Reason for termination ==
** {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,686}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
2019-02-15 08:29:20.020 [error] <0.211.0> CRASH REPORT Process rabbit_disk_monitor with 0 neighbours crashed with reason: {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,...},...]}
2019-02-15 08:29:20.021 [error] <0.210.0> Supervisor rabbit_disk_monitor_sup had child rabbit_disk_monitor started with rabbit_disk_monitor:start_link(50000000) at <0.211.0> exit with reason {eagain,[{erlang,open_port,[{spawn,"/bin/sh -s unix:cmd"},[binary,stderr_to_stdout,stream,in,hide,out]],[]},{os,cmd,1,[{file,"os.erl"},{line,239}]},{rabbit_disk_monitor,get_disk_free,2,[{file,"src/rabbit_disk_monitor.erl"},{line,220}]},{rabbit_disk_monitor,internal_update,1,[{file,"src/rabbit_disk_monitor.erl"},{line,197}]},{rabbit_disk_monitor,handle_info,2,[{file,"src/rabbit_disk_monitor.erl"},{line,169}]},{gen_server,try_dispatch,4,[{file,"gen_server.erl"},{line,616}]},{gen_server,handle_msg,...},...]} in context child_terminated05:58:05.278 [error] GenServer :inet_gethost_native_sup terminating
Actually, it seems failed to schedule many other threads except thread 6. Is this because failing to get hostname?
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 34m kubelet, ip-10-15-13-86.us-east-2.compute.internal Readiness probe failed: Failed to create aux thread
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 33m kubelet, ip-10-15-13-86.us-east-2.compute.internal Liveness probe failed: Failed to create dirty io scheduler thread 5
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 33m kubelet, ip-10-15-13-86.us-east-2.compute.internal Readiness probe failed: Failed to create dirty io scheduler thread 9
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 32m kubelet, ip-10-15-13-86.us-east-2.compute.internal Liveness probe failed: Failed to create dirty io scheduler thread 4
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 32m kubelet, ip-10-15-13-86.us-east-2.compute.internal Readiness probe failed: Failed to create dirty io scheduler thread 8
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 31m kubelet, ip-10-15-13-86.us-east-2.compute.internal Liveness probe failed: Failed to create dirty io scheduler thread 3
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Warning Unhealthy 31m kubelet, ip-10-15-13-86.us-east-2.compute.internal Readiness probe failed: Failed to create dirty io scheduler thread 7
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
Aborted (core dumped)
Hi,
From the rabbitmq checks the limit is 4096.
bash-4.2# cat /etc/security/limits.d/20-nproc.conf
# Default limit for number of user's processes to prevent
# accidental fork bombs.
# See rhbz #432903 for reasoning.
* soft nproc 4096
root soft nproc unlimited
The OS limit reported by `ulimit -a` can be misleading in environments where containers and container orchestrators can have their own