Disk alarm during upstart with transient messages

346 views
Skip to first unread message

jo...@cloudamqp.com

unread,
Jul 16, 2018, 11:38:05 PM7/16/18
to rabbitm...@googlegroups.com
Hi,

When starting a node that has ran out of disk, would it make sense to remove the transient messages before checking the disk free limit?

Johan

(RabbitMQ 3.7.5 on Erlang 20.1)
2018-07-17 03:16:36.903 [info] <0.275.0> Memory high watermark set to 542 MiB (569118597 bytes) of 670 MiB (702615552 bytes) total
2018-07-17 03:16:36.929 [info] <0.277.0> Enabling free disk space monitoring
2018-07-17 03:16:36.929 [info] <0.277.0> Disk free limit set to 500MB
2018-07-17 03:16:36.947 [info] <0.277.0> Free disk space is insufficient. Free bytes: 487227392. Limit: 500000000
2018-07-17 03:16:36.947 [error] <0.273.0> ** gen_event handler rabbit_alarm crashed.
** Was installed in rabbit_alarm
** Last event was: {set_alarm,{{resource_limit,disk,'rabbit@node-01'},[]}}
** When handler state == {alarms,{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},{dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[]
,[],[],[],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},[]}
** Reason == {badarg,[{gen_event,send,2,[{file,"gen_event.erl"},{line,265}]},{rabbit_alarm,handle_event,2,[{file,"src/rabbit_alarm.erl"},{line,134}]},{gen_event,server_update,4,[{file,"gen_event.erl"},{li
ne,573}]},{gen_event,server_notify,4,[{file,"gen_event.erl"},{line,555}]},{gen_event,handle_msg,6,[{file,"gen_event.erl"},{line,296}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]}
2018-07-17 03:16:36.948 [info] <0.279.0> Limiting to approx 999900 file handles (899908 sockets)
2018-07-17 03:16:36.948 [info] <0.280.0> FHC read buffering: OFF
2018-07-17 03:16:36.948 [info] <0.280.0> FHC write buffering: ON
2018-07-17 03:16:36.957 [info] <0.263.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2018-07-17 03:16:37.104 [info] <0.263.0> Waiting for Mnesia tables for 30000 ms, 9 retries left
2018-07-17 03:16:37.104 [info] <0.263.0> Peer discovery backend rabbit_peer_discovery_classic_config does not support registration, skipping registration.
2018-07-17 03:16:37.105 [info] <0.263.0> Priority queues enabled, real BQ is rabbit_variable_queue
2018-07-17 03:16:37.146 [info] <0.305.0> Starting rabbit_node_monitor
2018-07-17 03:16:37.147 [error] <0.324.0> CRASH REPORT Process <0.324.0> with 0 neighbours exited with reason: bad argument in call to lists:member(disk, {error,bad_module}) in rabbit_memory_monitor:init/
1 line 111
2018-07-17 03:16:37.148 [error] <0.323.0> Supervisor rabbit_memory_monitor_sup had child rabbit_memory_monitor started with rabbit_memory_monitor:start_link() at undefined exit with reason bad argument in
call to lists:member(disk, {error,bad_module}) in rabbit_memory_monitor:init/1 line 111 in context start_error
2018-07-17 03:16:37.156 [error] <0.262.0> CRASH REPORT Process <0.262.0> with 0 neighbours exited with reason: {error,{{shutdown,{failed_to_start_child,rabbit_memory_monitor,{badarg,[{lists,member,[disk,{
error,bad_module}],[]},{rabbit_memory_monitor,init,1,[{file,"src/rabbit_memory_monitor.erl"},{line,111}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,548}]},{proc_lib,init_p_do_apply,3,[{fi
le,"proc_lib.erl"},{line,247}]}]}}},{child,undefined,rabbit_memory_monitor_sup,{rabbit_restartable_sup,start_link,[rabbit_memory_monitor_sup,{rabbit_memory_monitor,start_link,[]},false]},transient,infinit
y,...}}} in application_master:init/4 line 134
2018-07-17 03:16:37.157 [info] <0.33.0> Application rabbit exited with reason: {error,{{shutdown,{failed_to_start_child,rabbit_memory_monitor,{badarg,[{lists,member,[disk,{error,bad_module}],[]},{rabbit_m
emory_monitor,init,1,[{file,"src/rabbit_memory_monitor.erl"},{line,111}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,548}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]
}}},{child,undefined,rabbit_memory_monitor_sup,{rabbit_restartable_sup,start_link,[rabbit_memory_monitor_sup,{rabbit_memory_monitor,start_link,[]},false]},transient,infinity,...}}}

Daniil Fedotov

unread,
Jul 17, 2018, 5:47:09 AM7/17/18
to rabbitmq-users
Hi,

It might make sense, yes. Although the error discovered is caused by the boot sequence when the rabbit_event process is started after the rabbit_alarm process, which tries to publish an event. This should be fixed first. Thanks for report.

I've create the issue report on Github for that https://github.com/rabbitmq/rabbitmq-server/issues/1644

If the space is taken my msg_store_transient directory, you can delete it as a workaround.
Reply all
Reply to author
Forward
0 new messages