Disable disk monitoring

265 views
Skip to first unread message

Mc Polu

unread,
Feb 21, 2024, 10:23:12 AM2/21/24
to rabbitmq-users
Hello,

Thank you very much for RabbitMQ, it is a fantastic product and community. I am using RabbitMQ 3.12.12 on Windows 2022. I can disable disk monitoring from the command line thanks to  https://github.com/rabbitmq/rabbitmq-server/issues/8741 however when the node is restarted disk monitoring is enabled again.

Is there a way to permanently disable disk monitoring, please? Perhaps in a config file?

Thank you.

Luke Bakken

unread,
Feb 21, 2024, 5:32:16 PM2/21/24
to rabbitmq-users
At this time you can't permanently disable disk monitoring.

Why would you like to permanently disable it?

Mc Polu

unread,
Feb 22, 2024, 4:47:32 AM2/22/24
to rabbitmq-users
Thank you for the quick response, very appreciated.

I am getting the below error trace a few times a day. I am testing CQv2 as I understand it will become the new default in 3.13 and later the only option. I am measuring latency. When I graph latency over time I get a spiky chart, most of the time latency is very low but every now and then I get messages that took longer. With CQv2 the baseline is better, most messages flow faster but the spikes are worse. When I get slow messages they are slower. Looking at the logs I found these errors so I disabled disk monitoring. After disabling the spikes got better.

I am guessing this may be a result particular to my environment? I understand CQv2 writes more to disk and more often, so this may be an issue with my hardware? The error trace looks like it's getting a NaN then failing and being restarted. I guess the NaN may be coming from something outside Rabbit but perhaps it could be handled without a restart? It looks like restarting disk monitoring is correlated to increased latency.
 
Thank you very much for your time. Apologies if my guesses are wildly off, I don't know anything about RabbitMQ internals.


[error] <0.717169.0> ** Generic server rabbit_disk_monitor terminating

[error] <0.717169.0> ** Last message in was update

[error] <0.717169.0> ** When Server state == {state,"d:/RabbitMQLogsAndDB/db/rabbit@TESTSERVER1-mnesia",

[error] <0.717169.0>                                50000000,50144292864,100,10000,

[error] <0.717169.0>                                #Ref<0.499390760.3255828481.128053>,false,true,

[error] <0.717169.0>                                10,120000,

[error] <0.717169.0>                                {win32,nt},

[error] <0.717169.0>                                not_used}

[error] <0.717169.0> ** Reason for termination ==

[error] <0.717169.0> ** {badarith,[{erlang,'-',

[error] <0.717169.0>                       ['NaN',50000000],

[error] <0.717169.0>                       [{error_info,#{module => erl_erts_errors}}]},

[error] <0.717169.0>               {rabbit_disk_monitor,interval,1,

[error] <0.717169.0>                                    [{file,"rabbit_disk_monitor.erl"},

[error] <0.717169.0>                                     {line,428}]},

[error] <0.717169.0>               {rabbit_disk_monitor,start_timer,1,

[error] <0.717169.0>                                    [{file,"rabbit_disk_monitor.erl"},

[error] <0.717169.0>                                     {line,419}]},

[error] <0.717169.0>               {rabbit_disk_monitor,handle_info,2,

[error] <0.717169.0>                                    [{file,"rabbit_disk_monitor.erl"},

[error] <0.717169.0>                                     {line,194}]},

[error] <0.717169.0>               {gen_server,try_handle_info,3,

[error] <0.717169.0>                           [{file,"gen_server.erl"},{line,1095}]},

[error] <0.717169.0>               {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1183}]},

[error] <0.717169.0>               {proc_lib,init_p_do_apply,3,

[error] <0.717169.0>                         [{file,"proc_lib.erl"},{line,241}]}]}

[error] <0.717169.0>

[error] <0.717169.0>   crasher:

[error] <0.717169.0>     initial call: rabbit_disk_monitor:init/1

[error] <0.717169.0>     pid: <0.717169.0>

[error] <0.717169.0>     registered_name: rabbit_disk_monitor

[error] <0.717169.0>     exception error: an error occurred when evaluating an arithmetic expression

[error] <0.717169.0>       in operator  -/2

[error] <0.717169.0>          called as 'NaN' - 50000000

[error] <0.717169.0>       in call from rabbit_disk_monitor:interval/1 (rabbit_disk_monitor.erl, line 428)

[error] <0.717169.0>       in call from rabbit_disk_monitor:start_timer/1 (rabbit_disk_monitor.erl, line 419)

[error] <0.717169.0>       in call from rabbit_disk_monitor:handle_info/2 (rabbit_disk_monitor.erl, line 194)

[error] <0.717169.0>       in call from gen_server:try_handle_info/3 (gen_server.erl, line 1095)

[error] <0.717169.0>       in call from gen_server:handle_msg/6 (gen_server.erl, line 1183)

[error] <0.717169.0>     ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.233.0>]

[error] <0.717169.0>     message_queue_len: 0

[error] <0.717169.0>     messages: []

[error] <0.717169.0>     links: [<0.365.0>]

[error] <0.717169.0>     dictionary: []

[error] <0.717169.0>     trap_exit: false

[error] <0.717169.0>     status: running

[error] <0.717169.0>     heap_size: 10958

[error] <0.717169.0>     stack_size: 28

[error] <0.717169.0>     reductions: 26706

[error] <0.717169.0>   neighbours:

[error] <0.717169.0>

[error] <0.365.0>     supervisor: {local,rabbit_disk_monitor_sup}

[error] <0.365.0>     errorContext: child_terminated

[error] <0.365.0>     reason: {badarith,

[error] <0.365.0>                 [{erlang,'-',

[error] <0.365.0>                      ['NaN',50000000],

[error] <0.365.0>                      [{error_info,#{module => erl_erts_errors}}]},

[error] <0.365.0>                  {rabbit_disk_monitor,interval,1,

[error] <0.365.0>                      [{file,"rabbit_disk_monitor.erl"},{line,428}]},

[error] <0.365.0>                  {rabbit_disk_monitor,start_timer,1,

[error] <0.365.0>                      [{file,"rabbit_disk_monitor.erl"},{line,419}]},

[error] <0.365.0>                  {rabbit_disk_monitor,handle_info,2,

[error] <0.365.0>                      [{file,"rabbit_disk_monitor.erl"},{line,194}]},

[error] <0.365.0>                  {gen_server,try_handle_info,3,

[error] <0.365.0>                      [{file,"gen_server.erl"},{line,1095}]},

[error] <0.365.0>                  {gen_server,handle_msg,6,

[error] <0.365.0>                      [{file,"gen_server.erl"},{line,1183}]},

[error] <0.365.0>                  {proc_lib,init_p_do_apply,3,

[error] <0.365.0>                      [{file,"proc_lib.erl"},{line,241}]}]}

[error] <0.365.0>     offender: [{pid,<0.717169.0>},

[error] <0.365.0>                {id,rabbit_disk_monitor},

[error] <0.365.0>                {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},

[error] <0.365.0>                {restart_type,{transient,1}},

[error] <0.365.0>                {shutdown,300000},

[error] <0.365.0>                {child_type,worker}]

Mc Polu

unread,
Feb 22, 2024, 5:11:44 AM2/22/24
to rabbitmq-users
Looking at rabbit_disk_monitor.erl source code, I am not getting "failed to retrieve" messages from line 343 in the log so it doesn't look like a timeout

Luke Bakken

unread,
Feb 22, 2024, 1:42:46 PM2/22/24
to rabbitmq-users
Thanks for providing the stack trace.


In your environment, that code returned "NaN" which was used as "Actual" later in this calculation:


... which resulted in the "badarith" error.

So, either the call to the external utility that gets free space timed out, or some other error happened, in which case you would see "Free disk space monitoring failed to retrieve the amount of available space" in your log file. Since you say you don't see that message, it must be due to a timeout.

We can add code to deal with this specific situation, but you should ask why a call to get the free disk space is timing out in your environment. Probably due to too much load, or a slow disk subsystem.

Thanks,
Luke

Luke Bakken

unread,
Feb 22, 2024, 1:45:07 PM2/22/24
to rabbitmq-users
Please follow this issue if you're interested - https://github.com/rabbitmq/rabbitmq-server/issues/10597

Vilius Šumskas

unread,
Feb 22, 2024, 5:22:09 PM2/22/24
to rabbitm...@googlegroups.com

Just couple of thoughts, you could probably check if you don‘t have any disconnected but still mapped network drives, which usually do not play nice with disk/freespace enumeration software. Don’t forget to check that under the same user your RabbitMQ instance is started with.

 

Running RabbitMQ from a network attached storage could also be problematic.

 

--

    Vilius

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/06a6f79d-dd3a-49e4-9f74-ba18e8655b1bn%40googlegroups.com.

Luke Bakken

unread,
Feb 22, 2024, 5:56:38 PM2/22/24
to rabbitmq-users
Thank you Vilius, these are excellent suggestions!
Reply all
Reply to author
Forward
0 new messages