Thank you for the quick response, very appreciated.
I am getting the below error trace a few times a day. I am testing CQv2 as I understand it will become the new default in 3.13 and later the only option. I am measuring latency. When I graph latency over time I get a spiky chart, most of the time latency is very low but every now and then I get messages that took longer. With CQv2 the baseline is better, most messages flow faster but the spikes are worse. When I get slow messages they are slower. Looking at the logs I found these errors so I disabled disk monitoring. After disabling the spikes got better.
I am guessing this may be a result particular to my environment? I understand CQv2 writes more to disk and more often, so this may be an issue with my hardware? The error trace looks like it's getting a NaN then failing and being restarted. I guess the NaN may be coming from something outside Rabbit but perhaps it could be handled without a restart? It looks like restarting disk monitoring is correlated to increased latency.
Thank you very much for your time. Apologies if my guesses are wildly off, I don't know anything about RabbitMQ internals.
[error] <0.717169.0> ** Generic server rabbit_disk_monitor terminating
[error] <0.717169.0> ** Last message in was update
[error] <0.717169.0> ** When Server state == {state,"d:/RabbitMQLogsAndDB/db/rabbit@TESTSERVER1-mnesia",
[error] <0.717169.0> 50000000,50144292864,100,10000,
[error] <0.717169.0> #Ref<0.499390760.
3255828481.128053>,false,true,
[error] <0.717169.0> 10,120000,
[error] <0.717169.0> {win32,nt},
[error] <0.717169.0> not_used}
[error] <0.717169.0> ** Reason for termination ==
[error] <0.717169.0> ** {badarith,[{erlang,'-',
[error] <0.717169.0> ['NaN',50000000],
[error] <0.717169.0> [{error_info,#{module => erl_erts_errors}}]},
[error] <0.717169.0> {rabbit_disk_monitor,interval,1,
[error] <0.717169.0> [{file,"rabbit_disk_monitor.erl"},
[error] <0.717169.0> {line,428}]},
[error] <0.717169.0> {rabbit_disk_monitor,start_timer,1,
[error] <0.717169.0> [{file,"rabbit_disk_monitor.erl"},
[error] <0.717169.0> {line,419}]},
[error] <0.717169.0> {rabbit_disk_monitor,handle_info,2,
[error] <0.717169.0> [{file,"rabbit_disk_monitor.erl"},
[error] <0.717169.0> {line,194}]},
[error] <0.717169.0> {gen_server,try_handle_info,3,
[error] <0.717169.0> [{file,"gen_server.erl"},{line,1095}]},
[error] <0.717169.0> {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,1183}]},
[error] <0.717169.0> {proc_lib,init_p_do_apply,3,
[error] <0.717169.0> [{file,"proc_lib.erl"},{line,241}]}]}
[error] <0.717169.0>
[error] <0.717169.0> crasher:
[error] <0.717169.0> initial call: rabbit_disk_monitor:init/1
[error] <0.717169.0> pid: <0.717169.0>
[error] <0.717169.0> registered_name: rabbit_disk_monitor
[error] <0.717169.0> exception error: an error occurred when evaluating an arithmetic expression
[error] <0.717169.0> in operator -/2
[error] <0.717169.0> called as 'NaN' - 50000000
[error] <0.717169.0> in call from rabbit_disk_monitor:interval/1 (rabbit_disk_monitor.erl, line 428)
[error] <0.717169.0> in call from rabbit_disk_monitor:start_timer/1 (rabbit_disk_monitor.erl, line 419)
[error] <0.717169.0> in call from rabbit_disk_monitor:handle_info/2 (rabbit_disk_monitor.erl, line 194)
[error] <0.717169.0> in call from gen_server:try_handle_info/3 (gen_server.erl, line 1095)
[error] <0.717169.0> in call from gen_server:handle_msg/6 (gen_server.erl, line 1183)
[error] <0.717169.0> ancestors: [rabbit_disk_monitor_sup,rabbit_sup,<0.233.0>]
[error] <0.717169.0> message_queue_len: 0
[error] <0.717169.0> messages: []
[error] <0.717169.0> links: [<0.365.0>]
[error] <0.717169.0> dictionary: []
[error] <0.717169.0> trap_exit: false
[error] <0.717169.0> status: running
[error] <0.717169.0> heap_size: 10958
[error] <0.717169.0> stack_size: 28
[error] <0.717169.0> reductions: 26706
[error] <0.717169.0> neighbours:
[error] <0.717169.0>
[error] <0.365.0> supervisor: {local,rabbit_disk_monitor_sup}
[error] <0.365.0> errorContext: child_terminated
[error] <0.365.0> reason: {badarith,
[error] <0.365.0> [{erlang,'-',
[error] <0.365.0> ['NaN',50000000],
[error] <0.365.0> [{error_info,#{module => erl_erts_errors}}]},
[error] <0.365.0> {rabbit_disk_monitor,interval,1,
[error] <0.365.0> [{file,"rabbit_disk_monitor.erl"},{line,428}]},
[error] <0.365.0> {rabbit_disk_monitor,start_timer,1,
[error] <0.365.0> [{file,"rabbit_disk_monitor.erl"},{line,419}]},
[error] <0.365.0> {rabbit_disk_monitor,handle_info,2,
[error] <0.365.0> [{file,"rabbit_disk_monitor.erl"},{line,194}]},
[error] <0.365.0> {gen_server,try_handle_info,3,
[error] <0.365.0> [{file,"gen_server.erl"},{line,1095}]},
[error] <0.365.0> {gen_server,handle_msg,6,
[error] <0.365.0> [{file,"gen_server.erl"},{line,1183}]},
[error] <0.365.0> {proc_lib,init_p_do_apply,3,
[error] <0.365.0> [{file,"proc_lib.erl"},{line,241}]}]}
[error] <0.365.0> offender: [{pid,<0.717169.0>},
[error] <0.365.0> {id,rabbit_disk_monitor},
[error] <0.365.0> {mfargs,{rabbit_disk_monitor,start_link,[50000000]}},
[error] <0.365.0> {restart_type,{transient,1}},
[error] <0.365.0> {shutdown,300000},
[error] <0.365.0> {child_type,worker}]