rabbitmq-server crash : Monotonic time stepped backwards!

142 views
Skip to first unread message

jose cm

unread,
Jan 8, 2024, 6:55:42 AMJan 8
to rabbitmq-users
Hi people,

We are using RabbitMQ server cluster with 4 nodes.
All 4 nodes are identical, running rabbit  3.11.10 and Erlang OTP 25.2.3
on Suse SLES 15 sp4

The issue is that randomly (twice a  week) systemd service reports
monotonic time has gone backwards and restarts the rabbitmq server service.

as seen in systemd logs :

Jan 03 14:16:49 srv1wc0 rabbitmq-server[27462]: Monotonic time stepped backwards!
Jan 03 14:16:49 srv1wc0 rabbitmq-server[27462]: Previous time: 2861975396900129
Jan 03 14:16:49 srv1wc0 rabbitmq-server[27462]: Current time:  2861975396900128
Jan 03 14:16:50 srv1wc0 systemd[1]: URabbitMQ.service: Main process exited, code=killed, status=6/ABRT

Jan 03 14:16:50 srv1wc0 systemd[1]: URabbitMQ.service: Failed with result 'signal'.
Jan 03 14:17:20 srv1wc0 systemd[1]: URabbitMQ.service: Scheduled restart job, restart counter is at 1.

on the rabbit node logs we see a msg about Mnesia db stopped then the typical restart logs (systemd controlled) :

2024-01-03 14:17:25.630012+01:00 [notice] <0.44.0> Application mnesia exited with reason: stopped
2024-01-03 14:17:25.630223+01:00 [notice] <0.232.0> Feature flags: checking nodes `rabbit@RMAHRF02` and `rabbit@RMAHRB01` compatibility...
2024-01-03 14:17:25.633573+01:00 [notice] <0.232.0> Feature flags: nodes `rabbit@RMAHRF02` and `rabbit@RMAHRB01` are compatible
2024-01-03 14:17:25.656600+01:00 [info] <0.232.0> ra: starting system quorum_queues
2024-01-03 14:17:25.656676+01:00 [info] <0.232.0> starting Ra system: quorum_queues in directory: /users/rmq00/rmq/mnesia/rabbit@RMAHRF02/quorum/rabbit@RMAHRF02
2024-01-03 14:17:25.657611+01:00 [info] <0.338.0> ra system 'quorum_queues' running pre init for 1 registered servers
2024-01-03 14:17:25.658642+01:00 [info] <0.339.0> ra: meta data store initialised for system quorum_queues. 1 record(s) recovered
2024-01-03 14:17:25.658804+01:00 [notice] <0.344.0> WAL: ra_log_wal init, open tbls: ra_log_open_mem_tables, closed tbls: ra_log_closed_mem_tables
2024-01-03 14:17:25.662368+01:00 [info] <0.232.0> ra: starting system coordination
2024-01-03 14:17:25.662412+01:00 [info] <0.232.0> starting Ra system: coordination in directory: /users/rmq00/rmq/mnesia/rabbit@RMAHRF02/coordination/rabbit@RMAHRF02
2024-01-03 14:17:25.663298+01:00 [info] <0.355.0> ra system 'coordination' running pre init for 1 registered servers
2024-01-03 14:17:25.666412+01:00 [info] <0.356.0> ra: meta data store initialised for system coordination. 1 record(s) recovered
2024-01-03 14:17:25.666536+01:00 [notice] <0.365.0> WAL: ra_coordination_log_wal init, open tbls: ra_coordination_log_open_mem_tables, closed tbls: ra_coordination_log_closed_mem_tables
2024-01-03 14:17:26.505286+01:00 [info] <0.232.0>
2024-01-03 14:17:26.505286+01:00 [info] <0.232.0>  Starting RabbitMQ 3.11.10 on Erlang 25.2.3 [jit]

We tried adding Erlang VM parameter modifier :

RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+C multi_time_warp"

into cfg file rabbitmq-env.conf, but with no effect.


We don't know if the issue comes from  Rabbit itself , systemd or maybe something OS related (cpu/ntp,  etc).

So, please if someone can shed a light it will be much appreciated !

Cheers!

José.

Luke Bakken

unread,
Jan 8, 2024, 10:35:12 AMJan 8
to rabbitmq-users
Hello,

Just FYI, RabbitMQ clusters must be an odd number of nodes, so that a clear quorum can be established in the event of a network partition.

What virtualization environment are you using to run RabbitMQ?

There is a surprising lack of hits for the following search -



Upgrading Erlang might help since the version you're using is almost a year old - https://github.com/erlang/otp/releases/tag/OTP-25.2.3

Thanks,
Luke

Luke Bakken

unread,
Jan 8, 2024, 11:00:07 AMJan 8
to rabbitmq-users
A colleague found this issue, which may be relevant: https://github.com/systemd/systemd/issues/28250
Reply all
Reply to author
Forward
Message has been deleted
0 new messages