We have been RabbitMQ on 3 clustered nodes, all running 3.7.8 version with erlang 21.1, Linux OS runing on VM. We have some queues with default no mirroring and using automatic auto heal handling partitions.
All nodes are started and messaging runs normal. After the VM was upgraded, when the VM run backup, some instance raise network issue include RabbitMQ node server.
<< coremqprd4 >>
2020-05-11 18:04:05 =ERROR REPORT====
Mnesia(rabbit@coremqprd4): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@coremqprd5}
<< coremqprd5 >>
2020-05-11 18:04:18 =ERROR REPORT====
** Node rabbit@coremqprd4 not responding **
** Removing (timedout) connection **
2020-05-11 18:04:55 =ERROR REPORT====
Mnesia(rabbit@coremqprd5): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@coremqprd6}
2020-05-11 18:04:55 =ERROR REPORT====
** gen_event handler lager_exchange_backend crashed.
** Was installed in lager_event
** Last event was: {log,{lager_msg,[],[{pid,<0.42.0>}],info,{["2020",45,"05",45,"11"],["18",58,"04",58,"55",46,"616"]},{1589,195095,616397},[65,112,112,108,105,99,97,116,105,111,110,32,"mnesia",32,101,120,105,116,101,100,32,119,105,116,104,32,114,101,97,115,111,110,58,32,"stopped"]}}
** When handler state == {state,{mask,127},lager_default_formatter,[date," ",time," ",color,"[",severity,"] ",{pid,[]}," ",message,"\n"],-573485176,{resource,<<"/">>,exchange,<<"amq.rabbitmq.log">>}}
** Reason == {badarg,[{rabbit_misc,dirty_read,1,[]},{rabbit_basic,publish,1,[]}]}
2020-05-11 18:04:55 =ERROR REPORT====
Mnesia(rabbit@coremqprd5): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@coremqprd4}
2020-05-11 18:04:55 =ERROR REPORT====
Mnesia(rabbit@coremqprd5): ** ERROR ** mnesia_event got {inconsistent_database, starting_partitioned_network, rabbit@coremqprd6}
<< coremqprd6 >>
2020-05-11 18:04:47 =SUPERVISOR REPORT====
Supervisor: {<0.30081.13>,rabbit_channel_sup}
Context: shutdown_error
Reason: noproc
Offender: [{pid,<0.30082.13>},{name,writer},{mfargs,{rabbit_writer,start_link,[#Port<0.439721>,1,4096,rabbit_framing_amqp_0_9_1,<0.30075.13>,{<<"
172.16.3.199:24530 ->
172.16.3.214:5672">>,1},true]}},{restart_type,intrinsic},{shutdown,70000},{child_type,worker}]
2020-05-11 18:04:49 =ERROR REPORT====
Mnesia(rabbit@coremqprd6): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, rabbit@coremqprd5}
The next problem is we encounter intermittent queue state down after partitioned network event, the time interval is quite long (occurs the next day when many applications are accessing)