Node not responding,Partial partition detected,Queue cannot be deleted and cannot be automatically restored

34 views
Skip to first unread message

Wong Energy

unread,
Jun 17, 2024, 11:19:31 PMJun 17
to rabbitmq-users
Hi ~
I have encountered a problem, can anyone help me find a solution?

Env:
3-node cluster running on centos7.9 on Azure virtual machine
RabbitMQ version: 3.8.12
Erlang version: 22.3
Failure time: 2024-06-14 08:45
Failure manifestation: Synchronization between nodes failed, manual synchronization cannot be performed, and queues cannot be deleted
Failure picture:



1st node logs
2024-06-14 08:45:19.960 [error] <0.28732.13> ** Node 'rabbit@RMQ-03' not responding **
** Removing (timedout) connection **
2024-06-14 08:45:21.227 [error] <0.579.0> Partial partition detected:
 * We saw DOWN from rabbit@RMQ-03
 * We can still see rabbit@RMQ-02 which can see rabbit@RMQ-03
We will therefore intentionally disconnect from rabbit@RMQ-02
2024-06-14 08:45:23.252 [error] <0.367.0> Mnesia('rabbit@RMQ-01'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@RMQ-02'}
2024-06-14 08:45:36.994 [error] <0.26213.4947> Error on AMQP connection <0.26213.4947> (90.180.108.9:33542 -> 90.180.108.9:5672 - Application-01-MessageApplicationCenter-23304-107, vhost: '/', user: 'google', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2024-06-14 08:45:36.994 [error] <0.2480.1741> Error on AMQP connection <0.2480.1741> (90.180.108.9:35584 -> 90.180.108.9:5672, vhost: '/', user: 'google', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2024-06-14 08:45:36.995 [error] <0.27912.2151> Error on AMQP connection <0.27912.2151> (90.180.108.10:48054 -> 90.180.108.9:5672 - Application-01-MessageApplicationCenter-34132-533, vhost: '/', user: 'google', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"
2024-06-14 08:45:36.995 [error] <0.21691.2096> Error on AMQP connection <0.21691.2096> (90.180.108.10:34420 -> 90.180.108.9:5672, vhost: '/', user: 'google', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

2nd node  logs
2024-06-14 08:45:21.228 [error] <0.583.0> Partial partition disconnect from rabbit@RMQ-01
2024-06-14 08:45:23.252 [error] <0.366.0> Mnesia('rabbit@RMQ-02'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@RMQ-01'}
2024-06-14 08:45:55.044 [error] <0.18652.6384> closing AMQP connection <0.18652.6384> (90.180.108.9:40604 -> 90.180.108.10:5672):
{handshake_timeout,frame_header}
2024-06-14 08:45:55.145 [error] <0.24850.6384> closing AMQP connection <0.24850.6384> (90.180.108.9:41194 -> 90.180.108.10:5672):
{handshake_timeout,frame_header}
2024-06-14 08:45:55.225 [error] <0.22671.4317> closing AMQP connection <0.22671.4317> (90.180.108.9:41564 -> 90.180.108.10:5672):
{handshake_timeout,frame_header}
2024-06-14 08:45:55.266 [error] <0.27855.6384> closing AMQP connection <0.27855.6384> (90.180.108.9:41696 -> 90.180.108.10:5672):



3rd node logs 
2024-06-14 08:45:35.007 [error] <0.277.0> ** Node 'rabbit@RMQ-01' not responding **
** Removing (timedout) connection **
2024-06-14 08:45:35.222 [error] <0.367.0> Mnesia('rabbit@RMQ-03'): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, 'rabbit@RMQ-01'}
2024-06-14 08:45:36.897 [error] <0.14541.5425> CRASH REPORT Process <0.14541.5425> with 0 neighbours exited with reason: {shutdown,{gen_server,call,[<0.5695.0>,{fetch,#Fun<rabbit_mgmt_db.23.17456294>,[[[{name,<<"HaFINTG-WXScoreOrderConfirmV3TaskQueue">>},{vhost,<<"/">>},{durable,true},{auto_delete,false},{exclusive,false},{owner_pid,none},{arguments,#{}},{pid,<18945.808.0>},{type,classic},{state,live},{slave_nodes,['rabbit@RMQ-03']},{synchronised_slave_nodes,['rabbit@RMQ-03']},{node,'rabbit@RMQ-02'}],[{name,<<"HaPayMgr-BillDailyStatisticsDataFillTaskQueue">>},{vhost,<<"/">>},{durable,true},{...},...],...]]},...]}} in gen_server:call/3 line 223 in gen_server:call/3 line 223
2024-06-14 08:45:36.949 [error] <0.25091.3207> Error on AMQP connection <0.25091.3207> (90.180.108.10:36536 -> 90.180.108.11:5672 - Application-01-MessageApplicationCenter-23304-398, vhost: '/', user: 'Teld', state: running), channel 0:
 operation none caused a connection exception connection_forced: "broker forced connection closure with reason 'shutdown'"

my config:

2024-06-18_111034.jpg


What are the possible reasons for this problem? Can I make some configuration to alleviate it? Of course, I am also trying to upgrade the version, but centos7 cannot currently install the rpm package of rhel8, but I should do those tasks during this period.



Thanks 
Joel Wong

Reply all
Reply to author
Forward
0 new messages