We have a strange problem that we are able to reproduce at both our production and development environments. We have a two-node rabbitmq server cluster with several queues and some of which are declared as "x-single-active-consumer".
If a rabbitmq server node fails, the applications quickly reconnect to another node and consumers re-activate and continue receiving messages from server. However, once the previously failed rabbitmq server node starts up, consumers of "x-single-active-consumer" queues get disconnected and stop receiving messages from the server. The management interface shows "Consumers: 0". We use Java client and we get no notification from the channel that it was shut down. From the application perspective, no exceptions occur, but it just stops receiving messages from those queues. However, the issue does not affect other queues and consumers that are not "single active".
In the server we can find the following logs (unfortunately they are processed by our log parsers, so may be a bit incomplete or in incorrect order, but all they were written at the same time at launch of the previously failed node, from that node):
CRASH REPORT Process <0.8703.126> with 2 neighbours exited with reason: no match of right hand value [{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in gen_server2:terminate/3 line 1183
** Generic server <0.8703.126> terminating
Supervisor {<0.8702.126>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"/">>,queue,<<"queue-name">>},true,false,none,[{<<"x-...">>,...},...],...}, slave, <0.8701.126>) at <0.8703.126> exit with reason no match of right hand value [{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in context child_terminated
** Reason for termination ==
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"queue-name">>},true,false,none,[{<<"x-single-active-consumer">>,bool,true},{<<"x-dead-letter-exchange">>,longstr,<<"DLX">>},{<<"x-dead-letter-routing-key">>,longstr,<<"queue-name">>}],<0.8703.126>,[<17542.1063.0>],[<17542.1063.0>],[rabbit@hostname],[{vhost,<<"/">>},{name,<<"ha-all">>},{pattern,<<".*">>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>},{<<"ha-sync-mode">>,<<"automatic">>}]},{priority,0}],undefined,[{<17542.1065.0>,<17542.1063.0>},{<0.8704.126>,<0.8703.126>}],[],live,0,[],<<"/">>,#{user => <<"appname">>},rabbit_classic_queue,#{}},{<0.9154.223>,{consumer,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,[],<<"appname">>}},true,rabbit_mirror_queue_master,{state,{resource,<<"/">>,queue,<<"queue-name">>},<0.8704.126>,<0.8946.223>,rabbit_priority_queue,{passthrough,rabbit_variable_queue,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},{0,{[],[]}},{0,{[],[]}},38916174,{0,nil},{0,nil},{0,nil},{qistate,"/var/lib/rabbitmq/mnesia/rabbit@hostname/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3ETGAHFRKN79CNNCAJ89U632V",{#{},[{segment,2375,"/var/lib/rabbitmq/mnesia/rabbit@hostname/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3ETGAHFRKN79CNNCAJ89U632V/2375.idx",{array,16384,0,undefined,{{{{{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,...},...},...},...},...}},...},...]},...},...}},...},...}
** {{badmatch,[{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}]},[{rabbit_amqqueue_process,handle_cast,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1646}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}
Ranch listener rabbit_web_dispatch_sup_15671, connection process <0.9820.223>, stream 3 had its request process <0.10453.223> exit with reason {{error,{gen_tcp_error,timeout}},{gen_server2,call,[<0.1961.0>,{submit,#Fun<rabbit_auth_backend_ldap.18.112915216>,<0.10453.223>,reuse},infinity]}} and stacktrace []
Can anyone suggest what could we do to fix that? I was unable to find any similar problems report on the web.