RabbitMQ channel hosting x-single-active-consumer queue crashes at startup of rabbitmq node

754 views
Skip to first unread message

Alexey Sharayev

unread,
Dec 18, 2020, 7:27:57 AM12/18/20
to rabbitmq-users
Hello!

We have a strange problem that we are able to reproduce at both our production and development environments. We have a two-node rabbitmq server cluster with several queues and some of which are declared as "x-single-active-consumer".

If a rabbitmq server node fails, the applications quickly reconnect to another node and consumers re-activate and continue receiving messages from server. However, once the previously failed rabbitmq server node starts up, consumers of "x-single-active-consumer" queues get disconnected and stop receiving messages from the server. The management interface shows "Consumers: 0". We use Java client and we get no notification from the channel that it was shut down. From the application perspective, no exceptions occur, but it just stops receiving messages from those queues. However, the issue does not affect other queues and consumers that are not "single active".

In the server we can find the following logs (unfortunately they are processed by our log parsers, so may be a bit incomplete or in incorrect order, but all they were written at the same time at launch of the previously failed node, from that node):

CRASH REPORT Process <0.8703.126> with 2 neighbours exited with reason: no match of right hand value [{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in gen_server2:terminate/3 line 1183  
** Generic server <0.8703.126> terminating 
Supervisor {<0.8702.126>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"/">>,queue,<<"queue-name">>},true,false,none,[{<<"x-...">>,...},...],...}, slave, <0.8701.126>) at <0.8703.126> exit with reason no match of right hand value [{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in context child_terminated 
** Reason for termination == 
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"queue-name">>},true,false,none,[{<<"x-single-active-consumer">>,bool,true},{<<"x-dead-letter-exchange">>,longstr,<<"DLX">>},{<<"x-dead-letter-routing-key">>,longstr,<<"queue-name">>}],<0.8703.126>,[<17542.1063.0>],[<17542.1063.0>],[rabbit@hostname],[{vhost,<<"/">>},{name,<<"ha-all">>},{pattern,<<".*">>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>},{<<"ha-sync-mode">>,<<"automatic">>}]},{priority,0}],undefined,[{<17542.1065.0>,<17542.1063.0>},{<0.8704.126>,<0.8703.126>}],[],live,0,[],<<"/">>,#{user => <<"appname">>},rabbit_classic_queue,#{}},{<0.9154.223>,{consumer,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,[],<<"appname">>}},true,rabbit_mirror_queue_master,{state,{resource,<<"/">>,queue,<<"queue-name">>},<0.8704.126>,<0.8946.223>,rabbit_priority_queue,{passthrough,rabbit_variable_queue,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},{0,{[],[]}},{0,{[],[]}},38916174,{0,nil},{0,nil},{0,nil},{qistate,"/var/lib/rabbitmq/mnesia/rabbit@hostname/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3ETGAHFRKN79CNNCAJ89U632V",{#{},[{segment,2375,"/var/lib/rabbitmq/mnesia/rabbit@hostname/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/3ETGAHFRKN79CNNCAJ89U632V/2375.idx",{array,16384,0,undefined,{{{{{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined,undefined},{undefined,undefined,...},...},...},...},...}},...},...]},...},...}},...},...}
** {{badmatch,[{<0.9154.223>,<<"amq.ctag-ngd5CFZVeynE34YNLJ8N4A">>,true,1,true,up,[],<<"appname">>},{<0.9500.223>,<<"amq.ctag-4ydS_g_H91l2HKW_UP_ObQ">>,true,1,true,up,[],<<"appname">>}]},[{rabbit_amqqueue_process,handle_cast,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1646}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}
Ranch listener rabbit_web_dispatch_sup_15671, connection process <0.9820.223>, stream 3 had its request process <0.10453.223> exit with reason {{error,{gen_tcp_error,timeout}},{gen_server2,call,[<0.1961.0>,{submit,#Fun<rabbit_auth_backend_ldap.18.112915216>,<0.10453.223>,reuse},infinity]}} and stacktrace []

Can anyone suggest what could we do to fix that? I was unable to find any similar problems report on the web.

jane...@gmail.com

unread,
Jan 5, 2021, 9:34:21 PM1/5/21
to rabbitmq-users
I'm seeing the exact same behavior as well.  I'm using the .NET client v5.2.0, RabbitMQ v3.8.3, Erlang v22.3.2.  I will report back when I know more.

Luke Bakken

unread,
Jan 6, 2021, 9:11:15 AM1/6/21
to rabbitmq-users
Hello,

Since you can reproduce this easily, please try with a three-node cluster. Two nodes is not recommended.

Also, please let us know what version of RabbitMQ and Erlang you're using.

Thanks,
Luke

Alexey Sharayev

unread,
Jan 6, 2021, 4:12:50 PM1/6/21
to rabbitmq-users
Hello,

We have created a test cluster with three nodes and were able to reproduce the same problem. RabbitMQ version 3.8.4, Erlang 23.0.2 in our test environment.
After a failed node has got back online, the consumers were silently disconnected and Java clients did not get any notification about that.

Here is the fresh log of our trial:

This is how the cluster looks like in the Management Console:
image (34).png
This is the queue view (durable and single-active-consumer, 1 Consumer is connected):
image (35).png
Now the restart is performed:
:~/project$ docker-compose restart rabbit2 

After the node has started, the queue has 0 consumers:
image (36).png
And here is the rabbitmq log:

2021-01-06 14:37:57.011 [info] <0.594.0> rabbit on node rabbit@ampq_2 down
2021-01-06 14:37:57.028 [info] <0.594.0> Keeping rabbit@ampq_2 listeners: the node is already back
2021-01-06 14:37:57.059 [info] <0.594.0> node rabbit@ampq_2 down: connection_closed
2021-01-06 14:38:02.060 [info] <0.594.0> node rabbit@ampq_2 up
2021-01-06 14:38:04.628 [info] <0.594.0> rabbit on node rabbit@ampq_2 up
2021-01-06 14:38:05.012 [error] <0.2895.0> ** Generic server <0.2895.0> terminating
** Last message in was {'$gen_cast',{force_event_refresh,#Ref<45383.464957429.3372744705.125508>}}
** When Server state == {q,{amqqueue,{resource,<<"/">>,queue,<<"betserver.csl.snapshots.v1">>},true,false,none,[{<<"x-single-active-consumer">>,bool,true}],<0.2895.0>,[],[],[],undefined,undefined,[],[],live,0,[],<<"/">>,#{user => <<"guest">>},rabbit_classic_queue,#{}},{<0.3112.0>,{consumer,<<"amq.ctag-CAB_HaNl_4LmXWVKndgGbA">>,true,1,[],<<"guest">>}},true,rabbit_priority_queue,{passthrough,rabbit_variable_queue,{vqstate,{0,{[],[]}},{0,{[],[]}},{delta,undefined,0,0,undefined},{0,{[],[]}},{0,{[],[]}},0,{0,nil},{0,nil},{0,nil},{qistate,"/var/lib/rabbitmq/mnesia/rabbit@ampq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/queues/ET9K5TCP35YWSRZDEZ9GB8AL3",{#{},[]},undefined,0,32768,#Fun<rabbit_variable_queue.2.43870038>,#Fun<rabbit_variable_queue.3.43870038>,{0,nil},{0,nil},[],[],{resource,<<"/">>,queue,<<"betserver.csl.snapshots.v1">>}},{{client_msstate,<0.653.0>,<<73,78,189,230,179,155,177,244,62,176,22,248,49,255,248,100>>,#{},{state,#Ref<0.1991788419.955908098.86562>,"/var/lib/rabbitmq/mnesia/rabbit@ampq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent"},rabbit_msg_store_ets_index,"/var/lib/rabbitmq/mnesia/rabbit@ampq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_persistent",<0.656.0>,#Ref<0.1991788419.955908098.86563>,#Ref<0.1991788419.955908098.86559>,#Ref<0.1991788419.955908098.86564>,#Ref<0.1991788419.955908098.86565>,{4000,800}},{client_msstate,<0.649.0>,<<157,83,131,6,16,120,36,10,31,55,202,150,46,200,50,242>>,#{},{state,#Ref<0.1991788419.955908098.86532>,"/var/lib/rabbitmq/mnesia/rabbit@ampq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient"},rabbit_msg_store_ets_index,"/var/lib/rabbitmq/mnesia/rabbit@ampq/msg_stores/vhosts/628WB79CIFDYO9LJI6DKMI09L/msg_store_transient",<0.650.0>,#Ref<0.1991788419.955908098.86533>,#Ref<0.1991788419.955908098.86530>,#Ref<0.1991788419.955908098.86534>,#Ref<0.1991788419.955908098.86535>,{4000,800}}},true,0,4096,...}},...}
** Reason for termination ==
** {{badmatch,[{<0.3112.0>,<<"amq.ctag-CAB_HaNl_4LmXWVKndgGbA">>,true,1,true,up,[],<<"guest">>}]},[{rabbit_amqqueue_process,handle_cast,2,[{file,"src/rabbit_amqqueue_process.erl"},{line,1646}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,236}]}]}
2021-01-06 14:38:05.013 [error] <0.2895.0> CRASH REPORT Process <0.2895.0> with 0 neighbours exited with reason: no match of right hand value [{<0.3112.0>,<<"amq.ctag-CAB_HaNl_4LmXWVKndgGbA">>,true,1,true,up,[],<<"guest">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in gen_server2:terminate/3 line 1183
2021-01-06 14:38:05.013 [error] <0.3915.0> Restarting crashed queue 'betserver.csl.snapshots.v1' in vhost '/'.
2021-01-06 14:38:05.013 [error] <0.2894.0> Supervisor {<0.2894.0>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"/">>,queue,<<"betserver.csl.snapshots.v1">>},true,false,none,[{<<"x-singl...">>,...}],...}, declare, <0.2893.0>) at <0.2895.0> exit with reason no match of right hand value [{<0.3112.0>,<<"amq.ctag-CAB_HaNl_4LmXWVKndgGbA">>,true,1,true,up,[],<<"guest">>}] in rabbit_amqqueue_process:handle_cast/2 line 1646 in context child_terminated

Luke Bakken

unread,
Jan 6, 2021, 5:17:24 PM1/6/21
to rabbitmq-users
Hello,

Thank you for re-running your test. The most current version of RabbitMQ is 3.8.9. I'll check the code to see if the "badmatch" error has been addressed. It is most helpful to us when issues are reproduced using the current version.

Luke

Alexey Sharayev

unread,
Jan 6, 2021, 5:28:43 PM1/6/21
to rabbitmq-users
Hello,

We have also reproduced earlier the same problem with the most recent stable version: 3.8.9 (sorry, I don't remember exact Erlang version number), but we also got the same problem so we have reverted our test environment back to the same version, as we have in production.
If it may help, we can retry the test with the most recent RabbitMQ version available on Docker.

Thanks,
Alexey.

Alexey Sharayev

unread,
Feb 9, 2021, 6:23:54 AM2/9/21
to rabbitmq-users
Hello,

Luke, have you had a chance to look at this issue? Any progress?

Thanks,
Alexey.

appscurry

unread,
Mar 21, 2021, 3:09:56 PM3/21/21
to rabbitmq-users
We have also encountered the same error on rabbitMQ version 3.8.11 Erlang 23.2.3

Started the thread before  I came across this one:  https://groups.google.com/g/rabbitmq-users/c/PR_GSKNpQrk

jane...@gmail.com

unread,
Jun 8, 2021, 1:52:28 PM6/8/21
to rabbitmq-users
Looks like this is fixed in 3.8.17? (https://github.com/rabbitmq/rabbitmq-server/issues/3072)

Alexey Sharayev

unread,
Jun 8, 2021, 2:07:47 PM6/8/21
to rabbitmq-users
That's great news! We'll test it as soon as 3.8.17 will be publicly available.
Reply all
Reply to author
Forward
0 new messages