Channel error on connection : operation queue.declare caused a channel exception not_found: failed to perform operation on queue

2,851 views
Skip to first unread message

Avik Hazra

unread,
Sep 2, 2019, 5:00:22 AM9/2/19
to rabbitmq-users
Team,

We are running 3 node RMQ cluster on windows server 2008 R2, having pause_minority enabled as partition handling strategy on all the three nodes. Recently we experienced an outage with our application where we see that partial network partition took place whihc got recovered but just after it recovered we see heaps of Channel error on the connection for all the queues and exchanges. 

I am not able to get this clear as how Channel error on connection can happen when the services recovered. Also, what does Channel error on connection means?



Node 1

2019-09-01 00:29:56.106 [info] <0.7922.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@mo-143443df1': <11916.12363.378>
2019-09-01 00:30:06.032 [error] <0.2393.389> Channel error on connection <0.2382.389> (<rab...@MO-4EB8ABE07.1.2382.389>, vhost: 'DC###', user: 'DC###RMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'aliveness-test' in vhost 'DC###' due to timeout
2019-09-01 00:30:07.770 [info] <0.9695.389> accepting AMQP connection <0.9695.389> (10.170.152.5:55844 -> 10.170.153.92:5672)
2019-09-01 00:30:07.780 [info] <0.9695.389> connection <0.9695.389> (10.170.152.5:55844 -> 10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:25.192 [info] <0.11837.389> accepting AMQP connection <0.11837.389> (10.170.152.5:60819 -> 10.170.153.92:5672)
2019-09-01 00:30:25.203 [info] <0.11837.389> connection <0.11837.389> (10.170.152.5:60819 -> 10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:25.679 [info] <0.11895.389> accepting AMQP connection <0.11895.389> (10.170.152.5:51191 -> 10.170.153.92:5672)
2019-09-01 00:30:25.684 [info] <0.11895.389> connection <0.11895.389> (10.170.152.5:51191 -> 10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:26.067 [info] <0.11963.389> accepting AMQP connection <0.11963.389> (10.170.152.5:62856 -> 10.170.153.92:5672)
2019-09-01 00:30:26.073 [info] <0.11963.389> connection <0.11963.389> (10.170.152.5:62856 -> 10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:26.164 [info] <0.11988.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@MO-C91C65961': <9848.13017.385>
2019-09-01 00:30:26.178 [info] <0.11988.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@mo-143443df1': <11916.12502.378>
2019-09-01 00:30:26.189 [error] <0.11979.389> Channel error on connection <0.11

Node 2

2019-09-01 00:28:06.163 [info] <0.15436.1> rabbit on node 'rabbit@MO-4EB8ABE07' down
2019-09-01 00:28:06.194 [info] <0.15436.1> Keeping rabbit@MO-4EB8ABE07 listeners: the node is already back
2019-09-01 00:28:06.210 [error] <0.11500.378> ** Generic server <0.11500.378> terminating
** Last message in was {'$gen_cast',go}
** When Server state == {not_started,{amqqueue,{resource,<<"PC17Prod">>,queue,<<"Job_Servers">>},true,false,none,[],<37715.842.0>,[],[],['rabbit@MO-4EB8ABE07'],[{vhost,<<"PC17Prod">>},{name,<<"ha-all">>},{pattern,<<"^">>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>}]},{priority,0}],undefined,[{<37715.843.0>,<37715.842.0>}],[],live,1,[],<<"PC17Prod">>,#{user => <<"PC17ProdRMQUser">>}}}
** Reason for termination == 
** {{badmatch,{error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}}},[{rabbit_variable_queue,delete_crashed,1,[{file,"src/rabbit_variable_queue.erl"},{line,624}]},{rabbit_mirror_queue_slave,handle_go,1,[{file,"src/rabbit_mirror_queue_slave.erl"},{line,123}]},{rabbit_mirror_queue_slave,handle_cast,2,[{file,"src/rabbit_mirror_queue_slave.erl"},{line,280}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1050}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
2019-09-01 00:28:06.210 [error] <0.11500.378> CRASH REPORT Process <0.11500.378> with 1 neighbours exited with reason: no match of right hand value {error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}} in rabbit_variable_queue:delete_crashed/1 line 624 in gen_server2:terminate/3 line 1166
2019-09-01 00:28:06.210 [warning] <0.15647.373> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
2019-09-01 00:28:06.210 [error] <0.11481.378> Supervisor {<0.11481.378>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"PC17Prod">>,queue,<<"Job_Servers">>},true,false,none,[],<37715.842.0>,[],...}, slave, <0.11486.378>) at <0.11500.378> exit with reason no match of right hand value {error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}} in rabbit_variable_queue:delete_crashed/1 line 624 in context child_terminated
2019-09-01 00:28:06.210 [warning] <0.30924.373> Mirrored queue 'WFA_AppServer_MO-57992F620' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
2019-09-01 00:28:06.210 [warning] <0.30946.373> Mirrored queue 'WFA_AppServer_MO-57992F620_MANAGER' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
019-09-01 00:30:10.106 [error] <0.12297.378> Channel error on connection <0.12279.378> (10.170.152.5:58244 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-3F493FF71_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:10.106 [info] <0.12402.378> accepting AMQP connection <0.12402.378> (10.170.152.5:58249 -> 10.170.152.92:5672)
2019-09-01 00:30:10.121 [info] <0.12402.378> connection <0.12402.378> (10.170.152.5:58249 -> 10.170.152.92:5672): user 'PC17ProdRMQUser' authenticated and granted access to vhost 'PC17Prod'
2019-09-01 00:30:25.206 [info] <0.12464.378> accepting AMQP connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672)
2019-09-01 00:30:25.222 [info] <0.12464.378> connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672): user 'PC17ProdRMQUser' authenticated and granted access to vhost 'PC17Prod'
2019-09-01 00:30:25.347 [info] <0.12483.378> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Adding mirror on node 'rabbit@MO-4EB8ABE07': <37716.11880.389>
2019-09-01 00:30:25.394 [info] <0.12483.378> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Adding mirror on node 'rabbit@MO-C91C65961': <37715.12717.385>
2019-09-01 00:30:25.503 [error] <0.12469.378> Channel error on connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:25.503 [info] <0.12464.378> closing AMQP connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser')
2019-09-01 00:30:40.666 [error] <0.12200.378> Channel error on connection <0.30764.373> (10.170.152.5:57719 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUs

Node 3

2019-09-01 00:30:25.503 [error] <0.12469.378> Channel error on connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:25.503 [info] <0.12464.378> closing AMQP connection <0.12464.378> (10.170.152.5:51190 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser')
2019-09-01 00:30:40.666 [error] <0.12200.378> Channel error on connection <0.30764.373> (10.170.152.5:57719 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'Job_Servers' in vhost 'PC17Prod' due to timeout
2019-09-01 00:30:41.540 [error] <0.12219.378> Channel error on connection <0.12220.378> (10.170.152.5:55840 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'bus-MO-75030A1A9-ConsumerWindowsService-g6qyyyns1ryfbj66bdm15sqfnk' in vhost 'PC17Prod' due to timeout
2019-09-01 00:30:41.540 [info] <0.12220.378> closing AMQP connection <0.12220.378> (10.170.152.5:55840 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser')
2019-09-01 00:30:56.469 [error] <0.12196.378> Channel error on connection <0.30891.373> (10.170.152.5:55597 -> 10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'WFA_AppServer_MO-57992F620' in vhost 'PC17Prod' due to timeout

Regards
Avik Hazra

Wesley Peng

unread,
Sep 2, 2019, 5:04:50 AM9/2/19
to rabbitm...@googlegroups.com
Hi

on 2019/9/2 17:00, Avik Hazra wrote:
> I am not able to get this clear as how Channel error on connection can
> happen when the services recovered. Also, what does Channel error on
> connection means?
>

When clients meet a connection error, you should re-connect to the cluster.

regards.

Wesley Peng

unread,
Sep 2, 2019, 5:14:36 AM9/2/19
to rabbitm...@googlegroups.com

Hi

on 2019/9/2 17:00, Avik Hazra wrote:
I am not able to get this clear as how Channel error on connection can happen when the services recovered. Also, what does Channel error on connection means?

I also provide the code piece to show how to re-connect automatically.


Please not if both switch and queue are durable, even re-connecting can't work.

When they are durable, switch and queue can't be re-created on another broker of the cluster.

regards.


Michael Klishin

unread,
Oct 3, 2019, 4:58:43 AM10/3/19
to rabbitmq-users
The stack trace looks awfully similar to [1].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b07f0435-e244-4572-8cad-53ea8164a622%40googlegroups.com.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
Oct 3, 2019, 4:59:37 AM10/3/19
to rabbitmq-users
Please don't use block: true in Bunny::Queue#subscribe if you don't understand what it does and what effects
that might have on connection recovery.

I have removed it from all examples IIRC but people keep recommending it…

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages