We are running 3 node RMQ cluster on windows server 2008 R2, having pause_minority enabled as partition handling strategy on all the three nodes. Recently we experienced an outage with our application where we see that partial network partition took place whihc got recovered but just after it recovered we see heaps of Channel error on the connection for all the queues and exchanges.
I am not able to get this clear as how Channel error on connection can happen when the services recovered. Also, what does Channel error on connection means?
2019-09-01 00:29:56.106 [info] <0.7922.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@mo-143443df1': <11916.12363.378>
2019-09-01 00:30:06.032 [error] <0.2393.389> Channel error on connection <0.2382.389> (<rab...@MO-4EB8ABE07.1.2382.389>, vhost: 'DC###', user: 'DC###RMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'aliveness-test' in vhost 'DC###' due to timeout
2019-09-01 00:30:07.780 [info] <0.9695.389> connection <0.9695.389> (
10.170.152.5:55844 ->
10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:25.203 [info] <0.11837.389> connection <0.11837.389> (
10.170.152.5:60819 ->
10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:25.684 [info] <0.11895.389> connection <0.11895.389> (
10.170.152.5:51191 ->
10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:26.073 [info] <0.11963.389> connection <0.11963.389> (
10.170.152.5:62856 ->
10.170.153.92:5672): user 'DC###RMQUser' authenticated and granted access to vhost 'DC###'
2019-09-01 00:30:26.164 [info] <0.11988.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@MO-C91C65961': <9848.13017.385>
2019-09-01 00:30:26.178 [info] <0.11988.389> Mirrored queue 'oData_Server_MO-B00827138' in vhost 'DC###': Adding mirror on node 'rabbit@mo-143443df1': <11916.12502.378>
2019-09-01 00:30:26.189 [error] <0.11979.389> Channel error on connection <0.11
2019-09-01 00:28:06.163 [info] <0.15436.1> rabbit on node 'rabbit@MO-4EB8ABE07' down
2019-09-01 00:28:06.194 [info] <0.15436.1> Keeping rabbit@MO-4EB8ABE07 listeners: the node is already back
2019-09-01 00:28:06.210 [error] <0.11500.378> ** Generic server <0.11500.378> terminating
** Last message in was {'$gen_cast',go}
** When Server state == {not_started,{amqqueue,{resource,<<"PC17Prod">>,queue,<<"Job_Servers">>},true,false,none,[],<37715.842.0>,[],[],['rabbit@MO-4EB8ABE07'],[{vhost,<<"PC17Prod">>},{name,<<"ha-all">>},{pattern,<<"^">>},{'apply-to',<<"all">>},{definition,[{<<"ha-mode">>,<<"all">>}]},{priority,0}],undefined,[{<37715.843.0>,<37715.842.0>}],[],live,1,[],<<"PC17Prod">>,#{user => <<"PC17ProdRMQUser">>}}}
** Reason for termination ==
** {{badmatch,{error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}}},[{rabbit_variable_queue,delete_crashed,1,[{file,"src/rabbit_variable_queue.erl"},{line,624}]},{rabbit_mirror_queue_slave,handle_go,1,[{file,"src/rabbit_mirror_queue_slave.erl"},{line,123}]},{rabbit_mirror_queue_slave,handle_cast,2,[{file,"src/rabbit_mirror_queue_slave.erl"},{line,280}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1050}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
2019-09-01 00:28:06.210 [error] <0.11500.378> CRASH REPORT Process <0.11500.378> with 1 neighbours exited with reason: no match of right hand value {error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}} in rabbit_variable_queue:delete_crashed/1 line 624 in gen_server2:terminate/3 line 1166
2019-09-01 00:28:06.210 [warning] <0.15647.373> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
2019-09-01 00:28:06.210 [error] <0.11481.378> Supervisor {<0.11481.378>,rabbit_amqqueue_sup} had child rabbit_amqqueue started with rabbit_prequeue:start_link({amqqueue,{resource,<<"PC17Prod">>,queue,<<"Job_Servers">>},true,false,none,[],<37715.842.0>,[],...}, slave, <0.11486.378>) at <0.11500.378> exit with reason no match of right hand value {error,{"c:/RabbitMQ/DB/rabbit@mo-143443df1-mnesia/msg_stores/vhosts/EXX0MWPVGFQAT84PIHY1DG87N/queues/BRGEFF8MKFW55KBUJPOR36GVM",eexist}} in rabbit_variable_queue:delete_crashed/1 line 624 in context child_terminated
2019-09-01 00:28:06.210 [warning] <0.30924.373> Mirrored queue 'WFA_AppServer_MO-57992F620' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
2019-09-01 00:28:06.210 [warning] <0.30946.373> Mirrored queue 'WFA_AppServer_MO-57992F620_MANAGER' in vhost 'PC17Prod': Stopping all nodes on master shutdown since no synchronised slave is available
019-09-01 00:30:10.106 [error] <0.12297.378> Channel error on connection <0.12279.378> (
10.170.152.5:58244 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-3F493FF71_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:10.121 [info] <0.12402.378> connection <0.12402.378> (
10.170.152.5:58249 ->
10.170.152.92:5672): user 'PC17ProdRMQUser' authenticated and granted access to vhost 'PC17Prod'
2019-09-01 00:30:25.222 [info] <0.12464.378> connection <0.12464.378> (
10.170.152.5:51190 ->
10.170.152.92:5672): user 'PC17ProdRMQUser' authenticated and granted access to vhost 'PC17Prod'
2019-09-01 00:30:25.347 [info] <0.12483.378> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Adding mirror on node 'rabbit@MO-4EB8ABE07': <37716.11880.389>
2019-09-01 00:30:25.394 [info] <0.12483.378> Mirrored queue 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod': Adding mirror on node 'rabbit@MO-C91C65961': <37715.12717.385>
2019-09-01 00:30:25.503 [error] <0.12469.378> Channel error on connection <0.12464.378> (
10.170.152.5:51190 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:25.503 [error] <0.12469.378> Channel error on connection <0.12464.378> (
10.170.152.5:51190 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.bind caused a channel exception not_found: no exchange 'Job_Servers_MO-FC283D0C6_TEST' in vhost 'PC17Prod'
2019-09-01 00:30:40.666 [error] <0.12200.378> Channel error on connection <0.30764.373> (
10.170.152.5:57719 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'Job_Servers' in vhost 'PC17Prod' due to timeout
2019-09-01 00:30:41.540 [error] <0.12219.378> Channel error on connection <0.12220.378> (
10.170.152.5:55840 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'bus-MO-75030A1A9-ConsumerWindowsService-g6qyyyns1ryfbj66bdm15sqfnk' in vhost 'PC17Prod' due to timeout
2019-09-01 00:30:56.469 [error] <0.12196.378> Channel error on connection <0.30891.373> (
10.170.152.5:55597 ->
10.170.152.92:5672, vhost: 'PC17Prod', user: 'PC17ProdRMQUser'), channel 1:
operation queue.declare caused a channel exception not_found: failed to perform operation on queue 'WFA_AppServer_MO-57992F620' in vhost 'PC17Prod' due to timeout