I have dynamic shovel between 2 nodes in cluster to 2 other nodes in an other cluster (10.12.1.1 + 10.12.1.2 are in cluster, and 10.12.1.3 + 10.12.1.4 are in cluster).
From time to time, due to network issue, the link disconnects, manages to reconnect several times, and then disconnect again, but does not retry to reconnect anymore.
When this happens, it seems the Shovel tries to connect from one node to an other node in the same cluster (10.12.1.1 -> 10.12.1.2), but Shovel is not configured like this.
To restart it, we need to shutdown the node where the shovel link appears in "terminated" state.
Thanks.
2019-04-23 00:20:49.701 [error] <0.3620.14> ** Generic server <0.3620.14> terminating
** Last message in was heartbeat_timeout
** When Server state == {state,amqp_network_connection,{state,#Port<0.5396885>,<<"client
10.12.1.1:42699 ->
10.12.1.3:5672">>,10,<0.3623.14>,131072,<0.3619.14>,undefined,false},<0.3622.14>,{amqp_params_network,<<"POLUSER">>,<<"***">>,<<"/">>,"POL1",5672,2047,0,10,60000,none,[#Fun<amqp_uri.12.90191702>,#Fun<amqp_uri.12.90191702>],[{<<"connection_name">>,longstr,<<"Shovel POLtoCEN">>}],[]},2047,[{<<"capabilities">>,table,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"consumer_priorities">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"per_consumer_qos">>,bool,true},{<<"direct_reply_to">>,bool,true}]},{<<"cluster_name">>,longstr,<<"POLPRD">>},{<<"copyright">>,longstr,<<"Copyright (C) 2007-2018 Pivotal Software, Inc.">>},{<<"information">>,longstr,<<"Licensed under the MPL. See
http://www.rabbitmq.com/">>},{<<"platform">>,longstr,<<"Erlang/OTP 20.3">>},{<<"product">>,longstr,<<"RabbitMQ">>},{<<"version">>,longstr,<<"3.7.5">>}],none,false}
** Reason for termination ==
** heartbeat_timeout
2019-04-23 00:20:49.701 [error] <0.3620.14> CRASH REPORT Process <0.3620.14> with 0 neighbours exited with reason: heartbeat_timeout in gen_server:handle_common_reply/8 line 726
2019-04-23 00:20:49.702 [error] <0.3618.14> Supervisor {<0.3618.14>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3619.14>, {amqp_params_network,<<"POLUSER">>,<<"***">>,<<"/">>,"POL1",5672,2047,...}) at <0.3620.14> exit with reason heartbeat_timeout in context child_terminated
2019-04-23 00:20:49.702 [error] <0.3618.14> Supervisor {<0.3618.14>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3619.14>, {amqp_params_network,<<"POLUSER">>,<<"***">>,<<"/">>,"POL1",5672,2047,...}) at <0.3620.14> exit with reason reached_max_restart_intensity in context shutdown
2019-04-23 00:20:49.702 [info] <0.3617.14> terminating static worker with {shutdown,{gen_server,call,[<0.3629.14>,{subscribe,{'basic.consume',0,<<"central">>,<<>>,false,false,false,false,[]},<0.3617.14>},60000]}}
2019-04-23 00:20:49.703 [error] <0.3633.14> ** Generic server <0.3633.14> terminating
** Last message in was {'EXIT',<0.3617.14>,{shutdown,{gen_server,call,[<0.3629.14>,{subscribe,{'basic.consume',0,<<"central">>,<<>>,false,false,false,false,[]},<0.3617.14>},60000]}}}
** When Server state == {state,amqp_network_connection,{state,#Port<0.5397121>,<<"client
10.12.1.1:60993 ->
10.12.1.2:5672">>,10,<0.3636.14>,131072,<0.3632.14>,undefined,false},<0.3635.14>,{amqp_params_network,<<"CENUSER">>,<<"*******">>,<<"/">>,"CEN2",5672,2047,0,10,60000,none,[#Fun<amqp_uri.12.90191702>,#Fun<amqp_uri.12.90191702>],[{<<"connection_name">>,longstr,<<"Shovel POLtoCEN">>}],[]},2047,[{<<"capabilities">>,table,[{<<"publisher_confirms">>,bool,true},{<<"exchange_exchange_bindings">>,bool,true},{<<"basic.nack">>,bool,true},{<<"consumer_cancel_notify">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"consumer_priorities">>,bool,true},{<<"authentication_failure_close">>,bool,true},{<<"per_consumer_qos">>,bool,true},{<<"direct_reply_to">>,bool,true}]},{<<"cluster_name">>,longstr,<<"CENPRD">>},{<<"copyright">>,longstr,<<"Copyright (C) 2007-2018 Pivotal Software, Inc.">>},{<<"information">>,longstr,<<"Licensed under the MPL. See
http://www.rabbitmq.com/">>},{<<"platform">>,longstr,<<"Erlang/OTP 20.3">>},{<<"product">>,longstr,<<"RabbitMQ">>},{<<"version">>,longstr,<<"3.7.5">>}],none,false}
** Reason for termination ==
** "stopping because dependent process <0.3617.14> died: {shutdown,\n {gen_server,call,\n [<0.3629.14>,\n {subscribe,\n {'basic.consume',0,\n <<\"central\">>,<<>>,\n false,false,false,\n false,[]},\n <0.3617.14>},\n 60000]}}"
2019-04-23 00:20:49.703 [error] <0.3633.14> CRASH REPORT Process <0.3633.14> with 0 neighbours exited with reason: "stopping because dependent process <0.3617.14> died: {shutdown,\n {gen_server,call,\n [<0.3629.14>,\n {subscribe,\n {'basic.consume',0,\n <<\"central\">>,<<>>,\n false,fal..." in gen_server:handle_common_reply/8 line 726
2019-04-23 00:20:49.704 [error] <0.3631.14> Supervisor {<0.3631.14>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3632.14>, {amqp_params_network,<<"CENUSER">>,<<"*******">>,<<"/">>,"CEN2",5672,2047,...}) at <0.3633.14> exit with reason "stopping because dependent process <0.3617.14> died: {shutdown,\n {gen_server,call,\n [<0.3629.14>,\n {subscribe,\n {'basic.consume',0,\n <<\"central\">>,<<>>,\n false,fal..." in context child_terminated
2019-04-23 00:20:49.704 [error] <0.3631.14> Supervisor {<0.3631.14>,amqp_connection_sup} had child connection started with amqp_gen_connection:start_link(<0.3632.14>, {amqp_params_network,<<"CENUSER">>,<<"*******">>,<<"/">>,"CEN2",5672,2047,...}) at <0.3633.14> exit with reason reached_max_restart_intensity in context shutdown