Hi all! I could really use some help with this RabbitMQ federation timeout issue!
I'm using RabbitMQ broker servers that another individual originally set up. He had an exchange being federated and it is working just fine. I have added a new exchange and set up bi-directional federation between the two brokers. Here are my observables:
1) Every 4 hours and 21 minutes (~15660 seconds), the federation stops for about 20 minutes. The messages are durable so they get backed up at the upstream broker, before all "flooding" in at the end of that 20 minute outage.
2) When using the 'rabbitmqctl eval 'rabbit_federation_status:status().' command, the federation status stays "running" the entire time, and the timestamp does not update until the end of one of those outage periods, at which time the timestamp updates and stays the same for another 4 hours and 21 minutes.
3) Within the rabbitmq log, the following appears at both the beginning of the outage period and again at the end of the outage period:
=ERROR REPORT==== 27-Jun-2016::09:19:41 ===
** Generic server <0.8647.80> terminating
** Last message in was heartbeat_timeout
** When Server state == {state,amqp_network_connection,
<<"client [IP]:53556 -> [IP]:5672">>,
580,<0.8660.80>,131072,<0.8648.80>,undefined,
{amqp_params_network,<<"federation_user">>,
<<"##########">>,<<"/">>,
"[UPSTREAM]",5672,0,0,0,infinity,
[#Fun<amqp_uri.9.9354953>,
#Fun<amqp_uri.9.9354953>],
[{<<"capabilities">>,table,
[{<<"publisher_confirms">>,bool,true},
{<<"exchange_exchange_bindings">>,bool,true},
{<<"basic.nack">>,bool,true},
{<<"consumer_cancel_notify">>,bool,true},
{<<"connection.blocked">>,bool,true},
{<<"consumer_priorities">>,bool,true},
{<<"authentication_failure_close">>,bool,true},
{<<"per_consumer_qos">>,bool,true}]},
{<<"cluster_name">>,longstr,<<"\"[CLUSTER]\"">>},
{<<"copyright">>,longstr,
<<"Copyright (C) 2007-2014 GoPivotal, Inc.">>},
{<<"information">>,longstr,
{<<"platform">>,longstr,<<"Erlang/OTP">>},
{<<"product">>,longstr,<<"RabbitMQ">>},
{<<"version">>,longstr,<<"3.5.3">>}],
** Reason for termination ==
=ERROR REPORT==== 27-Jun-2016::09:19:41 ===
** Generic server <0.8633.80> terminating
** Last message in was {'DOWN',#Ref<0.0.428.138280>,process,<0.8663.80>,
** When Server state == {state,
[<<"amqp://federation_user:[##########]@[UPSTREAM]">>,
<<"amqp://federation_user:[##########]@]@[UPSTREAM 2]">>],
<<"[EXCHANGE]">>,<<"[EXCHANGE]">>,1000,1,1,3600000,none,
false,'on-confirm',none,<<"upstream-man">>},
<<"amqp://federation_user:[##########]@]@[UPSTREAM 2]">>,
{amqp_params_network,<<"federation_user">>,
<<"[##########]">>,<<"/">>,"[UPSTREAM 2]",
undefined,0,0,0,infinity,none,
[#Fun<amqp_uri.9.9354953>,#Fun<amqp_uri.9.9354953>],
{resource,<<"/">>,exchange,<<"[EXCHANGE]">>},
topic,true,false,false,[],
[{{<<"upstream-man">>,<<"[EXCHANGE]">>},<<"A">>}]}],
{name,<<"federation-scs">>},
{pattern,<<"^[EXCHANGE]$">>},
{'apply-to',<<"exchanges">>},
[{<<"federation-upstream">>,<<"upstream-man">>}]},
{[],[rabbit_federation_exchange]}},
{<<"exchange">>,longstr,<<"[EXCHANGE]">>}]},
<<"\"man\"">>,<0.8647.80>,<0.8663.80>,
<<"amq.ctag-Lm4wCtvhIudSdFc3hh5m1Q">>,
<<"federation: [EXCHANGE] -> [CLUSTER]">>,
<<"federation: [EXCHANGE] -> [CLUSTER B]">>,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[],[],[],[],[],[],[],[],[],[],
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[{resource,<<"/">>,queue,
[],[],[],[],[],[],[],[],[],[]}}}]],
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[{resource,<<"/">>,queue,
[],[],[],[],[],[],[],[],[]}}}]],
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
[{resource,<<"/">>,queue,
{resource,<<"/">>,exchange,<<"[EXCHANGE]">>},
** Reason for termination ==
** {upstream_channel_down,killed}
=INFO REPORT==== 27-Jun-2016::09:19:45 ===
Federation exchange '[EXCHANGE]' in vhost '/' connected to exchange '[EXCHANGE]' in vhost '/' on amqp://[UPSTREAM]
4) Note that I've tried to set up the federation parameter WITH and WITHOUT the heartbeat in the uri's, with no difference in functionality observed
5) If I change the federation policy or parameter, it resets the 4 hour countdown.
6) The good exchange with proper federation does not appear in the output of the 'rabbitmqctl eval 'rabbit_federation_status:status().' command.