Exchange federation links continuously starting/stopping

772 views
Skip to first unread message

Andrew Wright

unread,
Nov 14, 2017, 11:06:10 PM11/14/17
to rabbitmq-users
Hi all,

We are testing out exchange federation between a 3-node Rabbit cluster (behind a haproxy load balancer) and 3 standalone nodes. We have noticed slightly unexpected behaviour regarding the federation connections between all the nodes. That is, the federation user connects seemingly successfully, and then immediately closes the connection. Rabbit logs look like:

=INFO REPORT==== 15-Nov-2017::13:38:42 ===
accepting AMQP connection <0.11624.18> (<ip1>:7927 -> <ip2>:5671)

<other federation activity elided>

=INFO REPORT==== 15-Nov-2017::13:38:42 ===
connection <0.11624.18> (<ip1>:7927 -> <ip2>:5671): user 'federation' authenticated and granted access to vhost '/'

<other federation activity elided>

=INFO REPORT==== 15-Nov-2017::13:38:42 ===
closing AMQP connection <0.11624.18> (<ip1>:7927 -> <ip2>:5671, vhost: '/', user: 'federation')


There are many logs like this, every second (ie, lots of connections are being created and closed all the time), on all nodes - cluster and standalone. SASL logs are clean.

We're trying to federate all our exchanges, with a few exceptions - around 100 exchanges in all. Regex config below.

Interestingly, the admin UI indicates that federation is running succesfully - the Federation Status page shows all green on both the cluster and standalone nodes. The main issue this appears to cause is excessive CPU usage on the servers, likely due to establishing many SSL connections.

I guess our main question is - would this be considered normal? Our understanding was that federation links would be long-running, similar to our application connections. We're also assuming that federating 100 exchanges is a reasonable number?

Kind regards,
Andrew


Some more details in case they're useful.
- RabbitMQ v 3.6.11
- Erlang/OTP 20 [erts-9.0.2] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-threads:64] [hipe] [kernel-poll:true]
- SLES 12, running as a systemd service

Setup on the standalone nodes to federate upstream to the cluster:
rabbitmqctl set_policy federation-policy '^((?!(monitoring.*))(?!(federation.*))(?!(amq.*))((?!dead-letter).)*$)' '{"federation-upstream":"mqcluster"}' --apply-to exchanges

rabbitmqctl set_parameter federation-upstream mqcluster '{"ack-mode":"on-confirm","trust-user-id":false,"uri":"amqps://user@pass@hostname:5671?cacertfile=/etc/rabbitmq/ssl/rabbit.cacrt&certfile=/etc/rabbitmq/ssl/rabbit.crt&keyfile=/etc/rabbitmq/ssl/rabbit.key&verify=verify_peer&fail_if_no_peer_cert=true"}'

Setup on the cluster to federate to the 3x individual nodes:
rabbitmqctl set_policy federation-policy '^((?!(monitoring.*))(?!(federation.*))(?!(amq.*))((?!dead-letter).)*$)' '{"federation-upstream-set":"all"}' --apply-to exchanges

(and then we configure the three standalone rabbit nodes as federation-upstreams)

Andrew Wright

unread,
Nov 15, 2017, 4:23:29 AM11/15/17
to rabbitmq-users
Just an additional clarification - the exchange links are bi-directional between each standalone node and the cluster. The use case is a migration from the standalone nodes to the cluster. We'd like apps to be able to connect to either during the migration, and still receive messages published to the other 'set' (either the cluster, or one of the three standalone nodes).

Not sure if that complicates matters. We noticed the 'reconnect-delay' option but weren't sure it would have an effect.

At the time of setting these tests up, there are no messages flowing onto any queues - all are empty.

Cheers,
Andrew

Andrew Wright

unread,
Nov 17, 2017, 5:53:17 AM11/17/17
to rabbitmq-users
Hi again,

After a bit more test reduction, it seems that connections are being established once every 30 seconds, per federated exchange link. This happens regardless of reconnect_delay/heartbeat/connection_timeout parameters, and activity on the exchange (or lack thereof).

The only obvious reference to a 30 second timeout I can see in the code is the 'internal_exchange_check_interval' parameter, defined in the federation plugin makefile and referenced in rabbit_federation_exchange_link.erl.

Unfortunately my erlang is somewhat scratchy :-/ Would it be correct to interpret this as as a hardcoded timeout that performs a polling check every 30s, for some reason?

I guess when we federate ~100 exchanges 6 ways, the SSL connection overhead of these 30s checks becomes significant on our VMs. Maybe time for beefier hardware, or an alternative approach.

Cheers,
Andrew



On Wednesday, 15 November 2017 15:06:10 UTC+11, Andrew Wright wrote:

Michael Klishin

unread,
Nov 17, 2017, 3:03:57 PM11/17/17
to rabbitm...@googlegroups.com
You are on the right track. That interval is used
a one-off short lived connection.

Is this a sufficient answer?

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Andrew Wright

unread,
Nov 19, 2017, 5:26:38 AM11/19/17
to rabbitmq-users
Hi Michael - thanks for that info. I guess our only other question would be - would there be value in making that interval a tuneable parameter? (Even via some eval() magic?) I'm sure there would be complications to doing so that I'm unaware of :-)

Cheers,
Andrew


On Saturday, 18 November 2017 07:03:57 UTC+11, Michael Klishin wrote:
You are on the right track. That interval is used
a one-off short lived connection.

Is this a sufficient answer?
On Fri, Nov 17, 2017 at 10:53 AM, Andrew Wright <atwr...@mac.com> wrote:
Hi again,

After a bit more test reduction, it seems that connections are being established once every 30 seconds, per federated exchange link. This happens regardless of reconnect_delay/heartbeat/connection_timeout parameters, and activity on the exchange (or lack thereof).

The only obvious reference to a 30 second timeout I can see in the code is the 'internal_exchange_check_interval' parameter, defined in the federation plugin makefile and referenced in rabbit_federation_exchange_link.erl.

Unfortunately my erlang is somewhat scratchy :-/ Would it be correct to interpret this as as a hardcoded timeout that performs a polling check every 30s, for some reason?

I guess when we federate ~100 exchanges 6 ways, the SSL connection overhead of these 30s checks becomes significant on our VMs. Maybe time for beefier hardware, or an alternative approach.

Cheers,
Andrew



On Wednesday, 15 November 2017 15:06:10 UTC+11, Andrew Wright wrote:
Hi all,

We are testing out exchange federation between a 3-node Rabbit cluster (behind a haproxy load balancer) and 3 standalone nodes. We have noticed slightly unexpected behaviour regarding the federation connections between all the nodes. That is, the federation user connects seemingly successfully, and then immediately closes the connection. Rabbit logs look like:

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Nov 20, 2017, 11:21:50 AM11/20/17
to rabbitm...@googlegroups.com

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Andrew Wright

unread,
Nov 21, 2017, 4:25:38 AM11/21/17
to rabbitmq-users
Aha! Perfect, thanks very much for that. We'll check it out and report back.

Would you be interested in a documentation update for the federation reference page? I'd be happy to submit a first draft, unless you'd rather keep it as an internal-only flag for the moment.

Cheers,
Andrew

Michael Klishin

unread,
Nov 28, 2017, 6:57:48 PM11/28/17
to rabbitm...@googlegroups.com
Sure, doc PRs are very welcome. Please branch off `live`, see the README at https://github.com/rabbitmq/rabbitmq-website.

Thank you!

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages