I have found a strange issue with federation links and cluster split recovery. The upstream stops working untill I recreate federation policy.
The problem occurs when I create a short network split on the downstream cluster. After autoheal finishes the federation link seems to break. The admin ui shows that all links are running but from the upstream side I can see that the exchange for the particular federation is missing the binding from the local exchange to the federation exchange (the from -> this exchange part is missing). The link starts working if I recreate the federation upstream policy in the downstream cluster. I am testing with 2 server clusters on both sides. The policy recreation is enough I do not need to reboot any of the servers.
Please refer to https://www.rabbitmq.com/heartbeats.html for more details as to heartbeat which can possibly help to resolve this issue. When the upstream connection is recreated the link applies the federation policy, this will in turn recreate the missing exchanges . You could also try to modify your upstream configuration, in this way:
amqp://user:user@yourserver?heartbeat=10
KH
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On Fri, Dec 23, 2016 at 6:21 AM, KH <khuf...@pivotal.io> wrote:
Tuuka,Please refer to https://www.rabbitmq.com/heartbeats.html for more details as to
heartbeatwhich can possibly help to resolve this issue. When theupstreamconnection is recreated the link applies the federation policy, this will in turn recreate the missing exchanges . You could also try to modify yourupstreamconfiguration, in this way:
amqp://user:user@yourserver?heartbeat=10
KH
On Thursday, December 22, 2016 at 4:04:12 AM UTC-6, Tuukka Lahtela wrote:I have found a strange issue with federation links and cluster split recovery. The upstream stops working untill I recreate federation policy.
The problem occurs when I create a short network split on the downstream cluster. After autoheal finishes the federation link seems to break. The admin ui shows that all links are running but from the upstream side I can see that the exchange for the particular federation is missing the binding from the local exchange to the federation exchange (the from -> this exchange part is missing). The link starts working if I recreate the federation upstream policy in the downstream cluster. I am testing with 2 server clusters on both sides. The policy recreation is enough I do not need to reboot any of the servers.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
I did some more investigating on
the this issue and I was able reproduce it quite easily. The link
breaks almost every time I cause a network partition which results in
autoheal. The link is restored if the federation
policy is recreated or if I restart the app. I can quite easily
reproduce the problem so if there is something that I could try let me
know.
I made a setup with two clusters both with two brokers. I tried to keep all settings as default as possible. RabbitMQ version used was 3.6.6 but the problem did happen with the milestone release of 3.6.7. The problem occurs at least in linux and windows environments.
The config is the same for all except for the ports and node names. I set the net tick times to short so that I get the partition to come quickly.
[
{rabbit,
[
{log_levels, [{connection, debug}, {channel, debug}, {federation, debug}, {mirroring, debug}]},
{tcp_listeners, [5672]},
{heartbeat, 5},
{cluster_partition_handling, autoheal},
{cluster_nodes, {['rabbitOne@brokerOne', 'rabbitOne@brokerOne'], disc}}
]},
{kernel, [ {net_ticktime, 5} ]},
{rabbitmq_management, [ {listener, [{port, 6672}, {ip, "127.0.0.1"}]}]}
].
Policies:
cluster message-ttl queues .* {"message-ttl":1000} 0
cluster federation exchanges heartbeats {"federation-upstream-set":"all"} 0
cluster mirror queues federation* {"ha-mode":"all"} 1
I only federated the heartbeats exchange to isolate any possible logging etc. We are mirroring the federated queues in our setup so I also used it here but I tried without it and I was able to get the similar results.
The federation configuration:
{
"uri": ["amqp://user:pass...@127.0.0.1:5674/cluster?heartbeat=10&connection_timeout=60", "amqp://user:pass...@127.0.0.1:5674/cluster?heartbeat=10&connection_timeout=60"],
"ack-mode": "no-ack",
"trust-user-id": true,
"message-ttl": 1000,
"max-hops": 3,
"expires": 10000
}
I made a very simple java app which sends and reads messages to the heartbeats exchange. The message is the pid of the java app so that I can detect when messages stop coming.
Broker details:
[{pid,7068},
{running_applications,
[{rabbitmq_federation,"RabbitMQ Federation","3.6.6"},
{rabbitmq_federation_management,"RabbitMQ Federation Management",
"3.6.6"},
{rabbitmq_management,"RabbitMQ Management Console","3.6.6"},
{rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.6"},
{rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.6"},
{rabbit,"RabbitMQ","3.6.6"},
{amqp_client,"RabbitMQ AMQP Client","3.6.6"},
{rabbit_common,[],"3.6.6"},
{webmachine,"webmachine","1.10.3"},
{mochiweb,"MochiMedia Web Server","2.13.1"},
{ssl,"Erlang/OTP SSL application","8.0.2"},
{public_key,"Public key infrastructure","1.2"},
{crypto,"CRYPTO","3.7.1"},
{os_mon,"CPO CXC 138 46","2.4.1"},
{ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
{compiler,"ERTS CXC 138 10","7.0.2"},
{syntax_tools,"Syntax tools","2.1"},
{xmerl,"XML parser","1.3.12"},
{inets,"INETS CXC 138 49","6.3.3"},
{asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},
{mnesia,"MNESIA CXC 138 12","4.14.1"},
{sasl,"SASL CXC 138 11","3.0.1"},
{stdlib,"ERTS CXC 138 10","3.1"},
{kernel,"ERTS CXC 138 10","5.1"}]},
{os,{win32,nt}},
{erlang_version,
"Erlang/OTP 19 [erts-8.1] [64-bit] [smp:4:4] [async-threads:64]\n"},
{memory,
[{total,53622272},
{connection_readers,0},
{connection_writers,5472},
{connection_channels,0},
{connection_other,77624},
{queue_procs,11024},
{queue_slave_procs,14000},
{plugins,1487760},
{other_proc,12923432},
{mnesia,90032},
{mgmt_db,1595840},
{msg_index,52128},
{other_ets,1584776},
{binary,216816},
{code,24979196},
{atom,1033401},
{other_system,9550771}]},
{alarms,[]},
{listeners,[{clustering,7672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,3390111744},
{disk_free_limit,50000000},
{disk_free,56448684032},
{file_descriptors,
[{total_limit,8092},
{total_used,3},
{sockets_limit,7280},
{sockets_used,1}]},
{processes,[{limit,1048576},{used,285}]},
{run_queue,0},
{uptime,3317},
{kernel,{net_ticktime,5}}]
I did some more investigating on the this issue and I was able reproduce it quite easily. The link breaks almost every time I cause a network partition which results in autoheal. The link is restored if the federation policy is recreated or if I restart the app. I can quite easily reproduce the problem so if there is something that I could try let me know.
I made a setup with two clusters both with two brokers. I tried to keep all settings as default as possible. RabbitMQ version used was 3.6.6 but the problem did happen with the milestone release of 3.6.7. The problem occurs at least in linux and windows environments.
The config is the same for all except for the ports and node names. I set the net tick times to short so that I get the partition to come quickly.
[
{rabbit,
[
{log_levels, [{connection, debug}, {channel, debug}, {federation, debug}, {mirroring, debug}]},
{tcp_listeners, [5672]},
{heartbeat, 5},
{cluster_partition_handling, autoheal},
{cluster_nodes, {['rabbitOne@brokerOne', 'rabbitOne@brokerOne'], disc}}
]},
{kernel, [ {net_ticktime, 5} ]},
{rabbitmq_management, [ {listener, [{port, 6672}, {ip, "127.0.0.1"}]}]}
].
Policies:
cluster message-ttl queues .* {"message-ttl":1000} 0
cluster federation exchanges heartbeats {"federation-upstream-set":"all"} 0
cluster mirror queues federation* {"ha-mode":"all"} 1
I only federated the heartbeats exchange to isolate any possible logging etc. We are mirroring the federated queues in our setup so I also used it here but I tried without it and I was able to get the similar results.
The federation configuration:
{
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
Tuuka, what do you mean under "upstream stops working"? Do you see the federation link on the Federation Status screen, or is it gone?
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
The upstream is visible in the status screen and it is showing the status to be ok.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
We were, at least some variation thereof. We improved logging and discovereda case where federation links do not get a chance to start after a partial partition.Without making federation links distributed across nodes, I'm not sure what canbe done about that in the general case.
On Mon, Jan 30, 2017 at 5:08 PM, Tuukka Lahtela <tuukka....@gmail.com> wrote:
HI!
Any update to this issue? Were you able to reproduce the problem?
- Tuukka
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
=INFO REPORT==== 14-Mar-2017::22:46:15 ===
rabbit on node 'rab...@node01-vl3464.prod.msgq.b0.p.fti.net' down
--
=INFO REPORT==== 14-Mar-2017::22:46:22 ===
rabbit on node 'rab...@node01-vl3464.prod.msgq.b0.p.fti.net' up
--
=INFO REPORT==== 14-Mar-2017::22:46:22 ===
rabbit on node 'rab...@node03-vl3464.prod.msgq.b0.p.fti.net' up
Link no longer can locate its "source" queue or exchange.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
=ERROR REPORT==== 14-Mar-2017::22:46:16 ===
** Generic server <0.18725.3908> terminating
** Last message in was {'DOWN',#Ref<0.0.9699332.120384>,process,
<0.17500.3908>,killed}
** When Server state == {state,
{upstream,
[<<"amqp://user1:XXXX@localhost/service_tram">>],
<<"tram.publish.content">>,
<<"modservices.tram.publish.document">>,1000,1,5,
none,none,false,'on-confirm',none,
<<"upstream-pfs_engen-service_tram">>,false},
{upstream_params,
<<"amqp://user1:XXXXX@localhost/service_tram">>,
[...]
** Reason for termination ==
** {upstream_channel_down,killed}
=ERROR REPORT==== 14-Mar-2017::22:46:16 ===
** Generic server <0.18213.718> terminating
** Last message in was {'DOWN',#Ref<0.0.11796483.245683>,process,
<0.17807.718>,killed}
** When Server state == {state,
{upstream,
[<<"amqp://user1:XXXXXX@localhost/pfs_engen">>],
<<"modservices.contentes.publish.document">>,
<<"content.publish.tram">>,1000,1,5,none,none,false,
'on-confirm',none,
<<"upstream-service_tram-pfs_engen">>,false},
{upstream_params,
<<"amqp://user1:XXXXX@localhost/pfs_engen">>,
[...]
** Reason for termination ==
** {downstream_channel_down,killed}=INFO REPORT==== 14-Mar-2017::22:46:16 ===
node 'rab...@node03-vl3464.prod.msgq.b0.p.fti.net' up
=INFO REPORT==== 14-Mar-2017::22:46:16 ===
Federation exchange 'modservices.tram.publish.document' in vhost 'pfs_engen' connected to exchange 'tram.publish.content' in vhost 'service_tram' on amqp://localhost/service_tramTo unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
HI!
Sounds good. I will certainly do some testing. We use suse versions but I can wait for the milestone release if it is coming already this week.
- Tuukka
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
Hi!
Sorry for the slow reply. I tried the latest rc release (3.6.11
RC2) which had the issue listed as one the fixed items. I am happy
to report that I was not able to reproduce the issue with the rc
version even though the issue reproduced quite consistently with
old versions. Can't wait to the get release version.
- Tuukka
--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
Hi!
Sorry for the slow reply. I tried the latest rc release (3.6.11 RC2) which had the issue listed as one the fixed items. I am happy to report that I was not able to reproduce the issue with the rc version even though the issue reproduced quite consistently with old versions. Can't wait to the get release version.
- Tuukka
On 27.07.2017 00:10, Michael Klishin wrote:
Hi Tuukka,--
Can you please try 3.6.11.M5? We have a couple of improvements in the federation pluginthat might be relevant for your case: https://groups.google.com/forum/#!topic/rabbitmq-users/9O52laLiPAQ.
Thank you.
On Monday, July 17, 2017 at 6:35:05 PM UTC+3, Diana Corbacho wrote:Thanks Tuukka. I'm using the Docker image that you provided, and I can consistently reproduce the issue.
I will continue the investigation now that we can reproduce it. Thanks again!
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
Hi!
Sorry for the slow reply. I tried the latest rc release (3.6.11 RC2) which had the issue listed as one the fixed items. I am happy to report that I was not able to reproduce the issue with the rc version even though the issue reproduced quite consistently with old versions. Can't wait to the get release version.
- Tuukka
On 27.07.2017 00:10, Michael Klishin wrote:
Hi Tuukka,--
Can you please try 3.6.11.M5? We have a couple of improvements in the federation pluginthat might be relevant for your case: https://groups.google.com/forum/#!topic/rabbitmq-users/9O52laLiPAQ.
Thank you.
On Monday, July 17, 2017 at 6:35:05 PM UTC+3, Diana Corbacho wrote:Thanks Tuukka. I'm using the Docker image that you provided, and I can consistently reproduce the issue.
I will continue the investigation now that we can reproduce it. Thanks again!
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
Thanks, Tuukka.3.6.11 GA is a few days away.
On Mon, Aug 7, 2017 at 3:18 PM, Tuukka Lahtela <tuukka....@gmail.com> wrote:
Hi!
Sorry for the slow reply. I tried the latest rc release (3.6.11 RC2) which had the issue listed as one the fixed items. I am happy to report that I was not able to reproduce the issue with the rc version even though the issue reproduced quite consistently with old versions. Can't wait to the get release version.
- Tuukka
On 27.07.2017 00:10, Michael Klishin wrote:
Hi Tuukka,--
Can you please try 3.6.11.M5? We have a couple of improvements in the federation pluginthat might be relevant for your case: https://groups.google.com/forum/#!topic/rabbitmq-users/9O52laLiPAQ.
Thank you.
On Monday, July 17, 2017 at 6:35:05 PM UTC+3, Diana Corbacho wrote:Thanks Tuukka. I'm using the Docker image that you provided, and I can consistently reproduce the issue.
I will continue the investigation now that we can reproduce it. Thanks again!
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Staff Software Engineer, Pivotal/RabbitMQ
--
Staff Software Engineer, Pivotal/RabbitMQ--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/70p-7Udujzo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.