Question about queue master migration

183 views
Skip to first unread message

João Nuno Silva

unread,
Jan 16, 2019, 10:53:50 AM1/16/19
to rabbitmq-users
Hi,

I'm trying to replicate a long cluster downtime seen during master migration caused by a scheduled rolling restart of the cluster. In this case, all queue masters were in the same rabbit node*.

Locally, while trying to replicate this behavior, I'm noticing that master migration time seems to be proportional, worse than linearly, to the number of queues.

Can you confirm if this behavior is expected or if it may be a problem?

If I try this with just one queue it takes a couple milliseconds but with 1000 empty durable queues already takes more than 2 seconds.

The way I'm measuring the master migration is by publishing a message every second and showing the time it took to reach the subscriber (in the same process as the publisher).

I'm stopping the master node with `rabbitmqctl stop_app`. All queues are synchronized when I issue this command.

Using rabbitmq 3.7.9 and java driver 4.0.0.

Thanks!

* This is caused by the min-masters policy unfortunately not being honored when migrating master, only on initial queue declaration.

Michael Klishin

unread,
Jan 16, 2019, 10:05:53 PM1/16/19
to rabbitm...@googlegroups.com
I'm not sure what your question is about.

If you shut down a node, all queue masters on it will migrate if allowed to. If you shut down all but one node,
they will all end up on that node. In some cases this is expected and what you want, in others it is not.

Migration of one queue master is not really  related to migration of other queue masters. I don't see anything
that would obviously do a scan over all queues, although some operations
with classic mirrored queues are linear w.r.t. the number of mirrors. Next master election from a list of eligible nodes is
not it, though.

Publishing messages to a queue that's undergoing a promotion is probably not a great benchmarking strategy
as so many operations can fail and be retried or await new master election. This applies to both publishers and consumers.
You should be able to notice master promotion events in the log with millisecond precision.

Somewhat related: when a queue is declared, master locators can do a lot more work than others,
including something that's a linear or worse algorithm [1].


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

João Nuno Silva

unread,
Jan 17, 2019, 4:54:45 AM1/17/19
to rabbitm...@googlegroups.com
Thank you for the detailed comments. Re-reading my question I see that it's confusing because I omitted some important details. Let me try to clarify the question.

Motivation
Assuming I have a N nodes cluster with ha=all (for simplicity we can consider N=2, nodes A and B). I want to perform a rolling restart of all machines to apply an OS security patch. I want to do this while minimizing perceived downtime from the pov of publishers/subscribers.

Context
I created 1 + 1000 queues. The 1 is the queue under test and the 1000 are empty queues which are not receiving messages. Node A is the master of all these queues. I already restarted node B and applied the security patch (stop_app, service rabbit stop, reboot machine and apply patch, service rabbit start, start_app).
Node B is fully replicated but is not the master of any queue. At this point I want to apply the patch in node A. When I do a stop_app in node A, node B will become the master of all queues.

Problem
This process is taking time proportional to the number of queues.

Scenario 1) If I just have 1 queue (the queue under test), this master migration is immediate. The measured latency between publish and subscribe is just a couple of milliseconds.

Scenario 2) If I have the other 1000 queues (although these are not receiving messages), the latency increases to about two seconds.

Note that the publish and subscriber configuration is the same in both scenarios. Using an AutorecoveringConnection with networkRecoveryInterval set to 100ms and the publish rate is 1 msg/s.
I agree that this is not a perfect benchmarking setup, but it consistently reproduces the offending behavior we're seeing in production.

Questions
1) Is is expected that master migration time is proportional to the number of queues? If not, should I file a bug? Give you the example code I'm using to repro this?
2) Is there a better way to perform these maintenance operations without downtime?


You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/0OVgPKKY1jU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-user...@googlegroups.com.

João Nuno Silva

unread,
Jan 25, 2019, 6:34:35 AM1/25/19
to rabbitmq-users
FYI, I created issue https://github.com/rabbitmq/rabbitmq-server/issues/1847 with a video illustrating the problem.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
MK

Staff Software Engineer, Pivotal/RabbitMQ

--
You received this message because you are subscribed to a topic in the Google Groups "rabbitmq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/rabbitmq-users/0OVgPKKY1jU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

Michael Klishin

unread,
Jan 25, 2019, 7:00:37 AM1/25/19
to rabbitmq-users
Our team does not use GitHub issues for discussions or investigations. Consider posting the video here.

Michael Klishin

unread,
Jan 25, 2019, 7:14:01 AM1/25/19
to rabbitmq-users
A queue is a stateful entity that reacts to cluster membership changes.

Promoting a different replica takes time, even if it's milliseconds for a replica that is in sync or an empty queue.
The process isn't sequential (at least I cannot immediately think of a part where it would be) but it modifies shared cluster state,
which involves acquiring write locks in the internal data store.
So yes, it is generally expected that the more queues there are, the more work will happen on other nodes when one node leaves.

Two data points is not enough to conclude as to whether this is a linear relationship.

A better question to ask here is: is this migration necessary? Can it be avoided entirely? Is there a way to prevent queues from migrating?

The answer is: there should be a way but currently there's only a workaround that involves disabling mirroring or messing with mirroring
setups so that the node undergoing maintenance next won't have any queue masters. Some examples definitely have been
discussed on this list before. [3] mentions some relevant settings (you can tell a node that it's replica set involves specific nodes,
explicitly moving it away from the node that's about to be shut down).

There is no consensus on what alternative
would work better. Quorum queues will have a different leader election and sync implementation which is a lot less drastic and in some
ways, significantly more efficient. [1] covers this in detail.

We believe that Blue/Green deployments is generally the best way to do upgrades but automating is often non-trivial.
It would make this particular issue mostly irrelevant since your apps will gradually migrate to a different cluster and
won't be affected nearly as much.

João Nuno Silva

unread,
Jan 25, 2019, 7:17:42 AM1/25/19
to rabbitmq-users
The video exceeds the size limit. The link to the video is: https://github.com/rabbitmq/rabbitmq-server/files/2795765/rmq-failover-latency.mp4.gz

João Nuno Silva

unread,
Jan 25, 2019, 7:45:53 AM1/25/19
to rabbitm...@googlegroups.com
Inline

On Fri, Jan 25, 2019 at 12:14 PM Michael Klishin <mkli...@pivotal.io> wrote:
A queue is a stateful entity that reacts to cluster membership changes.

Promoting a different replica takes time, even if it's milliseconds for a replica that is in sync or an empty queue.
The process isn't sequential (at least I cannot immediately think of a part where it would be) but it modifies shared cluster state,
which involves acquiring write locks in the internal data store.
So yes, it is generally expected that the more queues there are, the more work will happen on other nodes when one node leaves.

Two data points is not enough to conclude as to whether this is a linear relationship.

It's already 3 data points:
  • 1 queue: ~2ms
  • 1000 queues: ~2s
  • 5000 queues: ~10s 

A better question to ask here is: is this migration necessary? Can it be avoided entirely? Is there a way to prevent queues from migrating?

:) That was exactly my question "Is there a better way to perform these maintenance operations without downtime?"
 

The answer is: there should be a way but currently there's only a workaround that involves disabling mirroring or messing with mirroring
setups so that the node undergoing maintenance next won't have any queue masters. Some examples definitely have been
discussed on this list before. [3] mentions some relevant settings (you can tell a node that it's replica set involves specific nodes,
explicitly moving it away from the node that's about to be shut down).

We already tried removing the node from the ha policy (by setting the mode to nodes and excluding this node) but unfortunately this also causes downtime.
 

There is no consensus on what alternative
would work better. Quorum queues will have a different leader election and sync implementation which is a lot less drastic and in some
ways, significantly more efficient. [1] covers this in detail.

Thanks, will keep an eye on this.
 

We believe that Blue/Green deployments is generally the best way to do upgrades but automating is often non-trivial.
It would make this particular issue mostly irrelevant since your apps will gradually migrate to a different cluster and
won't be affected nearly as much.

Thank you for the link. We will learn more about the blue/green procedure and make some tests, but it seems contrived indeed. Even if this works well during a planned restart, the long downtime will still affect us in case of node failure. This combined with the queue locator min-masters not being honored during master migration makes this problem even worse.
 


On Thursday, January 17, 2019 at 12:54:45 PM UTC+3, João Nuno Silva wrote:
Thank you for the detailed comments. Re-reading my question I see that it's confusing because I omitted some important details. Let me try to clarify the question.

Motivation
Assuming I have a N nodes cluster with ha=all (for simplicity we can consider N=2, nodes A and B). I want to perform a rolling restart of all machines to apply an OS security patch. I want to do this while minimizing perceived downtime from the pov of publishers/subscribers.

Context
I created 1 + 1000 queues. The 1 is the queue under test and the 1000 are empty queues which are not receiving messages. Node A is the master of all these queues. I already restarted node B and applied the security patch (stop_app, service rabbit stop, reboot machine and apply patch, service rabbit start, start_app).
Node B is fully replicated but is not the master of any queue. At this point I want to apply the patch in node A. When I do a stop_app in node A, node B will become the master of all queues.

Problem
This process is taking time proportional to the number of queues.

Scenario 1) If I just have 1 queue (the queue under test), this master migration is immediate. The measured latency between publish and subscribe is just a couple of milliseconds.

Scenario 2) If I have the other 1000 queues (although these are not receiving messages), the latency increases to about two seconds.

Note that the publish and subscriber configuration is the same in both scenarios. Using an AutorecoveringConnection with networkRecoveryInterval set to 100ms and the publish rate is 1 msg/s.
I agree that this is not a perfect benchmarking setup, but it consistently reproduces the offending behavior we're seeing in production.

Questions
1) Is is expected that master migration time is proportional to the number of queues? If not, should I file a bug? Give you the example code I'm using to repro this?
2) Is there a better way to perform these maintenance operations without downtime?

Michael Klishin

unread,
Jan 25, 2019, 8:19:37 AM1/25/19
to rabbitmq-users
The Blue/Green deployment upgrade strategy has been used successfully for close to two years now. It is not particularly involved,
just harder to automate for the general case because not in every system publishers or consumers can be migrated independently.
There is no strategy that has lower risk.

I cannot comment on the video as I don't really understand what the applications do.
I keep forgetting to mention this but there are operations in RabbitMQ that wait for a new master to be promoted (up to a certain channel operation timeout).

Queue master locators are generally orthogonal to how the next-in-line replica is selected for promotion. It would be interesting to see if
the min-masters idea can be applied there but it would only slow down the promotion process and make it harder to reason about since there would have to be
an extra round of consensus or a clusterwide operation (depending on the implementation details).

When 3.8.0-beta.2 comes out it would be interesting to see how this scenario works for quorum queues. They will elect a new leader which takes time but will transfer as
little data as possible for up-to-date followers.

On Friday, January 25, 2019 at 3:45:53 PM UTC+3, João Nuno Silva wrote:
Inline
Reply all
Reply to author
Forward
0 new messages