ha-mode nodes breaks mirroring

410 views
Skip to first unread message

Devin Christensen

unread,
May 10, 2017, 2:35:04 PM5/10/17
to rabbitmq-users
There's a better format of this here: https://github.com/rabbitmq/rabbitmq-server/issues/1219

I'm using ha-mode nodes to shuffle queues around in my rabbitmq cluster to balance the load. I've found a couple scenarios that break. 

Rabbitmq Version: 3.6.9
Erlang Version: 18.2
Cluster size: 5

All tests start out with a single policy of:

```
ha-two queues .* {"ha-mode":"exactly","ha-params":2,"ha-sync-mode":"automatic"} 100
```

Each test then adds a policy in the form of:

```
test_policy full_queue_name {"ha-mode":"nodes","ha-params":[polciy_nodes],"ha-sync-mode":"automatic"} 200
```

| current nodes | policy nodes | resulting nodes | active policy |result |
| --- | --- |--- | --- | --- |
| 2, 4 | 1, 3 | 1, 4| ha-two (expected test_policy) | :skull_and_crossbones: |
| 2, 1 | 2, 3 | 2, 3| test_policy | :white_check_mark: |
| 2, 3 | 4, 3 | 3, 4| test_policy | :warning: |
| 3, 4 | 2, 1 | 2, 4| ha-two (expected test_policy) | :skull_and_crossbones: |

:skull_and_crossbones: did not apply the policy and ignores further policy changes
:warning: applied the policy but did not honor node order
:white_check_mark: result matched expectations

I also can pretty reliably get in situations where a policy remains applied to a queue even after the policy itself is deleted. I've found rolling restarts across the entire cluster the only way to recovery from this state.

Devin Christensen

unread,
May 10, 2017, 2:42:22 PM5/10/17
to rabbitmq-users
Command used to apply the test policy:

```
sudo rabbitmqctl set_policy test_policy "full_queue_name" '{"ha-mode":"nodes","ha-params":["node1","node2"],"ha-sync-mode":"automatic"}' --priority 200 --apply-to queues
```

Michael Klishin

unread,
May 10, 2017, 2:44:10 PM5/10/17
to rabbitm...@googlegroups.com
Please post a transcript of the shell commands used as well as node logs from the time you begin running them.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
May 10, 2017, 2:45:51 PM5/10/17
to rabbitm...@googlegroups.com
Also `rabbitmqctl list_policies` and `rabbitmqctl list_queues` output after each `set_policy` run would help a lot.

To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Michael Klishin

unread,
May 10, 2017, 2:50:00 PM5/10/17
to rabbitm...@googlegroups.com
The title claims that it "breaks mirroring". I assume the "breaks" part comes down
to the mirror location and/or effective policy, not that mirroring stops or there are any errors
in the log?

Yeah, we are pretty big on being as specific as possible on this list.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Devin Christensen

unread,
May 10, 2017, 3:52:52 PM5/10/17
to rabbitmq-users
Removing or adding policies become ineffective on a queue that has entered into the broken state. Sometimes rabbitmq will choose the same nodes for a primary and replica, even though only one of those nodes appears in the policy. These queues also become inaccessible to clients due to vhost timeouts: "MarchHare::NotFound: NOT_FOUND - failed to perform operation on queue 'ballista.abacus.category.updated' in vhost '/' due to timeout". I have found no way to correct the issue other than restarting the entire cluster.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Devin Christensen

unread,
May 10, 2017, 3:56:02 PM5/10/17
to rabbitmq-users
"even though only one of those nodes appears in the policy" should read "even though the node appears exactly once (along with a different node) in the policy"

Michael Klishin

unread,
May 10, 2017, 4:04:28 PM5/10/17
to rabbitm...@googlegroups.com
Can you please share server logs and a shell operation transcript?

To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Diana Corbacho

unread,
May 12, 2017, 9:52:59 AM5/12/17
to rabbitmq-users
Hi Devin,

Could you please provide the information that Michael requested? I tried to reproduce what you described applying those two policies, and using the right priorities everything works as expected. 
Reply all
Reply to author
Forward
0 new messages