Quorum queues and bouncing nodes.

275 views
Skip to first unread message

Michael Day

unread,
May 5, 2021, 2:34:49 PM5/5/21
to rabbitmq-users
Hello,

I'm running rabbitmq on AWS Fargate (which is a container orchestration platform) it reasonably similar to k8s with fewer bells and whistle.
I have defined a service which should have 3 nodes in it. When a node dies Fargate will start another one. 
I am using quorum queues which each have 2 replicas.
I am using the rabbitmq-management container from dockerhub.
When a new node starts it would be good if it would automatically grow all the queues onto itself, rebalance the queues and also forget the node which must have dies to cause this one to stand up.
I'm wondering how best to address this.
A sidecar container which waits until the node is up and then runs the rabbitmqctl commands?
Or is there some way of running the rabbitmqctl commands inside the server container after the node has started?


Michal Kuratczyk

unread,
May 5, 2021, 3:34:20 PM5/5/21
to rabbitm...@googlegroups.com
Hi,

First, quorum queues should be used with an odd number of replicas: https://www.rabbitmq.com/quorum-queues.html#quorum-requirements (the last sentence in this paragraph is not up to date - starting with 3.8.12 the default is 3 nodes; I've pushed a fix and it will be live soon).

As for your main question, I think there are a few things to consider:
1. How often does a Fargate node disappear?
2. If I understand correctly, a new node starts fresh - it's not a new incarnation of the old node but a completely new node. Does it have to be like this? When I kill a pod in Kubernetes, the restarted pod will keep the same persistent volume and therefore identity of the node. There is no need to grow the quorum queue in such a scenario.
3. Rebalancing is a good idea but keep in mind that it only affects who the leader is, it doesn't change cluster membership, so you need to rebalance after you grew the QQ clusters (if you can't avoid that as mentioned above). A node that is not a member of a quorum queue cluster will not be considered during rebalancing.
4. Rabbitmqctl commands can be executed inside a container. On Kubernetes that would be `kubectl exec POD -- rabbitmqctl ...`. I don't know about Fargate.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/73d05576-34a1-4201-83df-6c8622ea4fc9n%40googlegroups.com.


--
Michał
RabbitMQ team

Michael Day

unread,
May 5, 2021, 4:17:37 PM5/5/21
to rabbitmq-users
Thanks for the response.

Firstly, when I said 2 replicas I meant one master and two replicas, so 3 in total. It shows +2 in the web ui next to the queue name.

1. Fargate nodes do not disappear often, I have been running rabbitmq without one disappearing on Fargate since October of last year. But it can happen and I'd rather the cluster self healed.
2. Fargate nodes will have a new hostname and thus FQDN which becomes the rabbitmq nodename. I guess I could use EFS (Amazon's NFS) to mount the same volume back to a restarted node, but that means running 3 Fargate services, so as to set the 3 
RABBITMQ_NODENAME in the ENV which can only be different in a different service. This might turn out to be the best solution.
3. Yes the post_start.sh would need to: find the old dead node by piping rabbitmqctl cluster_status into awk, grep a couple of times and diffing them, run rabbitmqctl forget_cluster_node for the dead node, run rabbitmq-queues grow all on the new node and then rebalance.
4. There's no equivalent exec command for Fargate (that I've found yet). I could install an ssh daemon on the container and then shell in, although I'd rather not expose more ports than necessary. But I was more wondering about automated commands which run after the rabbitmq server is up, so the growing of the queues and the cleaning up of dead nodes happens as part of the startup sequence, clearly after the rabbitmq node is up. A post_start.sh I don't think I can do that without modifying your docker image significantly and then needing to maintain my mutant version, which I'd rather avoid.

I have been considering abusing the Fargate (Docker) healthcheck to do this. It doesn't seem to harm the cluster to grow queues onto a node which already has them, but the cleaning up of dead nodes seems more tricky to ensure we don't accidentally kill all the nodes when unexpected output occurs from the part of the script which is working out which nodes to forget.

I have run ElasticSearch in a similar configuration as to the suggestion in 2. (3 services running 1 node each with EFS permanent storage). I think this is likely to prove the least kludgey.

Thank-you very much for your advice.

-- 
M

Michal Kuratczyk

unread,
May 5, 2021, 4:39:03 PM5/5/21
to rabbitm...@googlegroups.com
As you pointed out, forgetting nodes is tricky, even when you know your environment and what to expect more or less. It's a human decision that a given node replaces a node that no longer exists. Some well defined use cases could be automated - eg. in the Kubernetes Operator we will probably have the Operator run forget_node on the nodes after an intentional scale down (if the human operator decided to scale down the cluster from 5 to 3, then it's pretty clear that nodes 4 and 5 can be forgotten) - but even that is not yet implemented because we weren't sure whether this should just happen unconditionally without any delay.



--
Michał
RabbitMQ team

Michael Day

unread,
May 5, 2021, 4:42:03 PM5/5/21
to rabbitmq-users

Michael Day

unread,
May 7, 2021, 1:54:45 PM5/7/21
to rabbitmq-users
So the plan of using a Fargate service per node and thus setting the RABBITMQ_NODENAME to a known fixed value for each node and persisting the data on EFS has run into a tricky issue.

A Fargate service can create a service discovery record for the task it's bringing up (service = deployment, task = pod for k8s folk) but that obviously takes some time to be set up (it's DNS) and the rabbitmq node exits with

ERROR: epmd error for host a.rabbitmq.test: nxdomain (non-existing domain)

NB rabbitmq.test does exist so it's failing on the specific record we asked the service to create on startup

I'm considering sticking a sleep 120 in the entrypoint.sh before starting the server, which seems like a terrible kludge.

But if the part after the @ in RABBITMQ_NODENAME needs to be a resolvable address, which it seems to need to be to me, and it needs to be constant through restarts of the container, then I can't see another way to achieve this.

-- 
M

Michael Day

unread,
May 7, 2021, 2:20:37 PM5/7/21
to rabbitmq-users
The other solution I've come up with is using ECS on EC2 using the host networking (rather than awsvpc) which means the nodes would always have the hostname of the underlying EC2 instance.

This would work until you need to cycle the underlying EC2 instance, whereapon you end up in the same situation as Fargate as the new EC2 instance would have a new name.

I think it will be possible to use the 'user data' section of the EC2 Launch Template to set the hostname of the EC2 instance it launches to a fixed value.

I will have a go at this and report back on my success or lack of it.

-- 
M

Reply all
Reply to author
Forward
0 new messages