New feature flag prevents upgrade when using official Docker image

2,090 views
Skip to first unread message

Jordan Snodgrass

unread,
Feb 18, 2021, 9:19:13 PM2/18/21
to rabbitmq-users
Hi, I run a Rabbit cluster using the official Docker image: https://hub.docker.com/_/rabbitmq/

When attempting to upgrade from v3.8.9 to v3.8.12, the new node fails joining the existing cluster due to the new feature flag for user_limits.

In the past, our upgrade process would go like so:
  1. Spin up a new node using the new version of the Docker image, like so: 
    docker run -d --name rabbitmq --hostname $HOSTNAME --restart=always -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 15692:15692 -p 25672:25672 -p 35672:35682 -e RABBITMQ_ERLANG_COOKIE='${erlang_cookie}' -v /root/data:/var/lib/rabbitmq -v /root/conf/:/etc/rabbitmq -v /root/bin:/tmp/bin rabbitmq:${rabbitmq_version}-management
  2. Run a script that makes the new node join the existing cluster
  3. Wait for cluster to go green with the new node
  4. Shut down one of the old nodes
  5. Repeat until all nodes are on the new version
This process served us well until v3.8.10 where a new feature flag for user_limits was introduced.  Step 2 now fails with Error: incompatible_feature_flags.  This is because the Docker images are immutable per image. You run the 3.8.9 image and you're stuck with it, so in-place upgrades aren't really an option. 

We were able to find a workaround by going into the running container and manually doing an in-place upgrade by emulating the Dockerfile (https://github.com/docker-library/rabbitmq/blob/master/3.8/ubuntu/Dockerfile) like so:

# SSH to the instance and bash into the container:
ssh -i ~/.ssh/<key> ec2-user@<ip>
docker exec -ti rabbitmq bash

# Install some tools:
apt-get update; apt-get install --no-install-recommends wget xz-utils

# Download RabbitMQ
wget --no-check-certificate --progress dot:giga --output-document "/usr/local/src/rabbitmq-3.8.12.tar.xz" "https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.8.12/rabbitmq-server-generic-unix-latest-toolchain-3.8.12.tar.xz";

# Put the node into maintenance mode:
rabbitmq-upgrade drain

# Stop RabbitMQ Server
# NOTE: This kicks us out of the container, so we have to log back in.
rabbitmqctl stop
docker exec -ti rabbitmq bash

# Install:
tar --extract --file "/usr/local/src/rabbitmq-3.8.12.tar.xz" --directory "/opt/rabbitmq" --strip-components 1; chown -R rabbitmq:rabbitmq "/opt/rabbitmq";

# Configure:
export RABBITMQ_VERSION=3.8.12; sed -i 's/^SYS_PREFIX=.*$/SYS_PREFIX=/' "/opt/rabbitmq/sbin/rabbitmq-defaults";

# Start RabbitMQ:
rabbitmqctl start_app

# Verify:
rabbitmqctl version
rabbitmqctl cluster_status

# Exit maintenance mode:
rabbitmq-upgrade revive

# Re-balance queues (may need to run this a couple times)
rabbitmq-upgrade post_upgrade
rabbitmq-queues rebalance "all"

This worked fine. Once all existing nodes were manually updated, we were able to go back to the old method and spin up new nodes using Docker. However, it feels dirty and goes against everything that running in Docker us supposed to provide. It's tedious and error prone, and our images are not longer "immutable."

I'm wondering if it would be possible to add a feature to start a new node with a feature flag disabled by default. This would allow a node running a newer version to join an older cluster. Once all nodes are updated (and only then), the new feature flag could be enabled.

Or, I'm wondering if there's some other magic to upgrading when using Docker for all of our nodes. Any help would be appreciated.  Thanks!

Diana Parra Corbacho

unread,
Feb 19, 2021, 4:55:25 AM2/19/21
to rabbitm...@googlegroups.com

Hi Jordan,

 

There is a way to start a new and unclustered node with all feature flags disabled, by setting the RABBITMQ_FEATURE_FLAGS environment variable to an empty string.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b7fc21e0-1c69-49aa-b7e6-7f668dc2f674n%40googlegroups.com.

Jordan Snodgrass

unread,
Feb 22, 2021, 5:37:29 PM2/22/21
to rabbitmq-users
Thank you Diana, that is exactly what I was looking for!  Going forward, I will use this flag to start up new instances with only explicitly enabled feature flags, like so:

docker run -d --name rabbitmq --hostname $HOSTNAME --restart=always -p 4369:4369 -p 5672:5672 -p 15672:15672 -p 15692:15692 -p 25672:25672 -p 35672:35682 -e RABBITMQ_ERLANG_COOKIE='${erlang_cookie}' -e RABBITMQ_FEATURE_FLAGS=drop_unroutable_metric,empty_basic_get_metric,implicit_default_bindings,maintenance_mode_status,quorum_queue,virtual_host_metadata -v /root/data:/var/lib/rabbitmq -v /root/conf/:/etc/rabbitmq -v /root/bin:/tmp/bin rabbitmq:${rabbitmq_version}-management

This will allow me to add a new v3.8.10 node to an existing v3.8.9 cluster, with the new user_limits feature flag disabled.  Then once the whole cluster is upgraded I'll be able to enable the new flag(s).  Is that right?

Also, I noticed that the documentation for RABBITMQ_FEATURE_FLAGS is only mentioned in passing at the bottom of https://www.rabbitmq.com/feature-flags.html (in the context of developing plugins). Perhaps this flag could be mentioned more prominently in the Feature Flag documentation? Also, is it possible to have a table of all feature flags and what versions/dates they became available?

Thanks!

-j

Diana Parra Corbacho

unread,
Mar 1, 2021, 9:33:16 AM3/1/21
to rabbitm...@googlegroups.com

Thanks for your feedback. The documentation has been just been updated, it’s available on https://www.rabbitmq.com/feature-flags.html#how-to-start-new-node-disabled-feature-flags

Reply all
Reply to author
Forward
0 new messages