Update Not Working

225 views

Skip to first unread message

Shuaib Hussain

unread,

Oct 11, 2021, 8:17:04 AM10/11/21

to rabbitmq-users

Hi,

I am seeking to update my RabbitMQ cluster in a production system from 3.8.7-management to 3.9.7-management.

Currently, I have managed to use the following policy:

updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2

This means only the third RabbitMQ node is updated, which is followed by the second and then finally the first. The following logs represent the logs of the first node which is using the 3.8.15-management image:

2021-10-11 10:33:44.289 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up

2021-10-11 10:33:48.777 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up

2021-10-11 10:49:45.864 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' down

2021-10-11 10:49:45.869 [info] <0.525.0> Keeping rab...@message-bus-2.message-bus.default.svc.cluster.local listeners: the node is already back

2021-10-11 10:49:45.932 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' down: connection_closed

2021-10-11 10:49:45.984 [info] <0.728.0> Mirrored queue 'afs.assetnotificationworker' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.728.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.739.0> to be down

2021-10-11 10:49:45.986 [info] <0.772.0> Mirrored queue 'channelanalyticsservice.channeleventsynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.772.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.768.0> to be down

2021-10-11 10:49:45.986 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.696.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.691.0> to be down

2021-10-11 10:49:45.988 [info] <0.748.0> Mirrored queue 'ars-asset-registrations' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.748.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.743.0> to be down

2021-10-11 10:49:45.988 [info] <0.700.0> Mirrored queue 'ams.AssetRequestListener' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.700.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.695.0> to be down

2021-10-11 10:49:45.988 [info] <0.672.0> Mirrored queue 'channeleventservice.workflowinstancesynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.672.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.671.0> to be down

2021-10-11 10:49:45.992 [info] <0.760.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.760.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.759.0> to be down

2021-10-11 10:49:45.993 [info] <0.720.0> Mirrored queue 'channeleventvalidationservice.channeleventsynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.720.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.715.0> to be down

2021-10-11 10:49:45.997 [info] <0.768.0> Mirrored queue 'user-system-partition-cache' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.768.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.763.0> to be down

2021-10-11 10:49:46.005 [info] <0.784.0> Mirrored queue 'channeleventservice.assetusageregistrationstatusynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.784.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.788.0> to be down

2021-10-11 10:49:46.007 [info] <0.752.0> Mirrored queue 'wfs-notificationinterface' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.752.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.749.0> to be down

2021-10-11 10:50:06.408 [info] <0.1059.0> k8s endpoint listing returned nodes not yet ready: message-bus-2

2021-10-11 10:50:06.408 [warning] <0.1059.0> Peer discovery: node rab...@message-bus-2.message-bus.default.svc.cluster.local is unreachable

2021-10-11 10:50:09.985 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up

2021-10-11 10:50:14.811 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up

The following logs represent the logs of the second node which is using the 3.8.15-management image:

021-10-11 10:50:14.929 [info] <0.10358.0> Mirrored queue 'channeleventvalidationservice.channeleventsynchronizer' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.931 [info] <0.10359.0> Mirrored queue 'assetdiscovery.FileDiscoveryNotificationHandler' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.931 [info] <0.756.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: 0 messages to synchronise

2021-10-11 10:50:14.931 [info] <0.756.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: batch size: 4096

2021-10-11 10:50:14.931 [info] <0.10371.0> Mirrored queue 'channeleventservice.workflowinstancesynchronizer' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.933 [info] <0.10400.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.934 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: 0 messages to synchronise

2021-10-11 10:50:14.934 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: batch size: 4096

2021-10-11 10:50:14.936 [info] <0.10367.0> Mirrored queue 'user-system-partition-cache' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.936 [info] <0.10365.0> Mirrored queue 'channeleventservice.trackstatusynchronizer' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.939 [info] <0.10402.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: all mirrors already synced

2021-10-11 10:50:14.939 [info] <0.724.0> Mirrored queue 'channeleventservice.channeleventupdater' in vhost '/': Synchronising: 0 messages to synchronise

2021-10-11 10:50:14.939 [info] <0.724.0> Mirrored queue 'channeleventservice.channeleventupdater' in vhost '/': Synchronising: batch size: 4096

2021-10-11 10:50:14.939 [info] <0.10378.0> Mirrored queue 'channeleventservice.channeldeleteprocessor' in vhost '/': Synchronising: all mirrors already synced

The following logs represent the logs of the third node which is using the 3.8.16-management image:

2021-10-11 10:50:15.635 [info] <0.826.0> Resetting node maintenance status

2021-10-11 10:50:16.161 [info] <0.1040.0> Successfully set policy 'ha' matching queues names in virtual host '/' using pattern '.*'

2021-10-11T10:50:04+0000 - HA-Fedderation policy for the RabbitMQ Cluster

RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.

Error:

Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms

RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.

Error:

Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms

RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.

Error:

Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms

RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.

Ping succeeded

RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Finally, I have attached an image to show the result in the UI:

The cluster remains stuck in this phase and it is the case for all images above 3.8.16-management and above.

Sincerely yours,

Shuaib

Shuaib Hussain

unread,

Oct 11, 2021, 8:26:22 AM10/11/21

to rabbitmq-users

Also the kubectl describe pods, show the following for message-bus-0:

Name: message-bus-0

Namespace: default

Priority: 1000000

Priority Class Name: critical-service

Node: ip-10-10-90-151.eu-west-1.compute.internal/10.10.90.151

Start Time: Mon, 11 Oct 2021 11:15:46 +0100

Labels: app=rabbitmq

controller-revision-hash=message-bus-5688c54c5d

coralbay.tv/container-type=infrastructure

coralbay.tv/scheduling-type=data

coralbay.tv/stack=coral-messaging

coralbay.tv/system=dev-demo2

statefulset.kubernetes.io/pod-name=message-bus-0

Annotations: kubernetes.io/psp: eks.privileged

prometheus.io/port: 15692

prometheus.io/scrape: true

Status: Running

IP: 10.10.69.96

IPs:

IP: 10.10.69.96

Controlled By: StatefulSet/message-bus

Init Containers:

config-setup:

Container ID: docker://303a02fb261421d400ebfa4f6aedab4528f269e7e3dc32b8276cd311466e8e1c

Image: coralbaytv/cloud-tools:0.4

Image ID: docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6

Port: <none>

Host Port: <none>

Command:

/opt/rabbitmq/init/rabbitmq-init.sh

State: Terminated

Reason: Completed

Exit Code: 0

Started: Mon, 11 Oct 2021 11:15:55 +0100

Finished: Mon, 11 Oct 2021 11:15:55 +0100

Ready: True

Restart Count: 0

Environment: <none>

Mounts:

/etc/rabbitmq from config (rw)

/opt/rabbitmq/config from config-volume (rw)

/opt/rabbitmq/init from init-volume (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Containers:

rabbitmq:

Container ID: docker://d357c78cd97d14e92b8274fedf6b9ecb2d21e80dced2918aac9777bab5c36f16

Image: rabbitmq:3.8.15-management

Image ID: docker-pullable://rabbitmq@sha256:65e167c9dbd55b108f4c400c1c6726370b10c551cff63fb4526db4185f05bc41

Ports: 15672/TCP, 5672/TCP, 15692/TCP

Host Ports: 0/TCP, 0/TCP, 0/TCP

State: Running

Started: Mon, 11 Oct 2021 11:15:56 +0100

Ready: True

Restart Count: 0

Limits:

cpu: 1500m

memory: 2000Mi

Requests:

cpu: 100m

memory: 750Mi

Readiness: exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10

Environment:

RABBITMQ_VM_MEMORY_HIGH_WATERMARK: 1.0

RABBITMQ_ERLANG_COOKIE: <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'> Optional: false

MY_POD_NAME: message-bus-0 (v1:metadata.name)

MY_POD_NAMESPACE: default (v1:metadata.namespace)

RABBITMQ_USE_LONGNAME: true

K8S_SERVICE_NAME: message-bus

RABBITMQ_NODENAME: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

K8S_HOSTNAME_SUFFIX: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

Mounts:

/etc/rabbitmq from config (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Conditions:

Type Status

Initialized True

Ready True

ContainersReady True

PodScheduled True

Volumes:

rabbitmq-data:

Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

ClaimName: rabbitmq-data-message-bus-0

ReadOnly: false

config:

Type: EmptyDir (a temporary directory that shares a pod's lifetime)

Medium:

SizeLimit: <unset>

config-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-config-files-kb9hm5h2h7

Optional: false

init-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-init-files-c2c26hmg2b

Optional: false

rabbitmq-peer-discovery-token-c6hl8:

Type: Secret (a volume populated by a Secret)

SecretName: rabbitmq-peer-discovery-token-c6hl8

Optional: false

QoS Class: Burstable

Node-Selectors: <none>

Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events: <none>

show the following for message-bus-1:

Name: message-bus-1

Namespace: default

Priority: 1000000

Priority Class Name: critical-service

Node: ip-10-10-23-12.eu-west-1.compute.internal/10.10.23.12

Start Time: Mon, 11 Oct 2021 11:12:17 +0100

Labels: app=rabbitmq

controller-revision-hash=message-bus-5688c54c5d

coralbay.tv/container-type=infrastructure

coralbay.tv/scheduling-type=data

coralbay.tv/stack=coral-messaging

coralbay.tv/system=dev-demo2

statefulset.kubernetes.io/pod-name=message-bus-1

Annotations: kubernetes.io/psp: eks.privileged

prometheus.io/port: 15692

prometheus.io/scrape: true

Status: Running

IP: 10.10.18.213

IPs:

IP: 10.10.18.213

Controlled By: StatefulSet/message-bus

Init Containers:

config-setup:

Container ID: docker://63c9ae216c47c704fd24c1db647f9e70875ef5f5b9b752ec36c6061924bbc0a9

Image: coralbaytv/cloud-tools:0.4

Image ID: docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6

Port: <none>

Host Port: <none>

Command:

/opt/rabbitmq/init/rabbitmq-init.sh

State: Terminated

Reason: Completed

Exit Code: 0

Started: Mon, 11 Oct 2021 11:12:34 +0100

Finished: Mon, 11 Oct 2021 11:12:34 +0100

Ready: True

Restart Count: 0

Environment: <none>

Mounts:

/etc/rabbitmq from config (rw)

/opt/rabbitmq/config from config-volume (rw)

/opt/rabbitmq/init from init-volume (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Containers:

rabbitmq:

Container ID: docker://d6454c3b3925ef1ca7e464fe262f0af3df04023db1e6e6305e1272852228d16c

Image: rabbitmq:3.8.15-management

Image ID: docker-pullable://rabbitmq@sha256:65e167c9dbd55b108f4c400c1c6726370b10c551cff63fb4526db4185f05bc41

Ports: 15672/TCP, 5672/TCP, 15692/TCP

Host Ports: 0/TCP, 0/TCP, 0/TCP

State: Running

Started: Mon, 11 Oct 2021 11:12:35 +0100

Ready: True

Restart Count: 0

Limits:

cpu: 1500m

memory: 2000Mi

Requests:

cpu: 100m

memory: 750Mi

Readiness: exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10

Environment:

RABBITMQ_VM_MEMORY_HIGH_WATERMARK: 1.0

RABBITMQ_ERLANG_COOKIE: <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'> Optional: false

MY_POD_NAME: message-bus-1 (v1:metadata.name)

MY_POD_NAMESPACE: default (v1:metadata.namespace)

RABBITMQ_USE_LONGNAME: true

K8S_SERVICE_NAME: message-bus

RABBITMQ_NODENAME: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

K8S_HOSTNAME_SUFFIX: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

Mounts:

/etc/rabbitmq from config (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Conditions:

Type Status

Initialized True

Ready True

ContainersReady True

PodScheduled True

Volumes:

rabbitmq-data:

Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

ClaimName: rabbitmq-data-message-bus-1

ReadOnly: false

config:

Type: EmptyDir (a temporary directory that shares a pod's lifetime)

Medium:

SizeLimit: <unset>

config-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-config-files-kb9hm5h2h7

Optional: false

init-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-init-files-c2c26hmg2b

Optional: false

rabbitmq-peer-discovery-token-c6hl8:

Type: Secret (a volume populated by a Secret)

SecretName: rabbitmq-peer-discovery-token-c6hl8

Optional: false

QoS Class: Burstable

Node-Selectors: <none>

Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events: <none>

show the following for message-bus-2:

Name: message-bus-2

Namespace: default

Priority: 1000000

Priority Class Name: critical-service

Node: ip-10-10-138-73.eu-west-1.compute.internal/10.10.138.73

Start Time: Mon, 11 Oct 2021 11:49:53 +0100

Labels: app=rabbitmq

controller-revision-hash=message-bus-555c88c76f

coralbay.tv/container-type=infrastructure

coralbay.tv/scheduling-type=data

coralbay.tv/stack=coral-messaging

coralbay.tv/system=dev-demo2

statefulset.kubernetes.io/pod-name=message-bus-2

Annotations: kubernetes.io/psp: eks.privileged

prometheus.io/port: 15692

prometheus.io/scrape: true

Status: Running

IP: 10.10.146.39

IPs:

IP: 10.10.146.39

Controlled By: StatefulSet/message-bus

Init Containers:

config-setup:

Container ID: docker://865a8128596398ee87a8973c28612b2580f71ded3e06857ce921ba97272bf75d

Image: coralbaytv/cloud-tools:0.4

Image ID: docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6

Port: <none>

Host Port: <none>

Command:

/opt/rabbitmq/init/rabbitmq-init.sh

State: Terminated

Reason: Completed

Exit Code: 0

Started: Mon, 11 Oct 2021 11:50:03 +0100

Finished: Mon, 11 Oct 2021 11:50:03 +0100

Ready: True

Restart Count: 0

Environment: <none>

Mounts:

/etc/rabbitmq from config (rw)

/opt/rabbitmq/config from config-volume (rw)

/opt/rabbitmq/init from init-volume (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Containers:

rabbitmq:

Container ID: docker://d18a48a931ccff4ee4a28f2d81c151df8df0697519ab542e91a0da5ac494511d

Image: rabbitmq:3.8.16-management

Image ID: docker-pullable://rabbitmq@sha256:09f73d00fc0d9eeb05d8dba8ab6aa5d0265af4d537c71f75229204bca4304dc7

Ports: 15672/TCP, 5672/TCP, 15692/TCP

Host Ports: 0/TCP, 0/TCP, 0/TCP

State: Running

Started: Mon, 11 Oct 2021 11:50:04 +0100

Ready: True

Restart Count: 0

Limits:

cpu: 1500m

memory: 2000Mi

Requests:

cpu: 100m

memory: 750Mi

Readiness: exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10

Environment:

RABBITMQ_VM_MEMORY_HIGH_WATERMARK: 1.0

RABBITMQ_ERLANG_COOKIE: <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'> Optional: false

MY_POD_NAME: message-bus-2 (v1:metadata.name)

MY_POD_NAMESPACE: default (v1:metadata.namespace)

RABBITMQ_USE_LONGNAME: true

K8S_SERVICE_NAME: message-bus

RABBITMQ_NODENAME: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

K8S_HOSTNAME_SUFFIX: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local

Mounts:

/etc/rabbitmq from config (rw)

/var/lib/rabbitmq from rabbitmq-data (rw)

/var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)

Conditions:

Type Status

Initialized True

Ready True

ContainersReady True

PodScheduled True

Volumes:

rabbitmq-data:

Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)

ClaimName: rabbitmq-data-message-bus-2

ReadOnly: false

config:

Type: EmptyDir (a temporary directory that shares a pod's lifetime)

Medium:

SizeLimit: <unset>

config-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-config-files-kb9hm5h2h7

Optional: false

init-volume:

Type: ConfigMap (a volume populated by a ConfigMap)

Name: rabbitmq-init-files-c2c26hmg2b

Optional: false

rabbitmq-peer-discovery-token-c6hl8:

Type: Secret (a volume populated by a Secret)

SecretName: rabbitmq-peer-discovery-token-c6hl8

Optional: false

QoS Class: Burstable

Node-Selectors: <none>

Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s

node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events: <none>

Sincerely yours,

Shuaib

Michal Kuratczyk

unread,

Oct 11, 2021, 9:15:36 AM10/11/21

to rabbitm...@googlegroups.com

Hi,

I just performed this upgrade in a cluster deployed with the Operator and it worked just fine (I'm also pretty sure many people have performed this upgrade without issues).

This is probably specific to how you deploy RabbitMQ. In particular, please check if the erlang cookie doesn't get regenerated. You have warnings about deprecation of the environment variables - perhaps there is a bug in the docker image scripts and the cookie gets regenerated or ignored (if I remember correctly, when not set at all, it will be generated which would explain why the new node can't join the cluster).

The way the Operator works is, it just creates a Secret with the Erlang cookie so no upgrades should affect the value of the cookie.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b39e300d-8993-4fb7-9c96-ca5a72b04a59n%40googlegroups.com.

Michał

RabbitMQ team

Shuaib Hussain

unread,

Oct 11, 2021, 11:40:28 AM10/11/21

to rabbitmq-users

Hi,

I have now tried to mount the .erlang.cookie and chmod 600 it and the rabbitmq pod starts up. However, it still does not join the cluster even though the erlang cookie matches the other erlang cookies:

Also as you can see here, it doesn't join the cluster:

So what I am doing it creating a configmap that references a file, which I mount onto a volume which copies it to the /var/lib/rabbitmq/ directory then the script chmod 600's it. However, for some odd reason it still does not connect. Could you elaborate what you mean by the operator creating a secret? How is the secret referenced to the /var/lib/rabbitmq/.erlang.cookie file?

Michal Kuratczyk

unread,

Oct 11, 2021, 12:00:24 PM10/11/21

to rabbitm...@googlegroups.com

Sounds like what you are doing is correct - the Operator does it similarly with a secreted mount as a volume and the cookie copied and chmoded.

I don't know what's going on in your env but we've created the Operator so that people don't need to waste time with such issues. Perhaps you can install the Operator, deploy a cluster and compare how things are done?

In the future, I'd recommend simply migrating to the Operator-deployed cluster.

Getting started should be trivial if you have a Kubernetes cluster available:

kubectl apply -f "https://github.com/rabbitmq/cluster-operator/releases/latest/download/cluster-operator.yml"

kubectl krew install rabbitmq

kubectl rabbitmq create mycluster --replicas 3 --image rabbitmq:3.8.15-management

With these three commands you should have a cluster deployed so you can see how everything is set up.

Best,

To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/2484bb0e-ad0f-4649-b54d-e7579821ff04n%40googlegroups.com.