Update Not Working

225 views
Skip to first unread message

Shuaib Hussain

unread,
Oct 11, 2021, 8:17:04 AM10/11/21
to rabbitmq-users
Hi,

I am seeking to update my RabbitMQ cluster in a production system from 3.8.7-management to 3.9.7-management.

Currently, I have managed to use the following policy:

updateStrategy:
type: RollingUpdate
rollingUpdate:
partition: 2

This means only the third RabbitMQ node is updated, which is followed by the second and then finally the first. The following logs represent the logs of the first node which is using the 3.8.15-management image:


2021-10-11 10:33:44.289 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up
2021-10-11 10:33:48.777 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up
2021-10-11 10:49:45.864 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' down
2021-10-11 10:49:45.869 [info] <0.525.0> Keeping rab...@message-bus-2.message-bus.default.svc.cluster.local listeners: the node is already back
2021-10-11 10:49:45.932 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' down: connection_closed
2021-10-11 10:49:45.984 [info] <0.728.0> Mirrored queue 'afs.assetnotificationworker' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.728.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.739.0> to be down
2021-10-11 10:49:45.986 [info] <0.772.0> Mirrored queue 'channelanalyticsservice.channeleventsynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.772.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.768.0> to be down
2021-10-11 10:49:45.986 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.696.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.691.0> to be down
2021-10-11 10:49:45.988 [info] <0.748.0> Mirrored queue 'ars-asset-registrations' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.748.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.743.0> to be down
2021-10-11 10:49:45.988 [info] <0.700.0> Mirrored queue 'ams.AssetRequestListener' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.700.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.695.0> to be down
2021-10-11 10:49:45.988 [info] <0.672.0> Mirrored queue 'channeleventservice.workflowinstancesynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.672.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.671.0> to be down
2021-10-11 10:49:45.992 [info] <0.760.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.760.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.759.0> to be down
2021-10-11 10:49:45.993 [info] <0.720.0> Mirrored queue 'channeleventvalidationservice.channeleventsynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.720.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.715.0> to be down
2021-10-11 10:49:45.997 [info] <0.768.0> Mirrored queue 'user-system-partition-cache' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.768.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.763.0> to be down
2021-10-11 10:49:46.005 [info] <0.784.0> Mirrored queue 'channeleventservice.assetusageregistrationstatusynchronizer' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.784.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.788.0> to be down
2021-10-11 10:49:46.007 [info] <0.752.0> Mirrored queue 'wfs-notificationinterface' in vhost '/': Secondary replica of queue <rab...@message-bus-0.message-bus.default.svc.cluster.local.1633947362.752.0> detected replica <rab...@message-bus-2.message-bus.default.svc.cluster.local.1633948424.749.0> to be down
2021-10-11 10:50:06.408 [info] <0.1059.0> k8s endpoint listing returned nodes not yet ready: message-bus-2
2021-10-11 10:50:06.408 [warning] <0.1059.0> Peer discovery: node rab...@message-bus-2.message-bus.default.svc.cluster.local is unreachable
2021-10-11 10:50:09.985 [info] <0.525.0> node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up
2021-10-11 10:50:14.811 [info] <0.525.0> rabbit on node 'rab...@message-bus-2.message-bus.default.svc.cluster.local' up

The following logs represent the logs of the second node which is using the 3.8.15-management image:

021-10-11 10:50:14.929 [info] <0.10358.0> Mirrored queue 'channeleventvalidationservice.channeleventsynchronizer' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.931 [info] <0.10359.0> Mirrored queue 'assetdiscovery.FileDiscoveryNotificationHandler' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.931 [info] <0.756.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: 0 messages to synchronise
2021-10-11 10:50:14.931 [info] <0.756.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: batch size: 4096
2021-10-11 10:50:14.931 [info] <0.10371.0> Mirrored queue 'channeleventservice.workflowinstancesynchronizer' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.933 [info] <0.10400.0> Mirrored queue 'channelstats.channellifecycleprocessor' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.934 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: 0 messages to synchronise
2021-10-11 10:50:14.934 [info] <0.696.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: batch size: 4096
2021-10-11 10:50:14.936 [info] <0.10367.0> Mirrored queue 'user-system-partition-cache' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.936 [info] <0.10365.0> Mirrored queue 'channeleventservice.trackstatusynchronizer' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.939 [info] <0.10402.0> Mirrored queue 'channeleventvalidationservice.channeleventcachemanager' in vhost '/': Synchronising: all mirrors already synced
2021-10-11 10:50:14.939 [info] <0.724.0> Mirrored queue 'channeleventservice.channeleventupdater' in vhost '/': Synchronising: 0 messages to synchronise
2021-10-11 10:50:14.939 [info] <0.724.0> Mirrored queue 'channeleventservice.channeleventupdater' in vhost '/': Synchronising: batch size: 4096
2021-10-11 10:50:14.939 [info] <0.10378.0> Mirrored queue 'channeleventservice.channeldeleteprocessor' in vhost '/': Synchronising: all mirrors already synced

The following logs represent the logs of the third node which is using the 3.8.16-management image:

2021-10-11 10:50:15.635 [info] <0.826.0> Resetting node maintenance status
2021-10-11 10:50:16.161 [info] <0.1040.0> Successfully set policy 'ha' matching queues names in virtual host '/' using pattern '.*'
2021-10-11T10:50:04+0000 - HA-Fedderation policy for the RabbitMQ Cluster
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Error:
Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Error:
Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Error:
Failed to connect and authenticate to rab...@message-bus-2.message-bus.default.svc.cluster.local in 60000 ms
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.
Will ping rab...@message-bus-2.message-bus.default.svc.cluster.local. This only checks if the OS process is running and registered with epmd. Timeout: 60000 ms.
Ping succeeded
RABBITMQ_ERLANG_COOKIE env variable support is deprecated and will be REMOVED in a future version. Use the $HOME/.erlang.cookie file or the --erlang-cookie switch instead.

Finally, I have attached an image to show the result in the UI:

RabbitMQUpdate.png

The cluster remains stuck in this phase and it is the case for all images above 3.8.16-management and above.

Sincerely yours,
Shuaib

Shuaib Hussain

unread,
Oct 11, 2021, 8:26:22 AM10/11/21
to rabbitmq-users
Also the kubectl describe pods, show the following for message-bus-0:

Name:                 message-bus-0
Namespace:            default
Priority:             1000000
Priority Class Name:  critical-service
Node:                 ip-10-10-90-151.eu-west-1.compute.internal/10.10.90.151
Start Time:           Mon, 11 Oct 2021 11:15:46 +0100
Labels:               app=rabbitmq
                      controller-revision-hash=message-bus-5688c54c5d
                      coralbay.tv/container-type=infrastructure
                      coralbay.tv/scheduling-type=data
                      coralbay.tv/stack=coral-messaging
                      coralbay.tv/system=dev-demo2
                      statefulset.kubernetes.io/pod-name=message-bus-0
Annotations:          kubernetes.io/psp: eks.privileged
                      prometheus.io/port: 15692
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.10.69.96
IPs:
  IP:           10.10.69.96
Controlled By:  StatefulSet/message-bus
Init Containers:
  config-setup:
    Container ID:  docker://303a02fb261421d400ebfa4f6aedab4528f269e7e3dc32b8276cd311466e8e1c
    Image:         coralbaytv/cloud-tools:0.4
    Image ID:      docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/rabbitmq/init/rabbitmq-init.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 11 Oct 2021 11:15:55 +0100
      Finished:     Mon, 11 Oct 2021 11:15:55 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/rabbitmq from config (rw)
      /opt/rabbitmq/config from config-volume (rw)
      /opt/rabbitmq/init from init-volume (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Containers:
  rabbitmq:
    Container ID:   docker://d357c78cd97d14e92b8274fedf6b9ecb2d21e80dced2918aac9777bab5c36f16
    Image:          rabbitmq:3.8.15-management
    Image ID:       docker-pullable://rabbitmq@sha256:65e167c9dbd55b108f4c400c1c6726370b10c551cff63fb4526db4185f05bc41
    Ports:          15672/TCP, 5672/TCP, 15692/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 11 Oct 2021 11:15:56 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  2000Mi
    Requests:
      cpu:      100m
      memory:   750Mi
    Readiness:  exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10
    Environment:
      RABBITMQ_VM_MEMORY_HIGH_WATERMARK:  1.0
      RABBITMQ_ERLANG_COOKIE:             <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'>  Optional: false
      MY_POD_NAME:                        message-bus-0 (v1:metadata.name)
      MY_POD_NAMESPACE:                   default (v1:metadata.namespace)
      RABBITMQ_USE_LONGNAME:              true
      K8S_SERVICE_NAME:                   message-bus
      RABBITMQ_NODENAME:                  rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
      K8S_HOSTNAME_SUFFIX:                .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
    Mounts:
      /etc/rabbitmq from config (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  rabbitmq-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-data-message-bus-0
    ReadOnly:   false
  config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config-files-kb9hm5h2h7
    Optional:  false
  init-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-init-files-c2c26hmg2b
    Optional:  false
  rabbitmq-peer-discovery-token-c6hl8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-peer-discovery-token-c6hl8
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

 show the following for message-bus-1:

Name:                 message-bus-1
Namespace:            default
Priority:             1000000
Priority Class Name:  critical-service
Node:                 ip-10-10-23-12.eu-west-1.compute.internal/10.10.23.12
Start Time:           Mon, 11 Oct 2021 11:12:17 +0100
Labels:               app=rabbitmq
                      controller-revision-hash=message-bus-5688c54c5d
                      coralbay.tv/container-type=infrastructure
                      coralbay.tv/scheduling-type=data
                      coralbay.tv/stack=coral-messaging
                      coralbay.tv/system=dev-demo2
                      statefulset.kubernetes.io/pod-name=message-bus-1
Annotations:          kubernetes.io/psp: eks.privileged
                      prometheus.io/port: 15692
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.10.18.213
IPs:
  IP:           10.10.18.213
Controlled By:  StatefulSet/message-bus
Init Containers:
  config-setup:
    Container ID:  docker://63c9ae216c47c704fd24c1db647f9e70875ef5f5b9b752ec36c6061924bbc0a9
    Image:         coralbaytv/cloud-tools:0.4
    Image ID:      docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/rabbitmq/init/rabbitmq-init.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 11 Oct 2021 11:12:34 +0100
      Finished:     Mon, 11 Oct 2021 11:12:34 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/rabbitmq from config (rw)
      /opt/rabbitmq/config from config-volume (rw)
      /opt/rabbitmq/init from init-volume (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Containers:
  rabbitmq:
    Container ID:   docker://d6454c3b3925ef1ca7e464fe262f0af3df04023db1e6e6305e1272852228d16c
    Image:          rabbitmq:3.8.15-management
    Image ID:       docker-pullable://rabbitmq@sha256:65e167c9dbd55b108f4c400c1c6726370b10c551cff63fb4526db4185f05bc41
    Ports:          15672/TCP, 5672/TCP, 15692/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 11 Oct 2021 11:12:35 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  2000Mi
    Requests:
      cpu:      100m
      memory:   750Mi
    Readiness:  exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10
    Environment:
      RABBITMQ_VM_MEMORY_HIGH_WATERMARK:  1.0
      RABBITMQ_ERLANG_COOKIE:             <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'>  Optional: false
      MY_POD_NAME:                        message-bus-1 (v1:metadata.name)
      MY_POD_NAMESPACE:                   default (v1:metadata.namespace)
      RABBITMQ_USE_LONGNAME:              true
      K8S_SERVICE_NAME:                   message-bus
      RABBITMQ_NODENAME:                  rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
      K8S_HOSTNAME_SUFFIX:                .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
    Mounts:
      /etc/rabbitmq from config (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  rabbitmq-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-data-message-bus-1
    ReadOnly:   false
  config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config-files-kb9hm5h2h7
    Optional:  false
  init-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-init-files-c2c26hmg2b
    Optional:  false
  rabbitmq-peer-discovery-token-c6hl8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-peer-discovery-token-c6hl8
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

 show the following for message-bus-2:

Name:                 message-bus-2
Namespace:            default
Priority:             1000000
Priority Class Name:  critical-service
Node:                 ip-10-10-138-73.eu-west-1.compute.internal/10.10.138.73
Start Time:           Mon, 11 Oct 2021 11:49:53 +0100
Labels:               app=rabbitmq
                      controller-revision-hash=message-bus-555c88c76f
                      coralbay.tv/container-type=infrastructure
                      coralbay.tv/scheduling-type=data
                      coralbay.tv/stack=coral-messaging
                      coralbay.tv/system=dev-demo2
                      statefulset.kubernetes.io/pod-name=message-bus-2
Annotations:          kubernetes.io/psp: eks.privileged
                      prometheus.io/port: 15692
                      prometheus.io/scrape: true
Status:               Running
IP:                   10.10.146.39
IPs:
  IP:           10.10.146.39
Controlled By:  StatefulSet/message-bus
Init Containers:
  config-setup:
    Container ID:  docker://865a8128596398ee87a8973c28612b2580f71ded3e06857ce921ba97272bf75d
    Image:         coralbaytv/cloud-tools:0.4
    Image ID:      docker-pullable://coralbaytv/cloud-tools@sha256:329013d11f4ccd9bdbd7f8bd06e46ff672aceffa215d9192fe5dff0f4a6d33e6
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/rabbitmq/init/rabbitmq-init.sh
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 11 Oct 2021 11:50:03 +0100
      Finished:     Mon, 11 Oct 2021 11:50:03 +0100
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /etc/rabbitmq from config (rw)
      /opt/rabbitmq/config from config-volume (rw)
      /opt/rabbitmq/init from init-volume (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Containers:
  rabbitmq:
    Container ID:   docker://d18a48a931ccff4ee4a28f2d81c151df8df0697519ab542e91a0da5ac494511d
    Image:          rabbitmq:3.8.16-management
    Image ID:       docker-pullable://rabbitmq@sha256:09f73d00fc0d9eeb05d8dba8ab6aa5d0265af4d537c71f75229204bca4304dc7
    Ports:          15672/TCP, 5672/TCP, 15692/TCP
    Host Ports:     0/TCP, 0/TCP, 0/TCP
    State:          Running
      Started:      Mon, 11 Oct 2021 11:50:04 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     1500m
      memory:  2000Mi
    Requests:
      cpu:      100m
      memory:   750Mi
    Readiness:  exec [rabbitmqctl ping] delay=20s timeout=15s period=15s #success=1 #failure=10
    Environment:
      RABBITMQ_VM_MEMORY_HIGH_WATERMARK:  1.0
      RABBITMQ_ERLANG_COOKIE:             <set to the key 'erlang-cookie' of config map 'rabbitmq-config-b8gctg7894'>  Optional: false
      MY_POD_NAME:                        message-bus-2 (v1:metadata.name)
      MY_POD_NAMESPACE:                   default (v1:metadata.namespace)
      RABBITMQ_USE_LONGNAME:              true
      K8S_SERVICE_NAME:                   message-bus
      RABBITMQ_NODENAME:                  rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
      K8S_HOSTNAME_SUFFIX:                .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
    Mounts:
      /etc/rabbitmq from config (rw)
      /var/lib/rabbitmq from rabbitmq-data (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from rabbitmq-peer-discovery-token-c6hl8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  rabbitmq-data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  rabbitmq-data-message-bus-2
    ReadOnly:   false
  config:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config-files-kb9hm5h2h7
    Optional:  false
  init-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-init-files-c2c26hmg2b
    Optional:  false
  rabbitmq-peer-discovery-token-c6hl8:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-peer-discovery-token-c6hl8
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:          <none>

Sincerely yours,
Shuaib

Michal Kuratczyk

unread,
Oct 11, 2021, 9:15:36 AM10/11/21
to rabbitm...@googlegroups.com
Hi,

I just performed this upgrade in a cluster deployed with the Operator and it worked just fine (I'm also pretty sure many people have performed this upgrade without issues).
This is probably specific to how you deploy RabbitMQ. In particular, please check if the erlang cookie doesn't get regenerated. You have warnings about deprecation of the environment variables - perhaps there is a bug in the docker image scripts and the cookie gets regenerated or ignored (if I remember correctly, when not set at all, it will be generated which would explain why the new node can't join the cluster).

The way the Operator works is, it just creates a Secret with the Erlang cookie so no upgrades should affect the value of the cookie.

Best,

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/b39e300d-8993-4fb7-9c96-ca5a72b04a59n%40googlegroups.com.


--
Michał
RabbitMQ team

Shuaib Hussain

unread,
Oct 11, 2021, 11:40:28 AM10/11/21
to rabbitmq-users
Hi,

I have now tried to mount the .erlang.cookie and chmod 600 it and the rabbitmq pod starts up. However, it still does not join the cluster even though the erlang cookie matches the other erlang cookies:

erlang.png

Also as you can see here, it doesn't join the cluster:

RMQCluster.png

So what I am doing it creating a configmap that references a file, which I mount onto a volume which copies it to the /var/lib/rabbitmq/ directory then the script chmod 600's it. However, for some odd reason it still does not connect. Could you elaborate what you mean by the operator creating a secret? How is the secret referenced to the /var/lib/rabbitmq/.erlang.cookie file?

Michal Kuratczyk

unread,
Oct 11, 2021, 12:00:24 PM10/11/21
to rabbitm...@googlegroups.com
Sounds like what you are doing is correct - the Operator does it similarly with a secreted mount as a volume and the cookie copied and chmoded.

I don't know what's going on in your env but we've created the Operator so that people don't need to waste time with such issues. Perhaps you can install the Operator, deploy a cluster and compare how things are done?
In the future, I'd recommend simply migrating to the Operator-deployed cluster.

Getting started should be trivial if you have a Kubernetes cluster available:
kubectl krew install rabbitmq
kubectl rabbitmq create mycluster --replicas 3 --image rabbitmq:3.8.15-management

With these three commands you should have a cluster deployed so you can see how everything is set up.

Best,




--
Michał
Reply all
Reply to author
Forward
0 new messages