Readiness probe failed: Error

amin testing

unread,

Feb 5, 2021, 2:35:05 AM2/5/21

to rabbitmq-users

I am trying to run a rabbitmq 3 instance cluster on AWS EKS environment using bitnami helm chart. I setup 3 ec2 instances for each rabbitmq pods, where I changed the replicas to 3 in the statefulset configuration file after installing helm chart.

helm install my-release bitnami/rabbitmq --set persistence.existingClaim=efs-claim

(Note: I am using File system storage type)

kubectl get pods

NAME READY STATUS RESTARTS AGE

my-release-rabbitmq-0 0/1 Running 0 46s

As you can see the rabbitmq pod is in the running status but shorty after checking the logs mentioned above I am receiving an error on the readiness probe, which I am trying to resolve.

kubectl describe pod

my-release-rabbitmq-0

Events: Type Reason Age From Message

---- ------ ---- ---- -------

Normal Scheduled 57s default-scheduler Successfully assigned default/my-release-rabbitmq-0 to ip-192-.compute.internal

Normal Pulling 52s kubelet Pulling image "docker.io/bitnami/rabbitmq:3.8.11-debian-10-r0"

Normal Pulled 45s kubelet Successfully pulled image "docker.io/bitnami/rabbitmq:3.8.11-debian-10-r0"

Normal Created 44s kubelet Created container rabbitmq

Normal Started 44s kubelet Started container rabbitmq

Warning Unhealthy 8s kubelet Readiness probe failed: Error: unable to perform an operation on node 'rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'. Please see diagnostics information and suggestions below.

Most common reasons for this are:

* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)

* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)

* Target node is not running

In addition to the diagnostics info below:

* See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more

* Consult server logs on node rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

* If target node is configured to use long node names, don't forget to use --longnames with CLI tools

DIAGNOSTICS

===========

attempted to contact: ['rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local']

rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local:

* connected to epmd (port 4369) on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

* epmd reports: node 'rabbit' not running at all no other nodes on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

* suggestion: start the node Current node details:

* node name: 'rabbitmqcli...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'

* effective user's home directory: /opt/bitnami/rabbitmq/.rabbitmq

* Erlang cookie hash: 4ojJe==

It seems that rabbimq did not start so that's why the readiness probe fails. After reading the diagnostics I figured the problem could be due to the difference of node names mentioned in the events so I entered my pod using

kubectl exec --stdin --tty my-release-rabbitmq-0 -- /bin/bash

and my nodename value in rabbitmq-env.conf was

NODENAME=rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

Please any suggestions where my error could be linked to is highly appreciated.

Wesley Peng

unread,

Feb 5, 2021, 2:40:21 AM2/5/21

to RabbitMQ Users

All the nodes' hostname should be resolvable and be unique.

The network connection among them should be reachable.

please check this point.

--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rabbitmq-users/7d2219b5-0f2d-45fb-8574-a0f9e0e6cdd8n%40googlegroups.com.

amin testing

unread,

Feb 5, 2021, 3:04:19 AM2/5/21

to rabbitmq-users

My apologies the nodename values are not showing due to the group thinking its an email address.

attempted to contact: ['rabbit @ ....

Current node details:

node name: 'rabbitmqcli-255-rabbit @ ....

kubectl exec --stdin --tty my-release-rabbitmq-0 -- /bin/bash

my nodename value in rabbitmq-env.conf was

NODENAME=rabbit @ ...

Please note that only 1 pod is running and the other pods have not started yet, regarding the network connection I am 100% sure the nodes are reachable as there are other pods running as well in my cluster. Kindly if possible provide me with any sources on how to change the node name as you can see above they are different. Do I need to change the rabbitmq-env.conf in the Pod itself to match the current node details?

ec2 instance node cat /etc/hosts:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4

::1 localhost6 localhost6.localdomain6

{IP} my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

amin testing

unread,

Feb 5, 2021, 4:00:21 AM2/5/21

to rabbitmq-users

While reading on how to change a rabbitmq node name from here: rabbitmqnodename

It clearly states that entering the configuration file rabbitmq-env.conf and changing the nodename would resolve this issue, but what is absurd the fact that the nodename value is in fact rabbit @ ....

But, as mentioned above, in the events of the describe pod my rabbitmq cluster is showing that the node name is rabbitmqcli-255-rabbit @ ....

Reply all

Reply to author

Forward

Message has been deleted