Readiness probe failed: Error

735 views
Skip to first unread message

amin testing

unread,
Feb 5, 2021, 2:35:05 AM2/5/21
to rabbitmq-users
I am trying to run a rabbitmq 3 instance cluster on AWS EKS environment using bitnami helm chart. I setup 3 ec2 instances for each rabbitmq pods, where I changed the replicas to 3 in the statefulset configuration file after installing helm chart.

helm install my-release bitnami/rabbitmq --set persistence.existingClaim=efs-claim

(Note: I am using File system storage type)

kubectl get pods 
NAME READY STATUS RESTARTS AGE 
my-release-rabbitmq-0 0/1 Running 0 46s

As you can see the rabbitmq pod is in the running status but shorty after checking the logs mentioned above I am receiving an error on the readiness probe, which I am trying to resolve.

kubectl describe pod 
my-release-rabbitmq-0 
Events: Type Reason Age From Message 
 ---- ------ ---- ---- ------- 
 Normal Scheduled 57s default-scheduler Successfully assigned default/my-release-rabbitmq-0 to ip-192-.compute.internal 
 Normal Pulling 52s kubelet Pulling image "docker.io/bitnami/rabbitmq:3.8.11-debian-10-r0
 Normal Pulled 45s kubelet Successfully pulled image "docker.io/bitnami/rabbitmq:3.8.11-debian-10-r0
 Normal Created 44s kubelet Created container rabbitmq 
 Normal Started 44s kubelet Started container rabbitmq 
 Warning Unhealthy 8s kubelet Readiness probe failed: Error: unable to perform an operation on node 'rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'. Please see diagnostics information and suggestions below. 
 Most common reasons for this are: 
 * Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues) 
 * CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server) 
 * Target node is not running 
 In addition to the diagnostics info below: 
 * See the CLI, clustering and networking guides on https://rabbitmq.com/documentation.html to learn more 
 * Consult server logs on node rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local 
 * If target node is configured to use long node names, don't forget to use --longnames with CLI tools 
 DIAGNOSTICS 
===========  
attempted to contact: ['rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'] 
rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local: 
 * connected to epmd (port 4369) on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local 
 * epmd reports: node 'rabbit' not running at all no other nodes on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local 
 * suggestion: start the node Current node details:  
* node name: 'rabbitmqcli...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local' 
 * effective user's home directory: /opt/bitnami/rabbitmq/.rabbitmq 
 * Erlang cookie hash: 4ojJe==

 It seems that rabbimq did not start so that's why the readiness probe fails. After reading the diagnostics I figured the problem could be due to the difference of node names mentioned in the events so I entered my pod using 
kubectl exec --stdin --tty my-release-rabbitmq-0 -- /bin/bash 
and my nodename value in rabbitmq-env.conf was 
NODENAME=rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

Please any suggestions where my error could be linked to is highly appreciated.





Wesley Peng

unread,
Feb 5, 2021, 2:40:21 AM2/5/21
to RabbitMQ Users
All the nodes' hostname should be resolvable and be unique.
The network connection among them should be reachable.
please check this point.
--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.

amin testing

unread,
Feb 5, 2021, 3:04:19 AM2/5/21
to rabbitmq-users
My apologies the nodename values are not showing due to the group thinking its an email address.

attempted to contact: ['rabbit @ ....

Current node details:
node name: 'rabbitmqcli-255-rabbit @ ....


kubectl exec --stdin --tty my-release-rabbitmq-0 -- /bin/bash 
my nodename value in rabbitmq-env.conf was 
NODENAME=rabbit @ ...

Please note that only 1 pod is running and the other pods have not started yet, regarding the network connection I am 100% sure the nodes are reachable as there are other pods running as well in my cluster. Kindly if possible provide me with any sources on how to change the node name as you can see above they are different. Do I need to change the rabbitmq-env.conf in the Pod itself to match the current node details?

ec2 instance node cat /etc/hosts:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 
::1 localhost6 localhost6.localdomain6 
{IP} my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local

amin testing

unread,
Feb 5, 2021, 4:00:21 AM2/5/21
to rabbitmq-users
While reading on how to change a rabbitmq node name from here: rabbitmqnodename
It clearly states that entering the configuration file rabbitmq-env.conf and changing the nodename would resolve this issue, but what is absurd the fact that the nodename value is in fact rabbit @ ....
But, as mentioned above, in the events of the describe pod my rabbitmq cluster is showing that the node name is  rabbitmqcli-255-rabbit @ ....

Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted
0 new messages