I am trying to run a rabbitmq 3 instance cluster on AWS EKS environment using bitnami helm chart. I setup 3 ec2 instances for each rabbitmq pods, where I changed the replicas to 3 in the statefulset configuration file after installing helm chart.
helm install my-release bitnami/rabbitmq --set persistence.existingClaim=efs-claim
(Note: I am using File system storage type)
kubectl get pods
NAME READY STATUS RESTARTS AGE
my-release-rabbitmq-0 0/1 Running 0 46s
As you can see the rabbitmq pod is in the running status but shorty after checking the logs mentioned above I am receiving an error on the readiness probe, which I am trying to resolve.
kubectl describe pod
my-release-rabbitmq-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 57s default-scheduler Successfully assigned default/my-release-rabbitmq-0 to ip-192-.compute.internal
Normal Created 44s kubelet Created container rabbitmq
Normal Started 44s kubelet Started container rabbitmq
Warning Unhealthy 8s kubelet Readiness probe failed: Error: unable to perform an operation on node 'rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'. Please see diagnostics information and suggestions below.
Most common reasons for this are:
* Target node is unreachable (e.g. due to hostname resolution, TCP connection or firewall issues)
* CLI tool fails to authenticate with the server (e.g. due to CLI tool's Erlang cookie not matching that of the server)
* Target node is not running
In addition to the diagnostics info below:
* Consult server logs on node rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local
* If target node is configured to use long node names, don't forget to use --longnames with CLI tools
DIAGNOSTICS
===========
attempted to contact: ['rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local']
rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local:
* connected to epmd (port 4369) on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local
* epmd reports: node 'rabbit' not running at all
no other nodes on my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local
* suggestion: start the node
Current node details:
* node name: 'rabbitmqcli...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local'
* effective user's home directory: /opt/bitnami/rabbitmq/.rabbitmq
* Erlang cookie hash: 4ojJe==
It seems that rabbimq did not start so that's why the readiness probe fails. After reading the diagnostics I figured the problem could be due to the difference of node names mentioned in the events so I entered my pod using
kubectl exec --stdin --tty my-release-rabbitmq-0 -- /bin/bash
and my nodename value in rabbitmq-env.conf was
NODENAME=rab...@my-release-rabbitmq-0.my-release-rabbitmq-headless.default.svc.cluster.local
Please any suggestions where my error could be linked to is highly appreciated.