Kubernetes Peer discovery fails

854 views
Skip to first unread message

Anirudh Dasu

unread,
Sep 10, 2018, 1:10:49 AM9/10/18
to rabbitmq-users
Problem

I made a RabbitMQ (3.7.7) cluster on Kubernetes on AWS using the official plugin.  Unfortunately, peer discovery doesn't seem to be working for me. 

The changes I made to the example from plugin are adding persistent storage and using hostname instead of ip for discovery. My modified files can be found here

Relevant portion of the logs showing clustering failing - 

  •  k8s endpoint listing returned nodes not yet ready: rabbitmq-1
  •  All discovered existing cluster peers: rab...@rabbitmq-0.rabbitmq.test-rabbitmq.svc.cluster.local
  •  Peer nodes we can cluster with: rab...@rabbitmq-0.rabbitmq.test-rabbitmq.svc.cluster.local
  •  Could not auto-cluster with node rab...@rabbitmq-0.rabbitmq.test-rabbitmq.svc.cluster.local: {badrpc,nodedown}
  •  Could not successfully contact any node of: rab...@rabbitmq-0.rabbitmq.test-rabbitmq.svc.cluster.local (as in Erlang distribution). Starting as a blank standalone node.

Attempted Solutions - 

I have gone through this thread and attempted all the solutions discussed there including 
  • Setting RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS env variable
  • Prepending a . to cluster_formation.k8s.hostname_suffix config
  • Set cluster_formation.node_cleanup.only_log_warning = true
All attempted solutions are in the .yml files in the github repo linked above. Hope this information is enough to figure out what the issue is and any help would be really appreciated as I have run out of things to try.

Regards,
Anirudh Dasu.

Luke Bakken

unread,
Sep 10, 2018, 3:54:34 PM9/10/18
to rabbitmq-users
Hello,

Thanks for providing all of that information. Rather than posting what you think is relevant from the logs, it helps us if the entire log contents are made available.

{badrpc,nodedown} is not a cookie issue, it means that the node isn't running, which might be logged ... but we don't have logs.

Luke

Anirudh Dasu

unread,
Sep 11, 2018, 12:44:33 AM9/11/18
to rabbitmq-users
Hi,

Thanks for the reply. Find attached logs from both pods. rabbitmq-0 is the machine which boots up first. 


Anirudh.
logs-from-rabbitmq-k8s-in-rabbitmq-0.txt
logs-from-rabbitmq-k8s-in-rabbitmq-1.txt

Michael Klishin

unread,
Sep 11, 2018, 8:37:12 AM9/11/18
to rabbitm...@googlegroups.com
If you use hostnames you have to make sure that they resolve. It can be one of the
reasons for a "nodedown" (which simply means "the node is not reachable") [1].

According to the log both nodes only discovered themselves and started
as blank standalone nodes. They have to be reset since they won't be performing
peer discovery after that.


--
You received this message because you are subscribed to the Google Groups "rabbitmq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
MK

Staff Software Engineer, Pivotal/RabbitMQ

Anirudh Dasu

unread,
Sep 14, 2018, 7:46:44 AM9/14/18
to rabbitmq-users
Thank you for pointing me in the right direction. Turns out that when using hostnames for discovery in Kubernetes, we need two services, one headless service for internal DNS resolution and one nodeType service that exposes the nodePorts for external access. Wish this was mentioned somewhere in the docs of the plugin as using hostnames for peer discovery will be a common use case I think. I pushed my updated code which is working to the same repo
To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-user...@googlegroups.com.
To post to this group, send email to rabbitm...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Klishin

unread,
Sep 14, 2018, 11:45:30 AM9/14/18
to rabbitm...@googlegroups.com
The docs are open source [1] so you can contribute.

Or you can explain what you wish was covered and I will extend the guide.
I'm not a Kubernetes expert, so some pointers at Kubernetes docs and slightly more elaborate explanation would be appreciated.



To unsubscribe from this group and stop receiving emails from it, send an email to rabbitmq-users+unsubscribe@googlegroups.com.
To post to this group, send email to rabbitmq-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages