join_cluster error - Error: unable to connect to nodes: nodedown despite matching erlang cookie key

2,457 views
Skip to first unread message

Max Rebuschatis

unread,
Nov 2, 2015, 10:03:28 PM11/2/15
to rabbitmq-users
I am trying to get a cluster going between a server in the cloud and a local computer.
The erlang cookie keys are the same.
The erlang versions are exactly the same (17.6.3)
The rabbitmq server versions are exactly the same (3.5.5)

The error output is not helpful. It says: 

lincolnfrog-macbookpro:scripts lincolnfrog$ sudo ./rabbitmqctl join_cluster <server node name here>

Clustering node 'rabbit@lincolnfrog-macbookpro' with '<server node name here>' ...

Error: unable to connect to nodes ['<server node name here>']: nodedown


DIAGNOSTICS

===========


attempted to contact: ['<server node name here>']


<node name here>:

  * connected to epmd (port 4369) on <server name here>

  * epmd reports node 'rabbitmq' running on port 25672

  * TCP connection succeeded but Erlang distribution failed

  * suggestion: hostname mismatch?

  * suggestion: is the cookie set correctly?

  * suggestion: is the Erlang distribution using TLS?


current node details:

- node name: 'rabbitmq-cli-29511@lincolnfrog-macbookpro'

- home dir: /var/root

- cookie hash: bHfp6OfD8JUupTUV2kDNXg==


I have nothing to go on here and no idea what to do to fix this. I have spent two days trying to get the same erlang version to be running on both of these machines as there appear to be serious and fatal mismatches between the erlang versions bundled with various versions of rabbitmq on different platforms. At this point, I am building both rabbitmq and erlang from source on both machines and am using the exact same source on each.

This error message is unhelpful to the extreme. Can anyone help me?

Thanks,
-Max

Michael Klishin

unread,
Nov 2, 2015, 10:12:49 PM11/2/15
to rabbitm...@googlegroups.com, Max Rebuschatis
 On 3 November 2015 at 06:03:32, Max Rebuschatis (linco...@google.com) wrote:
> I
> have spent two days trying to get the same erlang version to be
> running on both of these machines as there appear to be serious
> and fatal mismatches between the erlang versions bundled with
> various versions of rabbitmq on different platforms. At this
> point, I am building both rabbitmq and erlang from source on both
> machines and am using the exact same source on each.

There is only one RabbitMQ distribution that bundles Erlang: the standalone OS X build.

There is *absolutely no* need to build Erlang or RabbitMQ from source, so please reconsider.
--
MK

Staff Software Engineer, Pivotal/RabbitMQ


Michael Klishin

unread,
Nov 2, 2015, 10:13:50 PM11/2/15
to rabbitm...@googlegroups.com, Max Rebuschatis
 On 3 November 2015 at 06:03:32, Max Rebuschatis (linco...@google.com) wrote:
> * TCP connection succeeded but Erlang distribution failed
>
>
> * suggestion: hostname mismatch?
>
>
> * suggestion: is the cookie set correctly?
>
>
> * suggestion: is the Erlang distribution using TLS?

The above gives you a clue: you run rabbitmqctl using sudo, which changes its effective OS user,
and makes it look for the Erlang cookie in the wrong place (root's $HOME vs. your user's $HOME).
Therefore rabbitmqctl cannot authenticate to the running node — they have different cookies (shared
secrets).

Max Rebuschatis

unread,
Nov 2, 2015, 10:53:02 PM11/2/15
to rabbitmq-users, linco...@google.com
1) I have to build on OSX and Linux from source because there are no distributions that have matching erlang versions - I have tried many, many combinations and haven't managed to download something that matches.

2) I have edited the .erlang.cookie file for the sudo user as well - when I do my join_cluster command, it prints the same erlang cookie hash on both systems, so I am POSITIVE that they match.

Michael Klishin

unread,
Nov 2, 2015, 11:04:23 PM11/2/15
to rabbitm...@googlegroups.com, Max Rebuschatis
On 3 November 2015 at 06:53:06, Max Rebuschatis (linco...@google.com) wrote:
> 1) I have to build on OSX and Linux from source because there are
> no distributions that have matching erlang versions - I have
> tried many, many combinations and haven't managed to download
> something that matches.

Can you be more specific? There are 2 distributions for OS X:

 * Standalone build which includes Erlang/OTP (17.x): I'm not sure what kind of
   mismatch there can be

 * Generic UNIX binary build that requires Erlang R13B03 or R16B03 for TLS.
   Homebrew can install Erlang 18.1, which generic UNIX release of RabbitMQ 3.5.6 can run on.

What mismatch are we talking about *exactly*?

> 2) I have edited the .erlang.cookie file for the sudo user as well
> - when I do my join_cluster command, it prints the same erlang
> cookie hash on both systems, so I am POSITIVE that they match.

Sorry, the error message is unambiguous.

 There is no need to run `rabbitmqctl` via sudo on OS X
(or development environments in general).

Michael Klishin

unread,
Nov 2, 2015, 11:06:22 PM11/2/15
to rabbitm...@googlegroups.com, Max Rebuschatis
 On 3 November 2015 at 07:04:18, Michael Klishin (mkli...@pivotal.io) wrote:
> There are 2 distributions for OS X:
>
> * Standalone build which includes Erlang/OTP (17.x): I'm not
> sure what kind of
> mismatch there can be
>
> * Generic UNIX binary build that requires Erlang R13B03 or R16B03
> for TLS.
> Homebrew can install Erlang 18.1, which generic UNIX release
> of RabbitMQ 3.5.6 can run on.

both of which are available from http://www.rabbitmq.com/download.html,
the former is also available via Homebrew.

Max Rebuschatis

unread,
Nov 3, 2015, 12:22:11 AM11/3/15
to rabbitmq-users, linco...@google.com
Thanks for your help.
I installed the generic unix 3.5.6 on each machine (Just to be clear, I am trying to create a cluster between an OSX machine and a linux server) and have matching erlang on each as well.

Here is the ouput of the join_cluster command (note that "myles" is an entry in my /etc/hosts that points to the server IP address):

lincolnfrog-macbookpro:rabbitmq_server-3.5.6 lincolnfrog$ sbin/rabbitmqctl join_cluster rabbit@myles

Clustering node 'rab...@lincolnfrog-macbookpro.roam.corp.google.com' with rabbit@myles ...

Error: unable to connect to nodes [rabbit@myles]: nodedown


DIAGNOSTICS

===========


attempted to contact: [rabbit@myles]


rabbit@myles:

  * connected to epmd (port 4369) on myles

  * epmd reports node 'rabbit' running on port 25672

  * TCP connection succeeded but Erlang distribution failed

  * suggestion: hostname mismatch?

  * suggestion: is the cookie set correctly?

  * suggestion: is the Erlang distribution using TLS?


current node details:

- node name: 'rabbitmq-...@lincolnfrog-macbookpro.roam.corp.google.com'

- home dir: /Users/lincolnfrog

- cookie hash: bHfp6OfD8JUupTUV2kDNXg==


After this, I ran stop_app on the server and ran join_cluster a@b just to generate the cookie hash output so you can see that it is the same:

lincolnfrog@rabbitmq-1:~/stack/rabbitmq_server-3.5.6$ sudo sbin/rabbitmqctl join_cluster a@b

Clustering node 'rabbit@rabbitmq-1' with a@b ...

Error: unable to connect to nodes [a@b]: nodedown

DIAGNOSTICS

===========

attempted to contact: [a@b]

a@b:

  * unable to connect to epmd (port 4369) on b: nxdomain (non-existing domain)

current node details:

- node name: 'rabbitmq-cli-22796@rabbitmq-1'

- home dir: /root

- cookie hash: bHfp6OfD8JUupTUV2kDNXg==


Here is the status output on both machines so you can see all the details:

### OSX ###

lincolnfrog-macbookpro:rabbitmq_server-3.5.6 lincolnfrog$ sbin/rabbitmqctl status

Status of node 'rab...@lincolnfrog-macbookpro.roam.corp.google.com' ...

[{pid,34765},

 {running_applications,[{rabbit,"RabbitMQ","3.5.6"},

                        {mnesia,"MNESIA  CXC 138 12","4.12.4"},

                        {os_mon,"CPO  CXC 138 46","2.3"},

                        {xmerl,"XML parser","1.3.7"},

                        {sasl,"SASL  CXC 138 11","2.4.1"},

                        {stdlib,"ERTS  CXC 138 10","2.3"},

                        {kernel,"ERTS  CXC 138 10","3.1"}]},

 {os,{unix,darwin}},

 {erlang_version,"Erlang/OTP 17 [erts-6.3] [source-f9282c6] [64-bit] [smp:8:8] [async-threads:64] [hipe] [kernel-poll:true]\n"},

 {memory,[{total,38014488},

          {connection_readers,0},

          {connection_writers,0},

          {connection_channels,0},

          {connection_other,2808},

          {queue_procs,2808},

          {queue_slave_procs,0},

          {plugins,0},

          {other_proc,14188992},

          {mnesia,62208},

          {mgmt_db,0},

          {msg_index,47480},

          {other_ets,826208},

          {binary,16488},

          {code,16685819},

          {atom,654217},

          {other_system,5527460}]},

 {alarms,[]},

 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},

 {vm_memory_high_watermark,0.4},

 {vm_memory_limit,4744903065},

 {disk_free_limit,50000000},

 {disk_free,268190482432},

 {file_descriptors,[{total_limit,156},

                    {total_used,3},

                    {sockets_limit,138},

                    {sockets_used,1}]},

 {processes,[{limit,1048576},{used,130}]},

 {run_queue,0},

 {uptime,6}]


### Server ###

lincolnfrog@rabbitmq-1:~/stack/rabbitmq_server-3.5.6$ sudo sbin/rabbitmqctl status

Status of node 'rabbit@rabbitmq-1' ...

[{pid,21713},

 {running_applications,[{rabbit,"RabbitMQ","3.5.6"},

                        {os_mon,"CPO  CXC 138 46","2.3"},

                        {xmerl,"XML parser","1.3.7"},

                        {mnesia,"MNESIA  CXC 138 12","4.12.4"},

                        {sasl,"SASL  CXC 138 11","2.4.1"},

                        {stdlib,"ERTS  CXC 138 10","2.3"},

                        {kernel,"ERTS  CXC 138 10","3.1"}]},

 {os,{unix,linux}},

 {erlang_version,"Erlang/OTP 17 [erts-6.3] [source] [64-bit] [async-threads:64] [hipe] [kernel-poll:true]\n"},

 {memory,[{total,35969688},

          {connection_readers,0},

          {connection_writers,0},

          {connection_channels,0},

          {connection_other,2728},

          {queue_procs,2728},

          {queue_slave_procs,0},

          {plugins,0},

          {other_proc,13685816},

          {mnesia,58928},

          {mgmt_db,0},

          {msg_index,39048},

          {other_ets,798008},

          {binary,14056},

          {code,16695484},

          {atom,654217},

          {other_system,4018675}]},

 {alarms,[]},

 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},

 {vm_memory_high_watermark,0.4},

 {vm_memory_limit,249239961},

 {disk_free_limit,50000000},

 {disk_free,8063016960},

 {file_descriptors,[{total_limit,924},

                    {total_used,3},

                    {sockets_limit,829},

                    {sockets_used,1}]},

 {processes,[{limit,1048576},{used,123}]},

 {run_queue,0},

 {uptime,27}]


-Max

Michael Klishin

unread,
Nov 3, 2015, 12:34:01 AM11/3/15
to rabbitm...@googlegroups.com, Max Rebuschatis
On 3 November 2015 at 08:22:13, Max Rebuschatis (linco...@google.com) wrote:
> I installed the generic unix 3.5.6 on each machine (Just to be
> clear, I am trying to create a cluster between an OSX machine and
> a linux server) and have matching erlang on each as well.
>
> Here is the ouput of the join_cluster command (note that "myles"
> is an entry in my /etc/hosts that points to the server IP address):
>
> lincolnfrog-macbookpro:rabbitmq_server-3.5.6 lincolnfrog$
> sbin/rabbitmqctl join_cluster rabbit@myles
>
>
> Clustering node 'rabbit@…
> with rabbit@myles ...
>
> Error: unable to connect to nodes [rabbit@myles]: nodedown

According to `rabbitmqctl status`, your Linux node believes it is named
"rabbit-1" and not "myles".

RabbitMQ (or rather, the underlying runtime) requires that hostnames resolve
the same way on every node, using DNS, /etc/hosts or a special /etc/hosts-like file [1].
In addition, all nodes should use either short host names
or FQDNs (RABBITMQ_USE_LONGNAME should be exported to use FQDNs).

What do `hostname` and `hostname -f` return on both OS X and Linux? Can you try
changing hostname of the Linux node to `myles` (this will involve making sure it can resolve
itself as `myles`, e.g. via /etc/hosts) and try again?

1. http://www.erlang.org/doc/apps/erts/inet_cfg.html

Michael Klishin

unread,
Nov 3, 2015, 12:37:02 AM11/3/15
to rabbitm...@googlegroups.com, Max Rebuschatis
On 3 November 2015 at 08:22:13, Max Rebuschatis (linco...@google.com) wrote:
> RabbitMQ (or rather, the underlying runtime) requires that
> hostnames resolve
> the same way on every node, using DNS, /etc/hosts or a special
> /etc/hosts-like file [1].
> In addition, all nodes should use either short host names
> or FQDNs (RABBITMQ_USE_LONGNAME should be exported to use FQDNs).

I've filed an issue to make this clearer in the docs:
https://github.com/rabbitmq/rabbitmq-website/issues/104

suggestions what guide(s) beside Clustering it should go into are welcomed. 

Max Rebuschatis

unread,
Nov 3, 2015, 1:15:31 AM11/3/15
to rabbitmq-users, linco...@google.com
On server:

lincolnfrog@rabbitmq-1:~/stack/rabbitmq_server-3.5.6$ hostname
rabbitmq-1
lincolnfrog@rabbitmq-1:~/stack/rabbitmq_server-3.5.6$ hostname -f
rabbitmq-1.c.experience-centers-1084.google.com.internal

On local:

lincolnfrog-macbookpro:rabbitmq_server-3.5.6 lincolnfrog$ hostname

lincolnfrog-macbookpro.roam.corp.google.com

lincolnfrog-macbookpro:rabbitmq_server-3.5.6 lincolnfrog$ hostname -f

lincolnfrog-macbookpro.roam.corp.google.com


How do I change what the second half of the @ is set to?

Michael Klishin

unread,
Nov 3, 2015, 1:31:11 AM11/3/15
to rabbitm...@googlegroups.com, Max Rebuschatis
On 3 November 2015 at 09:15:35, Max Rebuschatis (linco...@google.com) wrote:
> How do I change what the second half of the @ is set to?

That "second half" is hostname, as resolved with gethostname(2).
So, you need to change hostname and
make sure it resolves on the node itself and other nodes.

If you plan on using FQDNs, configure [1] RABBITMQ_USE_LONGNAME to be true.

Node name can be set using RABBITMQ_NODENAME but it does not eliminate
the resolution requirements and is more helpful when running multiple nodes
on the same host, thus needing to change node qualifier (the "first half of the @").

See "Issues with hostname" in [2].

1. http://www.rabbitmq.com/configure.html#define-environment-variables
2. http://www.rabbitmq.com/ec2.html 

Max Rebuschatis

unread,
Nov 3, 2015, 1:46:56 AM11/3/15
to rabbitmq-users, linco...@google.com
I changed it to use rabbit@rabbitmq-1 on my computer and it still doesn't work. I also added an entry to /etc/hosts for my machine on the server.

I noticed it says: 
current node details:

- node name: 'rabbitmq-...@lincolnfrog-macbookpro.roam.corp.google.com'


But in other places it thinks my node is rab...@lincolnfrog-macbookpro.roam.corp.google.com. Why the difference?

If you are a developer, I recommend that you change this error to actually print out the real problem instead of saying it could be one of many things and not really giving any useful information.

Thanks,
-Max

Michael Klishin

unread,
Nov 3, 2015, 2:01:24 AM11/3/15
to rabbitm...@googlegroups.com, Max Rebuschatis
On 3 November 2015 at 09:46:59, Max Rebuschatis (linco...@google.com) wrote:
> I changed it to use rabbit@rabbitmq-1 on my computer and it still
> doesn't work. I also added an entry to /etc/hosts for my machine
> on the server.
>
> I noticed it says:
> current node details:
>
> - node name: 'rabbitmq-...@lincolnfrog-macbookpro.roam.corp.google.com'
>
>
> But in other places it thinks my node is rab...@lincolnfrog-macbookpro.roam.corp.google.com.
> Why the difference?

Max,

According to your earlier response, your machine's hostname is lincolnfrog-macbookpro.roam.corp.google.com
and the server is rabbitmq-1.c.experience-centers-1084.google.com.internal (FQDN) or rabbitmq-1 (short name).

What you need to do is not changing your machine's name to rabbit-1 (unless I misunderstand what
"it" refers to) but likely use FQDNs and make sure both hostnames resolve to the same IP on
both nodes.

I'm sorry but this is getting into a network configuration territory and has little to do with RabbitMQ.
We don't know your network settings, if VPN may be involved (it can alter hostname resolution),
so cannot really help. For example, "roam.corp" in your machine's domain name suggest
it can be the case.

We have a doc guide that explains manual clustering with short node names (rabbit1, rabbit2, rabbit3):
http://www.rabbitmq.com/clustering.html
We also have a guide that is specific to an environment that typically uses FQDNs and DNS:
http://www.rabbitmq.com/ec2.html
I've also explained hostname resolution expectations earlier in this thread.

I'd recommend getting to a point where `rabbitmqctl -n rabbit@[remote hostname] status`
works: this means that hostname resolution works and Erlang cookies match. You can find
out what the -n argument is for in the docs:
https://www.rabbitmq.com/man/rabbitmqctl.1.man.html

> If you are a developer, I recommend that you change this error
> to actually print out the real problem instead of saying it could
> be one of many things and not really giving any useful information.

We appreciate your feedback.

There is no shortage of threads on this topic on our legacy and this mailing list, so we are
as eager to improve that error message as anybody else: our entire team does user support
every single day.

Unfortunately, the runtime doesn't give us enough information to find out what exactly
prevents distribution from succeeding.

Max Rebuschatis

unread,
Nov 3, 2015, 2:01:43 PM11/3/15
to Michael Klishin, rabbitmq-users
Thanks for the help! I was able to get my cluster working last night with your advice - the problem turned out to be missing port forwarding on my network's side for 25672 and 4369, since I was able to connect to the server but it couldn't see my computer.

I think "rabbitmqctl -n rabbit@[remote hostname] status" is a really useful command that should be covered in the clustering tutorial - it is really important to know how to test connectivity to the nodes from the other computers and it wasn't clear how to do this from the documentation.

The other issue I was running into was with the pre-built mac package comes with a really strange version of erlang. It was really hard to get a matching version of erlang on mac and linux and I feel like there is potential for improvement in this area.

Again, I really appreciate your help in this matter. Other than the difficulties I have been having with setting up this cluster, RabbitMQ is a really great messaging system!
-Max
Reply all
Reply to author
Forward
0 new messages