Nomad fails server/client on two seperate virutal machines

1,300 views
Skip to first unread message

Konrad Gawlinski

unread,
Mar 22, 2019, 10:46:43 AM3/22/19
to Nomad
Hi,

I am trying to setup Nomad server/client on two separate virtual machines using Vagrant.
Server starts, but client after 'node registration complete', gets heartbeating faild.
ping 192.168.10.10 // from client vm works
ping 192.168.10.20 // from server vm works

Can any one give any hints on what can I check or why it does not work?

VM/Server:

 - Vagrantfile:

config.vm.network "private_network", ip: "192.168.10.10"


 - server.hcl:

bind_addr = "0.0.0.0"

data_dir  
= "/var/nomad/data"

server
{
  enabled
= true
  bootstrap_expect
= 1

  rejoin_after_leave
= true
}

 - output:

sudo nomad agent -config=/etc/nomad/server.hcl

==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from /etc/nomad/server.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       
Advertise Addrs: HTTP: 10.0.2.15:4646; RPC: 10.0.2.15:4647; Serf: 10.0.2.15:4648
           
Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
               
Client: false
             
Log Level: INFO
               
Region: global (DC: dc1)
               
Server: true
               
Version: 0.8.7

==> Nomad agent started! Log data will stream in below:

   
2019/03/22 14:27:34 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.0.2.15:4647 Address:10.0.2.15:4647}]
   
2019/03/22 14:27:34 [INFO] raft: Node at 10.0.2.15:4647 [Follower] entering Follower state (Leader: "")
   
2019/03/22 14:27:34 [INFO] serf: EventMemberJoin: contrib-jessie.global 10.0.2.15
   
2019/03/22 14:27:34.088813 [INFO] nomad: starting 4 scheduling worker(s) for [service batch system _core]
   
2019/03/22 14:27:34 [WARN] serf: Failed to re-join any previously known node
   
2019/03/22 14:27:34.089125 [INFO] nomad: adding server contrib-jessie.global (Addr: 10.0.2.15:4647) (DC: dc1)
   
2019/03/22 14:27:34.089324 [ERR] consul: error looking up Nomad servers: server.nomad: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused
   
2019/03/22 14:27:35 [WARN] raft: Heartbeat timeout from "" reached, starting election
   
2019/03/22 14:27:35 [INFO] raft: Node at 10.0.2.15:4647 [Candidate] entering Candidate state in term 217
   
2019/03/22 14:27:35 [INFO] raft: Election won. Tally: 1
   
2019/03/22 14:27:35 [INFO] raft: Node at 10.0.2.15:4647 [Leader] entering Leader state
   
2019/03/22 14:27:35.659736 [INFO] nomad: cluster leadership acquired



VM/Client:

 - Vagrantfile:

config.vm.network "private_network", ip: "192.168.10.20"

 - client.hcl:

bind_addr = "0.0.0.0"

data_dir  
= "/var/nomad/data"

client
{
  enabled
= true

  servers
= ["192.168.10.10:4647"]
}

 - output:

sudo nomad agent -config=/etc/nomad/client.hcl

==> Loaded configuration from /etc/nomad/client.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:

       
Advertise Addrs: HTTP: 10.0.2.15:4646
           
Bind Addrs: HTTP: 0.0.0.0:4646
               
Client: true
             
Log Level: INFO
               
Region: global (DC: dc1)
               
Server: false
               
Version: 0.8.7

==> Nomad agent started! Log data will stream in below:

   
2019/03/22 14:40:34.183227 [INFO] client: using state directory /var/nomad/data/client
   
2019/03/22 14:40:34.183269 [INFO] client: using alloc directory /var/nomad/data/alloc
   
2019/03/22 14:40:34.215315 [INFO] fingerprint.cgroups: cgroups are available
   
2019/03/22 14:40:36.257422 [INFO] client: Node ID "90c1570b-81e8-ed32-7816-53a21fdb19af"
   
2019/03/22 14:40:36.262103 [INFO] client: node registration complete
   
2019/03/22 14:40:44.722651 [INFO] client: node registration complete
   
2019/03/22 14:41:03.486243 [ERR] nomad: "Node.UpdateStatus" RPC failed to server 10.0.2.15:4647: rpc error: failed to get conn: dial tcp 10.0.2.15:4647: connect: connection refused
   
2019/03/22 14:41:03.486280 [ERR] client: heartbeating failed. Retrying in 1.307639733s: failed to update status: rpc error: failed to get conn: dial tcp 10.0.2.15:4647: connect: connection refused
   
2019/03/22 14:41:03.486415 [ERR] client.consul: error discovering nomad servers: client.consul: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused


Chris Baker

unread,
Mar 22, 2019, 11:40:12 AM3/22/19
to Konrad Gawlinski, Nomad
It looks like you need to set the advertise address, per https://www.nomadproject.io/docs/configuration/index.html#advertise

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/ccd392fb-0ced-4e4b-b4a7-d8c6f834e28d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Konrad Gawlinski

unread,
Mar 23, 2019, 5:01:29 AM3/23/19
to Nomad
Thanks Chris,

i did check the 'advertise' option and seeting it like this solved the problem:

bind_addr = "0.0.0.0"

data_dir  
= "/var/nomad/data"


advertise
{
  http
= "192.168.10.10"
  rpc
= "192.168.10.10"
  serf
= "10.0.2.15"

}

server
{
  enabled
= true
  bootstrap_expect
= 1

  rejoin_after_leave
= true
}
Reply all
Reply to author
Forward
0 new messages