Hi,
I am trying to setup Nomad server/client on two separate virtual machines using Vagrant.
Server starts, but client after 'node registration complete', gets heartbeating faild.
ping 192.168.10.10 // from client vm works
ping 192.168.10.20 // from server vm works
Can any one give any hints on what can I check or why it does not work?
VM/Server:
- Vagrantfile:
config.vm.network "private_network", ip: "192.168.10.10"
- server.hcl:
bind_addr = "0.0.0.0"
data_dir = "/var/nomad/data"
server {
enabled = true
bootstrap_expect = 1
rejoin_after_leave = true
}
- output:
sudo nomad agent -config=/etc/nomad/server.hcl
==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
==> Loaded configuration from /etc/nomad/server.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:
Advertise Addrs: HTTP: 10.0.2.15:4646; RPC: 10.0.2.15:4647; Serf: 10.0.2.15:4648
Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Client: false
Log Level: INFO
Region: global (DC: dc1)
Server: true
Version: 0.8.7
==> Nomad agent started! Log data will stream in below:
2019/03/22 14:27:34 [INFO] raft: Initial configuration (index=1): [{Suffrage:Voter ID:10.0.2.15:4647 Address:10.0.2.15:4647}]
2019/03/22 14:27:34 [INFO] raft: Node at 10.0.2.15:4647 [Follower] entering Follower state (Leader: "")
2019/03/22 14:27:34 [INFO] serf: EventMemberJoin: contrib-jessie.global 10.0.2.15
2019/03/22 14:27:34.088813 [INFO] nomad: starting 4 scheduling worker(s) for [service batch system _core]
2019/03/22 14:27:34 [WARN] serf: Failed to re-join any previously known node
2019/03/22 14:27:34.089125 [INFO] nomad: adding server contrib-jessie.global (Addr: 10.0.2.15:4647) (DC: dc1)
2019/03/22 14:27:34.089324 [ERR] consul: error looking up Nomad servers: server.nomad: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused
2019/03/22 14:27:35 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/03/22 14:27:35 [INFO] raft: Node at 10.0.2.15:4647 [Candidate] entering Candidate state in term 217
2019/03/22 14:27:35 [INFO] raft: Election won. Tally: 1
2019/03/22 14:27:35 [INFO] raft: Node at 10.0.2.15:4647 [Leader] entering Leader state
2019/03/22 14:27:35.659736 [INFO] nomad: cluster leadership acquired
VM/Client:
- Vagrantfile:
config.vm.network "private_network", ip: "192.168.10.20"
- client.hcl:
bind_addr = "0.0.0.0"
data_dir = "/var/nomad/data"
client {
enabled = true
servers = ["192.168.10.10:4647"]
}
- output:
sudo nomad agent -config=/etc/nomad/client.hcl
==> Loaded configuration from /etc/nomad/client.hcl
==> Starting Nomad agent...
==> Nomad agent configuration:
Advertise Addrs: HTTP: 10.0.2.15:4646
Bind Addrs: HTTP: 0.0.0.0:4646
Client: true
Log Level: INFO
Region: global (DC: dc1)
Server: false
Version: 0.8.7
==> Nomad agent started! Log data will stream in below:
2019/03/22 14:40:34.183227 [INFO] client: using state directory /var/nomad/data/client
2019/03/22 14:40:34.183269 [INFO] client: using alloc directory /var/nomad/data/alloc
2019/03/22 14:40:34.215315 [INFO] fingerprint.cgroups: cgroups are available
2019/03/22 14:40:36.257422 [INFO] client: Node ID "90c1570b-81e8-ed32-7816-53a21fdb19af"
2019/03/22 14:40:36.262103 [INFO] client: node registration complete
2019/03/22 14:40:44.722651 [INFO] client: node registration complete
2019/03/22 14:41:03.486243 [ERR] nomad: "Node.UpdateStatus" RPC failed to server 10.0.2.15:4647: rpc error: failed to get conn: dial tcp 10.0.2.15:4647: connect: connection refused
2019/03/22 14:41:03.486280 [ERR] client: heartbeating failed. Retrying in 1.307639733s: failed to update status: rpc error: failed to get conn: dial tcp 10.0.2.15:4647: connect: connection refused
2019/03/22 14:41:03.486415 [ERR] client.consul: error discovering nomad servers: client.consul: unable to query Consul datacenters: Get http://127.0.0.1:8500/v1/catalog/datacenters: dial tcp 127.0.0.1:8500: connect: connection refused