Nomad + Consul bootstrapping

574 views
Skip to first unread message

fjro...@paradigmadigital.com

unread,
Nov 14, 2016, 6:59:43 AM11/14/16
to Nomad
I am starting a new cluster with Nomad and I am experiencing a problem with automatic bootstrapping.

- 3 consul servers
- 3 node servers + consul agent
- 3 nomad clients + consul agent

Nomad server config file:

bind_addr = "0.0.0.0"
datacenter = "dc1"
region = "aws-frnkt-1"
data_dir  = "/nomad"

advertise {
  http = "10.20.10.39:4646"
  rpc = "10.20.10.39:4647"
  serf = "10.20.10.39:4648"
}

server {
  enabled          = true
  bootstrap_expect = 3
}

consul {
  ssl = true
  verify_ssl = true
  address = "127.0.0.1:8443"
  server_service_name = "nomad-server"
  auto_advertise = true
  server_auto_join = true
  token = "b3b1e648-f9fd-f88d-1f16-d98b61fb81c2"
  ca_file = "/etc/consul/certs/ca.crt"
  cert_file = "/etc/consul/certs/consul.crt"
  key_file  = "/etc/consul/certs/private/consul.key"
}

Nomad servers are working and cluster is UP.

# nomad server-members
Name                        Address        Port  Status  Leader  Protocol  Build  Datacenter  Region
nomad-server-1.aws-frnkt-1  10.20.10.39    4648  alive   false   2         0.4.1  dc1         aws-frnkt-1
nomad-server-2.aws-frnkt-1  10.20.110.126  4648  alive   false   2         0.4.1  dc1         aws-frnkt-1
nomad-server-3.aws-frnkt-1  10.20.10.17    4648  alive   true    2         0.4.1  dc1         aws-frnkt-1


Now, I am trying to connect Nomad clients to cluster, this is configuration file.

bind_addr = "0.0.0.0"
datacenter = "dc1"
region = "aws-frnkt-1"
data_dir  = "/nomad"

advertise {
  http = "10.20.10.19:4646"
  rpc = "10.20.10.19:4647"
  serf = "10.20.10.19:4648"
}

client {
  enabled = true
  servers = ["nomad-server.service.consul"]
}

consul {
  ssl = true
  verify_ssl = true
  address = "127.0.0.1:8443"
  client_service_name = "nomad-client"
  auto_advertise = true
  server_auto_join = true
  token = "b3b1e648-f9fd-f88d-1f16-d98b61fb81c2"
  ca_file = "/etc/consul/certs/ca.crt"
  cert_file = "/etc/consul/certs/consul.crt"
  key_file  = "/etc/consul/certs/private/consul.key"
}


It cannot connecting to cluster due to consul DNS.

==> Nomad agent started! Log data will stream in below:

    2016/11/14 06:54:56.081232 [INFO] client: using state directory /nomad/client
    2016/11/14 06:54:56.081269 [INFO] client: using alloc directory /nomad/alloc
    2016/11/14 06:54:56.081435 [INFO] fingerprint.cgroups: cgroups are available
    2016/11/14 06:54:56.084970 [WARN]: fingerprint.env_aws: Could not read value for attribute "public-ipv4"
    2016/11/14 06:54:56.086845 [WARN]: fingerprint.env_aws: Could not read value for attribute "public-hostname"
    2016/11/14 06:54:56.093444 [WARN] client.rpcproxy: unable to create new primary server from endpoint "nomad-server.service.consul": lookup nomad-server.service.consul on 10.20.0.2:53: no such host
    2016/11/14 06:54:56.093526 [WARN] client.rpcproxy: No servers available
    2016/11/14 06:54:56.093608 [WARN] client.rpcproxy: No servers available
    2016/11/14 06:54:56.093616 [ERR] client: failed to query for node allocations: no known servers

I do not know why DNS resolution is empty:

# dig nomad-server.service.consul @127.0.0.1 -p 8600

; <<>> DiG 9.9.4-RedHat-9.9.4-38.el7_3 <<>> nomad-server.service.consul @127.0.0.1 -p 8600
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 56627
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;nomad-server.service.consul. IN A

;; AUTHORITY SECTION:
consul. 0 IN SOA ns.consul. postmaster.consul. 1479124568 3600 600 86400 0

;; Query time: 2 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Nov 14 06:56:08 EST 2016
;; MSG SIZE  rcvd: 95

# dig consul.service.consul @127.0.0.1 -p 8600

; <<>> DiG 9.9.4-RedHat-9.9.4-38.el7_3 <<>> consul.service.consul @127.0.0.1 -p 8600
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50304
;; flags: qr aa rd; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;consul.service.consul. IN A

;; ANSWER SECTION:
consul.service.consul. 0 IN A 10.20.120.189
consul.service.consul. 0 IN A 10.20.20.24
consul.service.consul. 0 IN A 10.20.20.251

;; Query time: 1 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Mon Nov 14 06:56:18 EST 2016
;; MSG SIZE  rcvd: 87


are config. files right? Maybe, did I miss something with Consul bootstrap?

Regards.

fjro...@paradigmadigital.com

unread,
Nov 14, 2016, 7:20:12 AM11/14/16
to Nomad
It works if I add some server's IP.

client {
  enabled = true
  servers = ["10.20.10.39:4647"]
}


# nomad node-status -stats
ID        DC   Name            Class   Drain  Status
3c0ff6b9  dc1  nomad-client-1  <none>  false  ready

Regards.

Gabriele Paggi

unread,
Nov 14, 2016, 12:25:39 PM11/14/16
to fjro...@paradigmadigital.com, Nomad
Hi,

Nomad is trying to resolve nomad-server.service.consul via  10.20.0.2 (see log).
If you want it to use the local Consul agent you'll have to install dnsmasq and configure it to forward queries for .consul to the local Consul agent:
===
bind-interfaces
listen-address=127.0.0.1
server=/.consul/127.0.0.1#8600
cache-size=0
no-negcache
no-hosts
===

Gabriele

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/0c90071e-9c48-466e-ac0d-388cd832b925%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Gabriele

fjro...@paradigmadigital.com

unread,
Nov 14, 2016, 12:33:10 PM11/14/16
to Nomad
Hi Gabriele,

thanks for your response.

Do I really need a dnsmasq? Nomad should provide parameters for DNS queries I think.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.



--
Gabriele

Gabriele Paggi

unread,
Nov 14, 2016, 12:45:23 PM11/14/16
to Nomad
Hi,

Nomad is just doing a DNS lookup to find the configured servers and it uses the OS resolvers for that.
If you want it to use the Consul DNS interface you need a way to redirect these requests to Consul and that's where dnsmasq comes into play.

On Mon, Nov 14, 2016 at 6:33 PM, <fjro...@paradigmadigital.com> wrote:
Hi Gabriele,

thanks for your response.

Do I really need a dnsmasq? Nomad should provide parameters for DNS queries I think.

[...]


--
Gabriele

msch...@hashicorp.com

unread,
Nov 14, 2016, 12:54:41 PM11/14/16
to Nomad
dnsmasq is useful if you'd like to do name resolution via consul.

However, dnsmasq is not required to bootstrap a nomad cluster. If you remove the servers line altogether in the client config, the clients will query their local consul agent for nomad servers and connect.

It should Just Work with consul setup even without dnsmasq.

I hope that helps! We tried to fix a lot of issues with consul bootstrapping in 0.5!

fjro...@paradigmadigital.com

unread,
Nov 15, 2016, 11:07:34 AM11/15/16
to Nomad
I will try next time!

Additionally, I had an issue with Consul DNS. I solved updating anonymous access to service in Consul ACL.

service "" {
    policy = "read"
Reply all
Reply to author
Forward
0 new messages