nomad servers: how do they find each other?

1,492 views
Skip to first unread message

Pedro Melo

unread,
Oct 17, 2015, 4:28:07 PM10/17/15
to nomad...@googlegroups.com
Hi,

just starting to look at Nomad, read the docs, did a quick demo, but one think I can't "get".

In a networked environment (for example a AWS VPC with a nomad server on different subnets, each on a different AZ), how would they find each other the first time?

With consul, using a similar network setup, we must bootstrap the servers by pointing them to another one (the -join parameter).

I get that agents need at least one server to hook up to, like consul, but unlike consul, I didn't see the configuration parameters to make sure nomad servers can find each other…

Looking at this https://www.nomadproject.io/docs/agent/config.html, in client mode the configuration file acepts a "servers" option to list the nomad servers to connect to.

But on the "Server-specific options", I don't see an equivalent option…

Did I miss something?

Alex Dadgar

unread,
Oct 19, 2015, 2:51:19 PM10/19/15
to Pedro Melo, Nomad
Hey Pedro,

There are two parts to getting the Nomad servers to join. First, whatever interface and port that Nomad binds to for serf must be accessible by the other servers (https://www.nomadproject.io/docs/agent/config.html#serf). After that we use a similar join command. You can see the documentation on it here: https://www.nomadproject.io/docs/commands/server-join.html

Thanks,
Alex

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/CACCxPi3LSU8uSMHf2g%3DZ5g-jqgUQo1nJteC0abWLtcJ7RatqWQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Alvaro Miranda Aguilera

unread,
Oct 26, 2015, 6:35:59 AM10/26/15
to Pedro Melo, nomad...@googlegroups.com
Hello Pedro

Not sure if you have got past this.

are 2 ways of doing this.

If you know the IP at the moment of running nomad command you can use

sudo nomad agent -server -data-dir=/var/nomad -bootstrap-expect=3
-bind=${ip} -servers=192.168.219.101:4646,192.168.219.102:4646,192.168.219.103:4646

Or later, you can use:

#on server2
nomad server-join -address=http://${ip}:4646 192.168.219.101:4646

#on server3
nomad server-join -address=http://${ip}:4646 192.168.219.101:4646
nomad server-join -address=http://${ip}:4646 192.168.219.102:4646

Alvaro
Message has been deleted

ja...@fpcomplete.com

unread,
Nov 11, 2015, 12:31:29 AM11/11/15
to Nomad
ugh.. I just posted a reply, thought google received a double post (google ui showed a double entry), and deleted the double, only to find out I deleted my post. If you received this post in email, I would be grateful for a re-post, otherwise I will try to re-write.

/me dunce

ja...@fpcomplete.com

unread,
Nov 11, 2015, 11:29:42 AM11/11/15
to Nomad


On Monday, October 26, 2015 at 6:35:59 AM UTC-4, Alvaro Miranda Aguilera wrote:
Hello Pedro

Not sure if you have got past this.

are 2 ways of doing this.

If you know the IP at the moment of running nomad command you can use

  sudo nomad agent -server -data-dir=/var/nomad -bootstrap-expect=3
-bind=${ip} -servers=192.168.219.101:4646,192.168.219.102:4646,192.168.219.103:4646


Let's say you are deploying nomad leaders to auto-scaling groups, and aim to init the cluster (join/bootstrap leaders) during cloud-init (eg: user_data).

This means..

a) when you first create the cluster, nomad servers need to auto-join on their own, without interactive input
b) the servers need to join reliably, and are all told to do the same thing - unable to have one node do something different during init
c) if the group loses a node, ASG will spin up a new node, and that needs to be able to join with the existing cluster
d) we are unable to force a specific IP for each node you can rely on as a "known" IP later.
e) must be easy for clients/agents to find the servers (and hopefully with minimal dependencies)


I have achieved this type of setup with consul, and it was actually pretty easy:

1) limit the size of the ASG's subnets, which restricts the available IP space for the cluster - eg, I have a list of known IPs, and it is reasonably small
2) create a DNS record (private zone on AWS) for the leaders, add in that full list of known IPs
3) tell CM to configure consul leaders with that full list of IPs

The net result is that consul leaders are given a list of IPs to use in bootstrap, and consul is reasonably fast to find each other, build consensus, and elect a leader.


This past weekend I tested nomad with the same type of setup. I was disappointed with the results, specifically:

a) nomad has no way to join/auto-bootstrap based on config/command line args (when running the agent) - you have to run `nomad server-join` separately.
b) I could be wrong, but `-server-join` does not seem to let you run continuously until bootstrap succeeds - one of the nice aspects of my consul/ASG setup is that 2 nodes can sit waiting while a 3rd node fails to init or do its thing, and those 2 will continue to sit and wait until a 3rd node comes online and available to complete bootstrap.
c) due to b (I believe), I have not been able to use a DNS record (which in the setup described above, would sometimes return a good IP, and sometimes a bad one)..
d) I might need to spend more time on it, but when I tested nomad this weekend here, I was only able to get them joined to each other manually by IP.


Some ideas I have:
1) give the ASG a profile that allows it to lookup the nodes in the ASG, and use the AWS cli to lookup the private IPs of the other nodes, then pass those to `-server-join`
2) use consul, but I avoided this at first because I would not like nomad leader bootstrap to be dependent on consul (agents finding the servers is sensible to put through consul as a service registration/check)
3) write a wrapper script to loop over calling `nomad -server-join` with the DNS record, allowing it to fail and retry until it received a working IP from DNS (or get support for this in nomad)


I would be interested in any recommendations the group has for a simple setup here, to auto-bootstrap nomad servers on ASG.


Thank you for reading along this far!  (and thanks to Brian for helping me to resend)

Alvaro Miranda Aguilera

unread,
Nov 14, 2015, 2:53:58 AM11/14/15
to ja...@fpcomplete.com, Nomad
Hello,

Well, If you start with 3 servers, and you can manage to known the
assignment of the ips, you can create and bootstrap the cluster.

From there, the ASG can use any of the 2 methods to tell the new nomad
server to join the cluster.

Here is similar to consul, you have servers and client.

I think the ASG part is more for the client side of things, I am
correct? so you just need to point the clients (nomad workers) to join
1 of the servers, or a range of IP of the servers.

What is your experience with Consul and Auto-join with atlas? I would
love to hear your experience there.

Since nomad, hopefully soon should get that option too.

Alvaro.

On Wed, Nov 11, 2015 at 6:23 PM, <ja...@fpcomplete.com> wrote:
>
>
> On Monday, October 26, 2015 at 6:35:59 AM UTC-4, Alvaro Miranda Aguilera
> wrote:
>>
>> Hello Pedro
>>
>> Not sure if you have got past this.
>>
>> are 2 ways of doing this.
>>
>> If you know the IP at the moment of running nomad command you can use
>>
>> sudo nomad agent -server -data-dir=/var/nomad -bootstrap-expect=3
>> -bind=${ip}
>> -servers=192.168.219.101:4646,192.168.219.102:4646,192.168.219.103:4646
>>
>
> --
> This mailing list is governed under the HashiCorp Community Guidelines -
> https://www.hashicorp.com/community-guidelines.html. Behavior in violation
> of those guidelines may result in your removal from this mailing list.
>
> GitHub Issues: https://github.com/hashicorp/nomad/issues
> IRC: #nomad-tool on Freenode
> ---
> You received this message because you are subscribed to the Google Groups
> "Nomad" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nomad-tool+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/nomad-tool/3a2985e0-1e4a-498d-b4d6-d5374ef3a7f2%40googlegroups.com.

ja...@fpcomplete.com

unread,
Nov 15, 2015, 6:15:49 PM11/15/15
to Nomad
Hi Alvaro,

Thanks for the reply!


On Saturday, November 14, 2015 at 2:53:58 AM UTC-5, Alvaro Miranda Aguilera wrote:
Hello,

Well, If you start with 3 servers, and you can manage to known the
assignment of the ips, you can create and bootstrap the cluster.


Right, but that is one of the specific nuance limitations with Auto-Scaling Groups on AWS - you will not know the IPs of the nodes in the ASG, they cannot be set as you can with the `aws_instance` resource. You can look them up, using the AWS cli tools, and you can give an EC2 instance the permissions to lookup those IPs (via profiles/IAM/etc).

 
I think the ASG part is more for the client side of things, I am
correct? so you just need to point the clients (nomad workers) to join
1 of the servers, or a range of IP of the servers.
 
Well, I am using ASG on both nomad workers and clients (same with consul), but that does not really matter. The important detail is that ASG is used on the nomad/consul servers, and with nomad, it seems more complicated to setup the same leader election as I have with Consul. BTW, we do not want to use `aws_instance` resource with count, because ASG will re-create nodes when they die, automatically, without admin intervention.
 

What is your experience with Consul and Auto-join with atlas? I would
love to hear your experience there.


I am not using atlas (unfortunately) - the services I am working with are based on private networks with extremely limited external access. I will be publishing details about the auto-join setup I have for consul, but that is not yet ready. The method is based on Terraform, Packer, and Saltstack. It has worked beautifully so far in my testing. That said, the ideas are simple, and outlined in my original post to this thread: consul leaders are deployed to a small network, defining a limited range of IPs, and Consul will try to reconnect/restart/retry until it finds leaders running in the network.


Since nomad, hopefully soon should get that option too.


I would be more interested in nomad having similar command line/config UX for the same leader election process - and it seems there is an existing ticket for this: https://github.com/hashicorp/nomad/issues/180.
Reply all
Reply to author
Forward
0 new messages