Thinking about region, datacenter (and environment) nomenclature

Rusty Ross

unread,

Dec 17, 2015, 2:47:34 PM12/17/15

to Nomad

I understand that the typical intended use case for "region" and "datacenter" in Nomad is something like:

region = "us"

datacenter = "us-east-1a"

I have some thoughts and some questions about this.

One regards the fact that Consul does not (yet?) consider the concept of a "region". Is the intent that "datacenter" naming should ideally be consistent between Consul and Nomad, and if so, is the plan to build the concept of "regions" into Consul in the future?

If not, I wonder about what "datacenter" really means to most people in this context. In Consul, I most often use "datacenter" to define a full infrastructural deployment (all availability zones) in an AWS region. If however, I use "datacenter" in Nomad to define availability zones (as seems to be suggested by the docs as common/intended practice), then I have inconsistency in the ways I am using that term between Consul and Nomad. If I start using the term "datacenter" in Consul to limit clusters to specific availability zones, then I am deploying additional Consul clusters in that environment (one to cover each zone as a "datacenter"), and I don't see that as ideal.

Maybe there is no reason for the use of "datacenter" to be consistent between Nomad and Consul, but that seems confusing at best.

Additionally, I am thinking through the pros and cons of deploying a single Nomad cluster to serve multiple environments (dev, test, staging, prod, etc). It would be nice, I think, to have an "environment" config for Nomad clients and an ACL system which (1) enforces clients joining a unified Nomad cluster with an authorized environment tag, and (2) enforces job deployments into the correct environments from the same unified Nomad server cluster.

But, since this doesn't exist today in Nomad, and I don't know if this (or anything similar) is currently being considered for Nomad, I am thinking about the possibility (some pros, many cons) of using "datacenter" in Nomad for environmental identification, ie:

region = "us-east"

datacenter = "dev"

region = "us-east"

datacenter = "test"

etc

Anyway, I know this message is a slightly multi-topical, but does anyone care to comment about how they are thinking about regions and datacenters in Nomad, particularly in regards to consistency with Consul, and/or to comment on how folks are thinking about multiple staging environments? Are folks simply deploying multiple 3+ node server clusters, one for each env (ie: dev, test, prod, etc)?

Pires

unread,

Dec 18, 2015, 3:11:31 AM12/18/15

to Nomad

+1 for regions in Consul.

Regarding multiple _environments_, Kubernetes has this concept of namespaces [1] that I find very interesting. If something like this is adopted, it would make it easier to define global or per-namespace ACLs.

1 - http://kubernetes.io/v1.1/docs/user-guide/namespaces.html

Armon Dadgar

unread,

Dec 23, 2015, 2:50:18 AM12/23/15

to Nomad, Rusty Ross

Hey Rusty,

We do intend that Nomad and Consul to have shared naming terminology to minimize the confusion.

In that sense, we generally expect the Consul and Nomad datacenters to be the same, e.g. a Nomad datacenter

would be the entire region “us-east” and not per AZ. Currently, Nomad fingerprints the AZ and makes that

available as a node attribute that can be used to constrain placement if necessary.

With respect to a “region” in Consul, its something we are considering, but the two systems have very

different failure domains. If your DC loses connectivity, you still want service discovery and routing to

work internally so a regional controller for Consul becomes relatively useless. Nomad conversely, we

anticipate being mostly deployed in a single region configuration spanning all datacenters, since existing

services will continue to operate if the Nomad servers are unreachable or have lost quorum.

With respect to multiple clusters per environment, our goal is to support using a single Nomad cluster

for all environments. This will require the ACL system, quotas, and better QoS, so that practically it may

be more convenient to just run multiple clusters in the short term, but its certainly a long term goal for us.

I think similar to K8s “namespaces”, we’ve discussed “environments” in Consul as well, and it might be

that introducing either namespaces or environments is the simplest path for this in both tools.

Hope that helps!

Best Regards,
Armon Dadgar

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/962f20b4-4c29-429a-8338-ed114dd00967%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pires

unread,

Jan 18, 2016, 6:29:02 AM1/18/16

to Nomad

Armon,

Is there a meta issue on Github related to "namespaces" we can track?

Pires

On Wednesday, December 23, 2015 at 7:50:18 AM UTC, Armon Dadgar wrote:

Hey Rusty,

We do intend that Nomad and Consul to have shared naming terminology to minimize the confusion.
In that sense, we generally expect the Consul and Nomad datacenters to be the same, e.g. a Nomad datacenter
would be the entire region “us-east” and not per AZ. Currently, Nomad fingerprints the AZ and makes that
available as a node attribute that can be used to constrain placement if necessary.

With respect to a “region” in Consul, its something we are considering, but the two systems have very
different failure domains. If your DC loses connectivity, you still want service discovery and routing to
work internally so a regional controller for Consul becomes relatively useless. Nomad conversely, we
anticipate being mostly deployed in a single region configuration spanning all datacenters, since existing
services will continue to operate if the Nomad servers are unreachable or have lost quorum.

With respect to multiple clusters per environment, our goal is to support using a single Nomad cluster
for all environments. This will require the ACL system, quotas, and better QoS, so that practically it may
be more convenient to just run multiple clusters in the short term, but its certainly a long term goal for us.

I think similar to K8s “namespaces”, we’ve discussed “environments” in Consul as well, and it might be
that introducing either namespaces or environments is the simplest path for this in both tools.

Hope that helps!

Best Regards,
Armon Dadgar

Armon Dadgar

unread,

Jan 19, 2016, 2:09:04 PM1/19/16

to Pires, Nomad

Hey Pires,

I don’t think there is a meta issue at this time. It’s not an immediate roadmap item, so it’ll take some time

for us to get there given the other low hanging fruit.

Best Regards,
Armon Dadgar

To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/79efdeee-b021-451a-bd5b-46b3469a5605%40googlegroups.com.

Ouadie Benziane

unread,

Nov 13, 2016, 5:37:41 PM11/13/16

to Nomad

We also find the neccesity of having a multi-ends support, rather than creating a NOMAD server for each environment. Can some share ideas/techniques on how to use multi-envs ?

Thanks

Michael Schurter

unread,

Nov 14, 2016, 1:10:38 PM11/14/16

to Ouadie Benziane, Nomad

Ouadie,

Is there a reason you only want a single Nomad server cluster for multiple environments? By default I would recommend multiple Nomad server clusters - although for dev and staging a single server may be enough depending on your cluster size. Multiple clusters not only helps prevent leakage from non-prod apps, configs, etc into the prod cluster, but it allows testing Nomad, Consul, etc upgrades in dev and staging before they affect prod.

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.

To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/13fc3d9d-ac8b-4808-be4e-a8867e9bdeb7%40googlegroups.com.

Ouadie Benziane

unread,

Nov 14, 2016, 1:32:57 PM11/14/16

to Nomad

Michael,

There are cases when you have only small project that runs 1 or 2 instances for test, staging env and we will end by spin up one Nomad server just for one test instance. I'm not sure if this is

if this is an intended design from Hashicorp. Another question on the same context that i'm either ignorant about how to do it or i have missed a documentation detail, let's say you have

3 instances on staging and you have two projects (service01.nomad, service02.nomad) that you need to deploy to , how do get control if we need to deploy service01.nomad to 2 instances and service02.nomad

to the 3rd instance ?

Thanks

On Thursday, December 17, 2015 at 11:47:34 AM UTC-8, Rusty Ross wrote:

Ouadie Benziane

unread,

Nov 14, 2016, 1:39:55 PM11/14/16

to Nomad

Sorry for the typos here folk!

On Thursday, December 17, 2015 at 11:47:34 AM UTC-8, Rusty Ross wrote:

msch...@hashicorp.com

unread,

Nov 14, 2016, 2:04:41 PM11/14/16

to Nomad

If you only have a single instance for testing or staging feel free to enable both the client and server on a single node. That - among other things - is what the -dev flag does. Having a single node be a client and server works fine but is discouraged for a variety of performance and reliability reasons in production.

To control where jobs are scheduled use constraints. The node_class is probably the easiest way to say, deploy web services to "web" nodes. However, it's worth noting that having fine grained control over where jobs are scheduled is kind of an antipattern in Nomad. If you define the resources your jobs need in their configuration, Nomad should schedule them fairly optimally.

I hope that helps!

Alex Dadgar

unread,

Nov 14, 2016, 5:28:14 PM11/14/16

to Nomad

Hey Quadie,

To add more to what Michael is saying:

1) With cluster schedulers you should try to abstract away from individual machines. You should hopefully not be targeting an individual machine for a deployment but rather define the attributes required to run the job (resources, type of machine, networking, etc).You would then add the constraints to your job so that it only lands on machines that satisfy it. If you must though you can use constraints to pick a single machine.

2) If you would like a single set of Nomad servers to manage multiple environments you have several options:

a) Have dedicated Nomad Clients that are in their own DC, for example "us-east1-staging" and then staging jobs can use staging datacenters.

b) Use hierarchical naming for jobs. So if you had a testing, staging and prod cluster each running redis you could have the following jobs: "testing/redis", "staging/redis", "prod/redis" and allow your environments to run on the same machines.

Thanks,

Alex

On Monday, November 14, 2016 at 10:32:57 AM UTC-8, Ouadie Benziane wrote:

Ouadie Benziane

unread,

Nov 14, 2016, 6:39:30 PM11/14/16

to Nomad

Thank you guys! The constraint works awesome. I thought i could add another question in this topic if possible :)

Do we think the ability to override the docker tag like for example "nginx:v1.1" from the nomad command line ? It will nice to run : nomad run -e tag:v.1.1 service01.nomad.

Thanks

On Thursday, December 17, 2015 at 11:47:34 AM UTC-8, Rusty Ross wrote:

Alex Dadgar

unread,

Nov 14, 2016, 8:16:03 PM11/14/16

to Ouadie Benziane, Nomad

Hey Quadie,

Yeah it would be nice :) It is something that we have on our roadmap (in a more generic fashion) but isn't currently available.

Thanks,

Alex

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.

GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/18dbabb0-7089-49ab-8572-ea5717c74eaf%40googlegroups.com.

Ouadie Benziane

unread,

Nov 14, 2016, 9:08:17 PM11/14/16

to Nomad

The constrains works well, but i had to set a different job name for this to work, if i leave the job name unchanged from both files : service-staging.nomad and service-prod.nomad

it will stop one running on prod and start on starging and vice versa.

example:

job "flower-staging" {

# Run the job in the global region, which is the default.

region = "us"

# Specify the datacenters within the region this job can run in.

datacenters = ["prod"]

# Service type jobs optimize for long-lived services. This is

# the default but we can change to batch for short-lived tasks.

# type = "service"

# Priority controls our access to resources and scheduling priority.

# This can be 1 to 100, inclusively, and defaults to 50.

# priority = 50

# Restrict our job to only linux. We can specify multiple

# constraints as needed.

constraint {

attribute = "${attr.kernel.name}"

value = "linux"

}

constraint {

attribute = "${meta.ourenv}"

operator = "="

value = "staging"

}

# Configure the job to do rolling updates

update {

# Stagger updates every 10 seconds

stagger = "10s"

# Update a single task at a time

max_parallel = 2

}

# Create a 'cache' group. Each task in the group will be

# scheduled onto the same machine.

group "flower-app" {

# Control the number of instances of this group.

# Defaults to 1

count = 1

constraint {

attribute = "${meta.ourenv}"

operator = "="

value = "staging"

}

# Configure the restart policy for the task group. If not provided, a

# default is used based on the job type.

restart {

# The number of attempts to run the job within the specified interval.

attempts = 10

interval = "5m"

# A delay between a task failing and a restart occurring.

delay = "25s"

# Mode controls what happens when a task has restarted "attempts"

# times within the interval. "delay" mode delays the next restart

# till the next interval. "fail" mode does not restart the task if

# "attempts" has been hit within the interval.

mode = "delay"

}

# Define a task to run

task "flower" {

# Use Docker to run the task.

driver = "docker"

# Configure Docker driver with the image

config {

image = "xxxxxxxxxxxxx.dkr.ecr.us-east-1.amazonaws.com/tpx-flower:v-1.1"

port_map {

db = 5555

}

constraint {

attribute = "${meta.ourenv}"

operator = "="

value = "staging"

}

service {

name = "${TASKGROUP}"

tags = ["staging", "flower-app"]

port = "db"

check {

name = "alive"

type = "tcp"

interval = "10s"

timeout = "2s"

}

# We must specify the resources required for

# this task to ensure it runs on a machine with

# enough capacity.

env {

"REDIS_HOST" = "10.142.17.71"

"USERNAME" = "admin"

"PASSWORD" = "xxxxxx"

"REDIS_PORT" = "6379"

"OURVERSION" = "v1.5"

}

resources {

cpu = 500 # 500 MHz

memory = 256 # 256MB

network {

mbits = 10

port "db" {

static = 5555

}

# The artifact block can be specified one or more times to download

# artifacts prior to the task being started. This is convenient for

# shipping configs or data needed by the task.

# artifact {

# source = "http://foo.com/artifact.tar.gz"

# options {

# checksum = "md5:c4aa853ad2215426eb7d70a21922e794"

# }

# Specify configuration related to log rotation

# logs {

# max_files = 10

# max_file_size = 15

# }

# Controls the timeout between signalling a task it will be killed

# and killing the task. If not set a default is used.

# kill_timeout = "20s"

}

-------------

On Thursday, December 17, 2015 at 11:47:34 AM UTC-8, Rusty Ross wrote:

Reply all

Reply to author

Forward