Constraints for preventing job colocation

1,043 views
Skip to first unread message

Faria Kalim

unread,
Mar 1, 2019, 1:29:55 AM3/1/19
to Nomad
Hi,

I am using nomad to run Heron jobs (when I say job, I mean nomad job, not Heron job). Jobs are submitted programmatically so I cannot specify the job in a file. However, I would like each job to be run on a distinct machine so there is absolutely no colocation of jobs. I am trying to submit the constraint programmatically as follows: 

Constraint c = new Constraint();
c.setOperand("distinct_property");
c.setRTarget("1");
c.setLTarget("${node.unique.id}");

List<Constraint> list = new ArrayList<>();
list.add(c);
job.setConstraints(list);

However, jobs are still colocated on nodes. Is there a better way to do this? I realize that affinities are the way to do this, but I am not sure how to use them programmatically.

Thanks, 
 
 
 

Chris Baker

unread,
Mar 1, 2019, 8:36:07 AM3/1/19
to Nomad
Hello Faria Kalim, 

It is possible to do this using the following special constraint operator. From the constraint documentation:
    # All groups in this job should be scheduled on different hosts.
    constraint {
      operator  = "distinct_hosts"
      value     = "true"
    }

You should note that using a constraint means that there will absolutely be no colocation of jobs. The spread support in Nomad 0.9.0 (currently in beta3) will allow you express a preference to avoid colocation without actually blocking jobs in the event that there are more instances than there are distinct hosts.

Chris

Faria Kalim

unread,
Mar 1, 2019, 12:33:18 PM3/1/19
to Nomad
Thanks for getting back to me, Chris. That was actually the first thing I tried that did not work. (I did not specify the LTarget and RTarget just as the doc said). 

Constraint c = new Constraint();
c.setOperand("distinct_hosts");

List<Constraint> list = new ArrayList<>();
list.add(c);
job.setConstraints(list);

This did not work: I ran 11 jobs and 2 were colocated on 5 nodes. Is there any way to debug this?

Thanks,
Faria

Chris Baker

unread,
Mar 1, 2019, 2:05:27 PM3/1/19
to Nomad
Which version of Nomad are you using?

Faria Kalim

unread,
Mar 5, 2019, 10:33:33 PM3/5/19
to Nomad
Hi Chris,

The nomad version is 0.8.6 (ab54ebcfcde062e9482558b7c052702d4cb8aa1b+CHANGES).

Thanks,
Faria

Chris Baker

unread,
Mar 6, 2019, 10:21:01 AM3/6/19
to Nomad
I'd like to see whether the constraint is actually being set on the job. 

Can you check the output of  `nomad job inspect <job-name>`? Alternatively, you can use the JobsApi from the Java SDK to `info()` the job and print the constraints from the returned job.

Faria Kalim

unread,
Mar 7, 2019, 1:28:00 AM3/7/19
to Nomad
Thanks Chris. The constraint is indeed present when I `info` the job. What is a good way to debug this? My code snippet and output log are as follows:

Snippet:
EvaluationResponse response = apiClient.getJobsApi().register(job);
Job jobActual = apiClient.getJobsApi().info(job.getId()).getValue();
List<Constraint> constraints = jobActual.getConstraints();
LOG.info("jobId: " + jobActual.getId() + " constraints num: " + constraints.size());
for (Constraint constraint: constraints) {
     LOG.info("jobId: " + jobActual.getId() + " constraint operand: " + constraint.getOperand() + " constraint ltarget: " + constraint.getLTarget() + " constraint rtarget: " + constraint.getRTarget());
}
Log:
[2019-03-06 23:19:19 -0700] [INFO] org.apache.heron.scheduler.nomad.NomadScheduler: jobId: t1563c8bb34-28d6-4284-935f-4c22c6451cb3-3 constraints num: 1
[2019-03-06 23:19:19 -0700] [INFO] org.apache.heron.scheduler.nomad.NomadScheduler: jobId: t1563c8bb34-28d6-4284-935f-4c22c6451cb3-3 constraint operand: distinct_hosts constraint ltarget:  constraint rtarget:

Thanks,
Faria

Chris Baker

unread,
Mar 8, 2019, 2:38:49 PM3/8/19
to Nomad
I think I may be misunderstanding what you're trying to accomplish. 

To clarify, the distinct_hosts constraint I recommended will prevent multiple instances of the same job from being placed on a single host. It has no effect across different jobs. 

There currently is no mechanism to prevent one job from running on the same node as another job; in fact, Nomad's scheduler is a bin-packing scheduler, which will typically encourage tasks from different jobs to be collocated.
Explicit job constraints (e.g., using node metadata) can produce the behavior that you're looking for, with the risk that if the constraints cannot be satisfied, the jobs will not be placed. Also, this has the downside of potentially complicated external management.

There is one hack that has been used in the past to provide this capability.... by requesting a static port resource, you can avoid collocating jobs. For example, if I have some number of jobs (job1, job2, and job3) where I want to express some anti-affinity, I could indicate a static port resource requirement (specific to these jobs) that would prevent those jobs from being scheduled on a given node: 
resources {
  ...
  network {
    ...
    port "schedulinghack" {
      static = 31234
    }
  }
}

This works because there is only one port 31234 per node, and therefore only one task requiring the resource 31234 can be placed per node. You still have the work of keeping track of which "hack" ports you are using per "job anti-affinity class", and you should strive to avoid using ports that might be needed for bonafide usage.

Faria Kalim

unread,
Mar 8, 2019, 3:06:17 PM3/8/19
to Nomad
Thanks Chris! That is what I was looking to accomplish. So I'll just set a specific node as the host for each job and that'll work for my use case. 

One more thing: how does Nomad figure out the maximum size of a node? Each job asks for a particular set of resources but I haven't been able to figure out how Nomad figures out the capacity of each node for bin packing.


Thanks,
Faria

Chris Baker

unread,
Mar 8, 2019, 5:00:19 PM3/8/19
to Faria Kalim, Nomad
The Nomad client performs fingerprinting of the node, for the purpose of discovering available drivers and resources:
https://www.nomadproject.io/guides/operations/agent/index.html


--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/0ba7b13b-3714-45e1-b632-b1834772a886%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages