System job with node.class constraint.

828 views
Skip to first unread message

Cyril Gaudin

unread,
Feb 23, 2017, 11:18:03 AM2/23/17
to Nomad
Hi,

We tried to use system job with nomad class filtering and we have strange behaviors with "nomad run" status code and output.

Our use case:

We have a system job that runs on an auto scaling group (on AWS).
The instances of this group have a nomad class "foo" so the job definition is like:

job "test" {
    datacenters = ["dc1"]

    type = "system"

    constraint {
        attribute = "${node.class}"
        value     = "foo"
    }

    [...]
}

So the job will be deployed on all servers in the autoscaling group and if we scale up the group,
    Nomad automatically deploys the job on the newly instantiated server.

It's really cool but at the job submission, we have strange output.

Here is our (simplified) cluster nodes:
 - A: Instance with class="bar
 - B: Instance with class="bar"
 - C: Instance with class="baz"
Autoscaling group:
 - D1: Instance with class="foo"

When we run the job above we have the following output:

==> Monitoring evaluation "d1e000cd"
    Evaluation triggered by job "test"
    Allocation "51b3d960" modified: node "a45700d3", group "test"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "d1e000cd" finished with status "complete" but failed to place all allocations:
    Task Group "test" (failed to place 3 allocations):
      * Class "bar" filtered 1 nodes
      * Constraint "${node.class} = foo" filtered 1 nodes

I think it's because a system job has only one evaluation but these numbers are weird:
 - Class "bar" really filtered 2 nodes
 - Constraint node.class filtered 3 nodes (or indeed 1 if we subtract the previous line)

The output contains a specific line for class "bar" but not class "baz" ? (It's pretty weird)

And, our main problem is that the status code of the "nomad run" command is 2.

I read in the code:
"On successful job submission and scheduling, exit code 0 will be
returned. If there are job placement issues encountered
(unsatisfiable constraints, resource exhaustion, etc), then the
exit code will be 2. Any other errors, including client connection
issues or internal errors, are indicated by exit code 1."

But in this case we can't programmatically differentiate if nomad succeeded to place at least one allocation or if no allocation were placed.
In both case, status code will be 2.

Is it a good way to use constraints with system job ?
If yes (and I hope so :)), is the output and status code normal ?

Otherwise I can open issues on Github !

Thanks !

Cyril.

Alex Dadgar

unread,
Feb 27, 2017, 1:48:51 PM2/27/17
to Nomad, Cyril Gaudin
Hey Cyril,

I think you can open an issue to improve the exit code for system jobs.

Thanks,
Alex Dadgar
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/f9e3ac37-c1d6-48b4-92c4-414fb90c0b19%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dma...@istreamplanet.com

unread,
Sep 11, 2018, 1:27:14 PM9/11/18
to Nomad
I'm having this same problem.
Reply all
Reply to author
Forward
0 new messages