Looking for tips to troubleshoot constraints that are not behaving as expected.

256 views
Skip to first unread message

Marc Slayton

unread,
Oct 29, 2017, 3:37:50 PM10/29/17
to Nomad
Hey all -- 

I have a job with some constraints at the TaskGroup level that aren't working as I'd expect. 
The cause is probably my own nievete, but I've had some difficulty understanding how
to troubleshoot scheduling constraints in general, so I'm reaching out for tips. 

Background: 

I'm using these two constraint examples, which are directly from the docs: 

  constraint {
    attribute = "${attr.platform.aws.instance-type}"
    value     = "m4.2xlarge"
  }
  constraint {
    distinct_hosts = true
  }
Also, I am using 'count = 2', to spawn to instances of my task on two separate hosts.

The constraints are defined at the TaskGroup level. I can see them represented in 
the output of 'nomad deployment status' when the job is run, like so: 

                "Constraints": [
                    {
                        "LTarget": "",
                        "Operand": "distinct_hosts",
                        "RTarget": "true"
                    },
                    {
                        "LTarget": "${attr.platform.aws.instance-type}",
                        "Operand": "=",
                        "RTarget": "m4.2xlarge"
                    }
                ],
                "Count": 2,

so I'm confident they are being added. 

The expected behavior is that tasks would be scheduled on the two m4.2xlarge instances, one per host. 
What happens is the tasks are scheduled on two distinct hosts, but often on instances that are NOT m4.2xlarge. 

So the 'distinct_hosts' constraint seems to be honored, but the instance-type constraint seems
to be silently discarded.

I have some general questions about how to troubleshoot this: 

1.) Is there a way to find out how 'attr.platform.aws.instance-type' is being 
evalutated as part of the deployment? 

2.) It seems like there should be a way to ask nomad how it made it's 
scheduling decisions, but the docs left me scratching my head. 


explains how to use 'nomad eval-status <eval>'
which I can use to track down the specific eval that was used to erroneously 
assign my task to the non-2xlarge host. 

However, the output doesn't give any additional clues: 

> nomad eval-status XXXXXXXX
ID                 = XXXXXXXX
Status             = complete
Status Description = complete
Type               = service
TriggeredBy        = job-register
Job ID             = testjob
Priority           = 50
Placement Failures = false

As I said, I'm not experienced with nomad but I'm interested in digging deeper. 
Any advice on how to troubleshoot scheduling constraints like this one would be 
greatly appreciated. 

Marc Slayton

unread,
Oct 29, 2017, 3:43:52 PM10/29/17
to Nomad
I was able to answer my own question on this by using the command 
# nomad node-status -verbose <node_id>
That gave me the necessary output to find the problem. 

FWIW -- I'm still interested in any tips for understanding how a 
particular scheduling event was decided by nomad. 

Alex Dadgar

unread,
Oct 30, 2017, 1:30:31 PM10/30/17
to Marc Slayton, Nomad
Hey Marc,

You can use `nomad plan` to do a dry run of the scheduler and it will annotate changes that cause the scheduler to make certain decisions. Further if you look at the eval-status to see what constraints caused nodes to not be considered.

So in this example I ran a job constrained to run on Linux on my Mac:

```
$ nomad eval-status -verbose a1475c78-e765-dfde-c714-118b5f214ac4
ID                 = a1475c78-e765-dfde-c714-118b5f214ac4
Status             = complete
Status Description = complete
Type               = service
TriggeredBy        = job-register
Job ID             = example
Priority           = 50
Placement Failures = true
Previous Eval      = <none>
Next Eval          = <none>
Blocked Eval       = 5886fb65-6943-4af5-06e7-4f947a518de9

Failed Placements
Task Group "cache" (failed to place 1 allocation):
  * Constraint "${attr.kernel.name} = linux" filtered 1 nodes

Evaluation "5886fb65-6943-4af5-06e7-4f947a518de9" waiting for additional capacity to place remainder

$ nomad eval-status -json a1475c78-e765-dfde-c714-118b5f214ac4
{
    "AnnotatePlan": false,
    "BlockedEval": "5886fb65-6943-4af5-06e7-4f947a518de9",
    "ClassEligibility": null,
    "CreateIndex": 8,
    "DeploymentID": "507e21e7-7ede-f103-761c-1e4e012b9047",
    "EscapedComputedClass": false,
    "FailedTGAllocs": {
        "cache": {
            "AllocationTime": 26687,
            "ClassExhausted": null,
            "ClassFiltered": null,
            "CoalescedFailures": 0,
            "ConstraintFiltered": {
                "${attr.kernel.name} = linux": 1
            },
            "DimensionExhausted": null,
            "NodesAvailable": {
                "dc1": 1
            },
            "NodesEvaluated": 1,
            "NodesExhausted": 0,
            "NodesFiltered": 1,
            "QuotaExhausted": null,
            "Scores": null
        }
    },
    "ID": "a1475c78-e765-dfde-c714-118b5f214ac4",
    "JobID": "example",
    "JobModifyIndex": 7,
    "ModifyIndex": 11,
    "Namespace": "default",
    "NextEval": "",
    "NodeID": "",
    "NodeModifyIndex": 0,
    "PreviousEval": "",
    "Priority": 50,
    "QueuedAllocations": {
        "cache": 1
    },
    "QuotaLimitReached": "",
    "SnapshotIndex": 8,
    "Status": "complete",
    "StatusDescription": "",
    "TriggeredBy": "job-register",
    "Type": "service",
    "Wait": 0
}
```

Thanks,
Alex

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/1716d80d-fc54-4506-a0e4-faeef7e0af50%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages