$ nomad server-members
Name Address Port Status Protocol Build Datacenter Region
cptest-dev-master-00e94ad85f1d702fb.global 10.0.4.252 4648 alive 2 0.3.1 dc1 global
cptest-dev-master-048fc11bd951b6815.global 10.0.1.213 4648 alive 2 0.3.1 dc1 global
cptest-dev-master-0d8633708cf4778b7.global 10.0.2.222 4648 alive 2 0.3.1 dc1 global
$ nomad node-status
ID DC Name Class Drain Status
eeb88d9d dc1 cptest-dev-client-050b7e212203733f6 <none> false ready
$ nomad status
No running jobs
$ nomad initExample job file written to example.nomad
$ nomad run example.nomad
==> Monitoring evaluation "96f07a2c"
Evaluation triggered by job "example"
Evaluation status changed: "pending" -> "failed"
==> Evaluation "96f07a2c" finished with status "failed"
$ nomad status example
ID = example
Name = example
Type = service
Priority = 50
Datacenters = dc1
Status = pending
Periodic = false
==> Evaluations
ID Priority Triggered By Status
01e73507 50 job-register blocked
96f07a2c 50 job-register failed
==> Allocations
ID Eval ID Node ID Task Group Desired Status
$ curl -s localhost:4646/v1/evaluations | python -m json.tool
[
{
"ID": "01e73507-8a4e-f961-3535-c6e9ca38e9de",
"Priority": 50,
"Type": "service",
"TriggeredBy": "job-register",
"JobID": "example",
"JobModifyIndex": 6,
"NodeID": "",
"NodeModifyIndex": 0,
"Status": "blocked",
"StatusDescription": "",
"Wait": 0,
"NextEval": "",
"PreviousEval": "96f07a2c-11a1-3916-f790-b37ab213794c",
"ClassEligibility": {
"v1:6305318303864028080": true
},
"EscapedComputedClass": false,
"CreateIndex": 8,
"ModifyIndex": 8
},
{
"ID": "96f07a2c-11a1-3916-f790-b37ab213794c",
"Priority": 50,
"Type": "service",
"TriggeredBy": "job-register",
"JobID": "example",
"JobModifyIndex": 6,
"NodeID": "",
"NodeModifyIndex": 0,
"Status": "failed",
"StatusDescription": "maximum attempts reached (5)",
"Wait": 0,
"NextEval": "",
"PreviousEval": "",
"ClassEligibility": null,
"EscapedComputedClass": false,
"CreateIndex": 7,
"ModifyIndex": 9
}
]
nomad fs ls <allocId fo task> alloc/logs
$ curl -s localhost:4646/v1/allocations
[]
example.nomad
it will give you more details related to job
try with either http apis
nomad run -verboseexample.nomad
it will give you more details related to job
$ nomad run -verbose example.nomad
==> Monitoring evaluation "69bb3e59-8d56-f945-6331-d51faa9b1222"
Evaluation triggered by job "example"
Evaluation status changed: "pending" -> "failed"
==> Evaluation "69bb3e59-8d56-f945-6331-d51faa9b1222" finished with status "failed"
$ curl -XPOST localhost:4646/v1/job/example1 --data-binary '@example1.json'
{"EvalID":"","EvalCreateIndex":0,"JobModifyIndex":1358,"Index":0,"LastContact":0,"KnownLeader":false}
$ nomad status
ID Type Priority Status
example1 batch 50 running
$ nomad status example1/
ID = example1/periodic-1459831080
Name = example1/periodic-1459831080
Type = batch
Priority = 50
Datacenters = dc1
Status = pending
Periodic = false
==> Evaluations
ID Priority Triggered By Status
71a5b285 50 periodic-job blocked
1e44d4af 50 periodic-job failed
==> Allocations
ID Eval ID Node ID Task Group Desired Status
nomad agent -log-level DEBUG -config /var/lib/nomad/server.hcl -bind $COREOS_PRIVATE_IPV4
nomad agent -log-level DEBUG -config /var/lib/nomad/client.hcl -bind $COREOS_PRIVATE_IPV4
data_dir = "/var/lib/nomad/data"
disable_update_check = true
addresses {
http = "0.0.0.0"
}
server {
enabled = true
bootstrap_expect = 3
retry_join = ["consul.service.consul"]
}
data_dir = "/var/lib/nomad/data"
disable_update_check = true
client {
enabled = true
servers = ["server.nomad.service.consul"]
reserved {
cpu = 500
memory = 512
disk = 10000
reserved_ports = "22,8300-8600"
}
}
Also as per config files you have 3 servers and clients attached.
do you have docker installed on each client???
$ nomad node-status
ID DC Name Class Drain Status
eeb88d9d dc1 cptest-dev-client-050b7e212203733f6 <none> false
ready
$ nomad node-status eeb88d9d
ID = eeb88d9d
Name = cptest-dev-client-050b7e212203733f6
Class = <none>
DC = dc1
Drain = false
Status = ready
Attributes = arch:amd64, consul.datacenter:us-west-2, consul.revision:26a0ef8c41aa2252ab4cf0844fc6470c8e1d8256, consul.server:false, consul.version:0.6.4, cpu.frequency:2500.092000, cpu.modelname:Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz, cpu.numcores:1, cpu.totalcompute:2500.092000, driver.docker:1, driver.docker.version:1.9.1, driver.exec:1, driver.rkt:1, driver.rkt.appc.version:0.7.4, driver.rkt.version:1.0.0, hostname:cptest-dev-client-050b7e212203733f6, kernel.name:linux, kernel.version:4.3.6-coreos, memory.totalbytes:2101010432, os.name:coreos, os.version:899.13.0, platform.aws.instance-type:t2.small, platform.aws.placement.availability-zone:us-west-2c, unique.cgroup.mountpoint:/sys/fs/cgroup, unique.consul.name:cptest-dev-client-050b7e212203733f6, unique.network.ip-address:10.0.4.198, unique.platform.aws.ami-id:ami-5bc4313b, unique.platform.aws.hostname:ip-10-0-4-198.service.consul, unique.platform.aws.instance-id:i-050b7e212203733f6, unique.platform.aws.local-hostname:ip-10-0-4-198.service.consul, unique.platform.aws.local-ipv4:10.0.4.198, unique.platform.aws.public-hostname:ec2-54-191-238-185.us-west-2.compute.amazonaws.com, unique.platform.aws.public-ipv4:54.191.238.185, unique.storage.bytesfree:97222623232, unique.storage.bytestotal:101552205824, unique.storage.volume:/dev/xvda9
==> Allocations
ID Eval ID Job ID Task Group Desired Status Client Status
==> Resource Utilization
CPU Memory MB Disk MB IOPS
0 0 0 0
everything looks fine. try below commands.
sudo docker stop $(docker ps -a -q)
sudo docker rm $(docker ps -a -q)
it will remove the all container first stopping them.
i am using the scripts as some time ago i had the issue where i ran the job which was running before perfectly but started failing. the reason was some how containers were not destroyed. so i ran the above two commands and then ran the job. it started working again.
core@cptest-dev-client-050b7e212203733f6 ~ $ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
core@cptest-dev-client-050b7e212203733f6 ~ $
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/c8e2997d-a696-4ceb-9f15-7ed4512a9685%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
servers = ["server.nomad.service.consul"]
retry_join = ["consul.service.consul"]
can you try giving ip addresses instead of resolving it using interpreted variables?
also check nomad log file. it will show which all nodes it is joined.
because if
interpreted variables are not returning anything then, your nomad server will be alone in cluster and somehow the job will not bne sent to any clientas server won't run job in own machine.
The blocked eval was created because Nomad couldn't find a suitable job to run the job. So the failed allocation will have details around why Nomad couldn't find any suitable nodes.Can you please share the output of "nomad alloc-status 96f07a2c" please?
$ nomad run example.nomad
==> Monitoring evaluation "b092d115"
Evaluation triggered by job "example"
Evaluation status changed: "pending" -> "failed"
==> Evaluation "b092d115" finished with status "failed"
$ nomad alloc-status b092d115
No allocation(s) with prefix or id "b092d115" found
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/6dab1f70-f6b8-44ac-8629-cc5fb47b1830%40googlegroups.com.
You are using the evaluation ID there, you need to use the allocation id.Running the nomad status example, should provide you with a list of allocations and you can find the allocation id of the failed allocation.
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/b4a958dd-9997-4b9c-9613-b7ac72e5b0ad%40googlegroups.com.
Oh I see, didn' realize that no allocations were created.
What does nomad eval-monitor for the failed eval say?
==> Monitoring evaluation "b092d115"
Evaluation triggered by job "example"
Evaluation status changed: "pending" -> "failed"
==> Evaluation "b092d115" finished with status "failed"
can you try giving ip addresses instead of resolving it using interpreted variables?
also check nomad log file. it will show which all nodes it is joined.
because ifinterpreted variables are not returning anything then, your nomad server will be alone in cluster and somehow the job will not bne sent to any clientas server won't run job in own machine.
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] serf: EventMemberJoin: cptest-dev-master-0d8633708cf4778b7.global 10.0.2.222
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] nomad: starting 1 scheduling worker(s) for [service batch system _core]
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] agent: Joining cluster...
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] raft: Node at 10.0.2.222:4647 [Follower] entering Follower state
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] nomad: adding server cptest-dev-master-0d8633708cf4778b7.global (Addr: 10.0.2.222:4647) (DC: dc1)
Apr 04 06:47:36 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:36 [INFO] agent: Join completed. Synced with 1 initial agents
Apr 04 06:47:41 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:41 [INFO] serf: EventMemberJoin: cptest-dev-master-048fc11bd951b6815.global 10.0.1.213
Apr 04 06:47:41 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:41 [INFO] nomad: adding server cptest-dev-master-048fc11bd951b6815.global (Addr: 10.0.1.213:4647) (DC: dc1)
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] serf: EventMemberJoin: cptest-dev-master-00e94ad85f1d702fb.global 10.0.4.252
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] nomad: adding server cptest-dev-master-00e94ad85f1d702fb.global (Addr: 10.0.4.252:4647) (DC: dc1)
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] nomad: Attempting bootstrap with nodes: [10.0.2.222:4647 10.0.1.213:4647 10.0.4.252:4647]
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] raft: Node at 10.0.2.222:4647 [Candidate] entering Candidate state
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] raft: Election won. Tally: 2
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] raft: Node at 10.0.2.222:4647 [Leader] entering Leader state
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] nomad: cluster leadership acquired
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] raft: pipelining replication to peer 10.0.1.213:4647
Apr 04 06:47:43 cptest-dev-master-0d8633708cf4778b7 nomad[792]: 2016/04/04 06:47:43 [INFO] raft: pipelining replication to peer 10.0.4.252:4647
can you test running this job
job "test" {
[...]
==> Monitoring evaluation "0ee90e3d"
Evaluation triggered by job "test"
Evaluation status changed: "pending" -> "failed"
==> Evaluation "0ee90e3d" finished with status "failed"
reserved_ports = "22,8300-8600"
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/de88a76c-f8b5-46b2-930d-486ca7aae1d9%40googlegroups.com.
I am running 0.3.2, came across this thread because I was having the same problem. Did the same commented outreserved_ports = "22,4194,7301,8300-8600"restarted the client, and it worked.Did this fix actually get pushed out?
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to a topic in the Google Groups "Nomad" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/nomad-tool/x_CvBBDtoTc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/61a1c175-2181-4414-86f8-83348fa479df%40googlegroups.com.
I am certain that I was running 0.3.2 since I never ran anything before that. I am running a cluster of 3 servers, and 2 client nodes. No machines act as both. Not sure if that matters. I cannot reproduce it using the steps that were given in the issue only when I setup the 5 servers.
FWIW - I can repro this on 0.4.0 and 0.4.1-rc1 - on a single node server+client, using a couple of reserved port ranges, everything works fine. Running separate server and client nodes, if the client config specifies _any_ reserved ports, no job can run - the evaluation fails with no useful output anywhere and no allocations are created. If the reserved ports are removed from the client config, everything is fine. This is reproducible with any job, including trivial examples.
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/a8582b78-6bbf-4604-b95c-3bfb31d8b723%40googlegroups.com.
Hey Bagelswitch,I just verified this by adding reserved_ports = "20000-59990" to a client and was asking for dynamic ports and it worked. Please let me know how you are reproducing this. It would be best if you opened an issue.Thanks,Alex
On Sun, Aug 14, 2016 at 9:12 AM, Bagelswitch <bagel...@gmail.com> wrote:
FWIW - I can repro this on 0.4.0 and 0.4.1-rc1 - on a single node server+client, using a couple of reserved port ranges, everything works fine. Running separate server and client nodes, if the client config specifies _any_ reserved ports, no job can run - the evaluation fails with no useful output anywhere and no allocations are created. If the reserved ports are removed from the client config, everything is fine. This is reproducible with any job, including trivial examples.
--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.