Running the same batch task multiple times

221 views
Skip to first unread message

Wes Higbee

unread,
Mar 29, 2016, 4:10:55 PM3/29/16
to Nomad
Will nomad run the same allocation multiple times even if I don't ask it to run more than a count of 1? I've run into several situations where nomad is running the same task repeatedly, is this a known issue? This happened when I didn't have enough nodes to handle the load of a batch job and work was queued up, perhaps something duplicated allocations in the case of a scheduling placement failure? it was primarily one task that had the duplication, there were 40 copies of it running simultaneously! There were a few other tasks in my batch running 2 to 4 copies.

I'm using nomad v0.3.1 with docker 1.10

Alex Dadgar

unread,
Mar 29, 2016, 5:30:17 PM3/29/16
to Wes Higbee, Nomad
This sounds like a bug. Would you mind filling an issue with steps to reproduce? There are a few scheduler bug fixes that will arrive in Nomad 0.3.2 so would love if we could fix this too before the release.

Thanks,
Alex Dadgar

On Tue, Mar 29, 2016 at 1:10 PM, Wes Higbee <wes.m...@gmail.com> wrote:
Will nomad run the same allocation multiple times even if I don't ask it to run more than a count of 1? I've run into several situations where nomad is running the same task repeatedly, is this a known issue? This happened when I didn't have enough nodes to handle the load of a batch job and work was queued up, perhaps something duplicated allocations in the case of a scheduling placement failure? it was primarily one task that had the duplication, there were 40 copies of it running simultaneously! There were a few other tasks in my batch running 2 to 4 copies.

I'm using nomad v0.3.1 with docker 1.10

--
This mailing list is governed under the HashiCorp Community Guidelines - https://www.hashicorp.com/community-guidelines.html. Behavior in violation of those guidelines may result in your removal from this mailing list.
 
GitHub Issues: https://github.com/hashicorp/nomad/issues
IRC: #nomad-tool on Freenode
---
You received this message because you are subscribed to the Google Groups "Nomad" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nomad-tool+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nomad-tool/666f55b4-c543-4a5b-a410-3156b01bbf18%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Wes Higbee

unread,
Mar 29, 2016, 11:23:58 PM3/29/16
to Alex Dadgar, Nomad
I'm close to making the issue reproducible, I'll let you know if I find the unique conditions. 

I do know that the issue happens after I reboot my nomad server, after not using it overnight. I have to nuke the tmp data directory to get nomad to work again :(

Wes Higbee

unread,
Mar 30, 2016, 9:08:44 PM3/30/16
to Nomad
I ran across this again today. Had many jobs that executed just fine, then all of the sudden one task in one job has executed 170 times! Is it possible there's a bug in the batch scheduling logic that makes nomad think a batch job is a service job?

What can I pull from log files to help figure out why this might be happening?
Reply all
Reply to author
Forward
0 new messages