Using "system" scheduler for periodic jobs?

693 views
Skip to first unread message

Michael Ihde

unread,
Feb 23, 2018, 4:58:08 PM2/23/18
to Nomad
I wish to define a periodic job that executes on all nodes that match the constraint for the job.   An example would be a cleanup script that I want to run once an hour on all the nodes.

Currently the periodic scheduler only works with the "batch" scheduler.  I tried using the 'distinct_host' constraint:

job "cleanup" {
  type = "batch"
  periodic { ... }
  
  group "docleanup" {
    count = NUMNODES
    constraint {
      operator = "distinct_hosts"
      value = "true"
    }
    task "cleanup" { ...  }
}

but it wasn't clear if that would guarantee that the task would run once and only once on every node.  I was able to demonstrate that if NUMNODES was greater than the number of actual nodes the task was scheduled multiple times on the node.

One workaround is to use a normal "system" job and have an external cron script call 'nomad run cleanup.nomad', but that doesn't seem very elegant.

Allowing periodic jobs to use the "system" scheduler would be very useful.  As before, if there is agreement on this feature request I'm happy to go off and try to implement it.  If there are alternate approaches to this let me know.

Regards,
~Michael

msch...@hashicorp.com

unread,
Feb 26, 2018, 1:45:03 PM2/26/18
to Nomad
Hi Michael,

Unfortunately Nomad doesn't currently support periodic system jobs. Please follow this issue to be notified when the feature eventually lands! https://github.com/hashicorp/nomad/issues/1944

Michael Ihde

unread,
Feb 27, 2018, 10:11:37 PM2/27/18
to Nomad
I've started prototyping an implementation for system/periodic jobs.  This was relatively easy to implement, with one small issue.  LIke the service scheduler, the system scheduler expects that the task never exits.  If the task exits then the restart logic is triggered.  One approach I considered was changing RestartTracker, but even if you set onSuccess to false the child jobs will remain in the running state after the task exits.  With enough changes it could be made to work, but it was easier and seemed more appropriate to have system/periodic jobs dispatch the children jobs with as batch instead of system.  

I made a prototype of this (https://github.com/maihde/nomad/tree/periodic-system-jobs) and it generally provides the desired behaviour.  I'd appreciate comments before going further with testing and documentation.

Thanks,
~Michael


Reply all
Reply to author
Forward
0 new messages