scheduler new feature: task dependencies

148 views
Skip to first unread message

Niphlod

unread,
Aug 5, 2014, 4:51:25 AM8/5/14
to web...@googlegroups.com
Hi @all,
   we have another feature in trunk for the scheduler... Jobs (i.e. task dependencies)

Directly from https://github.com/niphlod/w2p_scheduler_tests/ (that has been updated to accomodate the new feature explanation...)


What are "Jobs", you ask ? Well, it's a way to coordinate a set of tasks that have dependencies (what in Celery is called "Canvas").

As always, the Scheduler sticks to the basics. Every Job is considered to be a DAG (a  Directed Acyclic Graph).
Without going into silly details, every task can have one or more dependencies, but of course you can't have mutual dependencies among the same tasks.
If a "job" can't be represented as a DAG, then it can't be processed in its entirety. You can still queue it, but it won't ever complete (i.e. you could have a
  complete stall at the first task or just a task left on 100 queued...)

So... what can you do ?
Let's take a trivial example (there are a few based on mathematics, map/reduce, etc... but hey, this is an example!!!)
Suppose you need to create a job that describes what is needed to get dressed ( thanks to http://hansolav.net/sql/graphs.html )...

We have a few items to wear, and there's an "order" to respect...
Items are: watch, jacket, shirt, tie, pants, undershorts, belt, shoes, socks

Now, we can't put on the tie without wearing the shirt first, etc...




Suppose we have those tasks queued in a controller (for example's sake, the same function, with different task_name(s))...
watch = s.queue_task(fname, task_name='watch')
jacket
= s.queue_task(fname, task_name='jacket')
shirt
= s.queue_task(fname, task_name='shirt')
tie
= s.queue_task(fname, task_name='tie')
pants
= s.queue_task(fname, task_name='pants')
undershorts
= s.queue_task(fname, task_name='undershorts')
belt
= s.queue_task(fname, task_name='belt')
shoes
= s.queue_task(fname, task_name='shoes')
socks
= s.queue_task(fname, task_name='socks')


Now, there's a helper class to construct and validate a "job".
First, let's declare a job named "job_1"


#from gluon.scheduler import JobGraph
myjob
= JobGraph(db, 'job_1')



Next, we'd need to establish dependencies


# before the tie, comes the shirt
myjob
.add_deps(tie.id, shirt.id)
# before the belt too comes the shirt
myjob
.add_deps(belt.id, shirt.id)
# before the jacket, comes the tie
myjob
.add_deps(jacket.id, tie.id)
# before the belt, come the pants
myjob
.add_deps(belt.id, pants.id)
# before the shoes, comes the pants
myjob
.add_deps(shoes.id, pants.id)
# before the pants, comes the undershorts
myjob
.add_deps(pants.id, undershorts.id)
# before the shoes, comes the undershorts
myjob
.add_deps(shoes.id, undershorts.id)
# before the jacket, comes the belt
myjob
.add_deps(jacket.id, belt.id)
# before the shoes, comes the socks
myjob
.add_deps(shoes.id, socks.id)



Then, we can ask JobGraph if what we asked is a job that is accomplishable

myjob.validate('job_1')

And voilà, job done! If it's not a DAG, then an exception will be raised and the jobs won't be committed (of course their dependencies won't be committed too)

How it works under the hood ?

There's a new table called scheduler_task_deps that holds a reference to the job_name, the task parent, the task child and
a boolean to mark the "path" (the arrows in the graph) as "visitable".
To be fair, the job name isn't that important, you can have task dependencies amongst
different jobs, it's just not that easy to verify that the Job is a DAG at a later stage.
If a path is "visitable" it means that the DAG graph can be "walked" in that direction.
Every time a task gets "COMPLETED", the "paths" gets updated to be "visitable". The algo to pick up tasks has been updated
to work fetching only tasks that have no dependencies, or dependencies that have already been satisfied (i.e. tasks that depends
on nothing, or tasks that depend on tasks that are yet COMPLETED).


Let me know what you think, and if you spot bugs.



Dave S

unread,
Aug 5, 2014, 2:52:40 PM8/5/14
to web...@googlegroups.com


On Tuesday, August 5, 2014 1:51:25 AM UTC-7, Niphlod wrote:
Hi @all,
   we have another feature in trunk for the scheduler... Jobs (i.e. task dependencies)

Directly from https://github.com/niphlod/w2p_scheduler_tests/ (that has been updated to accomodate the new feature explanation...)


What are "Jobs", you ask ? Well, it's a way to coordinate a set of tasks that have dependencies (what in Celery is called "Canvas").
[...]
Let me know what you think, and if you spot bugs.

My first reaction is, "Sweet!"  The explanation is nice and simple and the hooks look very usable.  Now I just have to learn to use the scheduler   ;-)
(I haven't need it yet, but I've been reading along with the posts here, so I have a vague idea of what to do.)

/dps

Limedrop

unread,
Aug 5, 2014, 4:30:22 PM8/5/14
to web...@googlegroups.com
Thanks Niphlop, that's brilliant!  I've been using a work-around to schedule dependent jobs, and this will help to tidy things up.

Andrew W

unread,
Aug 6, 2014, 5:28:11 PM8/6/14
to web...@googlegroups.com
Sounds great. Looking forward to testing it out. Thanks
Andrew W

Andrew W

unread,
Aug 6, 2014, 9:18:07 PM8/6/14
to web...@googlegroups.com
P.S.  Although I expect they are not the same thing, I am interested to see how this dependency feature may be used to implement the workflow ideas expressed in:

Niphlod

unread,
Aug 7, 2014, 5:29:18 AM8/7/14
to web...@googlegroups.com
I don't see exactly how. If you prepare a job, the "path" is already fixed.... the idea behind a workflow is to make "turns" between several "paths" in reply to some events or user interactions or states.
That being said, you could theoretically queue tasks dinamically, and the added feature will help in case those tasks are "segments" of the same "path".....this means that a "workflow" that describes how to reach Los Angeles from Washington can be composed by "scheduler jobs" that connect 1-4 cities in a fixed way, but there still must be something ("workflows") that will "set the path to follow" interconnecting "jobs". 
Reply all
Reply to author
Forward
0 new messages