Task scheduler as a library?

Alex Buchanan

unread,

Oct 11, 2017, 3:14:46 PM10/11/17

to golang-nuts

Hey all,

In Funnel (a distributed task toolkit) we're sort of dancing around having a full-on scheduler. We have a scheduler that has grown from development util, to prototype, to something we actually use, but it's missing many of the features you'd want in production. Mostly we aim to delegate scheduling to another application (SGE, Slurm, AWS Batch, Kubernetes, etc), but having a built-in ability to schedule tasks without extra infrastructure is undeniably attractive.

Writing a scheduler is one of those things people warn you away from though. I wish there was a solid library we could embed, but I haven't found anything.

I wanted to get some opinions from this community. Do you know of any scheduling libraries? Do you think having scheduling built in is a good idea? A bad idea? Should we keep chipping away at it? Would people be interested in a standalone scheduling library, or is this problem inherently too complex to be adequately captured in library form?

Thanks!

Funnel: https://github.com/ohsu-comp-bio/funnel

Jimmy Tang

unread,

Oct 15, 2017, 3:00:03 PM10/15/17

to golang-nuts

Not to throw a spanner into the works, but we have a similar problem in my work environment of needing a scheduler to schedule distributed jobs, one problem of writing a *nice* one for a given language is that you end up being pigeon holed into one solution. We've been looking at using drmaa as a way of accessing different schedulers in a more platform and language agnostic way. It may be worth your while to take a look at the golang bindings for drmaa so you aren't left reinventing the wheel. Maybe creating a dumb scheduler for drmaa library might be the way to go?

Alex Buchanan

unread,

Oct 15, 2017, 8:53:25 PM10/15/17

to golang-nuts

Not a spanner at all.

I think the Task Execution Schemas (TES) [1], which Funnel is based on, is a reinvention of DRMAA using technologies such as HTTP, REST, JSON, Protobuf. It's a pretty simple API and message type (Task) for create, get, list, cancel. But, admittedly, I don't know enough about DRMAA. I get a bit overwhelmed by its documentation, to be honest.

Funnel is an implementation of the TES spec. We'd like to keep it versatile for the reason you mentioned; many solutions end up feeling heavyweight and you get pigeon holed. We also think a lot about workflows, and the status quo is similar there. Funnel already supports many environments and schedulers: GCE, AWS, HTCondor, SGE, etc, etc. We're talking about adding Kubernetes. We're always thinking of ways to make it easier and more flexible, hence the thoughts about how far we should take the scheduler. If you need to run 10K tasks in a new GCE project on preemptible machines, how easy can we make that?

I encourage you to take a look at Funnel, let me know what you think. If it's lacking something that you need, I'd be interested in hearing about it.

Anywho, I'll stop ranting now. Thanks for the feedback!

-Alex

[1] https://github.com/ga4gh/task-execution-schemas

Jimmy Tang

unread,

Oct 17, 2017, 2:32:17 AM10/17/17

to golang-nuts

On Monday, 16 October 2017 01:53:25 UTC+1, Alex Buchanan wrote:

Not a spanner at all.

I think the Task Execution Schemas (TES) [1], which Funnel is based on, is a reinvention of DRMAA using technologies such as HTTP, REST, JSON, Protobuf. It's a pretty simple API and message type (Task) for create, get, list, cancel. But, admittedly, I don't know enough about DRMAA. I get a bit overwhelmed by its documentation, to be honest.

Yea the DRMMA documentation is quite overwhelming, I think it was written a long time ago by Sun/SGI users AFAIK.

Funnel is an implementation of the TES spec. We'd like to keep it versatile for the reason you mentioned; many solutions end up feeling heavyweight and you get pigeon holed. We also think a lot about workflows, and the status quo is similar there. Funnel already supports many environments and schedulers: GCE, AWS, HTCondor, SGE, etc, etc.

We're talking about adding Kubernetes. We're always thinking of ways to make it easier and more flexible, hence the thoughts about how far we should take the scheduler. If you need to run 10K tasks in a new GCE project on preemptible machines, how easy can we make that?

Having tried using things like gleam/glow (see https://github.com/chrislusf/gleam) and dask (which has similar ideas but written in python) it sounds like a FIFO work stealing type scheduler wouldn't be a bad idea. In particular dask's approach is quite nicely put together. Though one thing that I have noticed with a lot of these non-HPC type schedulers is that they are great for doing embarrassingly parallel type problems but fall apart reasonably quickly once you get into MPI territory where you want to be careful about how you layout the processes/network for your job, probably things to consider when writing a scheduler.

I encourage you to take a look at Funnel, let me know what you think. If it's lacking something that you need, I'd be interested in hearing about it.

Will do, this looks interesting and may address some of my needs for wanting an web API on top of Slurm.

Alex Buchanan

unread,

Oct 17, 2017, 2:04:55 PM10/17/17

to golang-nuts

Having tried using things like gleam/glow (see https://github.com/chrislusf/gleam) and dask (which has similar ideas but written in python) it sounds like a FIFO work stealing type scheduler wouldn't be a bad idea. In particular dask's approach is quite nicely put together. Though one thing that I have noticed with a lot of these non-HPC type schedulers is that they are great for doing embarrassingly parallel type problems but fall apart reasonably quickly once you get into MPI territory where you want to be careful about how you layout the processes/network for your job, probably things to consider when writing a scheduler.

Ya, Funnel currently has a really simple scheduler along these lines, but the minute it had two users, we realized we wanted something smarter (fair scheduling). Hadn't considered MPI yet, as I'm more interested in large batch jobs, but I can see people wanting it, and data locality, and worker admin, and, and, and pretty soon you've just built Kubernetes :)

Many projects have done this though, so it seems like a well understood problem, so it seems like something that could stand on its own as a library, built in Go and wrapped by other languages via C bindings.

Reply all

Reply to author

Forward