Fwd: cloud scheduler stuff

15 views
Skip to first unread message

Patrick Armstrong

unread,
May 18, 2010, 2:05:08 PM5/18/10
to cloudsc...@googlegroups.com


Begin forwarded message:

> From: Andre Charbonneau <Andre.Ch...@nrc-cnrc.gc.ca>
> Date: May 18, 2010 9:03:32 AM PDT (CA)
> To: Patrick Armstrong <patr...@uvic.ca>
> Cc: Kyle Fransham <fran...@uvic.ca>
> Subject: cloud scheduler stuff
>
> Hi Patrick,
>
> I sat down with Kyle for an hour this morning so that he could give
> me an overview of the current state of the NEP-52 project and where
> we are at. We also tried to find some action items that might be
> options for me to get back into the project. Here is that we found
> so far:
>
> * grid certificate handling at the cloud scheduler level
>
> Review how authentication happens between the cloud scheduler and
> nimbus and see how it can be debugged, tested and improved.
> Initially, this smells like it could be a good candidate for a robot
> certificate. Maybe I could investigate into this and see what are
> the options.
>
>
> * resource selection logic at cloud scheduler level
>
> I think right now the cloud scheduler will use a round robin method
> for selecting which resource to boot next (assuming it can boot the
> vm type, has vm slots left, etc...). I was thinking that maybe we
> could add a plugin at that level in the scheduler that will ask a
> seperate component to make a decision on which resource it is best
> to pick from to boot the next VM. I have a rule engine such as
> Drools in mind, but I'm sure there are other options available for
> this too. Such an approach would allow us to change the resource
> selection rules at runtime, without have to restart anything or
> cancel any running jobs. It will also give us a lot of flexibility
> and implement some wacky decisions rules if needed, such as "favor
> cloud A on Sundays because they are cheaper than the other clouds",
> or "we are debugging something, so whenever a resource is selected
> from cloud B, send an email to such and such and increment a counter
> in a database table there....". Maybe all this would be overkill
> and may be outside the scope of the project; let me know what are
> your thoughts about this.
>
>
> Any other ideas of areas of the project I could work on?
>
> Thanks,
> Andre
>
> CC: Kyle
>

Patrick Armstrong

unread,
May 18, 2010, 2:21:18 PM5/18/10
to Andre Charbonneau, cloudsc...@googlegroups.com, Kyle Fransham
Hey André,

I'm CCing this to the cloudscheduler mailing list, just to keep track
of these discussions, and so anyone's who's interested can follow. The
list is at http://groups.google.com/group/cloudscheduler .

On 18-May-10, at 9:03 AM, Andre Charbonneau wrote:
> I sat down with Kyle for an hour this morning so that he could give
> me an overview of the current state of the NEP-52 project and where
> we are at. We also tried to find some action items that might be
> options for me to get back into the project. Here is that we found
> so far:
>
> * grid certificate handling at the cloud scheduler level
>
> Review how authentication happens between the cloud scheduler and
> nimbus and see how it can be debugged, tested and improved.
> Initially, this smells like it could be a good candidate for a robot
> certificate. Maybe I could investigate into this and see what are
> the options.

Yep, this sounds like a good idea, and actually, we already have it on
our mailing list for 0.6. http://wiki.github.com/hep-gc/cloud-scheduler/roadmap

We aren't really sure about the best way to do this so far, since
there doesn't seem to be a good way to get proxies from Condor via its
SOAP interface. We talked about making it a feature that only works
when cloudscheduler and condor are installed on the same machine or
share a common filesystem or something (which I imagine will be the
most common configuration), and looking at the job to determine the
proxy location (probably with the x509userproxy classad attribute).

If you have a better idea, I'd love to hear it, and if you're
interested in implementing it, that would be awesome.


> * resource selection logic at cloud scheduler level
>
> I think right now the cloud scheduler will use a round robin method
> for selecting which resource to boot next (assuming it can boot the
> vm type, has vm slots left, etc...). I was thinking that maybe we
> could add a plugin at that level in the scheduler that will ask a
> seperate component to make a decision on which resource it is best
> to pick from to boot the next VM. I have a rule engine such as
> Drools in mind, but I'm sure there are other options available for
> this too. Such an approach would allow us to change the resource
> selection rules at runtime, without have to restart anything or
> cancel any running jobs. It will also give us a lot of flexibility
> and implement some wacky decisions rules if needed, such as "favor
> cloud A on Sundays because they are cheaper than the other clouds",
> or "we are debugging something, so whenever a resource is selected
> from cloud B, send an email to such and such and increment a counter
> in a database table there....". Maybe all this would be overkill
> and may be outside the scope of the project; let me know what are
> your thoughts about this.

I think that's probably a pretty good idea, but I've spent very little
time with the scheduling code, so mhp might have some ideas about
implementing this. I'm a little wary of using a rules engine like
drools, and just wonder whether the best language to do this in would
be python, and just do scheduling as python plugins with a common
interface. I think python is pretty high level and pretty expressive,
and we all know the syntax of python pretty well, so I don't really
think it makes sense to introduce another language. This is just my
opinion though, so if you have a different one, I'm okay to talk about
this.

There's even a really good library for doing this kind of constraint
programming, called python constraint (http://labix.org/python-constraint
). Also, see this stack overflow answer: http://bit.ly/al0VfI . It
shows how you can do stuff like "favor cloud A on Sundays because they
are cheaper than the other clouds" in python, without having to learn
a complicated BRMS.



Anyway, I'm excited to have you working on Cloud Scheduler! Welcome
back!

--patrick

Patrick Armstrong

unread,
May 18, 2010, 2:25:19 PM5/18/10
to Andre Charbonneau, Kyle Fransham, cloudsc...@googlegroups.com
Also, if you're really keen, our support for Eucalyptus is pretty
poorly tested at this point (partially because we're having trouble
keeping a Eucalyptus cluster running), and OpenNebula is non-existant.
That would be a neat thing to have.

--patrick
Reply all
Reply to author
Forward
0 new messages