Hey André,
I'm CCing this to the cloudscheduler mailing list, just to keep track
of these discussions, and so anyone's who's interested can follow. The
list is at
http://groups.google.com/group/cloudscheduler .
On 18-May-10, at 9:03 AM, Andre Charbonneau wrote:
> I sat down with Kyle for an hour this morning so that he could give
> me an overview of the current state of the NEP-52 project and where
> we are at. We also tried to find some action items that might be
> options for me to get back into the project. Here is that we found
> so far:
>
> * grid certificate handling at the cloud scheduler level
>
> Review how authentication happens between the cloud scheduler and
> nimbus and see how it can be debugged, tested and improved.
> Initially, this smells like it could be a good candidate for a robot
> certificate. Maybe I could investigate into this and see what are
> the options.
Yep, this sounds like a good idea, and actually, we already have it on
our mailing list for 0.6.
http://wiki.github.com/hep-gc/cloud-scheduler/roadmap
We aren't really sure about the best way to do this so far, since
there doesn't seem to be a good way to get proxies from Condor via its
SOAP interface. We talked about making it a feature that only works
when cloudscheduler and condor are installed on the same machine or
share a common filesystem or something (which I imagine will be the
most common configuration), and looking at the job to determine the
proxy location (probably with the x509userproxy classad attribute).
If you have a better idea, I'd love to hear it, and if you're
interested in implementing it, that would be awesome.
> * resource selection logic at cloud scheduler level
>
> I think right now the cloud scheduler will use a round robin method
> for selecting which resource to boot next (assuming it can boot the
> vm type, has vm slots left, etc...). I was thinking that maybe we
> could add a plugin at that level in the scheduler that will ask a
> seperate component to make a decision on which resource it is best
> to pick from to boot the next VM. I have a rule engine such as
> Drools in mind, but I'm sure there are other options available for
> this too. Such an approach would allow us to change the resource
> selection rules at runtime, without have to restart anything or
> cancel any running jobs. It will also give us a lot of flexibility
> and implement some wacky decisions rules if needed, such as "favor
> cloud A on Sundays because they are cheaper than the other clouds",
> or "we are debugging something, so whenever a resource is selected
> from cloud B, send an email to such and such and increment a counter
> in a database table there....". Maybe all this would be overkill
> and may be outside the scope of the project; let me know what are
> your thoughts about this.
I think that's probably a pretty good idea, but I've spent very little
time with the scheduling code, so mhp might have some ideas about
implementing this. I'm a little wary of using a rules engine like
drools, and just wonder whether the best language to do this in would
be python, and just do scheduling as python plugins with a common
interface. I think python is pretty high level and pretty expressive,
and we all know the syntax of python pretty well, so I don't really
think it makes sense to introduce another language. This is just my
opinion though, so if you have a different one, I'm okay to talk about
this.
There's even a really good library for doing this kind of constraint
programming, called python constraint (
http://labix.org/python-constraint
). Also, see this stack overflow answer:
http://bit.ly/al0VfI . It
shows how you can do stuff like "favor cloud A on Sundays because they
are cheaper than the other clouds" in python, without having to learn
a complicated BRMS.
Anyway, I'm excited to have you working on Cloud Scheduler! Welcome
back!
--patrick