Job Queueing Framework

12 views
Skip to first unread message

Russell

unread,
May 2, 2011, 3:24:34 PM5/2/11
to cast...@googlegroups.com
Earlier I pushed a bunch of commits on a new framework for job queueing and execution. So far it is completely decoupled from Cast itself, so I wanted to explain what it is and how we can integrate it.

The framework has four main components:

1. The DirectoryResource class (which implements the "Resource" interface which I didn't bother actually making - thus far we have no other type of resource). The premise of this entire framework is that jobs operate on "resources". We would, for example, make the Instance class inherit from DirectoryResource in order to allow jobs to operate on Instances.

2. The ResourceManager class. Each resource type is managed by a ResourceManager, which takes care of instantiating Resource objects as necessary. We would create an InstanceManager, etc.

3. The JobManager class. Each ResourceManager registers itself with the JobManager to handle jobs for resources of a given type. The InstanceManager would register itself to handle jobs for Instances.

4. The Job class. We subclass jobs and implement the 'run' method. Job objects must have 'resourceName', 'resourceType', 'args' and 'options' fields by the time they are submitted to the JobManager.

In use, we might have an UpgradeInstanceJob class. We would instantiate an UpgradeInstanceJob with "new InstanceUpgradeJob('foo', 'v2.0')" then pass the job to the JobManager where it is assigned an id then routed to the appropriate ResourceManager. The ResourceManager instantiates an Instance named "foo" if one doesn't already exist then enqueues the job for that resource.

When a Job is first enqueued for a resource it attempts to verify that the resource will exist when the job is executed by traversing forward through the job queue looking for jobs with CREATE or DELETE options, or if none are found it checks whether the resource already exists. If the resource does not already exist, or the job is somewhere behind a DELETE (without a subsequent CREATE) in the queue the job will never emit a "ready" event and instead just emit an "error" stating that the resource doesn't exist. While this adds some complexity, it will allow us to quickly 404 requests for nonexistent resources while still allowing us to enqueue jobs based on predicted state, ie one can enqueue an "upgrade instance" immediately following a "create instance", without having to wait for it to complete. If we're ok with accepting requests for resources that don't exist, and just having them fail later we could rip this out and remove a lot of complexity.

Anyway, from there things work about how you would expect: the job waits in the queue for its resource until it reaches the front, then it executes. Throughout the lifetime of a job it's 'status' is updated and it emits various events including "ready", "start", "error" or "success". When an "error" or "success" is emitted the job sets its 'status' to completed and its result to the error or result passed to the 'run' callback.

The one thing I haven't done yet is implement "read" access to resources. There are two options here, we can either block reads until the currently running job completes, or we can cache the resource state before a job starts. The latter would almost always turn out to be a waste of resources, but would save us from blocking reads on potentially very long running jobs.

This is getting long so I'm cutting myself off, but I think its important we get this right, so I'd appreciate any feedback, etc that people have to offer. For some very contrived examples of this in use see tests/simple/job.js

-Russell

Tomaž Muraus

unread,
May 2, 2011, 6:25:04 PM5/2/11
to cast...@googlegroups.com
I have just finished reviewing your changes.

Most of them look good. There were some minor issues with directly accessing "private" attributes instead of using a method to make it more obvious, but other than that, the changes look pretty sane.

Me and Russell also discussed on IRC that we should always open a pull request on Github before merging the changes, because this also makes it possible to add comments to the whole diff and not just to each commit separately.

Next step should probably be addressing those issues, adding some more tests and adding an HTTP endpoint for  the Jobs API.

HTTP endpoint change is probably blocked by the "porting to Expresso task", right?
Reply all
Reply to author
Forward
0 new messages