Jobs API and REST API Refactoring (was: Cast roadmap and plans for the next release)

15 views
Skip to first unread message

Tomaž Muraus

unread,
Apr 28, 2011, 5:08:27 PM4/28/11
to cast...@googlegroups.com
On Thu, Apr 28, 2011 at 7:12 PM, Russell <russell...@gmail.com> wrote:
I've been thinking about the API refactoring (and discussing it with Tomaž in #cast-project), and what I'd like to do is:

1. Move all non-http logic out of lib/services/http/ into lib/control/ or somewhere.

Yes, we should do that, but I am not sure what we should do with some of the actions which are only composed of a single function call to the code in some module (e.g. performing an action on a service - in this case you only need to call a single function - "manager.runAction"...).

Adding a wrapper function and another layer of indirection seems unnecessary here, but we do need to have a clear separation between a public (classes and methods to which among other things, plugins have access to) and a private API.

2. The new code in lib/control/ should return (or pass back via a callback) "Job" objects on all calls that modify agent state (ie, anything not currently behind a GET). Jobs should go into a queue corresponding to the resource they operate on. In this case our top level resources are bundles, instances, and certificate signing requests. Logically services belong to instances, so service operations should go into a queue based on the instance they belong to (Tomaž and I were discussing modifying the public API to reflect this, but for now we want to limit the scope of our changes).

Yes, things this will also make cluster stuff and handling possible race conditions easier.
 
Job objects should be EventEmitters that emit:

'queue' - Emitted when a job validated, given a globally unique 'id' and assigned to a queue.

imo 'ready' is a better name than 'queue' for this event.
 
'error' - Emitted if the job fails due to an error.
'message' - I'd like to be able to emit 'message' events so we can provide users feedback on what is actually happening on the agent.
'success' - Emitted when the job succeeds.

I've gone back and forth on whether jobs should be returned directly (implying no validation of the request, since most validation would require async calls) or passed back via a callback. Node's APIs all seem to pass back event emitters directly, even if they will be useless due to an error (see net.createConnection), but on the other hand it would be nice to, for example, 404 requests to instances that don't exist.

The solution would be to, for the HTTP API, not respond to the request until the Job emits a 'queue' event. At that point the Job has an 'id', so we return either the id or the entire serialized job (id included).

3. At this point someone is free to branch off and work on the plugin system which will basically give user plugin code access to the 'control' API.

4. The HTTP layer should be reworked to use node-swiz. If we want to switch to Express, this would probably be where we should do it.

+1, switching to Express will require a lot of work and means modifying a lot of tests, but it's easier to do it now than later on.
 
5. A new HTTP API (and corresponding 'control' API) should be added for accessing jobs. Given a job's id, callers should be able to retrieve the current state of a job, "tail" it for "messages" (ie, "GET /1.1/jobs/102/messages?next=0" should return a list of 1 or more messages as soon as at least 1 is available), or wait for it to complete ("GET /1.1/jobs/102/result").

Obviously the Jobs stuff would be quite a bit of work, but I think would give us a lot of good things. We can stream messages back to the user (its nice to have UI feedback for long running jobs), our API becomes "non-blocking" (personally I don't care about this, but I know Tomaž and Paul expressed an interest), but the "tail job" and "wait for job to complete" calls eliminate the need for polling. Plus it solves a lot of locking problems.

Anyway, curious to know what others think of this, if anyone has suggestions/improvements, etc.

-Russell 

Russell

unread,
Apr 29, 2011, 7:13:42 PM4/29/11
to cast...@googlegroups.com
2011/4/28 Tomaž Muraus <ka...@k5-storitve.net>
On Thu, Apr 28, 2011 at 7:12 PM, Russell <russell...@gmail.com> wrote:
I've been thinking about the API refactoring (and discussing it with Tomaž in #cast-project), and what I'd like to do is:

1. Move all non-http logic out of lib/services/http/ into lib/control/ or somewhere.

Yes, we should do that, but I am not sure what we should do with some of the actions which are only composed of a single function call to the code in some module (e.g. performing an action on a service - in this case you only need to call a single function - "manager.runAction"...).

Adding a wrapper function and another layer of indirection seems unnecessary here, but we do need to have a clear separation between a public (classes and methods to which among other things, plugins have access to) and a private API.

We'll have to rework some of those types of calls anyway in order to queue them appropriately. I'm prototyping a system for this now, its going to take a little while to get it working though.
 
2. The new code in lib/control/ should return (or pass back via a callback) "Job" objects on all calls that modify agent state (ie, anything not currently behind a GET). Jobs should go into a queue corresponding to the resource they operate on. In this case our top level resources are bundles, instances, and certificate signing requests. Logically services belong to instances, so service operations should go into a queue based on the instance they belong to (Tomaž and I were discussing modifying the public API to reflect this, but for now we want to limit the scope of our changes).

Yes, things this will also make cluster stuff and handling possible race conditions easier.
 
Job objects should be EventEmitters that emit:

'queue' - Emitted when a job validated, given a globally unique 'id' and assigned to a queue.

imo 'ready' is a better name than 'queue' for this event.

True.
 
'error' - Emitted if the job fails due to an error.
'message' - I'd like to be able to emit 'message' events so we can provide users feedback on what is actually happening on the agent.
'success' - Emitted when the job succeeds.

I've gone back and forth on whether jobs should be returned directly (implying no validation of the request, since most validation would require async calls) or passed back via a callback. Node's APIs all seem to pass back event emitters directly, even if they will be useless due to an error (see net.createConnection), but on the other hand it would be nice to, for example, 404 requests to instances that don't exist.

The solution would be to, for the HTTP API, not respond to the request until the Job emits a 'queue' event. At that point the Job has an 'id', so we return either the id or the entire serialized job (id included).

3. At this point someone is free to branch off and work on the plugin system which will basically give user plugin code access to the 'control' API.

4. The HTTP layer should be reworked to use node-swiz. If we want to switch to Express, this would probably be where we should do it.

+1, switching to Express will require a lot of work and means modifying a lot of tests, but it's easier to do it now than later on.

Yeah, thats my take as well. Considering it was Paul's idea, I think we should plan to move forward with it unless anyone else has any objections.
Reply all
Reply to author
Forward
0 new messages