How to design URIs to initate batch jobs in a RESTful service

127 views
Skip to first unread message

API Dev

unread,
Jun 27, 2017, 8:22:15 AM6/27/17
to API Craft
I need to create rest apis for triggering, aborting and checking the status of a batch job submitted in our enterprise application. The batch job basically includes a bunch of processes. Each process includes a bunch of operations that perform data transformations/calculations on the configured input data set. What should be the REST guidelines namely the url and the HTTP method that the api's should adhere to? 

To detail on the requirements
1. Trigger api accepts the job name along with other execution parameters. It returns a job id. 
2. Abort & check status accept the job id that was returned by the trigger api returns

Given that the api urls should be focused on resources and the requirements here talk about actions to be taken on a job, I am kind of looking for inputs on what should the api contract look like. Any examples that I can look out for?

Thanks.

André Tavares

unread,
Jun 27, 2017, 11:43:07 AM6/27/17
to API Craft
Hi,

You can model it like any other CRUD (Create, Read, Update and Delete)...

POST /jobs --> Creates a new Job, triggers its execution and returns HTTP 202/Accepted with a {JOB_ID}
GET /jobs/{JOB_ID} --> Returns the status of the Job
DELETE /jobs/{JOB_ID} --> Aborts execution of given Job

Think it should do the trick!

Andrew Braae

unread,
Jun 27, 2017, 4:16:44 PM6/27/17
to API Craft
To add to André's  example, depending on your use case, DELETE might not be quite right. Do you want to:
- remove all trace of the job, or
- just abort it, leaving it there for future reference?

If the latter, you could break out the job's "activity" as a separate resource. Then, to abort a job, you would do: 

DELETE /jobs/{JOB_ID}/activity

mca

unread,
Jun 27, 2017, 4:25:47 PM6/27/17
to api-...@googlegroups.com
make sure you know how to handle failed jobs on the server side and how to explain that to the client.

ex: 20 recs, first 10 succeed, 11th fails.
- do you stop now or keep going?
- do you reject the whole job (e.g. roll back successful ones)?
- do you reject only the filed one(s)? how to you tell the client what failed and how to fix it?
- if one or more fail for differing reasons, how do you tell that to the client?

do you allow clients to view the progress of the job? if so, how?
can clients cancel jobs even if they are going w/o errors? how to do handle rollback?

job processing at a distance is messy. be ready to spend most of your time coding to errors instead of coding for success.


LEVEL 14, 23 CUSTOMS STREET, PO BOX 106 769, AUCKLAND 1143, NEW ZEALAND


--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

Ismael Celis

unread,
Jun 29, 2017, 6:55:29 PM6/29/17
to API Craft
I recently built something similar to this. I split job management into the following:

* POST /jobs creates an initial job (which you can get with GET /jobs/{id})
* POST /jobs/{id}/tasks adds tasks to a job (bulk inventory data in this case). The idea is that the client can call this repeatedly to add task to an open job
* PUT /jobs/{id}/run start processing tasks in a job. I know, "run" isn't a noun and not very "restful" in the narrow sense, but I don't think it matters that much in this case.
* GET /jobs/{id}/status this starts a server-sent-events response, if instructed by the client, and streams back real-time status updates (JSON-encoded). Or a single status response if it's a regular GET
* PUT /jobs/{id}/revert reverts successful tasks in a job (tasks store the current state before running, so they can be reverted. There's edge cases in doing this but it works for my use case)
* PUT /jobs/{id}/retry retries failed tasks in a job.
* GET /jobs/{id}/tasks{?status} list tasks in a job, optionally filtering them by status (pending, failed, successful, reverted)


Job entities include links to the available actions depending on current status (using HAL). So for example an open job includes the "add_tasks" and "run_job" links, but not the "revert_job" one (since there's nothing to revert yet). This simplifies the client somewhat, because it just needs to present the user with relevant controls and views depending on the actions present in the API response at any given time.

Hope that helps.

API Dev

unread,
Jun 30, 2017, 3:07:36 AM6/30/17
to API Craft
I appreciate the comments on this topic. Thanks for sharing your insights.

The jobs api that we need to expose right now is to allow the user to execute the job. The job creation and task addition/deletion would be done through the UI. 

@Ismael - Your api seems to match closely with the requirements we have. I have a few questions on the api contract

>  * PUT /jobs/{id}/run start processing tasks in a job.
> PUT /jobs/{id}/revert reverts successful tasks in a job 

Why do you choose PUT over POST for triggering/aborting a run of a job?
 
GET /jobs/{id}/status this starts a server-sent-events response, if instructed by the client, and streams back real-time status updates 

Could you please elaborate on how do you implement this as a real time streaming api? 

Thanks

Ismael Celis

unread,
Jun 30, 2017, 3:29:52 PM6/30/17
to API Craft
Hi


The job creation and task addition/deletion would be done through the UI. 

Yup, same here, but the UI uses the API endpoints described above to add tasks,
 

>  * PUT /jobs/{id}/run start processing tasks in a job.
> PUT /jobs/{id}/revert reverts successful tasks in a job 

Why do you choose PUT over POST for triggering/aborting a run of a job?

My thinking was that run, revert, abort, etc should be idempotent operations. Ie. running a job that's already running shouldn't do anything, same with reverting an already reverted job, etc. PUT tends to be used for such operations whereas POST tends to be used for creating new resources.

 
GET /jobs/{id}/status this starts a server-sent-events response, if instructed by the client, and streams back real-time status updates 

Could you please elaborate on how do you implement this as a real time streaming api? 

I guess it'll depend on the Client. I'm using Ruby and the Sinatra library, which makes it really easy. Then I have a background worker process that runs tasks in a job and puts updates in a Redis key (using the job ID as the key). The server-sent-events GET handler has a loop that pops updates off of the Redis key and flushes them back to the client, using Redis' BPOP operation https://redis.io/commands/blpop

But the same can be implemented in any number of ways and languages, even in-process. I chose Redis so multiple nodes behind a load balancer can access updates for the same jobs.

Hope that helps

Ismael Celis

unread,
Jun 30, 2017, 3:31:59 PM6/30/17
to API Craft
"I guess it'll depend on the Client" should have been "I guess it'll depend on the stack" you're using.

The client is JS running on the browser, using the built-in EventSource API which is pretty simple.
Reply all
Reply to author
Forward
0 new messages