POST or GET for calculations on one or multiple resources

158 views
Skip to first unread message

Rolph H

unread,
Mar 6, 2015, 11:51:38 AM3/6/15
to api-...@googlegroups.com

Dear list,


We are having some discussions between teams and would like to know your thoughts on the following:


We have a collection of servers and we need to calculate the datatraffic used by these servers (either one server  or multiple aggregated):


POST /servers/calculateUsage     

{

    "servers": [

        {

            "id": 1,

            "startDate": "2015-01-01 00:00:00",

            "endDate": "2015-02-01  01:02:00"

        },

        {

            "id": 2,

            "startDate": "2015-02-01 00:00:00",

            "endDate": "2015-02-01  05:00:00"

        },

    ]

}


This will sum all the datatraffic of the two servers in the specified timeframes and returns one value: { “usage”:  1421342134 }


At first we thought to use POST because of two reasons:

- the number of servers (1..n) the calculation needs to be performed on might exceed the maximum query string length 

- make a clear distinct between ‘real’  resources being created/requested and ‘calculations’ or ‘actions’ being performed  (for example, to reboot a server we would also do a POST /servers/{id}/reboot)


Now we have the discussion if this makes sense or not. What if we only need the usage for one server, would a GET request on a server resource be better? For example:


GET /servers/{id}/usage  (with queryParameters “startDate”  and “endDate”)


Instead of the POST method with only one serverId, start- and endDate?


Some say the GET is better and more REST and cacheable, others argue that to make our whole API look/feel uniform , stick to the POST method for all actions/calculations etc.


Regards,


Rolph



mca

unread,
Mar 6, 2015, 12:00:12 PM3/6/15
to api-...@googlegroups.com
Rolph:

for cases where the collection of query data has the potential to be arbitrarily large (e.g. could over-run the safe length of an HTTP query string), i use the following pattern:

1) create a message body with all the query data
2) POST that message body to create a new "query resource" and have the server return a URI for that resource
3) perform a GET to (or with) that query resource in order to get the results.

benefits are:
- you don't run into "too big" errors on queries
- you get to collect up queries for analysis and possible optimizations for oft-used queries
- you still get the caching benefits of a safe/idempotent query (step 3)
- you have the option of allowing users to manage their stored queries (including sharing them, if that's appropriate)

BTW - for step three you can arrange your server back end to accept either a set of standard query params OR a single URI that *contains* the query params. this lets you start w/ a simple resource that accepts a query string and THEN move to supporting "large queries" by just adding a new URI param. think of it like linux where you can pass args on a line or point to a file that contains the args.

cheers.

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

todd

unread,
Mar 7, 2015, 3:47:31 PM3/7/15
to api-...@googlegroups.com

Some say the GET is better and more REST and cacheable, others argue that to make our whole API look/feel uniform , stick to the POST method for all actions/calculations etc.

I just thought I follow up on your REST/cacheable point around the whys of the  GET-POST-GET: (GET-GET-GET) pattern. You might be taking this for granted so I apologise in advance if that is the case. I am usually thinking about a design that assumes that activities are unlikely to be uniquely unique (ie there are cache hits) and often repeated (eg hit refresh or just run the same/similar scenario again). All of this means to me that I want to avoid doing processing behind the external service interface (I read that as your POST /servers/calculateUsage). Processing behind makes reduces my scaling options (or more likely means I have to deal with the scaling issue rather than get the built-in benefits). So the aggregating function is something I would like to put out the front rather than keep it in behind (actually, more correctly it can go out the front or to the side).

Taking Mike's pattern, we are finding that designing your solutions requires a little more:

1. GET-POST-GET to create the search resource (which could be a async-type pattern to read status until finished)
2. GET (code-on-demand - that last constraint) which then is the service which can do the data aggregation (if that is handed off it would be a GET-POST-GET and then an async-type pattern to read status until finished)
3. GET-(GET-GET-GET) ... to return the collection links and then return items within the collection

The second step in a simple approach is a map function. It could be on the current client or even distributed if you have really want to get fancy. 

The whole point is to find the smallest optimal batch size based on the system characteristics. Our experience (like many others) is that by having lots and lots of GETs in step 3 we can make more performant systems both for the client and server - it makes us stay as simple as possible between request, query, response as possible to keep the throughput up on the server. Those responses are cachable when designed correctly (ie small and atomic). 

When talk about this type of design, we are often laughed at. One time, we demoed some work that had this design across an existing database. The existing system was very slow. We parallel coded this design into application. Ours worked quiet nicely - you started to see responses sub-second compared with 10+ seconds. The person we showed was impressed. He responded that was excellent. But would now like to see the live system. It is all very with the mock up but now wanted to see it run on the live system. I took a couple of explanations to point out it was live. 

Thomas Lörcher

unread,
Mar 8, 2015, 5:54:03 PM3/8/15
to api-...@googlegroups.com

Hi Rolph,

actually I think it's not a good decision to represent the calculation of big queries (in your case concerning many servers) with GET-parameters. 
First many popular caches don't handle get-query-parameters as cacheable identifiers.
Second and probably more importantly the caches won't be able to handle the order of the parameters.
So two GET-request with possibly the same parameters in different orders won't allocate to the same cache entry.

I had the same question in one of my projects. I've chosen the attempt mike advertised:
Generate a "query resource" and afterwards perform a GET on it.

Shortcoming of this approach is, that somebody that isn't aware of your search probably isn't able to take advantage of your search-cache-entry 
since he doesn't know there is already an aptly query-resource.

You can circumvent this by returning a resource with a unique URL. We generated the URL internally (on the server, client didn't take notice) 
as a HASH of the sorted query. On this way you can use the advantage of caches with the use of POST.

Regards Thomas

darrel...@gmail.com

unread,
Mar 8, 2015, 8:42:50 PM3/8/15
to api-...@googlegroups.com
Thomas,

From my research, it is a myth that caches don’t use query parameters as part of the cache key.  There is some historical evidence of caches ignoring query parameters to prevent some sites from cache busting to boost their traffic stats, however, that default behavior changed almost seven years ago.


As far as the order of query parameters is concerned, the best solution to that is to provide clients with URI templates and encourage clients to use those templates.  That ensures that parameters always appear in the same order.

Also, popular caches like Varnish have plug-ins that will sort the order of query string parameters to improve hit rates. https://www.varnish-software.com/blog/essential-vmods-all-varnish-users-should-know-about

Having said all of this, I’m still a fan of the “POST to create query resource and then GET it” approach.

Darrel


Sent from Surface

--

Andrew Braae

unread,
Mar 9, 2015, 3:11:41 AM3/9/15
to api-...@googlegroups.com
There are some additional hassles associated with the "POST to create query resource and then GET it” approach which are probably obvious, but worth mentioning just in case.

- You need to store those query resources somewhere, i.e. probably in a database
- You need to decide on the lifetime of the query resources (a minute? an hour? forever?)
- You need to build some daemon-like garbage collection code to clean up once that lifetime is reached (even if you provide a DELETE method the user may fail to call it)
- You may need some admin tools to let you see how many query resources you have, rates of growth, etc. etc.

Rolph H

unread,
Mar 9, 2015, 6:24:40 AM3/9/15
to api-...@googlegroups.com
I like the idea of creating a query resource and requesting the results, however it seems a bit to complicated for the consumer and like  Andrew pointed out also has some impact on the backend which is not really needed (storing the queries etc.).

The queries won't be requested often (maybe a few times in a very long period) and are mostly unique. Caching is not really needed (or even possible over very long periods).

I just read that Dropbox also switched from GET to POST (http://evertpot.com/dropbox-post-api/) and someone mentioned the REPORT method. Any opinions on that as possibility? In my opinion not really an option as it is hardly used.

While thinking over it, I think the calculation in quite some situations might be taking a long time, and we might want to make them async so your pattern might be the way to go.
 
What would you do if I only needed usage of a single resource over a certain period of time? Just implement a GET /servers/{id}/usage with queryParams as well or just stick to one implementation?




Op vrijdag 6 maart 2015 18:00:12 UTC+1 schreef Mike Amundsen:

Thomas Lörcher

unread,
Mar 9, 2015, 5:59:10 PM3/9/15
to api-...@googlegroups.com
Hi Darrel,

you're right, I exaggerated the issue about caching query parameters. I think you can expect the caches to handle it appropriately.
But I think the order is still an obstacle. You cannot expect caches to handle it on that way. Yes you can implement and configure 
your cache to do that but since the ReST approach states that you never know where there'll actually be a cache (propably theres a cache at your clients 
proxy) you can not generelly expect this.

But I agree to Andrew, that there are some shortcoming namely the storage of the query object. You as an architect have to ponder if it's worth 
in your specific scenario to do this or not.

Regards Thomas

Rolph H

unread,
Mar 15, 2015, 3:23:00 AM3/15/15
to api-...@googlegroups.com

Any advise on the last question:

What would you do if I only needed usage of a single resource over a certain period of time? Just implement a GET /servers/{id}/usage with queryParams as well or just stick to one implementation?



Op maandag 9 maart 2015 11:24:40 UTC+1 schreef Rolph H:

Jack Repenning

unread,
Mar 16, 2015, 1:02:36 PM3/16/15
to api-...@googlegroups.com
On Mar 15, 2015, at 12:23 AM, Rolph H <merli...@gmail.com> wrote:

What would you do if I only needed usage of a single resource over a certain period of time? Just implement a GET /servers/{id}/usage with queryParams as well or just stick to one implementation?

At one point in your discussion, you suggest that retrieving the data might take an inconveniently long time. If that applies, then it would trump all these other considerations. Assuming you can't do something in the back end to make it faster, you should seriously consider the "quickly create a query object that supports polling for progress and eventual results." 

-- 
Jack Repenning
Repenni...@gmail.com

signature.asc
Reply all
Reply to author
Forward
0 new messages