GET w/too large query vs. POST returning resource representation

97 views
Skip to first unread message

Mike Schinkel

unread,
Sep 30, 2012, 11:13:38 PM9/30/12
to api-...@googlegroups.com
Hi all,

I'm hoping you can help me validate the following API workflow and/or suggest better options.

Let's say I'm working on a server API that generates reports from complex criteria passed to the API.  If the criteria were relatively simple we might just do a GET on a report URL, like this:

GET /report?criteria1=value1&criteria2=value2&…criteriaN=valueN HTTP/1.1
      
However if the criteria could easily be longer than 2000 characters then GET isn't really an option so we can look at POST instead:

POST /report HTTP/1.1
   
criteria1=value1&criteria2=value2&…criteriaN=valueN 
   
However, I came across this from Roy Fielding:

It isn’t RESTful to use POST for information retrieval when 
that information corresponds to a potential resource, because 
that usage prevents safe reusability and the network-effect of 
having a URI.
-----

So it would seem that I would be better doing the following:

POST /report/new HTTP/1.1
   
criteria1=value1&criteria2=value2&…criteriaN=valueN 
   
The server could then create a new report and locally store the JSON for a timeout period. The POST would then respond with a 303 "See Other" status code and a location header like this:

Location: /report/{report_guid}

Then the following GET would return the report's JSON as it's response representation:

GET /report/{report_guid} HTTP/1.1
   
Then after the timeout period GETting the report URL would be invalid and it should return something to tell the client that the client needs to resubmit the POST along with the previously submitted criteria that that API doesn't know any more, i.e. the client will need to have remembered it.
   
So if all that makes sense what should my HTTP response code be on the last GET after the timeout period, and how should I tell the client to resubmit that previous report criteria that it needed to remember?

- 205 "Reset Content" with a Location header?  
- 302 "Found" with a Location header?  
- 307 "Temporary Redirect" with a Location header (ignoring that the spec says to keep the same HTTP Verb)?
- A different status code with a Location header?  
- A different status code with the URL in the body?
- Something else entirely? 

Thanks in advance for helping me think this one through.

-Mike

mca

unread,
Sep 30, 2012, 11:27:39 PM9/30/12
to api-...@googlegroups.com
typically my pattern for handling requests w/ large filter criteria is...

1) allow client to formulate the filter criteria (usually using some hypermedia control(s) that provide options, suggested values, etc.)
2) instruct the client to POST the criteria to the server thereby creating a stored criteria resource
3) server then returns either a 3xx with instructions to do a GET on the returned URI or gets a 201 w/ the created URI
4) client takes the returned URI and executes the GET to return the results of the filter.

usually steps 3 & 4 are seamless to users (i.e. no need to "re-request" the filter results, just handle the redirection)

note i do not store the executed results, just the filter. the exection of the filter happens when the client does a GET on the appropriate URI.

I also provide a URI that returns the list of available stored filters (in some cases shared among users, too).

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group, send email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft?hl=en.
 
 

Mike Schinkel

unread,
Sep 30, 2012, 11:43:24 PM9/30/12
to api-...@googlegroups.com
Hi Mike,

On Sep 30, 2012, at 11:27 PM, mca <m...@amundsen.com> wrote:
typically my pattern for handling requests w/ large filter criteria is...

Thanks for your feedback. 

1) allow client to formulate the filter criteria (usually using some hypermedia control(s) that provide options, suggested values, etc.)
2) instruct the client to POST the criteria to the server thereby creating a stored criteria resource

Yep, that follows with what I have below.

3) server then returns either a 3xx with instructions to do a GET on the returned URI or gets a 201 w/ the created URI

201 is a choice I hadn't considered. Reading about it, it does seem like a good choice.  Still, pros and cons of 201 vs. 303?

4) client takes the returned URI and executes the GET to return the results of the filter.

usually steps 3 & 4 are seamless to users (i.e. no need to "re-request" the filter results, just handle the redirection)

That makes sense.

note i do not store the executed results, just the filter. the exection of the filter happens when the client does a GET on the appropriate URI.

Hmm. In my case the processing of the criteria is very time consuming (maybe up to 10 seconds in worse case) and it might get called several times so I would assume that caching on the server for a reasonable period of time would make sense?

Also, let's assume that this service could have so many clients that cost of storage becomes a concern (yes, it's not unthinkable for this.)  In that case I wouldn't want to store every filter criteria ever sent, and since I can't depend on clients to delete them once they are done with them I'm still left with the problem, "How best to tell the client that the report is no longer available and what if any guidance should I provide to help the client resubmit?

I also provide a URI that returns the list of available stored filters (in some cases shared among users, too).

Yes, if I decide to keep them on the server that would be nice.

-Mike

mca

unread,
Sep 30, 2012, 11:53:31 PM9/30/12
to api-...@googlegroups.com
if the processing time for request is long, then consider using 202 accepted as a response (not 201/303).

if you want to optimize the experience, you can use caching directives on the execution results including weak tags that would allow multiple users who share the same plan to share the same results.

as for storage, you're on your own there. FWIW, it's likely cheaper to store filter statements than completed reports. 
  

mcx

Mike Schinkel

unread,
Oct 1, 2012, 12:04:32 AM10/1/12
to api-...@googlegroups.com
On Sep 30, 2012, at 11:53 PM, mca <m...@amundsen.com> wrote:
if the processing time for request is long, then consider using 202 accepted as a response (not 201/303).

Interesting, thanks gain.  Does a 202 still respond with a location header for later GET?

if you want to optimize the experience, you can use caching directives on the execution results

By caching directives do you mean response headers?  

 including weak tags that would allow multiple users who share the same plan to share the same results.

I'm not familiar with "weak tags?"

FWIW, it's likely cheaper to store filter statements than completed reports. 

True.  

-Mike



mca

unread,
Oct 1, 2012, 12:07:11 AM10/1/12
to api-...@googlegroups.com
check out RFC2616 for details on 202 responses and weak etags (left out the "e" in my last post).

Mike Schinkel

unread,
Oct 1, 2012, 12:42:58 AM10/1/12
to api-...@googlegroups.com
On Oct 1, 2012, at 12:07 AM, mca <m...@amundsen.com> wrote:
check out RFC2616 for details on 202 responses and

Thanks.  From RFC2616:
The entity returned with this response SHOULD include an indication of the request's current status and either a pointer to a status monitor or some estimate of when the user can expect the request to be fulfilled.
Any suggestions on what a "status monitor" would be realized as?  A Location header where the URL either returns with a 
102 Processing or a redirect to the report URL when it is ready?

weak etags (left out the "e" in my last post).

Ah, thanks.  Yes it was the missing 'e' that stumped me.  

-Mike

mca

unread,
Oct 1, 2012, 1:07:04 AM10/1/12
to api-...@googlegroups.com
"Any suggestions on what a "status monitor" would be realized as?  A Location header where the URL either returns with a 102 Processing or a redirect to the report URL when it is ready?"

i simply return a body that contains a link that points to a resoruce that 1) represents the resource that will eventually be created. often this will include some progress report on the expected resource or, 2) represents a processor of pending requests showing each one in flight along w/ stats/details of progress on each item, etc.

usually this is defined in my media type details for the service/api set.

Mike Schinkel

unread,
Oct 1, 2012, 1:29:01 AM10/1/12
to api-...@googlegroups.com
On Oct 1, 2012, at 1:07 AM, mca <m...@amundsen.com> wrote:
"Any suggestions on what a "status monitor" would be realized as?  A Location header where the URL either returns with a 102 Processing or a redirect to the report URL when it is ready?"

i simply return a body that contains a link that points to a resoruce that 1) represents the resource that will eventually be created. often this will include some progress report on the expected resource or, 2) represents a processor of pending requests showing each one in flight along w/ stats/details of progress on each item, etc.

usually this is defined in my media type details for the service/api set.

So ultimately beyond the scope of HTTP 1.1, right?  Well, it works. :)

I *really* appreciate you time on this one tonight.  It's a side project I'm working on that, if realized might be of interest to some on this list. Wish me luck.

-Mike

Glenn Block

unread,
Oct 1, 2012, 1:51:10 AM10/1/12
to api-...@googlegroups.com
Yep, I do the same. 202 is perfect for whenever there is some potentially delayed or long running process. Once a client gets the location header they start polling the resource until they get a response.

Mike Schinkel

unread,
Oct 1, 2012, 2:02:47 AM10/1/12
to api-...@googlegroups.com
On Oct 1, 2012, at 1:51 AM, Glenn Block <glenn...@gmail.com> wrote:
> Yep, I do the same. 202 is perfect for whenever there is some potentially delayed or long running process. Once a client gets the location header they start polling the resource until they get a response.

Cool. Thanks!

-Mike

Darrel Miller

unread,
Oct 1, 2012, 9:53:33 AM10/1/12
to api-...@googlegroups.com
Mike,

On Sun, Sep 30, 2012 at 11:13 PM, Mike Schinkel <mikesc...@me.com> wrote:
>
> So it would seem that I would be better doing the following:
>
> POST /report/new HTTP/1.1
>
> criteria1=value1&criteria2=value2&…criteriaN=valueN
>
>
> The server could then create a new report and locally store the JSON for a
> timeout period. The POST would then respond with a 303 "See Other" status
> code and a location header like this:
>
> Location: /report/{report_guid}
>
>
> Then the following GET would return the report's JSON as it's response
> representation:
>
> GET /report/{report_guid} HTTP/1.1
>
>

The interesting part of this pattern for me, is the server has the
option to be intelligent about the URL that it returns in the location
header. If the query string does fit in the URL then it can just put
the parameters as query string params. Or you could invent some
simple query string minification mechanism. Or the server could
create a transient resource to the filter, or a transient resource to
the store results.

The key thing is that the client doesn't care. The client retrieves
the URL from the location header and does a GET. The parameters in
the URL are the necessary breadcrumbs to allow the server to return
the right results. How it does that and what that URL looks like are
irrelevant to the client.

Which means the behaviour of the server can change over time. Today,
you might know that none of the filter criteria that exist will exceed
2000 chars, but you know it will happen in the future. So for the
moment, just return a query string. When it does become an issue,
change your server behaviour, because the client will just continue
doing its thing.

I use this POST 303 GET mechanism in my own software, and so far I
have not found a need to store temporary filters. I believe this is
because the user is able to create their own filters. Most of the
time, queries are either based on an existing saved filter, or a saved
filter plus one or two modifications. It is very rare that someone
would create a filter condition with many criteria and not want to
actually save that filter. But maybe that's just my domain. Or maybe
its because I don't use a web browser, so I can have URIs up to 64K
:-)

Darrel

Mike Schinkel

unread,
Oct 1, 2012, 7:22:10 PM10/1/12
to api-...@googlegroups.com
On Oct 1, 2012, at 9:53 AM, Darrel Miller <darrel...@gmail.com> wrote:
The interesting part of this pattern for me,..

Thanks for your comments on this.

is the server has the
option to be intelligent about the URL that it returns in the location
header.  If the query string does fit in the URL then it can just put
the parameters as query string params.  Or you could invent some
simple query string minification mechanism.  Or the server could
create a transient resource to the filter, or a transient resource to
the store results.

Yeah, I agree and I felt pretty confident about that part of the acrhitecture.

The key thing is that the client doesn't care.  The client retrieves
the URL from the location header and does a GET.  The parameters in
the URL are the necessary breadcrumbs to allow the server to return
the right results.  How it does that and what that URL looks like are
irrelevant to the client.

Unless about the URL, but unless I misunderstand the client does care about what HTTP status codes and what HTTP headers are returned and then the resultant workflow actions the client is expected to take to complete it's process, correct?  That was what I was trying to flesh out.

Today, you might know that none of the filter criteria that exist will
exceed 2000 chars, but you know it will happen in the future.  So for the
moment, just return a query string.  When it does become an issue,
change your server behavior, because the client will just continue
doing its thing.

Specifically for this use case I know immediately that the filter criteria is likely to exceed 2000 chars.  I used "report" as an example instead of describing what I'm actually doing because it's similar in concept and the report metaphor is very easy to understand. I didn't want to distract people who might spend more time trying to clarify what I was doing or might choose to debate the appropriateness of what I'm building. I wanted to keep the discussion focused on HTTP status codes and HTTP headers.

I use this POST 303 GET mechanism in my own software

303 vs. 201 or 202, as Mike Amundsen suggested?  I thought 303 at first but I'm leaning toward the 20x that Mike recommended.

, and so far I have not found a need to store temporary filters.  

So if you redirect via 303 then you have to store the filters or the results of the filter on the server for at least as long as it takes for the client to redirect, right?  If not, I'm confused. If so, how long to do hold on to it?

I believe this is because the user is able to create their own filters.  

To be clear on this use-case the API client will be submitting a Javascript-based script via HTTP POST to run server-side. The script will have access to a single sandboxed object that provides a simplified interface to calling the resource-based APIs on the server. The script could be used to compose a lot of over-the-web API calls into a single web API call. The results so far are promising.

So that's why it can easily be over 2000 characters.  And yes, users would create those but the users would be API developers.

Most of the time, queries are either based on an existing saved filter,
or a saved filter plus one or two modifications.  

Where it is saved to?  On your API server in a database using some sort of key/value lookup? Is that because what you are doing is ad-hoc querying on the user's part?

It is very rare that someone would create a filter condition with many criteria and not want to
actually save that filter.

In my envisioned use-case, the user/developer would write the Javascript, test it out and then store it embedded in a PHP, Ruby, Python or other file programming language file.  So while storing them on the server might be useful for ad-hoc querying the initial use-case really wouldn't need it.

But maybe that's just my domain.  Or maybe
its because I don't use a web browser, so I can have URIs up to 64K

Yeah, probably. I'm mostly concerned about routers and other intermediaries. But mostly just being super conservative on this. Don't want somebody to hate the approach because they hate the use of large URLs.

Besides, super large URLs can be a real bitch to deal with in server logs, etc. :)

-Mike
Reply all
Reply to author
Forward
0 new messages