How to handle GET request when URI is to long? (For bulk "get" request)

Joe Public

unread,

Aug 21, 2013, 9:26:12 PM8/21/13

to api-...@googlegroups.com

Hello,

We have a GET request that passes in multiple IDs at once (in a delimited list) that causes the length of the URI to occasionally be truncated as the resulting length of the URI is to long. This service is used when a client would like to do a bulk "get" and it passes in many IDs at once (perhaps 100s).

I believe the solution to this problem is to use a POST instead. With that in mind, I'd like to know if there are any other "best practices" or guidance for using a POST in this manner.

Thanks,

Joe

Peter Monks

unread,

Aug 22, 2013, 12:34:47 AM8/22/13

to api-...@googlegroups.com

G'day Joe,

HTTP Status Codes 413 (Request Entity Too Large) status or 414 (Requested URI Too Long) are pretty good for the initial GET, with documentation that describes what the client should do next (i.e. POST with a specifically formatted message body).

Cheers,
Peter

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.
Visit this group at http://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/groups/opt_out.

Philippe Mougin

unread,

Aug 22, 2013, 3:40:02 AM8/22/13

to api-...@googlegroups.com

An interesting alternative to POST in this case is to use HTTP pipelining (i.e., pipeline as much GET as you have IDs).

Best,

Philippe Mougin

Philip Nelson

unread,

Aug 22, 2013, 7:14:52 AM8/22/13

to api-...@googlegroups.com

The best answer seems to be out of our control: use a request body with the GET, though I've heard that some servers/frameworks do support this. It is a fairly common scenario. Give the intersections of two sets, where one set is provided by an external system. Idempotent, but way beyond the designed use cases for a query string.

You run into this with MongoDb, apache Solr pretty often. The solutions I've seen used are to increase the allowable request parameter size (recompile apache) or switch to POST. Curious about the pipelining suggestion, not sure how it would work, but interesting.

Joe Public

unread,

Aug 22, 2013, 9:17:16 AM8/22/13

to api-...@googlegroups.com

Philippe,

Could you share more information about the HTTP pipelining approach? I would like to learn more.

Thanks,

Joe

Steve Marshall

unread,

Aug 22, 2013, 9:52:48 AM8/22/13

to api-...@googlegroups.com

If I understand correctly, pipelining would essentially be “send n GETs in quick succession over the same HTTP request, then multiplex their results into one”, which would be perfectly fine.

In terms of POST, the way I’ve tackled this in the past (and am recommending to tackle in an API I’m designing at the moment) is to use parameters called `id[]` (with the `[]` to signify multiple values are allowed). That way, a bunch of smarter web frameworks automagically treat the `id` (or `id[]`, depending on your framework) request parameter as an array, and it shouldn’t be horrendously hard if your framework doesn’t to get to all the values. (Though some servers, by default, may kill all values save the last.) It also means that, if a GET falls with 413/414, the client should (for a given value of should, of course) be able to retry with a POST and just dumping all the parameters as an x-form-encoded POST body.

I’d also be wary of anything that relies on GET for super-long queries, simply because you can’t guarantee what intermediate proxies and such will do to your query string.

Ultimately, pipelining and POST are the safest of the two options, and the friendly to your own systems and your consumers’ out of the box. If you wanted to be really nice to your consumers, support both.

Jørn Wildt

unread,

Aug 22, 2013, 10:07:08 AM8/22/13

to api-...@googlegroups.com

One more thing about the POST solution: you can choose to let the POST operation create a new (temporary) "search result" resource and redirect to that. The following GET on that resource can then be cached in various proxies etc.

/Jørn

Kevin Swiber

unread,

Aug 22, 2013, 10:56:09 AM8/22/13

to api-...@googlegroups.com

+1 to Jørn's advice. This also allows interesting scenarios like breaking the result of the bulk fetch into pages. Your response to the initial GET can offer links to the individual pages, and the client can even fetch all pages simultaneously. Not exactly BitTorrent, but it offers some benefits.

Kevin Swiber

Projects: https://github.com/kevinswiber
Twitter: @kevinswiber

John Watson

unread,

Aug 22, 2013, 11:01:02 AM8/22/13

to api-...@googlegroups.com

With solutions like this, how do people store the temporary results, in general? Memory? Some datastore with built-in expiration features? Also, how is authorization handled on the temporary URIs?

John

Kevin Swiber

unread,

Aug 22, 2013, 11:21:27 AM8/22/13

to api-...@googlegroups.com

Well, that depends on the size of the representation, the hardware available, and/or the lifetime of that representation.

If it needs to be fast, I'd probably generate the response and use HTTP caching semantics to instruct a caching reverse proxy to store the results. The storage mechanism for on-premise versions of tools like this are often configurable. With Varnish[1], for instance, in memory (malloc) or file-based (file) are configurable options. Nginx[2] has extensibility around this and can even work with Memcached[3].

Otherwise, temporary storage in Your Favorite Database would work. Both Memcached and Redis have auto-expire features. I'm not sure about others.

[1] http://www.varnish-cache.org

[2] http://nginx.org

[3] http://wiki.nginx.org/HttpMemcachedModule

Reply all

Reply to author

Forward