Solutions for returning huge api responses

165 views
Skip to first unread message

API Dev

unread,
Oct 6, 2017, 7:44:27 AM10/6/17
to API Craft

Hi,


We need to expose multiple internal apis that need to return large data sets to the api caller. Some of solutions that we thought off were as below

  1. Splitting the api's at the conceptual level
  2. Pagination
    1. These two approaches do not help because the backend services called by the api to generate the data sets do not support these concepts. They do fetch the data in one go.
  3. Stateful api - Fetch the data from the backend services and store it in a file. Return this data in a batched fashion through the api. Provide a result identifier on the first api call and return the data in batches. This didn't look like a good solution since it makes the api calls tightly coupled and increased complexity
  4. Pushing the file through FTP - Fetch the data from the backend services and store it in a file. This file is then pushed to the configured FTP server before returning a success http status code (200) to the user. There are networking and operational implications to this approach given that there are multiple deployments of the web service. Also it does not sound to be conceptually right to push the file to FTP when the api is supposed to return the data. Let me know if there are inputs/thoughts here.

 

What are some of your suggestions in designing such an api given that the back end services fetch the data in one go and could not be changed? 


Thanks.

Jørn Wildt

unread,
Oct 6, 2017, 7:49:54 AM10/6/17
to api-...@googlegroups.com
What is the use case for your clients? Do they require all the of data in one request anyway? Or do they need to filter and/or paginate on top of what the backend returns?

/Jørn

--
You received this message because you are subscribed to the Google Groups "API Craft" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+unsubscribe@googlegroups.com.
Visit this group at https://groups.google.com/group/api-craft.
For more options, visit https://groups.google.com/d/optout.

API Dev

unread,
Oct 6, 2017, 7:59:09 AM10/6/17
to API Craft
It does not matter to the clients if the data is returned in batches or in one go. Its just that from the server side I cannot fetch the data in batches. The clients need to show all this data in a nice dashboard/report. They do not have a requirement of showing this data in real time or near real time.

Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+...@googlegroups.com.

Jørn Wildt

unread,
Oct 6, 2017, 8:05:03 AM10/6/17
to api-...@googlegroups.com
It does not matter to the clients if the data is returned in batches or in one go

Then why complicate matters by using stateful / batched API or FTP? Why not just return all the data in one go?

/Jørn

To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+unsubscribe@googlegroups.com.

API Dev

unread,
Oct 6, 2017, 8:51:03 AM10/6/17
to API Craft
The problem with that the data is too huge to keep in the app server's memory.

Andrew B

unread,
Oct 6, 2017, 2:15:57 PM10/6/17
to API Craft
At the back end, pull the data all in one go as you have to, and then store it in a database.

Then have your APIs serve data out of that database with pagination or whatever.

mca

unread,
Oct 6, 2017, 2:29:10 PM10/6/17
to api-...@googlegroups.com
The stateless solution for passing large responses in multiple bodies is HTTP Chunked Encoding.


There are a handful of examples if you search the term.

IIRC, the browser will handle this w/o any trouble (e.g. no code needed).  if you're writing a native/desktop app you might need to check docs on the HTTP support library you are using.

I'd use the spec'd solution before doing your #3 or #4.



To unsubscribe from this group and stop receiving emails from it, send an email to api-craft+unsubscribe@googlegroups.com.

API Dev

unread,
Oct 9, 2017, 7:26:56 AM10/9/17
to API Craft
Thanks for the inputs

@Andrew - I wanted to run through the flow with the paginated response approach
1. On the first api call, server would fetch the data and store it in either a database/file
2. The server would return the first batch along with an unique identifier. This identifier needs to be passed by the client on subsequent call. The identifier helps the server to distinguish between concurrent requests.
3. Along with the identifier the server would also pass a flag indicating if there is more data to be passed. This helps to client to determine if more calls need to be made.
4. The client makes api calls passing the unique identifier till the flag returned on the previous call is false.

Let me know if you have more comments.

@Mike Amundsen - 

The chunked encoding approach sounds similar to this approach of using "Range" and "If-Range" headers. I will review them and let you know if I have questions.

Thanks!

Andrew Braae

unread,
Oct 9, 2017, 2:09:46 PM10/9/17
to api-...@googlegroups.com
Without having studied the chunked encoding approach that Mike has highlighted, I think it's intended for the case where the clients always wants the entire response. (I could be wrong though, I often am!).

Whereas pagination (basically what you are describing) is flexible, e.g the client navigates steadily through the first third of the "response" until the end user has seen enough.

So which you choose would depend on the nature of your client.

You received this message because you are subscribed to a topic in the Google Groups "API Craft" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/api-craft/egzDdBDQq_c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to api-craft+unsubscribe@googlegroups.com.

Henry Andrews

unread,
Oct 19, 2017, 6:07:00 PM10/19/17
to API Craft
I would recommend going with pagination and using an intermediate storage (file, database, whatever) to deal with the back-end limitations.  So a combination of #2 and #3.

Semantic pagination, as used by Stripe (https://stripe.com/docs/api#pagination) makes it easier to manage paging through a collection- if you just use simple integer offsets, then if your collection contents change while you are paging through you may end up off by one (or more) in either direction, resulting in duplicated or missing items between pages.

You can send HTTP Link headers with relation types of "prev" and "next" to give the client the URIs of the next (or previous) page so that they do not have to directly understand your pagination mechanism.  When there are no more results, simply do not include a "next" link.  That tells the client that no "next" page exists, so they must be at the end of the collection.

thanks,
-henry

API Dev

unread,
Nov 11, 2017, 11:54:44 PM11/11/17
to API Craft
Thanks for the replies. We went ahead an implemented pagination by accepting the byte range in the http request parameter. The server returns the next byte to be requested by the client on every response. This works for us and suits our current client requirements. We are facing another problem in the implementation.
The back end service collects all the information and saves it to a temporary file. This file is read by the rest api and returned to the client in chunks. For simplicity the temp file is generated on the first api call which takes long due to which the web server times out. Increasing the timeout is the last option we want to look at. I am not sure if adding multi-threading to the implementation would complicate this.i.e. The main thread listens to a callback from the file-generation-thread and returns the chunked api response to the client in parallel. I wanted to check if there is a simpler work around.

What are the standard ways/ideas of dealing with this?

Thanks.

Parambir Singh

unread,
Nov 15, 2017, 4:58:12 PM11/15/17
to API Craft
One option in such cases is to make the initial call asynchronous. The first call will fire off a task to generate the temporary file (in another thread) and return a task id to the caller immediately. The caller can subsequently use the task id to fetch paginated data. So the subsequent calls will need to input parameters - the task id returned originally and the byte range that you use for pagination. The client may need to retry the call before the first page is available due to the time consumed by the file generation task.

Thanks
Param
Reply all
Reply to author
Forward
0 new messages