Efficient way to collect all Search Resources for a multi-page return?

58 views
Skip to first unread message

Ed

unread,
Mar 22, 2012, 2:02:38 PM3/22/12
to Otter API to Topsy
I am interested in collecting 100 returns per page and up to 10 pages
of returns from a Topsy Otter Search Resource. So, I would be
collecting somewhere between [ 0, 10*100] returns ("tweets").

There is probably a more efficient way to do this than my current
approach, which is:

1) do an initial Search request, then look at the return field "total"
in the JSON "response" field. Then I use this to calculate how many
pages I need to request ( numPages = Min( "total" / 100 ,10) )

2) then I do separate calls for each page number [1, numPages] and get
the data I need.

This approach is very slow...

I am thinking that I should instead be using "last_offset" and
"offset" instead so I do not have to do this pre-checking of how many
pages I can search and then looping through the page number index and
searching one page at a time...

So, my long winded question really boils down to:

What is an efficient way to request all Topsy Search results to be
collected, from 0 (if the query returns none, to 100 per page * 10
pages = 1000 total possible results?

Thanks in advance for any suggestions you have for me on this.
Ed


Mehdi Lahmam B.

unread,
Mar 25, 2012, 10:54:45 AM3/25/12
to otte...@googlegroups.com
I use a do while loop.
I do first a query, and while results are not empty, do another one for the next page.

Topsy API provides on responses a 'last_offset' param which could be useful to know if there is more results pages to fetch or not.
But the behavior is not clear to me.
Reply all
Reply to author
Forward
0 new messages