clarification and possible doc typo for List Parameters

70 views
Skip to first unread message

David Rees (@studgeek)

unread,
Feb 15, 2012, 12:36:24 PM2/15/12
to otte...@googlegroups.com
I would like to clarify the usage of the offset and the page List Parameters. Am I correct they are redundant with each other?

So given an initial query of
either of these 2nd queries will give the same results? Is that correct?

I also have a question about consistency across queries. Since there is no session key used, can't the query result and its ordering change between the query for page 1 and page 2? So items could be missed or not line up between the two queries? Or if there some sort of time window the underlying query is guaranteed to be involitale?

In either case, I think the examples need to be updated. The first URL example is the one with the problem. It think it has 3 or 4 issues:
  • It should come second (since its getting the 2nd page)
  • It should still have the perpage=25
  • It only needs a page or an offset, not both I think?
  • If it does have an offset, it should be 25 not 24.

Thanks,
dave

David Rees (@studgeek)

unread,
Feb 15, 2012, 2:24:21 PM2/15/12
to otte...@googlegroups.com
Actually, as I play with these more it seems that last_offset's main value is to let you know if you are at the end of the query results. For example, if the query only has 167 results. Then you will see the following....

http://otter.topsy.com/search.json?q=New+York&perpage=100
returns last_offset 100

http://otter.topsy.com/search.json?q=New+York&perpage=100&page=2
returns last_offset 167

So with the last call you can see that last_offset < perpage*page (167 < 100*2) so there is no need to make additional queries.


Mehdi Lahmam B.

unread,
Mar 2, 2012, 5:57:01 AM3/2/12
to otte...@googlegroups.com
Notes on loop exit condition:

Normally, the exit condition from the loop through pages should be last_offset < perpage * page (last call).
But for some unexplained reasons, offsets don't match the results. (nohidden results or not don't make difference for me)

Examples of responses :

The last one with non empty results
 'window' => 'm'
 'page' => 9
 'total' => 969
 'perpage' => 100
 'last_offset' => 849
 'hidden' => 0
 'list' => array of 47 results
 'offset' => 800

And the response for the next page
 'window' => 'm'
 'page' => 10
 'total' => 969
 'perpage' => 100
 'last_offset' => 900
 'hidden' => 0
 'list' => empty array
 'offset' => 900

Seems that the should be more 69 missing results.
The working solution is to check if the results array is empty or not when looping through pages, and don't care about offsets.
That will make you do one additional query.

Vipul Ved Prakash

unread,
Mar 26, 2012, 5:58:07 PM3/26/12
to otte...@googlegroups.com
The two queries you mention are not equivalent.  It's required to pass "last_offset" as "offset" when you are fetching the next page. 

In a process termed "post filtering", the API layer applies filters to search results (for de-deping, profanity, etc) that come from the main search index.  If results are filtered due to this post filtering, the last_offset value is incremented and communicated to the caller.  eg:  If there were 30 results, and first 4 were filtered, then q=QUERY&page=1&perpage=10 will return 10 results with last_offset set to 14.   You should then get the next 10 results with q=QUERY&page=2&offset=14.  

There's another finer point re consistency.  Topsy is a real-time index, so it's possible that the result set has changed by the time you fetch page #2.  If you want your results to be consistent, you should add a maxtime=TIMESTAMP parameter to both page1 and page2.  TIMESTAMP can be current clock time when page1 query is issued. 

Hope this helps. 

cheers,
vipul

David Rees

unread,
Mar 27, 2012, 10:27:58 AM3/27/12
to otte...@googlegroups.com, Vipul Ved Prakash
Thanks Vipul!

That makes a lot of sense. Good tip on the TIMESTAMP also.

d


On Monday, March 26, 2012 5:58:07 PM, Vipul Ved Prakash wrote:
> The two queries you mention are not equivalent. It's required to pass
> "last_offset" as "offset" when you are fetching the next page.
>
> In a process termed "post filtering", the API layer applies filters to
> search results (for de-deping, profanity, etc) that come from the main
> search index. If results are filtered due to this post filtering, the
> last_offset value is incremented and communicated to the caller. eg:
> If there were 30 results, and first 4 were filtered, then
> q=QUERY&page=1&perpage=10 will return 10 results with last_offset set
> to 14. You should then get the next 10 results with
> q=QUERY&page=2&offset=14.
>
> There's another finer point re consistency. Topsy is a real-time
> index, so it's possible that the result set has changed by the time
> you fetch page #2. If you want your results to be consistent, you
> should add a maxtime=TIMESTAMP parameter to both page1 and page2.
> TIMESTAMP can be current clock time when page1 query is issued.
>
> Hope this helps.
>
> cheers,
> vipul
>
> On Wednesday, February 15, 2012 9:36:24 AM UTC-8, David Rees
> (@studgeek) wrote:
>

> I would like to clarify the usage of the *offset *and the *page
> *List Parameters
> <http://code.google.com/p/otterapi/wiki/ResListParameters>. Am I


> correct they are redundant with each other?
>
> So given an initial query of
> http://otter.topsy.com/search.json?q=New+York&perpage=25
> <http://otter.topsy.com/search.json?q=New+York&perpage=25>
> either of these 2nd queries will give the same results? Is that
> correct?
> http://otter.topsy.com/search.json?q=New+York&perpage=25&page=2

> <http://otter.topsy.com/search.json?q=New+York&perpage=25&page=2>
> http://otter.topsy.com/search.json?q=New+York&perpage=25&offset=25


> <http://otter.topsy.com/search.json?q=New+York&perpage=25&offset=25>
>
> I also have a question about consistency across queries. Since
> there is no session key used, can't the query result and its
> ordering change between the query for page 1 and page 2? So items
> could be missed or not line up between the two queries? Or if
> there some sort of time window the underlying query is guaranteed
> to be involitale?
>
> In either case, I think the examples need to be updated. The first
> URL example is the one with the problem. It think it has 3 or 4
> issues:
>

> * It should come second (since its getting the 2nd page)
> * It should still have the perpage=25
> * It only needs a page or an offset, not both I think?
> * If it does have an offset, it should be 25 not 24.
>
>
> Thanks,
> dave
>

Reply all
Reply to author
Forward
0 new messages