Cursor Expiration

Skip to first unread message

Alan Gutierrez

Dec 9, 2009, 3:44:47 AM12/9/09
Although million follower accounts are rare, how to I design for a
million follower user logged into my application which users Social
Graph API?

If Barack Obama were to log into my application, it would take 566 API
calls to fetch his 2,828,782 followers, but I wouldn't have any left
after the 150 API calls to fetch his 747,127 friends.

Obviously, I'd like to work my way through the list a little bit each
hour. I'd like to store the cursor after 30 API calls, resume my
iteration over an hour later.

When do cursors expire? I assume they will still be valid an hour later,
but I've seen discussion on this group that says that they are opaque
and that they may change at some point. I suppose that when that time
goes, if my application is crawling a celebrity, it will not be able to
resume crawling with the cursor it stored an hour before.

Alan Gutierrez

Abraham Williams

Dec 9, 2009, 11:05:07 AM12/9/09
Check out the section about whitelisting:

Abraham Williams | Community Evangelist |
Project | Intersect |
Hacker | |
This email is: [ ] shareable [x] ask first [ ] private.
Sent from Madison, WI, United States

John Kalucki

Dec 9, 2009, 12:28:09 PM12/9/09
A cursor should be valid forever, but as it ages and rows are removed, you might see some minor data loss and probably more duplicates.

-John Kalucki
Services, Twitter Inc.

Marc Mims

Jan 17, 2010, 1:40:58 AM1/17/10
* John Kalucki <> [091209 09:28]:

> A cursor should be valid forever, but as it ages and rows are removed, you
> might see some minor data loss and probably more duplicates.

Out of curiosity, what is a cursor? From our (the users') perspective,
it's just an opaque number. But I'm curious. How is it generated?
What does it represent internally?


John Kalucki

Jan 17, 2010, 10:43:49 AM1/17/10
A cursor is an opaque deletion-tolerant index into a Btree keyed by source userid and modification time. It brings you to a point in time in the reverse chron sorted list. So, since you can't change the past, other than erasing it, it's effectively stable. (Modifications bubble to the top.) But you have to deal with additions at the list head and also block shrinkage due to deletions, so your blocks begin to overlap quite a bit as the data ages. (If you cache cursors and read much later, you'll see the first few rows of cursor[n+1]'s block as duplicates of the last rows of cursor[n]'s block. The intersection cardinality is equal to the number of deletions in cursor[n]'s block). Still, there may be value in caching these cursors and then heuristically rebalancing them when the overlap proportion crosses some threshold.

-John Kalucki
Infrastructure, Twitter Inc.
Reply all
Reply to author
0 new messages