Twitter, Please Explain How Cursors Work

114 views
Skip to first unread message

Dewald Pretorius

unread,
Oct 4, 2009, 9:10:14 PM10/4/09
to Twitter Development Talk
For discussion purposes, let's assume I am cursoring through a very
volatile followers list of @veryvolatile. We have the following
cursors:

A = 5,000
B = 5,000
C = 5,000

I retrieve Cursor A and process it. Next I retrieve Cursor B and
process it. Then I retrieve Cursor C and process it.

While I am processing Cursor C, 200 of the people who were in Cursor A
unfollow @veryvolatile, and 400 of the people who were in Cursor B
unfollow @veryvolatile.

What do I get when I go back from C to B? Do I now get 4,600 ids in
the list?

Or, do I get 5,000 in B, which now includes a subset of 400 ids that
were previously in Cursor A?

Dewald

John Kalucki

unread,
Oct 5, 2009, 12:17:05 AM10/5/09
to Twitter Development Talk
I haven't looked at all the parts of the system, so there's some
chance that I'm missing something.

The method returns the followers in the reverse chronological order of
edge creation. Cursor A will have the most recent 5,000 edges, by
creation time, B the next most recent 5,000, etc. The last cursor will
have the oldest edges.

Each cursor points to some arbitrary edge. If you go back and retrieve
cursor B, you should receive N edges created just before the edge-
pointed-to-by-B was created. I don't recall if N is always 5000,
generally 5000 or if it's at most 5000. This detail shouldn't matter,
other than, on occasion, you'll make an extra API call.

In any case, retrieving cursor B will never return edges created after
the edge-pointed-to-by-B was created. All edges returned by cursor B
will be no-newer-than, and generally older than, than the edge-pointed-
to-by-B.

So, all future sets returned by cursor B are always disjoint from the
set originally returned by cursor A. In your example, if you refetched
both A and B, the result sets wouldn't be disjoint as there are no
longer 5,000 edges between cursor A and cursor B.

I think this, in part answers your question. ?

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

Dewald Pretorius

unread,
Oct 6, 2009, 12:22:05 PM10/6/09
to Twitter Development Talk
Thanks John. However, I will be the first to put up my hand and say
that I have no clue what you said.

Can someone please translate John's answer into easy to understand
language, with specific relation to the questions I asked?

Dewald

On Oct 5, 1:17 am, John Kalucki <jkalu...@gmail.com> wrote:
> I haven't looked at all the parts of the system, so there's some
> chance that I'm missing something.
>
> The method returns the followers in the reverse chronological order of
> edge creation. Cursor A will have the most recent 5,000 edges, by
> creation time, B the next most recent 5,000, etc. The last cursor will
> have the oldest edges.
>
> Each cursor points to some arbitrary edge. If you go back and retrieve
> cursor B, you should receive N edges created just before the edge-
> pointed-to-by-B was created. I don't recall if N is always 5000,
> generally 5000 or if it's at most 5000. This detail shouldn't matter,
> other than, on occasion, you'll make an extra API call.
>
> In any case, retrieving cursor B will never return edges created after
> the edge-pointed-to-by-B was created. All edges returned by cursor B
> will be no-newer-than, and generally older than, than the edge-pointed-
> to-by-B.
>
> So, all future sets returned by cursor B are always disjoint from the
> set originally returned by cursor A. In your example, if you refetched
> both A and B, the result sets wouldn't be disjoint as there are no
> longer 5,000 edges between cursor A and cursor B.
>
> I think this, in part answers your question. ?
>
> -John Kaluckihttp://twitter.com/jkalucki

Jesse Stay

unread,
Oct 6, 2009, 2:06:43 PM10/6/09
to twitter-deve...@googlegroups.com
I said the same thing in the last thread about this - still no clue what Twitter is doing with cursors and how it is any different than the previous paging methods.

Jesse

Brian Smith

unread,
Oct 6, 2009, 2:12:32 PM10/6/09
to twitter-deve...@googlegroups.com
John,

Based on your description, it looks like you are on the verge of being able
to offer a very useful capability: the ability to query the follows AND
unfollows since the last time you checked. That would be a great addition to
the API.

For example, I'd really like to be able to page through A, B, C, etc. And
then, after that, say "OK, what's changed since then"?

Regards,
Brian

jmathai

unread,
Oct 6, 2009, 2:58:07 PM10/6/09
to Twitter Development Talk
On Oct 6, 11:06 am, Jesse Stay <jesses...@gmail.com> wrote:
> I said the same thing in the last thread about this - still no clue what
> Twitter is doing with cursors and how it is any different than the previous
> paging methods.
> Jesse

Is the main advantage that the new method takes a snapshot of the
followers list and let's you page through them?

I'd be willing to sacrifice some accuracy for speed since I'm not
doing anything like auto-unfollow. From a sample set of 150k calls to
the api the average latency I have (from the west coast) is .85
seconds. Grabbing a follower list serially, 100 at a time is
painful. I much preferred what I was doing before (total # / 100 ->
fire off that many calls in parallel). If I dropped a few followers
in the process, that was ok because it's so much faster and I don't
need my copy of the social graph to be 100% accurate.

Tim Haines

unread,
Oct 6, 2009, 3:50:04 PM10/6/09
to twitter-deve...@googlegroups.com


On Wed, Oct 7, 2009 at 7:58 AM, jmathai <jma...@gmail.com> wrote:

I'd be willing to sacrifice some accuracy for speed since I'm not
doing anything like auto-unfollow.  From a sample set of 150k calls to
the api the average latency I have (from the west coast) is .85
seconds.  Grabbing a follower list serially, 100 at a time is
painful.  I much preferred what I was doing before (total # / 100 ->
fire off that many calls in parallel).  If I dropped a few followers
in the process, that was ok because it's so much faster and I don't
need my copy of the social graph to be 100% accurate.


I'm in the same boat - and filed this recently:  http://code.google.com/p/twitter-api/issues/detail?id=1078&colspec=ID%20Stars%20Type%20Status%20Priority%20Owner%20Summary%20Opened%20Modified%20Component


John Kalucki

unread,
Oct 6, 2009, 6:34:38 PM10/6/09
to Twitter Development Talk
There is no snapshotting. 5,000 edges are returned on each call. Few
users have more than 5,000 followers or more than 5,000 followings.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

John Kalucki

unread,
Oct 6, 2009, 6:39:00 PM10/6/09
to Twitter Development Talk
No. If we are to offer real-time social graph changes, they'll be via
the Streaming API. In the mean time, there is no low-latency high-
throughput way to determine changes to the social graph. Attempts to
simulate this at large scale via repeated polling are likely to be
frustrating.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

John Kalucki

unread,
Oct 6, 2009, 7:12:07 PM10/6/09
to Twitter Development Talk
I described, in some detail, the reasons for cursors here:
http://groups.google.com/group/twitter-development-talk/msg/badfb7b6074aab10

If the details are uninteresting, the high-level summary is this: The
paged API was designed in a previous era. Paging is simply too
expensive and totally impractical to provide with the current
following counts. Also the QoS had deteriorated to the point where
some doubted that anyone was seriously using the methods. Paging is
going away and paging is not coming back.

The cursored approach allows us to continue to provide access to the
social graph via the REST API. As a benefit, QoS has been dramatically
improved and data quality is now pretty close to perfect.

If the implementation details and invariants described are confusing,
then stick to the well worn part of the path: Request the first block
with a cursor of -1. Keep requesting forward until you get a cursor of
0.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

On Oct 6, 11:06 am, Jesse Stay <jesses...@gmail.com> wrote:

Brian Smith

unread,
Oct 6, 2009, 7:16:04 PM10/6/09
to twitter-deve...@googlegroups.com
John Kalucki wrote:
> No. If we are to offer real-time social graph changes, they'll be via
> the Streaming API. In the mean time, there is no low-latency high-
> throughput way to determine changes to the social graph. Attempts to
> simulate this at large scale via repeated polling are likely to be
> frustrating.

Never mind. I was requesting this because previously statuses/followers was
documented to return followers "in the order they joined Twitter." However,
on Sept. 25th, Alex updated the documentation to say it returns followers
"in the order they followed the user" which is what I wanted.

http://apiwiki.twitter.com/sdiff.php?first=Twitter%2BREST%2BAPI%2BMethod%253
A%2Bstatuses%25C2%25A0followers&second=Twitter%2BREST%2BAPI%2BMethod%253A%2B
statuses%25C2%25A0followers.2009-09-25-16-55-57

I did not notice this change because it did not show up in the changelog.

Regards,
Brian

Jeffrey Greenberg

unread,
Oct 7, 2009, 9:57:31 AM10/7/09
to twitter-deve...@googlegroups.com
John,
Please clarify this scenario. If one makes a complete set of calls starting from cursor -1 unto the end at one moment, and then another set of the same calls later is there any invariance?  If so what?

From the statements above I understand:
- always 5000 followers are returned (if the user has more than 5000, and the last call will have less)
- the order is the same: it's the time order that users followed this account

And thus:
- there is no correlation in the API between a particular cursor and a set of returned values (followers)

Is that it?

John Kalucki

unread,
Oct 7, 2009, 12:24:02 PM10/7/09
to Twitter Development Talk
First you have to assume no changes to the set. Users with any
significant following will see constant churn. Factoring out natural
churn then:

Ideally, the results are the same. Practically, the results are the
same. In a very few corner cases they are not. For the next several
weeks, for edges that were created over ~2 weeks ago, there will be,
very very rarely, issues with cursor jitter: In theory and in practice
there will be some over-delivery -- the last userid, or so, in a block
may be duplicated in the first rows a subsequent block. In theory
there might be similar under-delivery, but we haven't found an actual
case of under-delivery yet. You may need to deduplicate your results
if your app is very sensitive to duplication. In any case, new edges
no longer suffer from this jitter, and we're going to repair the whole
graph in a few weeks. I think this will require several megawatthours
of computation.

Your first two statements are correct. I don't understand your third
statement. But I think it is a false assertion. Could you briefly
restate?

An aside: There may be some signal in the cursors. Especially in the
most significant bytes. They're references into the edge-creation-time
index after all. I don't know how much obfuscation there is,
especially in the lsb's, but the cursors ideally should be treated as
opaque tokens. While unlikely, we may change their format at some time
in the future. And then various acts of daring do could break.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.

On Oct 7, 6:57 am, Jeffrey Greenberg <jeffreygreenb...@gmail.com>
wrote:
> John,Please clarify this scenario. If one makes a complete set of calls
> starting from cursor -1 unto the end at one moment, and then another set of
> the same calls later is there any invariance?  If so what?
>
> From the statements above I understand:
> - always 5000 followers are returned (if the user has more than 5000, and
> the last call will have less)
> - the order is the same: it's the time order that users followed this
> account
>
> And thus:
> - there is no correlation in the API between a particular cursor and a set
> of returned values (followers)
>
> Is that it?
>
> On Tue, Oct 6, 2009 at 4:12 PM, John Kalucki <jkalu...@gmail.com> wrote:
>
> > I described, in some detail, the reasons for cursors here:
>
> >http://groups.google.com/group/twitter-development-talk/msg/badfb7b60...
Reply all
Reply to author
Forward
0 new messages