New cursor methods are way too slow

1 view
Skip to first unread message

Tim Haines

unread,
Oct 14, 2009, 8:12:55 PM10/14/09
to Twitter Development Talk
Hi'ya,

I'm migrating my code to use cursors at the moment. It's frustrating
that calls need to be synchronous rather than how paged calls could be
asynchronous. Retrieving 7000 followers just took > 20 minutes for
me.

I filed an issue that proposes a solution here:
http://code.google.com/p/twitter-api/issues/detail?id=1078 If you
retrieve friends or followers, please take a look and give it a star
if it's important to you.

If anyone can suggest a work around for this, I'd be happy to hear it.

Cheers,

Tim.

Chad Etzel

unread,
Oct 14, 2009, 9:42:23 PM10/14/09
to twitter-deve...@googlegroups.com
Hi Tim,

You said "Retrieving 7000 followers just took > 20 minutes for me."
Can you explain what you meant by that?

Are you using the friends/ids, followers/ids methods or the
statuses/friends, statuses/followers methods?

-Chad

Tim Haines

unread,
Oct 14, 2009, 9:50:54 PM10/14/09
to Twitter Development Talk
Hi Chad,

Statuses/followers.

I've just timed another attempt - it took 25 minutes to retrieve 17957
followers with statuses/followers.

Is there anything I can elaborate on in the filed issue to make it
clearer?

Tim.

Chad Etzel

unread,
Oct 14, 2009, 9:56:35 PM10/14/09
to twitter-deve...@googlegroups.com
If you are pulling down the entire social graph, why not use the
social graph calls which would deliver all 7000 ids in 2 calls?

You can also parallelize this process by looping through different
users on each thread instead of using each thread to grab a different
page/cursor of the same user.

Regarding the code issue you submitted, if you have the users cached
locally, you could use the social graph methods to determine the
missing/new 2k users pretty quickly using the social graph methods and
comparing ids.

-Chad

Tim Haines

unread,
Oct 14, 2009, 10:19:46 PM10/14/09
to Twitter Development Talk
Are you suggesting I should retrieve the 2k users 1 at a time from
users/show once I have the ids? I'd essentially like to do this, but
100 at a time.

I know I can get the 7000 ids in 2 calls (1 even without the cursors)
- but I actually want the whole user objects..

Tim.

Josh Roesslein

unread,
Oct 14, 2009, 10:21:17 PM10/14/09
to twitter-deve...@googlegroups.com
Yeah we really need a way to bulk request user payloads by giving a list of IDs.

--
Josh

Chad Etzel

unread,
Oct 14, 2009, 10:30:24 PM10/14/09
to twitter-deve...@googlegroups.com
I agree. I'm lobbying the team for something like this.
-Chad

Tim Haines

unread,
Oct 14, 2009, 10:43:01 PM10/14/09
to Twitter Development Talk
Thanks Chad.

On Oct 15, 3:30 pm, Chad Etzel <c...@twitter.com> wrote:
> I agree. I'm lobbying the team for something like this.
> -Chad
>
>
>
> On Wed, Oct 14, 2009 at 10:21 PM, Josh Roesslein <jroessl...@gmail.com> wrote:
>
> > Yeah we really need a way to bulk request user payloads by giving a list of IDs.
>

Kyle Mulka

unread,
Oct 15, 2009, 7:46:23 AM10/15/09
to Twitter Development Talk
I agree that there needs to be a faster way to retrieve a lot of data
than the cursor method allows.

I'd also like to add that if a user is waiting on an app to pull their
data from Twitter, seconds of waiting feels like hours. This can't be
sped up by parallelizing if we have to use cursors.

Thanks,

--
Kyle Mulka
http://twilk.com

On Oct 14, 9:56 pm, Chad Etzel <c...@twitter.com> wrote:
> If you are pulling down the entire social graph, why not use the
> social graph calls which would deliver all 7000 ids in 2 calls?
>
> You can also parallelize this process by looping through different
> users on each thread instead of using each thread to grab a different
> page/cursor of the same user.
>
> Regarding the code issue you submitted, if you have the users cached
> locally, you could use the social graph methods to determine the
> missing/new 2k users pretty quickly using the social graph methods and
> comparing ids.
>
> -Chad
>

Kyle Mulka

unread,
Oct 15, 2009, 7:48:40 AM10/15/09
to Twitter Development Talk
I wonder if a query language like what Facebook has with its FQL might
help here. ;-)

--
Kyle Mulka
http://twilk.com

On Oct 14, 10:30 pm, Chad Etzel <c...@twitter.com> wrote:
> I agree. I'm lobbying the team for something like this.
> -Chad
>
> On Wed, Oct 14, 2009 at 10:21 PM, Josh Roesslein <jroessl...@gmail.com> wrote:
>
> > Yeah we really need a way to bulk request user payloads by giving a list of IDs.
>

Michael Steuer

unread,
Oct 15, 2009, 12:17:40 PM10/15/09
to twitter-deve...@googlegroups.com
That's great!! I'm currently using the suggested method (get IDs, then do
users/show for each of them) and it's horrendously slow and cumbersome. It'd
be great if you could get a 100 user objects at the time, based on 100 ids
you provide..

jmathai

unread,
Oct 15, 2009, 9:07:25 PM10/15/09
to Twitter Development Talk
I'm curious why you're using followers/ids and then users/show for
each id? I tried using that and using statuses/followers and found
that the total times were in the same ballpark. statuses/followers
requires far fewer api calls if you're interested in user objects.

FYI, I do want to add and say I agree that either method is EXTREMELY
inefficient. Regardless what the argument against pages and for
cursors are...the current implementation is painful from an end user
perspective. Our backend doesn't really care, but our users don't
like to wait 10-30 minutes for a web page to gather a social graph.

I wish instead of a cursor I could get a snapshot id, # of pages and a
page parameter. I don't know how it's implemented, but the ability to
deterministically parallelize the calls - is such a benefit to the end
user. Pages let me do that.

On Oct 15, 9:17 am, Michael Steuer <mste...@gmail.com> wrote:
> That's great!! I'm currently using the suggested method (get IDs, then do
> users/show for each of them) and it's horrendously slow and cumbersome. It'd
> be great if you could get a 100 user objects at the time, based on 100 ids
> you provide..
>
> On 10/14/09 7:30 PM, "Chad Etzel" <c...@twitter.com> wrote:
>
>
>
> > I agree. I'm lobbying the team for something like this.
> > -Chad
>
> > On Wed, Oct 14, 2009 at 10:21 PM, Josh Roesslein <jroessl...@gmail.com> wrote:
>
> >> Yeah we really need a way to bulk request user payloads by giving a list of
> >> IDs.
>

jmathai

unread,
Oct 15, 2009, 9:11:21 PM10/15/09
to Twitter Development Talk
For clarification, an api to get user objects in bulk given a set of
ids - would work just as well :).

Rooting for Chad on that one.

Tim Haines

unread,
Oct 15, 2009, 10:02:56 PM10/15/09
to twitter-deve...@googlegroups.com
FYI, My backend cares.  

Michael Steuer

unread,
Oct 20, 2009, 2:02:47 PM10/20/09
to twitter-deve...@googlegroups.com
Hi,

The reason why I’m using followers/ids and then users/show is efficiency:

I’m maintaining a local cache of my users social graph. I’m also maintaining local user objects for my users and for their followers. Since both the social graph and user info are subject to change, both need periodic updating... They way I’m doing that now is as follows:

  1. I request followers/ids for each of my users
  2. If I detect new followers I add them to my users social graph / If I detect followers removed, I remove them from my users social graph

Subsequently I parse my user object table for users whose:
  1. info hasn’t been updated in X days
  2. have no info because they were added as numeric IDs only via the followers/ids method described above

I then request users/show for each user matching condition 1 or 2 above.

This way, I only get an updated user object for each unique user once, when they’re first added, or when I expire a previous update to their info. When I get the followers of another new user, chances are I already know the majority of his followers user information.

I’m not using statuses/followers because I would be getting the same information over and over and over and over again... Especially when you’re talking about users with a lot of followers, it’s really inefficient considering you probably already store user info on most of the user’s followers... It would be an equally efficient method if overlap in followers didn’t exist... Since it does, I believe my approach is more efficient, and faster over time, as your user database grows and your basically just querying the social graph...

ALL THAT SAID – I would LOVE to have a method that allows me to get user objects in batch... If I could request 100 user objects by numeric id in one API call, the above would be exponentially efficient and result in far fewer calls to Twitter.

I am definitely interested in your feedback on my logic above and if you think it holds...

Thanks!

Michael.

Oren Rose

unread,
Oct 21, 2009, 7:23:34 AM10/21/09
to Twitter Development Talk
I vote for that, too!

Same scenario, same issues... bulk status request is the right
solution, also for users you get from the Search API...

= Oren

On Oct 20, 8:02 pm, Michael Steuer <mste...@gmail.com> wrote:
> Hi,
>
> The reason why I¹m using followers/ids and then users/show is efficiency:
>
> I¹m maintaining a local cache of my users social graph. I¹m also maintaining
> local user objects for my users and for their followers. Since both the
> social graph and user info are subject to change, both need periodic
> updating... They way I¹m doing that now is as follows:
>
> 1. I request followers/ids for each of my users
> 2. If I detect new followers I add them to my users social graph / If I
> detect followers removed, I remove them from my users social graph
>
> Subsequently I parse my user object table for users whose:
> 1. info hasn¹t been updated in X days
> 2. have no info because they were added as numeric IDs only via the
> followers/ids method described above
>
> I then request users/show for each user matching condition 1 or 2 above.
>
> This way, I only get an updated user object for each unique user once, when
> they¹re first added, or when I expire a previous update to their info. When
> I get the followers of another new user, chances are I already know the
> majority of his followers user information.
>
> I¹m not using statuses/followers because I would be getting the same
> information over and over and over and over again... Especially when you¹re
> talking about users with a lot of followers, it¹s really inefficient
> considering you probably already store user info on most of the user¹s
> followers... It would be an equally efficient method if overlap in followers
> didn¹t exist... Since it does, I believe my approach is more efficient, and
> faster over time, as your user database grows and your basically just
> querying the social graph...
>
> ALL THAT SAID ­ I would LOVE to have a method that allows me to get user
> objects in batch... If I could request 100 user objects by numeric id in one
> API call, the above would be exponentially efficient and result in far fewer
> calls to Twitter.
>
> I am definitely interested in your feedback on my logic above and if you
> think it holds...
>
> Thanks!
>
> Michael.
>
> On 10/15/09 7:02 PM, "Tim Haines" <tmhai...@gmail.com> wrote:
>
>
>
>
>
> > FYI, My backend cares.  
>

Harshad

unread,
Oct 22, 2009, 9:10:13 AM10/22/09
to Twitter Development Talk
Exactly the same scenario here [1] too. Querying with Bulk Ids would
save quite a bit of overhead for both parties.

Btw, if so many people are doing this graph-walking exercise, how
about collaborating and sharing this data? Feel free to contact me off-
list at { harshad.rj AT gmail }

[1] http://twinkler.in

Richard

unread,
Oct 22, 2009, 1:46:54 PM10/22/09
to Twitter Development Talk
I've got the same problem too with this. We were parallel fetching the
friends list but this new method is going to be too slow and I agree
with Josh that "we really need a way to bulk request user payloads by
giving a list of IDs"

Reply all
Reply to author
Forward
0 new messages