Need a list of Friends -- followers/ids.xml isn't enough

JakeS

unread,

Mar 5, 2009, 9:29:40 AM3/5/09

to Twitter Development Talk

I'd like to implement an "addressbook" or tab-completion for
@replies-- to make it easier for users to send a message to a specific
user without having to type out the whole username. Unfortunately,
there doesn't seem to be a way to get the full list of friends names
for a user.

Any good ideas on how I can do this?

Doug Williams

unread,

Mar 5, 2009, 10:59:37 AM3/5/09

to twitter-deve...@googlegroups.com

Jake,
There are two options. Since I don't know what you've looked into, I'll list them below.

You can use the social graph API [1] to gain a list of all other friend IDs in a single call. This method however does not return screen name of a user as explained in the linked discussion. This is, however a good method to easily get a complete list of a user's friends' IDs which could then be used with individual calls to the users/show method [2] to cache screen name values.

The second, and less API intensive method to retrieve a list of all screen names is to page and parse through a user's friends with paginated calls to the statuses/friends method [3].

[1] - http://groups.google.com/group/twitter-development-talk/browse_thread/thread/98f4c4d13954e8bf/8ab074e13a12fc82?#8ab074e13a12fc82
[2] - http://apiwiki.twitter.com/REST+API+Documentation#show
[3] - http://apiwiki.twitter.com/REST+API+Documentation#friends

Thanks,
Doug
@dougw

--
Doug Williams

do...@igudo.com
http://www.igudo.com

Nick Arnett

unread,

Mar 5, 2009, 11:04:11 AM3/5/09

to twitter-deve...@googlegroups.com

There's no good way, as far as I know. You can get the list of IDs from the social graph calls, but to resolve those to names you either have to get the entire friends timeline (http://twitter.com/statuses/friends/) - which won't include people who have never tweeted, not that that matters a lot, or do a user show for every one of them. And if you do the former (get the entire friends status timeline), there really isn't any point in getting the social graph friend ID list, since it should be identical other than those who have never tweeted.

IIRC, Alex said that there would be a huge performance hit to return the social network data as names rather than IDs.

Nick

Nick Arnett

unread,

Mar 5, 2009, 11:07:28 AM3/5/09

to twitter-deve...@googlegroups.com

On Thu, Mar 5, 2009 at 7:59 AM, Doug Williams <do...@igudo.com> wrote:

Jake,
There are two options. Since I don't know what you've looked into, I'll list them below.

You can use the social graph API [1] to gain a list of all other friend IDs in a single call. This method however does not return screen name of a user as explained in the linked discussion. This is, however a good method to easily get a complete list of a user's friends' IDs which could then be used with individual calls to the users/show method [2] to cache screen name values.

Um, "easily?" When people have hundreds or thousands of friends? Maybe easy to write the code, but extremely SLOW and a big consumer of API calls.

The second, and less API intensive method to retrieve a list of all screen names is to page and parse through a user's friends with paginated calls to the statuses/friends method [3].

I think TweetDeck populates its screen name lists (for creating groups) from the statuses it receives. That's a PITA for the user when they want to add somebody who hasn't tweeted recently, but it eventually catches up with all the active users.

Nick

Chad Etzel

unread,

Mar 5, 2009, 11:16:55 AM3/5/09

to twitter-deve...@googlegroups.com

There is an open issue that is related:
"New API method request, return friend starting with..."
http://code.google.com/p/twitter-api/issues/detail?id=207

There is some discussion of its merit in the bug description itself,
but you should star it if it would suit your needs.

-Chad

Doug Williams

unread,

Mar 5, 2009, 11:25:00 AM3/5/09

to twitter-deve...@googlegroups.com

Nick,
These methods aren't perfectly complete for every use case, and that's why we are here discussing them. Note that "easily" modifies the work necessary to retrieve the list of IDs. So yes, that is "easily" done. I then mentioned the con: that individual API calls to users/show were necessary to cache screen name values. For someone wanting a complete list of IDs, this is probably the best choice at this time, because as you've noted, the statuses/friends method is not always complete.

The usefulness of the social graph methods for use cases such as Jake's has already been discussed on this board so I'd like to avoid rehashing that argument (see [1] from post number 2 above).

Lets move to architecture. I can see the following being highly effective:

[APPLICATION LOGIC] <-> [CACHING LAYER/DATABASE] <-> [TWITTER]

Where the application logic uses the social graph method to download the list of friends, then checks with the caching layer if a screen name has already been resolved. In the case that it has not been previously resolved, the caching layer accesses the data via a users/show call to Twitter.

You will obviously pay a penalty with the initial users that use your application. Almost all of their users will be misses in the cache. But from my experience, users are followed according to a long tail distribution, so after a while, the caching layer would begin to pay off and calls to resolve screen names would fall off precipitously. I have used this method with great success. (It should be noted, that cached screen names should eventually expire since a user is allowed to change their screen name. Again, not a perfect solution, but programming is a practice of compromise, right?).

Thoughts?

Doug
@dougw

On Thu, Mar 5, 2009 at 11:07 AM, Nick Arnett <nick....@gmail.com> wrote:

Nick Arnett

unread,

Mar 5, 2009, 11:31:45 AM3/5/09

to twitter-deve...@googlegroups.com

On Thu, Mar 5, 2009 at 8:25 AM, Doug Williams <do...@igudo.com> wrote:

Lets move to architecture. I can see the following being highly effective:

[APPLICATION LOGIC] <-> [CACHING LAYER/DATABASE] <-> [TWITTER]

Where the application logic uses the social graph method to download the list of friends, then checks with the caching layer if a screen name has already been resolved. In the case that it has not been previously resolved, the caching layer accesses the data via a users/show call to Twitter.

That's exactly what I'm doing, but for a slightly different purpose (social network analysis) than the original poster's... but there is one additional wrinkle. Since Twitter users can change their screen names, I'm refreshing the cache every few days for everybody I'm tracking. Come to think of it, I'd love to hear from Twitter how often people really do change their user names. I think I'll post a new thread on that.

Nick

Nick Arnett

unread,

Mar 6, 2009, 12:46:43 PM3/6/09

to twitter-deve...@googlegroups.com

On Thu, Mar 5, 2009 at 7:59 AM, Doug Williams <do...@igudo.com> wrote:

The second, and less API intensive method to retrieve a list of all screen names is to page and parse through a user's friends with paginated calls to the statuses/friends method

I've been trying this out for the last day or so... unfortunately, it turns out to be quite slow. The problem is that you have to slog through many, many statuses (100 at a time, of course) to get the screen_names. It looks to me as though the most efficient approach will be to use the statuses to get the most active users' names, abandon it when getting news statuses isn't yielding many new names, then use the show call to get the rest if you really want them.

Nick

David Neubauer

unread,

Mar 6, 2009, 1:57:22 PM3/6/09

to twitter-deve...@googlegroups.com

Read the status ids into a holding table and run a routine that gently matches those ids to user /show api returns?

Regards,

David Neubauer

832-252-9004

Doug Williams

unread,

Mar 6, 2009, 2:15:21 PM3/6/09

to twitter-deve...@googlegroups.com

Nick,
Are you using a caching layer? Initialization of the cache will of
course be slow since every user will need to be looked up with a
users/show call, but the cache should eventually pay off after the
most active users have been entered.

Doug
@dougw

--

Nick Arnett

unread,

Mar 6, 2009, 2:49:19 PM3/6/09

to twitter-deve...@googlegroups.com

On Fri, Mar 6, 2009 at 11:15 AM, Doug Williams <do...@igudo.com> wrote:

Nick,
Are you using a caching layer? Initialization of the cache will of
course be slow since every user will need to be looked up with a
users/show call, but the cache should eventually pay off after the
most active users have been entered.

Yes, I'm putting them into a database... it is especially slow because I decided to capture the status data in addition to the user name, so there's a fair bit of overhead. I'm going to code up a light version that just grabs names and see how much faster it is. I'm fairly sure it's nothing people would want to wait for in real time... so as you say, building up the cache is the key.

The speed of my current code is also limited by the database, which is CPU-bound. I'm using an old server that will benefit from a lot more memory, which I'm going to go and purchase this afternoon! I haven't offered any hard numbers because of these constraints... and I may be bandwidth-limited some of the time.

I'll post some numbers at some point. I'm wondering how many unique screen names I'm getting, on average, per API call (it's less than 100 because there will be multiple statuses for some people) and what the average latency is. Those are things I can't control, so they ultimately will create the upper boundary.

By the way, I'm not just looking at this as a problem. It's also an opportunity and I may have a source for the resources to address it.

Nick

Doug Williams

unread,

Mar 6, 2009, 2:56:03 PM3/6/09

to twitter-deve...@googlegroups.com

Nick,
Have you looked into memcached [1]? Attribute-value pair caching is
what it was designed to do. Perfect for the write-through cache that
is needed here. It will also handle the pesky details like resolution
expiry for you, too. If you would like help, ping me offline, I can
get you started.

[1] - http://www.danga.com/memcached/

Doug Williams
@dougw

--

Nick Arnett

unread,

Mar 6, 2009, 3:04:17 PM3/6/09

to twitter-deve...@googlegroups.com

On Fri, Mar 6, 2009 at 11:56 AM, Doug Williams <do...@igudo.com> wrote:

Nick,

Have you looked into memcached [1]? Attribute-value pair caching is
what it was designed to do. Perfect for the write-through cache that
is needed here. It will also handle the pesky details like resolution
expiry for you, too. If you would like help, ping me offline, I can
get you started.

New to me and thanks, that definitely looks useful, especially since I have a couple of shell accounts that tend to do nothing most of the time.

This sentence really caught my eye: "MySQL's query cache destroys the entire cache for a given table whenever that table is changed. On a high-traffic site with updates happening many times per second, this makes the the cache practically worthless. In fact, it's often harmful to have it on, since there's a overhead to maintain the cache."

I hadn't even thought to consider if the cache is harmful to performance, but the moment I read that, I realized it probably is, since my tables are changing constantly.

And good, there's a Python client. Oh, and a pooling one. Cool.

I'll let you know if I need help. Thanks!

Nick

JakeS

unread,

Mar 7, 2009, 8:16:44 AM3/7/09

to Twitter Development Talk

Thanks for the discussion here. It's a shame there's no easy way to
get this information from twitter itself, especially since the names
are subject to change.

I'll probably just have to keep a list of user names that have
appeared in the seen timelines and use that. It's not perfect, but
will save the users some hassle in trying to type out screen names.

Reply all

Reply to author

Forward