Larger Users Not Returning Follower Data

6 views
Skip to first unread message

Jesse Stay

unread,
Jun 3, 2009, 10:22:20 PM6/3/09
to twitter-deve...@googlegroups.com
I was discussing this with Iain, and have also talked about it with Damon, so I know I'm not alone in this.  I am having huge issues retrieving follower and friend data for the larger users (1 million+ followers), most of the time returning 502 Bad Gateway errors.  I know there are a few of these users getting really frustrated about our apps not being able to retrieve data for them.  Is there a plan to fix this?  Is the API team aware of this?  Any ETA by chance?

Thanks,

@Jesse

Doug Williams

unread,
Jun 3, 2009, 11:56:21 PM6/3/09
to twitter-deve...@googlegroups.com
What methods in particular are you referring to? The social graph methods now support paging so retrieving all of that data is now possible, where it used to throw 502s. It does however require a bit of application logic to assume when paging is necessary (e.g. large follower counts). Additionally, we are making changes to the databases which cause latency that result in periodic 502s. We are not able to give definitive ETAs on these fixes due to priorities that change as unforeseeable critical needs arise. 

More specificity would be beneficial. Do you have a replaceable bug, problem, or suggestion that you would like to discuss?

Thanks,
Doug

Jesse Stay

unread,
Jun 4, 2009, 12:19:11 AM6/4/09
to twitter-deve...@googlegroups.com
In my case specifically it's the Social Graph methods.  I didn't realize you had paging available now.  Is there some logic as to when I should expect to page and when I can just rely on the full result?

Jesse

Doug Williams

unread,
Jun 4, 2009, 2:26:47 AM6/4/09
to twitter-deve...@googlegroups.com
I've heard that list sizes greater than 150K-200K start to return timeouts at higher rates. Although I'd enjoy hearing first-hand experiences and recommendations.

Thanks,
Doug

Jesse Stay

unread,
Jun 4, 2009, 5:07:07 AM6/4/09
to twitter-deve...@googlegroups.com
Yes, that's what appears to be happening.  My experience starts at around 500K+.  I'm okay with waiting with my script if you guys need to take longer to retrieve the info.  Or if you'd prefer we paginate I'll start doing that as well.  Maybe a hard limit of 200K and you have to Page to get above that?

Jesse

Jesse Stay

unread,
Jun 4, 2009, 5:12:35 AM6/4/09
to twitter-deve...@googlegroups.com
Also, how do you recommend we deal with the larger users that would like to follow back their followers? With the hard limit of 1,000 follows per day, there is no way they'll ever catch up, as some of them have more than 1k new followers per day as is.  If this limit were more dynamic based on the size of the user that would be nice.  Capabilities to follow people in bulk may also help.

Of course, I think many of these would no longer need to follow back if they could just have the option to enable anyone to DM them if they choose.  I think that's the underlying cause to want to auto-follow for most people.  The only other cause is for an additional token/feeling of community, although I think many would be willing to forgo that if they had the ability to just allow everyone to DM them - it feels good to have someone you admire follow you back, even if it's not 100% sincere.

Jesse

Doug Williams

unread,
Jun 4, 2009, 4:23:19 PM6/4/09
to twitter-deve...@googlegroups.com
We would like users to be judicious with their following habits and only follow users who contribute value to their timeline. This justifies the following limits we impose.

We are aware that many users would like to accept all incoming directs. This, along with the quid pro quo following to build community, capture the majority of the use-cases for auto-following. We are discussing internally how to best approach these two uses within the bounds of the product we are trying to build.  At this time we have nothing to report but know we are actively thinking about these ideas.

Thanks,
Doug

Sean Scott

unread,
Jun 4, 2009, 4:31:35 PM6/4/09
to twitter-deve...@googlegroups.com
Just speaking from a user perspective, I'd love to see that debate about opening DM to senders who you are not following to the community as a whole or a representative subset of them.  By opening DMs to non-followed twitters, it would be way to easy for spammers to start spamming via DMs.  From a user perspective i don't see a compelling argument for opening DMS to folks i do not follow.  

Off my soap box

Caliban Darklock

unread,
Jun 4, 2009, 6:18:46 PM6/4/09
to twitter-deve...@googlegroups.com
On Thu, Jun 4, 2009 at 1:23 PM, Doug Williams <do...@twitter.com> wrote:
>
> We are aware that many users would like to accept all incoming directs.

Sounds like a checkbox in your profile to me.

JDG

unread,
Jun 4, 2009, 6:34:24 PM6/4/09
to twitter-deve...@googlegroups.com
To provide a slightly different user perspective, I think it would be a nice user-level setting to be able to "accept DMs from users I do not follow" That said, I also understand the potential complexity and performance issues such a setting could present, so I'm not expecting it any time soon, but I'm just injecting my opinion into the conversation.
--
Internets. Serious business.

Jesse Stay

unread,
Jun 4, 2009, 11:54:35 PM6/4/09
to twitter-deve...@googlegroups.com
Sean, why not let the users decide that though? If I enable the option for my account it's my responsibility to weed out the spam.  If I don't want the spam then I won't enable it on my account.  Giving users multiple options is a good thing.

Jesse

Sean Scott

unread,
Jun 5, 2009, 12:09:41 AM6/5/09
to twitter-deve...@googlegroups.com
Jesse,

If the implementation is to make that a preference which is turned off by default (no DM by non followers) that users can toggle, then i am totally for it.  As you point out its then the users responsibility to clean their inboxes if they get hit by spam after turning the feature on.

So for what it counts, I'm all in favor allowing DMs from non followers if its a preference users can control.

Sean

Jesse Stay

unread,
Jun 8, 2009, 9:19:04 PM6/8/09
to twitter-deve...@googlegroups.com
Doug, et. al., here's the problem(s) I'm running into.  By forcing me to use paging for followers/ids and friends/ids, for someone like BritneySpears I now have to make over 350 requests to get through all her followers.  Now I'm having huge rate limit issues because of that, not to mention how long it takes to get through the entire list.  Would it be possible to set a number to specify how many user ids are returned per page so I don't have to make so many requests?

In addition, I'm finding the pages aren't returning consistent data.  Some are returning less than 5,000 results, and some aren't even returning data that should.  So even with Paging I'm still unable to get through all of @britneyspears' followers.  Any suggestions?

@Jesse

On Thu, Jun 4, 2009 at 12:26 AM, Doug Williams <do...@twitter.com> wrote:

Ho John Lee

unread,
Jun 9, 2009, 12:45:46 AM6/9/09
to Twitter Development Talk
I'll share are a few other wrinkles I've observed in the paged
friends / followers API responses:

- There are often more (sometime many more) pages than you'd expect
based on the follower/friend counts listed in the profile
- Pages in the middle of a long series of output pages usually but not
always return 5000 elements
- The only reliable way to determine the end of available output pages
is to keep asking until an empty JSON response "[]" is returned
- Friends / follow pages occasionally contain some duplicate entries
from one page to another

On the positive side, the paged output methods are *far* more reliable
for high friend/follow lists than the old ones, when the list is long.
The old methods were more convenient but only worked a small fraction
of the time when the list was very long.

I have test data for britneyspears from a few days ago, at that time
the profile said 1,644,227, which would imply around 328 pages. There
were actually 356 pages, containing 1,772,771 entries. I just looked
at the profile page now and it says 1,745,417, which would still
suggest 349 pages of data, less than what is actually there. And
several of the pages return less than 5000 entries, so you can't
assume that an unfilled page represents the end of the data.


On Jun 8, 6:19 pm, Jesse Stay <jesses...@gmail.com> wrote:
> Doug, et. al., here's the problem(s) I'm running into.  By forcing me to use
> paging for followers/ids and friends/ids, for someone like BritneySpears I
> now have to make over 350 requests to get through all her followers.  Now
> I'm having huge rate limit issues because of that, not to mention how long
> it takes to get through the entire list.  Would it be possible to set a
> number to specify how many user ids are returned per page so I don't have to
> make so many requests?
> In addition, I'm finding the pages aren't returning consistent data.  Some
> are returning less than 5,000 results, and some aren't even returning data
> that should.  So even with Paging I'm still unable to get through all of
> @britneyspears' followers.  Any suggestions?
>
> @Jesse
>
> On Thu, Jun 4, 2009 at 12:26 AM, Doug Williams <d...@twitter.com> wrote:
> > I've heard that list sizes greater than 150K-200K start to return timeouts
> > at higher rates. Although I'd enjoy hearing first-hand experiences and
> > recommendations.
> > Thanks,
> > Doug
>
> > On Wed, Jun 3, 2009 at 9:19 PM, Jesse Stay <jesses...@gmail.com> wrote:
>
> >> In my case specifically it's the Social Graph methods.  I didn't realize
> >> you had paging available now.  Is there some logic as to when I should
> >> expect to page and when I can just rely on the full result?
> >>  Jesse
>
> >> On Wed, Jun 3, 2009 at 9:56 PM, Doug Williams <d...@twitter.com> wrote:
>
> >>> What methods in particular are you referring to? The social graph methods
> >>> now support paging so retrieving all of that data is now possible, where it
> >>> used to throw 502s. It does however require a bit of application logic to
> >>> assume when paging is necessary (e.g. large follower counts). Additionally,
> >>> we are making changes to the databases which cause latency that result in
> >>> periodic 502s. We are not able to give definitive ETAs on these fixes due to
> >>> priorities that change as unforeseeable critical needs arise.
> >>> More specificity would be beneficial. Do you have a replaceable bug,
> >>> problem, or suggestion that you would like to discuss?
> >>> Thanks,
> >>> Doug
>

Doug Williams

unread,
Jun 9, 2009, 1:15:39 AM6/9/09
to twitter-deve...@googlegroups.com
Jesse,
Please submit an issue if you feel that this would contribute to the community. There are issues for paging bugs with the social graph methods so star them appropriately.

I have some questions to the community at large using the social graph methods so please feel free to chime in:

What is your caching scheme? How dependent is your data on being real time and why? What type of value are you generating from this data?

What is your use case? There is interest in popular users' social graphs but from what I've seen they are rather edge case in terms of the value the contribute back to the community. A valuable use-case outside of a very specific need would help in prioritizing requests like this.

I'm trying to understand where you (the community, not just Jesse) are generating value from these large follower lists so please feel free to chime in if you are doing projects on top popular users.

Thanks,
Doug

Jesse Stay

unread,
Jun 9, 2009, 3:46:38 AM6/9/09
to twitter-deve...@googlegroups.com
Well here's my answers:

On Mon, Jun 8, 2009 at 11:15 PM, Doug Williams <do...@twitter.com> wrote:
Jesse,
Please submit an issue if you feel that this would contribute to the community. There are issues for paging bugs with the social graph methods so star them appropriately.

Will do - I like to present here first because it opens it up for others to also share their issues and ideas, and helps us get to the bottom of the underlying issue.
 
I have some questions to the community at large using the social graph methods so please feel free to chime in: 

What is your caching scheme? How dependent is your data on being real time and why? What type of value are you generating from this data?

I cache all followers, and when looking for new followers compare the followers list with that of my cache to find the differences.  I also keep a cache of the individual users.  All this plays a huge part in helping others learn what spammers are following them so they can take action appropriately.  We also provide several blacklisting options for each user to help prevent spam at the discretion of each user.  The goal is to provide as much information as possible about the actions of their followers - the big brands on Twitter are all interested in this.  If you contact me privately I would be happy to share with you a good list of those brands using this and interested in its value.
 
Real-time social graph changes are important - by knowing when a user follows another I can know what Tweet could have been the reason of the follow, where they were located at the time, the exact time of the follow, and keep a history of such.  If I notice multiple follows by the same user in a single day I can know very quickly whether to flag them so they can decide whether to mark them as spam or not.


What is your use case? There is interest in popular users' social graphs but from what I've seen they are rather edge case in terms of the value the contribute back to the community. A valuable use-case outside of a very specific need would help in prioritizing requests like this.

In my case there are major celebrities, charities, and brands who want their followers to a) be able to contact them, and b) make their followers feel at least some semblance that they care about them.  A simple follow, while in reality not much in meaning, means a lot to those being followed.  People want to be part of these individuals networks, and being followed by that individual gives them a sense of belonging.  

In addition, these users want to learn more about the people that follow them.  Keeping a list of the new and the old and history and analytics around these users is very powerful. All this is very important to them, to the extent they are willing to pay for it.
 

I'm trying to understand where you (the community, not just Jesse) are generating value from these large follower lists so please feel free to chime in if you are doing projects on top popular users.

In my case I actually have people willing to pay for this information, so the value to me personally is part of my business model.  The users themselves also take value from this for the reasons I stated above.  In fact, the more popular the user, the more important this becomes because it becomes much, much harder to manage themselves.

Jesse
Reply all
Reply to author
Forward
0 new messages