Heads up: per-IP rate limits for unauthenticated API requests are pending

263 views
Skip to first unread message

Alex Payne

unread,
Jul 10, 2008, 7:23:08 PM7/10/08
to twitter-deve...@googlegroups.com
As we hope you've noticed, the site has been feeling much snappier for
both web and API requests over the past several days, even despite the
increased rate limit. In our continued effort to keep things fast and
prevent abuse we're planning on introducing rate-limiting by IP for
unauthenticated API requests. We'll allow 100 unauthenticated
requests per IP per hour, just as we currently do with authenticated
requests.

Please let us know if you foresee any ghastly issues with this change.
It won't go into production until early next week at the soonest.

--
Alex Payne
http://twitter.com/al3x

Brett Morgan

unread,
Jul 10, 2008, 8:11:55 PM7/10/08
to twitter-deve...@googlegroups.com
How are you planning on dealing with people behind large ISP proxies?

--

Brett Morgan http://brett.morgan.googlepages.com/

Alex Payne

unread,
Jul 10, 2008, 8:16:49 PM7/10/08
to twitter-deve...@googlegroups.com
Well, we can't trust cookies or user agents, so that's a tough one.
We're basically assuming that if a given IP is hammering on us, it's
far more likely to be an abusive user than a bunch of users behind a
proxy. We've never once seen a bunch of users behind a proxy show up
in our abuse logs to the best of my knowledge.

Rob Iles

unread,
Jul 10, 2008, 9:55:52 PM7/10/08
to twitter-deve...@googlegroups.com
Hi Alex,

I'm new to Twitter, but have a lot of experience in administrating large scale web sites (commercial and otherwise).

Some of the issues you're likely to come across, as I believe Brett was alluding to, are the likes of AOL who implement "transparent" web proxies. Here in the UK, NTL (Cable provider) and others do a similar thing - automagically intercept all traffic on port 80, and proxy/cache it - no doubt with the intention of serving content more quickly and incurring fewer transit charges or somesuch.  The trouble is, the end user, the ISP's customer, is generally unaware of this - hence the name  "transparent proxy" - - with such malfeasants in the pipeline, it's entirely possible that you will see multiple hits from a single IP (that of the prox(y/ies)) - and may end up blocking "innocent" users.

It might be worth considering adding a Whitelist to your rate-limiter, so that you / the team can look at the IP's that are flooding you, and if they turn out you be "megaproxies", do $something_different; (either allow all, or be more generous).

As you'll have full access to all the data that's hammering you, perhaps some heuristic analysis could be employed, if it's the same username, it's clearly abuse, otherwise consider X, Y, Z etc...

Hope this is of some help, keep up the good work :)

Rob

 

2008/7/11 Alex Payne <al...@twitter.com>:



--


Rob Iles

RMIDevelopment

Web: http://www.rob-iles.co.uk/rmidevelopment
Twitter: http://twitter.com/Rob_Iles
Mobile: 079 6666 1092
Skype: rob_iles

Alex Payne

unread,
Jul 10, 2008, 9:58:56 PM7/10/08
to twitter-deve...@googlegroups.com
Thanks for the suggestions, Rob!

tweetip

unread,
Jul 10, 2008, 11:05:18 PM7/10/08
to Twitter Development Talk
> megaproxies

A few more examples

- AT&T iPhone
- Verizon cell phones
- etc


Cameron Kaiser

unread,
Jul 10, 2008, 11:45:20 PM7/10/08
to twitter-deve...@googlegroups.com
> > megaproxies
>
> A few more examples
>
> - AT&T iPhone

No. The AT&T network has multiple IP exit points, and the iPhone does not use
a proxy.

--
------------------------------------ personal: http://www.cameronkaiser.com/ --
Cameron Kaiser * Floodgap Systems * www.floodgap.com * cka...@floodgap.com
-- The only thing to fear is fearlessness -- R. E. M. -------------------------

tweetip

unread,
Jul 11, 2008, 12:22:52 AM7/11/08
to Twitter Development Talk
> No. The AT&T network has multiple IP exit points, and the iPhone does not use
> a proxy.

Our testing shows a range - as growth of the iPhone ramps up, it could
create an issue.

Kee Hinckley

unread,
Jul 11, 2008, 1:44:50 AM7/11/08
to twitter-deve...@googlegroups.com
On Jul 10, 2008, at 7:23 PM, Alex Payne wrote:
> unauthenticated API requests. We'll allow 100 unauthenticated
> requests per IP per hour, just as we currently do with authenticated
> requests.
>
> Please let us know if you foresee any ghastly issues with this change.
> It won't go into production until early next week at the soonest.

We are building a web-based twitter reader. In particular, that means
we are proxying all requests from the user (which is a win for you,
because we're caching things like friendship relationships, tweets
read by multiple users...). The calls we make that currently don't
require authentication (e.g. getting lists of who someone follows,
getting status on the "@foo" references in a tweet) are made on behalf
of multiple twitter users. We're going to blow past 100 requests per
hour without even trying.

So yes... I have ghastly issues with the change.

This isn't even just a matter of our exceeding the limits. Writing
code to work inside of them is as much of an issue. For instance, we
are caching tweets. When we fetched the tweet, we got the URL for the
user's image. But that gets stale. So we need to periodically query so
that when the user looks at old tweets, they don't get broken icons.
We can (and are) smart about updating the info if we see a new tweet
from the user, but that doesn't handle all the cases. Previously
getting that info was a "free" call. You're making it have a cost. So
now we need to figure out how to juggle those calls and spread them
out so we don't exceed the limit. It's a nice barrier-to-entry for
other developers, but frankly, I'd rather work on features. These
constant changes are making it very difficult to develop applications,
let alone plan ahead.

I also agree with others who point out that proxies and NATs are going
to cause problems for you. Although I actually suspect the major ones
won't be with ISPs, but with companies with multiple Twitter users.

One suggestion. It won't help us very much, but it might help some of
the other cases. That's to make the limit per-ip-per-user. Which is to
say, give a user 100 authenticated calls (current set), plus 100
didn't-have-to-be-authenticated-but-are calls (new set). Odds are that
most clients are making all on-behalf-of-a-user calls authenticated
anyway--even if they don't have to be. When that is the case, you
don't really care about the IP address.

Kee Hinckley
CEO/CTO Somewhere Inc.
Somewhere: http://www.somewhere.com/
TechnoSocial: http://xrl.us/bh35i
I'm not sure which upsets me more; that people are so unwilling to
accept responsibility for their own actions, or that they are so eager
to regulate those of everybody else.


Evan Weaver

unread,
Jul 11, 2008, 1:52:16 AM7/11/08
to twitter-deve...@googlegroups.com
Yes, authenticated users will not be limited by ip for any request.

Evan

--
Evan Weaver

E.B.

unread,
Jul 11, 2008, 2:01:04 AM7/11/08
to twitter-deve...@googlegroups.com
I think he means, 100 unathenticated per-ip-per-user.

If you can't do it, that's fine, we all just have to work something out via
par...@twitter.com as you suggested before.

Thanks for the heads up.

E.B.

Evan Weaver

unread,
Jul 11, 2008, 2:12:09 AM7/11/08
to twitter-deve...@googlegroups.com
That would be the same as 200 requests of any type per authenticated
user; we can probably reach that goal.

Evan

--
Evan Weaver

Richard

unread,
Jul 11, 2008, 11:35:01 AM7/11/08
to Twitter Development Talk
This is very short notice.

I can see going past this number easily. Is the limit 100/per hour
every hour? or is it an average over a day or a week?
Are you assuming everyone is running desktop clients with one user,
what about web apps?
Will it matter how many users you have?
does this mean we must stay a tiny site forever because I can never
request more than 100/hour?

Currently when a user joins FriendBinder (my site) we have to do a
fetch for each of a user's friends that have updated with in the last
2 weeks (typically most of them).
This can often be 1000 requests for one user.

An alternative to the method we use is to use the user_timeline
method, which requires authentication which is annoying because it
means we have to ask each user for their password and this puts many
users off. When is twitter going to support oauth or similar. I think
oauth support is needed before you make this change and others like it
that push people onto authenticated requests.

Also it seems that the user_timeline method has only started allowing
paging and decent sized pages (20 results vs. 200) on the 7th of July
and I have based code on the fact that I couldn't use the
user_timeline properly and now very quickly you are changing it so
effectively I can *only* use it.

Is the following issue going to be fixed before this limit comes into
place:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/6c1b09be52e4c591

I think you are changing the way the API works too quickly for people
to keep up and raise concerns, bare in mind we have to write, debug
and test code too.

Am I only the only one who sees major problems with this other than
the megaproxies issue?

jstrellner

unread,
Jul 11, 2008, 12:52:00 PM7/11/08
to Twitter Development Talk
Alex,

This change will probably take Twitturly offline in the first hour.
We use the http://twitter.com/statuses/user_timeline/USER.json call
exclusively to get information about a user, somewhere to the tune of
about 7,000 per hour, depending on how many tweets per hour have URLs
in them.

We have always identified our user agent as "Twitturly / v0.5" (or
whatever the current version at that time is).

We usually run multiple servers to do the updates, but it would be way
to prohibitive to run 70 of them, especially when we have gotten our
system optimized to the point that we can run fine on one, or two
update server when in the middle of a peak.

Are you going to allow exceptions to this new rule?

Sincerely,
Joel Strellner

Evan Weaver

unread,
Jul 11, 2008, 1:08:26 PM7/11/08
to twitter-deve...@googlegroups.com
We will allow some exceptions, but this way at least we will know
about them, instead of getting hammered by anyone who wants to do 7000
requests an hour without warning.

Can't you get that user data inline from the tweets?

Evan

--
Evan Weaver

jungle

unread,
Jul 11, 2008, 2:14:57 PM7/11/08
to Twitter Development Talk
Both our sites, twist.flaptor.com and twittersearch.flaptor.com would
also be shut down by this.
If we can't get into the exception list for the public timeline either
through the API or the Jabber feed, we're history.
Please advise.

On Jul 11, 1:52 pm, jstrellner <jstrell...@urltrends.com> wrote:
> Alex,
>
> This change will probably take Twitturly offline in the first hour.
> We use thehttp://twitter.com/statuses/user_timeline/USER.jsoncall
> exclusively to get information about a user, somewhere to the tune of
> about 7,000 per hour, depending on how many tweets per hour have URLs
> in them.
>
> We have always identified our user agent as "Twitturly / v0.5" (or
> whatever the current version at that time is).
>
> We usually run multiple servers to do the updates, but it would be way
> to prohibitive to run 70 of them, especially when we have gotten our
> system optimized to the point that we can run fine on one, or two
> update server when in the middle of a peak.
>
> Are you going to allow exceptions to this new rule?
>
> Sincerely,
> Joel Strellner
>

Evan Weaver

unread,
Jul 11, 2008, 2:45:30 PM7/11/08
to twitter-deve...@googlegroups.com
What if we exempted *only* the public timeline?

Evan

--
Evan Weaver

tweetip

unread,
Jul 11, 2008, 3:06:10 PM7/11/08
to Twitter Development Talk
> What if we exempted *only* the public timeline?

We feel twitter assumes we developers don't include the timeline when
discussing rate limits. But with the Jabber fiasco, developers feel
out of the loop. Ask Alex about our endless emails this week wondering
what's going on :)

Michael

jstrellner

unread,
Jul 11, 2008, 3:09:56 PM7/11/08
to Twitter Development Talk
Evan,

I posted it earlier, but it doesn't look like it took.

We need that call primarily for spam reasons. We analyze their past
tweets as well as the information provided about the user in that call
to see if the current tweet that we are parsing is or could be spam.

We currently get the information from Summize to ease the load on the
twitter servers. We do not currently have access to the XMPP feed so
Summize was needed and in a lot of ways better for us since they can
add filters and we don't need to parse tons of tweets that we would
just have to ignore. The one downfall to using Summize that we found
was that they do not return the twitter user id, but an internal one
that they have assigned to that user. This call helps resolve that
issue too, without making any additional calls to the API.

Without this call, Twitturly would have a lot of spam showing in our
results and wouldn't be nearly as useful for our users.

FYI: you can see what Twitturly does by going here: http://twitturly.com

Sincerely,
Joel Strellner


On Jul 11, 10:08 am, "Evan Weaver" <ewea...@twitter.com> wrote:
> We will allow some exceptions, but this way at least we will know
> about them, instead of getting hammered by anyone who wants to do 7000
> requests an hour without warning.
>
> Can't you get that user data inline from the tweets?
>
> Evan
>
>
>
> On Fri, Jul 11, 2008 at 12:52 PM, jstrellner <jstrell...@urltrends.com> wrote:
>
> > Alex,
>
> > This change will probably take Twitturly offline in the first hour.
> > We use thehttp://twitter.com/statuses/user_timeline/USER.jsoncall
> > exclusively to get information about a user, somewhere to the tune of
> > about 7,000 per hour, depending on how many tweets per hour have URLs
> > in them.
>
> > We have always identified our user agent as "Twitturly / v0.5" (or
> > whatever the current version at that time is).
>
> > We usually run multiple servers to do the updates, but it would be way
> > to prohibitive to run 70 of them, especially when we have gotten our
> > system optimized to the point that we can run fine on one, or two
> > update server when in the middle of a peak.
>
> > Are you going to allow exceptions to this new rule?
>
> > Sincerely,
> > Joel Strellner
>

Evan Weaver

unread,
Jul 11, 2008, 3:19:08 PM7/11/08
to twitter-deve...@googlegroups.com
No, you're right that we're talking about limiting every request.

Evan

--
Evan Weaver

tweetip

unread,
Jul 11, 2008, 4:14:02 PM7/11/08
to Twitter Development Talk
Evan & Alex,

Fwiw, here's how each install of our desktop app plans to use the API:

- we poll the api up to 75 times/hour for non-public timeline data
(100 is plenty and will create a near realtime feel)

- we poll the api public timeline approx 3000 times/hour for client
specific mining. As tweet volume is increasing 5% per day via the api,
we'll need to hit the api more or fail under current api restrictions,
which is the since_id only returns 20 tweets. We stayed away from
coding for jabber - and hope we can continue to do what we need via
the api. Additionally the Summize data feed does not fulfill our
needs. This rate issue may come down to us setting up our own servers
for our clients to thump. But even that plan may be restricted in the
future as twitter refines their data use policy. Last night, we made a
decision to decrease our development efforts until we see concrete
policies coming from the biz side of twitter. We feel, for the first
time, what we're doing may not be ultimately acceptable to twitter.
And we're in final beta stage...

- Iow, we view twitter tech issues and twitter biz issues out of sync
with each other - enough so to cause us significant concern. This
doesn't imply we're mad - we rode into this corner of mirrors
voluntarily, knowing twitter policies on the api and the data would
change.

hth

Michael :)
Moab

Stut

unread,
Jul 11, 2008, 4:27:55 PM7/11/08
to twitter-deve...@googlegroups.com
On 11 Jul 2008, at 21:14, tweetip wrote:
> we poll the api public timeline approx 3000 times/hour for client
> specific mining.

Have you actually confirmed that you get more than 1200 tweets an hour
by doing that? Last month Alex confirmed[1] that they cache the public
timeline API response every 60 seconds so it shouldn't be possible to
get more than 1200 an hour and hitting it 3000 times is a massive
waste of your users resources as well as Twitter's.

Alex/Evan: Any word on when the Jabber feed will be open to all?

-Stut

--
http://stut.net/

[1] http://groups.google.com/group/twitter-development-talk/browse_thread/thread/f881564598a947a7/c5ee88b03b8d7faf?lnk=gst&q=waste+of+resources#c5ee88b03b8d7faf

Evan Weaver

unread,
Jul 11, 2008, 4:38:33 PM7/11/08
to twitter-deve...@googlegroups.com
There are many ways to bypass the cache right now. Most of them are
going away. After that we might speed up the timeout on the public
timeline cache to 15 or 30 seconds.

However, please note that the public timeline does not have all tweets
even when uncached.

Sorry for the disconnect. I definitely understand your frustration.

Evan

--
Evan Weaver

tweetip

unread,
Jul 11, 2008, 5:11:29 PM7/11/08
to Twitter Development Talk
Stut,

We're whitelisted - perhaps helping us bypass the cache? Today, at the
current rate, we'll filter & save 600,000+ tweets in realtime. Both
twitter & their vc's are aware of what we're doing. Here's a chart
from a few days ago:

http://tweetip.us/lkxh3

fyi, summize also accesses the api pub timeline when jabber is down.

Michael


ps: Evan-ok-thanks-we'll quietly wait for things to shut off :) and
for now shelving eight months of work :(

Evan Weaver

unread,
Jul 11, 2008, 5:15:53 PM7/11/08
to twitter-deve...@googlegroups.com
On Fri, Jul 11, 2008 at 5:11 PM, tweetip <twe...@mac.com> wrote:
> ps: Evan-ok-thanks-we'll quietly wait for things to shut off :) and
> for now shelving eight months of work :(

No, don't get discouraged. That's why we're having this conversation
before and not after.

However, I'm confused as to whether you guys think that you're getting
a full tweet stream or not.

Evan

--
Evan Weaver

tweetip

unread,
Jul 11, 2008, 5:35:08 PM7/11/08
to Twitter Development Talk
> However, I'm confused as to whether you guys think that you're getting
> a full tweet stream or not.

Evan,

We're not discouraged, just sad. The technology we've developed is the
value and can point at whatever data river.

We know we don't receive the full stream. In our discussions with
Summize, we've compared tweet count and they've said we're within a
few percent (not counting Asia). We've documented when we miss mission
critical tweets. To resolve this, we've lobbied for the api (not
jabber) to return as much as possible via since_id. For us, Jabber is
un-needed overkill.

Our understanding of "whitelisting" until recently: twitter would
allow Source "tweetip" to poll the public timeline as needed. Iow, not
based on IP or screenname, but on Source.

Michael

Evan Weaver

unread,
Jul 11, 2008, 5:50:17 PM7/11/08
to twitter-deve...@googlegroups.com
It looks to me like the Summize counts you got are wrong.

I am going to talk to Evan Williams about this and get back to you.

Evan

--
Evan Weaver

tweetip

unread,
Jul 11, 2008, 6:05:59 PM7/11/08
to Twitter Development Talk
> It looks to me like the Summize counts you got are wrong.

Evan,

Perhaps Summize was comparing their count from when they switch to the
api when jabber is offline. We've felt we're close to Summize because
of audits we do in major events, like the China quake. We recorded
tweets that Summize missed or didn't index. Also - we're counting
English tweets - not Asia nor Spanish nor ?

hope this convo helps.

Michael :)

Jode...@gmail.com

unread,
Jul 12, 2008, 7:16:57 AM7/12/08
to Twitter Development Talk, Evan Weaver, Andrew Maizels
Alex
As we do all our Twitter client requests server side this will blow us
out of the water.
We would have to rewrite our code so that all our client side code
makes direct calls on Twitter. This would exponentially increase API
call ie no caching on our side.
Regards

Andrew Maizels

unread,
Jul 12, 2008, 11:44:51 AM7/12/08
to Jode...@gmail.com, Twitter Development Talk, Evan Weaver
Here's what Evan said in that thread:


We will allow some exceptions, but this way at least we will know
about them, instead of getting hammered by anyone who wants to do 7000
requests an hour without warning.

Sounds reasonable.

Andrew M.

Chris Meller

unread,
Jul 12, 2008, 12:46:51 PM7/12/08
to twitter-deve...@googlegroups.com
Just out of curiosity, has anyone actually checked to see if a problem exists with people hammering the API in such a manner? All the comments have seemed along the lines of "at least we'll know if someone's hammering us", making it sound as if there's no clear idea of whether or not the problem exists yet.

I have no stake in the matter either way, as I just started lurking here to keep up with the API limit changes, I'm just curious.

Oh yeah, and supporting some form of authentication that doesn't require a user to hand their password over to a 3rd party service would be kick ass - for both users and developers. I don't mind giving my local copy of twhirl my password, but giving a 3rd party web service that has to store that password somewhere (hopefully securely) is different...

Alex Payne

unread,
Jul 12, 2008, 9:22:47 PM7/12/08
to twitter-deve...@googlegroups.com
Yes, we know that people hammering on our API - where it defeats a
reasonable caching strategy - is tough on our service.

There's been a lot of discussion around providing an alternative to
password-based authentication. Please search the group of "OAuth".
It's coming later this year.

--
Alex Payne
http://twitter.com/al3x

tweetip

unread,
Jul 13, 2008, 1:27:54 AM7/13/08
to Twitter Development Talk
I was asked to clarify my comments on another blog, and since no one
may ever see them I'll repost here. If twitter changes their minds and
has no desire to become a highly reliable worldwide comm utility, then
my comments are 100% invalid.

---
from a mission critical developer/end user point of view, and twitter
being a ---> worldwide comm utility <---

by not drafting an advanced specification, twitter failed to insure
the integrity of the data. Mobile app developers just started pumping
in whatever format/values they desired. This data "breaks" our app, if
we are filtering values based on historical analysis as is the case of
First Responder / Earthquake / Tornado / Emergency Services or for
that matter Market Movement / Breaking News / any realtime anything
where we feel Location is integral to our analysis.
---

Michael

tweetip

unread,
Jul 13, 2008, 1:44:20 AM7/13/08