Twitter Search API - Questions Regarding Scaling Out

29 views
Skip to first unread message

Corey Ballou

unread,
Apr 11, 2011, 11:14:38 AM4/11/11
to Twitter Development Talk
I tried speaking with Ryan Sarver directly, but he's forwarding me
here to the community advocates to answer. I believe this answer will
need to come top down from Twitter, as it's your rate limiting that
I'm most worried about.

I have a technical question for all of you in regards to the Search
API as I want to maintain full compliancy. Currently, the old Search
API implementation (albeit slower) provides a fuller result set and
allows for more flexibility in the types and combinations of searches
allowed. The manner I have developed my application would allow for a
number of daemonized worker instances running on different IP
addresses to make calls to the search API on behalf of the stored
OAuth credentials to avoid rate limiting issues.

I had a conversation with the Pluggio developer in which he stated
Twitter had threatened to shutdown his application if he didn't switch
to a different implementation of the Search API. The problem indicated
was that he was performing searches for multiple Twitter accounts,
which is exactly my use case. Site streams does not make as much sense
for my application given the search queries I wish to perform and the
necessity for logical AND operations on geo-location.

Do you foresee any problems with my current method of using different
IP addresses to stay under the rate limit? I'm trying to stay in full
compliance with Twitter's TOS and would love to find the most
applicable and API friendly solution. I know headway is being made
with Twitter's new search implementation so I would like to stay ahead
of the curve and not get myself stuck in a box.

I still need a method for polling for new search results (say, every
30 minutes, dependent upon the pricing plan) for non-logged in users.

Below is a scaled down representation of how I'm currently handling
searches to help you decide the best plan of action:

1) Searches are performed on a rolling queue basis, say one search
every thirty minutes. There can be a finite number of searches per
Twitter user (say 5 searches per Twitter account). There can be any
number of Twitter accounts.
2) Search results are stored locally for retrieval by a javascript
AJAX long-poller every minute to check for frequent changes.
3) When a user visits the search results page and filters results, no
API calls to Twitter are made, only a local query is required

Due to this process, the queue is constantly searching for the next
searches and mentions to perform. I foresee rate limiting concerns
cropping up with searches being performed for any number of users.

Can you steer me in the right direction to avoid shutdown notices or
access revocation?

Regards,

Corey
@cballou

M. Edward (Ed) Borasky

unread,
Apr 11, 2011, 1:05:22 PM4/11/11
to twitter-deve...@googlegroups.com
I don't see an answer here, but I'll tell you how *I* would go about implementing this:

1. Switch to the Streaming API. Using Search in an application puts a strain on Twitter's servers and makes it difficult to Twitter to manage capacity. That's why it's rate-limited and why the rate limits aren't publicly disclosed.

2. If your application is a desktop application, use User Streams. If it is a server, use User Streams on a desktop or the low-frequency free access to Streaming on a server to prototype and develop. Your target for a server will be Site Streams, but that's in closed beta at the moment IIRC.

3. *Concurrently with development*, your business development / sales / marketing / planning people, or yourself, if it's a one-person shop, should be negotiating with Twitter for access to Site Streams, I'm assuming an "agile" development methodology - customer-in-the-loop - and one of the parties that needs to be in the loop is Twitter for Site Streams. You simply *can't* build an at-scale Twitter application without direct business discussions with Twitter!


--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: http://groups.google.com/group/twitter-development-talk



--
http://twitter.com/znmeb http://borasky-research.net

"A mathematician is a device for turning coffee into theorems." -- Paul Erdős

Corey Ballou

unread,
Apr 11, 2011, 5:50:21 PM4/11/11
to Twitter Development Talk
Thanks for the reply, I appreciate it.

I have concerns regarding the streaming APIs, which mainly concern the
following:

* usage of logical OR when using locations
* firehose limitations
* the user’s location field is not used to filter tweets
* increased application complexity for parsing the resulting stream of
data back out into individual searches

I know that the Search API is not Twitter's preferred choice, but it's
currently returning the best applicable results for my application.
It's also worth noting that the API recently received a drastic
improvement to speed which should theoretically relax the strain on
the API:

http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faster_1656.html

I guess I'm mainly interested in knowing whether @twitterapi will
allow me to use the Search API in the manner I indicated above?
Essentially I would be willing to guarantee the application worker
nodes handles 420 rate limiting errors accordingly while still
supporting multiple twitter accounts and searches.

On Apr 11, 1:05 pm, "M. Edward (Ed) Borasky" <zn...@borasky-
> --http://twitter.com/znmebhttp://borasky-research.net

Corey Ballou

unread,
Apr 13, 2011, 1:28:57 PM4/13/11
to Twitter Development Talk
I'm still looking for a community leader answer on this one.

On Apr 11, 5:50 pm, Corey Ballou <ball...@gmail.com> wrote:
> Thanks for the reply, I appreciate it.
>
> I have concerns regarding the streaming APIs, which mainly concern the
> following:
>
> * usage of logical OR when using locations
> * firehose limitations
> * the user’s location field is not used to filter tweets
> * increased application complexity for parsing the resulting stream of
> data back out into individual searches
>
> I know that the Search API is not Twitter's preferred choice, but it's
> currently returning the best applicable results for my application.
> It's also worth noting that the API recently received a drastic
> improvement to speed which should theoretically relax the strain on
> the API:
>
> http://engineering.twitter.com/2011/04/twitter-search-is-now-3x-faste...

Stuart Dallas

unread,
Apr 13, 2011, 1:40:35 PM4/13/11
to twitter-deve...@googlegroups.com
You may want to take a look at http://datasift.net/

-Stuart

--
Stuart Dallas
3ft9 Ltd
http://3ft9.com/

Brian Sutorius

unread,
Apr 14, 2011, 1:09:29 PM4/14/11
to Twitter Development Talk

Brian Sutorius

unread,
Apr 14, 2011, 1:12:36 PM4/14/11
to Twitter Development Talk
While the Streaming API may not provide processed results to you in
the way that search queries can (logical ORs, etc.), it's a more
scalable solution for returning a lot of Tweets. Our search system can
rate limit queries if they become too computationally expensive (in
addition to the normal query limit), so continuing to add parameters
to the query up front rather than doing this processing yourself may
cause you to keep running into limits. Ultimately, circumventing the
limits put in place by our APIs is not allowed by our API ToS, and
building your architecture this way just to get around the defaults is
something we strongly discourage. If you keep being rate limited, you
should think about re-factoring your prioritization strategy.

Can you go into a little more detail about what your application does?
We might be able to guide you towards a mix of Streaming API and
search queries that gets you what you need but stays within the rate
limits.

Brian Sutorius
Twitter API Policy

Corey Ballou

unread,
May 16, 2011, 10:39:16 AM5/16/11
to Twitter Development Talk
Thanks for the feedback Brian. Late response here, but I'd be more
than willing to provide you with more details regarding our
application in a private email. You should be receiving said email
shortly.

Regards,

Corey
Reply all
Reply to author
Forward
0 new messages