Search API: since_id is now unreliable

2 views
Skip to first unread message

Chad Etzel

unread,
Jul 21, 2009, 2:57:58 PM7/21/09
to twitter-deve...@googlegroups.com
Hi API Team,

A few of us have been discussing off list a funky behavior we have
been noticing and now users are starting to notice.

There is a problem for sites/apps like TweetGrid and TweetChat which
auto-refresh tweets based on the Search API using the since_id. People
are noticing that these sites are "missing tweets" when compared to
the search.twitter.com results page for the same query.

We believe what is happening is that the search servers are not
indexing tweets in a serial manner, and so a tweet with a higher id
may sneak into a search server and be indexed first before a tweet
with a lower id. This means that when the since_id is sent back from
the query (or derived from the first result in the results array),
using that since_id to refresh the query will miss lower id tweets
when they finally do get indexed. So the illusion of "missing tweets"
is created. You can run TweetGrid and TweetChat in separate tabs using
the same query and see that sometimes the results don't match up
because of this.

I'll try to give an example to be clear.

Let's say for the sake of simplicity that I'm searching for "twitter"
and that every 10th tweet in the public timeline matches. So, all
tweets ending in 0 match my query.

Search server 1 may index:

20
30
40
60
70

(notice missing 50)

At the same time, Search server 2 may index:

20
30
40
50

(notice hasn't indexed 60 or 70 yet)

I send a query and get a response from Server 1 and get a since_id of
70. On my next request I use that since_id=70 and I'll never see
tweet 50. Thus the "missing tweets".

This is quite annoying, especially now that users are noticing and
complaining to us (the app devs) that are apps are broken.

I cannot think of a good work around for this that would be simple
enough to implement and be worth the effort.

Is this behavior something anyone else can confirm? Are tweets
supposed to be indexed/replicated serially by the search servers?

-Chad

Doug Williams

unread,
Jul 21, 2009, 4:45:36 PM7/21/09
to twitter-deve...@googlegroups.com
Chad,
Your assessment is spot on. 

At the heart of search there are a number of data stores that accept queries (reads) while at the same time perform writes from an indexer. Heavy load -- large numbers of queries, large number of writes or both, or both -- can cause the write replication between the indexer and various data stores to grow inconsistent when a particular data store is blocked on a read.

Unfortunately there is no easy fix for this problem at the moment. The search team has grown considerably in the last couple of weeks so as they get up to speed, the feature set and stability of search should continue to improve.

Thanks,
Doug

Brooks Bennett

unread,
Jul 21, 2009, 11:03:34 PM7/21/09
to Twitter Development Talk
Thanks for posting this Chad!

Doug, please keep us updated on how things progress with this issue so
we can pass along guidance to our user-base. Hopefully the
improvements will come in the near-term.

Thanks for all that you guys do!

Brooks

On Jul 21, 3:45 pm, Doug Williams <d...@twitter.com> wrote:
> Chad,Your assessment is spot on.

Brooks Bennett

unread,
Jul 30, 2009, 10:12:24 AM7/30/09
to Twitter Development Talk
Doug,

Is there any status update on this issue? Users are really starting to
get frustrated with results and wondering what the status is on things
getting back to being consistent...

Thanks!

Brooks




On Jul 21, 3:45 pm, Doug Williams <d...@twitter.com> wrote:
> Chad,Your assessment is spot on.

Doug Williams

unread,
Jul 30, 2009, 1:50:14 PM7/30/09
to twitter-deve...@googlegroups.com
Brooks,
As I stated previously, it is a large problem (much deeper than the API) that will take some time to fix. The search team is growing aggressively, and the new engineers are quickly getting up to speed. One of their tasks is to make progress on the bottleneck causing this problems. That said, there is no ETA on a fix.

Thanks,
Doug
Reply all
Reply to author
Forward
0 new messages