A few of us have been discussing off list a funky behavior we have
been noticing and now users are starting to notice.
There is a problem for sites/apps like TweetGrid and TweetChat which
auto-refresh tweets based on the Search API using the since_id. People
are noticing that these sites are "missing tweets" when compared to
the search.twitter.com results page for the same query.
We believe what is happening is that the search servers are not
indexing tweets in a serial manner, and so a tweet with a higher id
may sneak into a search server and be indexed first before a tweet
with a lower id. This means that when the since_id is sent back from
the query (or derived from the first result in the results array),
using that since_id to refresh the query will miss lower id tweets
when they finally do get indexed. So the illusion of "missing tweets"
is created. You can run TweetGrid and TweetChat in separate tabs using
the same query and see that sometimes the results don't match up
because of this.
I'll try to give an example to be clear.
Let's say for the sake of simplicity that I'm searching for "twitter"
and that every 10th tweet in the public timeline matches. So, all
tweets ending in 0 match my query.
Search server 1 may index:
(notice missing 50)
At the same time, Search server 2 may index:
(notice hasn't indexed 60 or 70 yet)
I send a query and get a response from Server 1 and get a since_id of
70. On my next request I use that since_id=70 and I'll never see
tweet 50. Thus the "missing tweets".
This is quite annoying, especially now that users are noticing and
complaining to us (the app devs) that are apps are broken.
I cannot think of a good work around for this that would be simple
enough to implement and be worth the effort.
Is this behavior something anyone else can confirm? Are tweets
supposed to be indexed/replicated serially by the search servers?