search api: best practice to capture all tweets.

3 views
Skip to first unread message

maestrojed

unread,
Jan 16, 2010, 6:40:36 PM1/16/10
to Twitter Development Talk
I would like to capture and store all tweets that match a search query
and do so from this time forward. My 1st attempt to do this was to
query and store the matching results (tweets); additional queries
include the parameter since_id="The max id value already stored".
However the search api does not seem reliable to code this way. I am
missing tweets because apparently the api does not always return all
matches every query. By coding this way if a tweet is missed but the
next one is captured, because the next one has a higher id the missing
tweet will never be recovered.

This is discussed here:
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/b7b6859620327bad/a31a88f8125c1c4e?lnk=gst&q=search+api+store+#a31a88f8125c1c4e

This is my code, I then just run it as a cron once a min.
http://pastebin.com/f6207f43

So if this is not a reliable method, what is?

I was thinking I could just remove the since_id parameter which would
return the 100 most recent results. Then, in my code, I could see if
the tweet was already stored or not and update/insert accordingly. If
a tweet is missing from a query maybe it will be there next time and
will be added. However this approach would fail if there were more
then a 100 results a minute. This script would not keep up.

I really appreciate any advice.


John Kalucki

unread,
Jan 16, 2010, 10:23:27 PM1/16/10
to twitter-deve...@googlegroups.com
The Search API does not return all tweets that match a query. See: http://groups.google.com/group/twitter-api-announce/browse_thread/thread/c8c713bb63fac24c

-John Kalucki
Services, Twitter Inc.
Reply all
Reply to author
Forward
0 new messages