Search API questions

9 views
Skip to first unread message

enygmatic

unread,
Nov 26, 2009, 12:26:58 PM11/26/09
to Twitter Development Talk
I have been using the Twitter Search API to query the public line for
Twitter status updates originating out of a particular location.
Currently, I run one search every 15 minutes using an automated
script. However I have found that the search results returned contain
a number of old search results . An average of 30 new tweets come up
for my location every 5 minutes or so. Therefore this shouldn't be the
case. Also Results for the same search criteria using
search.twitter.com show different results, with no repeats of old
search results. Any idea why this is so ?

A second question is regarding published date. Is the published date
returned by the search API in GMT ? If so, is there any way to have
the search API return the published date as per local time ?

Raffi Krikorian

unread,
Nov 26, 2009, 8:49:31 PM11/26/09
to twitter-deve...@googlegroups.com
I have been using the Twitter Search API to query the public line for
Twitter status updates originating out of a particular location.
Currently, I run one search every 15 minutes using an automated
script. However I have found that the search results returned contain
a number of old search results . An average of 30 new tweets come up
for my location every 5 minutes or so. Therefore this shouldn't be the
case. Also Results for the same search criteria using
search.twitter.com show different results, with no repeats of old
search results. Any idea why this is so ?

i don't have a direct answer for this, however, if you are polling search every 15 minutes -- then this seems like a clear reason for you to switch over to the streaming API instead.  


A second question is regarding published date. Is the published date
returned by the search API in GMT ? If so, is there any way to have
the search API return the published date as per local time ?

the created_at strings in the search API look like

"created_at":"Fri, 27 Nov 2009 00:06:44 +0000"

the +0000 is the timezone.  no, there is no way to ask search to return those values in local time -- just do the conversion yourself when you receive the status objects.

--
Raffi Krikorian
Twitter Platform Team




enygmatic

unread,
Nov 26, 2009, 11:55:31 PM11/26/09
to Twitter Development Talk
@Raffi,
Thanks for the info.
Just a couple of queries: I'm using the Atom format for search results
(As mentioned on http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search).
I get the published date in the atom feed. So I am not sure what you
mean by "created_at":"Fri, 27 Nov 2009 00:06:44 +0000". The format
available in the atom feed is like this "2009-11-27T04:45:03Z". Do you
mean the JSON format or are you referring to the search results
returned by the streaming API ?

Oddly though if I viewed the same feed in my browser, I could see the
correct local times reported. Maybe a browser thing I guess...Anyway,
converting the time reported to my timezone, shouldn't be that much of
a problem I guess.

The streaming API seems like a good idea. Probably will consider
shifting to it. In the meantime, does anyone have any ideas about my
first problem? Any idea as to why I get some "stale" results (some
times a couple of hours old) when I query with the API and the latest
results when I query using Twitter advanced search? Or will switching
to the feed generated for the advanced search results, instead of
using the API solve my problem ?

Regards,
Elroy

Raffi Krikorian

unread,
Nov 27, 2009, 9:44:45 AM11/27/09
to twitter-deve...@googlegroups.com
> Just a couple of queries: I'm using the Atom format for search results
> (As mentioned on http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search)
> .
> I get the published date in the atom feed. So I am not sure what you
> mean by "created_at":"Fri, 27 Nov 2009 00:06:44 +0000". The format
> available in the atom feed is like this "2009-11-27T04:45:03Z". Do you
> mean the JSON format or are you referring to the search results
> returned by the streaming API ?
>
> Oddly though if I viewed the same feed in my browser, I could see the
> correct local times reported. Maybe a browser thing I guess...Anyway,
> converting the time reported to my timezone, shouldn't be that much of
> a problem I guess.

time reported as 2009-11-27T04:45:03Z is in ISO8601 (http://en.wikipedia.org/wiki/ISO_8601
), and the Z at the end means "Zulu" time (otherwise known as UTC). i
wouldn't be all that surprised that if a browser, when encountering an
atom feed, converts the time into local time.

> The streaming API seems like a good idea. Probably will consider
> shifting to it. In the meantime, does anyone have any ideas about my
> first problem? Any idea as to why I get some "stale" results (some
> times a couple of hours old) when I query with the API and the latest
> results when I query using Twitter advanced search? Or will switching
> to the feed generated for the advanced search results, instead of
> using the API solve my problem ?


the search API does have a cache on it, specifically because there are
a lot of applications which instead of using the streaming API are
hammering the search API instead. you are probably seeing a cache hit
as the search result.

enygmatic

unread,
Nov 27, 2009, 1:38:58 PM11/27/09
to Twitter Development Talk
@Raffi, thanks for the reply. I now convert the time from UTC to my
local time zone, so my time zone problem is sorted out. On the issue
of search, been going through the streaming api docs. From what I have
gone through so far, there doesn't seem to be a way to query for
status updates from a certain geographical location, say limited to a
city. I may be mistaken here, so do correct me if I am wrong.

Anyway, I guess I will have to live with the "stale" results from
cache for now.
Thanks for the help.


On Nov 27, 7:44 pm, Raffi Krikorian <ra...@twitter.com> wrote:
> > Just a couple of queries: I'm using the Atom format for search results
> > (As mentioned onhttp://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search)

Abraham Williams

unread,
Nov 27, 2009, 2:40:32 PM11/27/09
to twitter-deve...@googlegroups.com


On Fri, Nov 27, 2009 at 12:38, enygmatic <enyg...@gmail.com> wrote:
From what I have
gone through so far, there doesn't seem to be a way to query for
status updates from a certain geographical location, say limited to a
city. I may be mistaken here, so do correct me if I am wrong.

Check out the search operators: http://search.twitter.com/operators


Abraham
--
Abraham Williams | Community Evangelist | http://web608.org
Hacker | http://abrah.am | http://twitter.com/abraham
Project | Awesome Lists | http://twitterli.st
This email is: [ ] blogable [x] ask first [ ] private.
Sent from Madison, WI, United States

enygmatic

unread,
Nov 27, 2009, 11:28:41 PM11/27/09
to Twitter Development Talk
@Abraham
I actually use the geocode with the search api for my script, so using
the search api isn't my problem. My problem is that I get "stale"
results from the search cache, even when querying after a sufficient
interval. Also the "stale" results seem hours old (at times, in fact
yesterday at 23:00 hours I got a few results that were from
22:00-22:30 hours. Didn't have the problem when using twitter search
from the browser). To overcome this Raffi Krikorian suggested using
the streaming api instead of the search api. My question was - how do
i get a location specific stream using the streaming api. From the
streaming api docs, there doesn't seem a way to do this at the moment,
which kind of defeats my purpose as I need to the deploy the script in
the next one week or so. Guess I'll have to live with the stale
results...

Anyway thanks for the help.

On Nov 28, 12:40 am, Abraham Williams <4bra...@gmail.com> wrote:

Raffi Krikorian

unread,
Nov 28, 2009, 9:45:53 AM11/28/09
to twitter-deve...@googlegroups.com
unfortunately, there is no (current) way to subscribe to the streaming
API for a particular location. as for the caching issue on the
search, that's unfortunate, and i'll try to raise the issue with the
search team next week.

enygmatic

unread,
Nov 28, 2009, 10:46:50 AM11/28/09
to Twitter Development Talk

the streaming API would be ideal for my purposes, so will eagerly wait
and see what new features the twitter api dev team adds before the
final release. Till then, search api is what I will use. Thanks a lot
Raffi, for trying to raise the issue with the search team.

Regards,
Elroy

enygmatic

unread,
Nov 28, 2009, 11:18:19 AM11/28/09
to Twitter Development Talk
I got some requests to post the query that I am using:
here is the query :
http://search.twitter.com/search.atom?geocode=19.017656%2C72.856178%2C15.0mi&rpp=25
Do correct me if I am not querying or using the API correctly. (Should
have been my first question actually :) )

Also here is a sample of the output from my ruby script. It will give
you an idea of the "stale" results that I am getting. The script was
run at approximately 21:37 IST. As you can see, I'm getting tweets
all the way back to 14:00 hours in the afternoon. I'm pretty sure
there are more tweets for my location. I'm querying for tweets
originating out of Mumbai, and by querying through twitter search I
have noticed that there are at least 40-50 tweets posted every 2
minutes or so.
Output follows: Date-Day-Hour-Minute-Tweet-User-Hashtags(csv, if any)-
source of tweet (All date/time info below is in IST)
2009-11-28 Saturday 21 27 @Abhishek_Rai I too am huge fan of
quizzing.. do let me kno if u find anythin interesting. ty
Shakti_Shetty (Shakti Shetty) web
2009-11-28 Saturday 21 21 @surubhi hallow darlin, 'm fine doin
great...how about u? dacku87 (darshan thacker) mobile web
2009-11-28 Saturday 20 40 powai mocha so full of people, smaloe
conversations and music.. sumagambs (Sumit Singh Gambhir) web
2009-11-28 Saturday 20 25 @thetruboy idk we'll see. Ari should be home
by then ronniebaby010 (Princess) UberTwitter
2009-11-28 Saturday 19 54 friends do look up www.clickthehorror.com -
the website for my new film distirbuted by PNC has been launched -
look 4ward to feedbacks .... sangeethsivan (sangeeth sivan) web
2009-11-28 Saturday 19 54 I'm guessing @Netra and @prolificd are the
two few Twitterers who've had multi-city tweetups. How cool is that.
National figures! b50 (Bombay Addict) Tweetie
2009-11-28 Saturday 19 36 RT: Trupti's Blog: What Commercial Floor
Mats Offer: One of the best ways to keep any p.. http://bit.ly/6sZWJg
#blog MishraNatty (Natasha Mishra) blog twitterfeed
2009-11-28 Saturday 19 09 @mattyza when launched back in 2005, the
Xbox 360 was available in Core and Pro. Now it's Arcade and Elite.
Same difference! aalaap (Aalaap Ghag) Tweetie
2009-11-28 Saturday 19 05 Profit with Google, Twitter &amp; affiliate
marketing http://snipurl.com/tet1r Tiifani_Lurid (Tiifani Lurid)
API
2009-11-28 Saturday 18 35 Just voted OOiZiT.com for Best Online Music
Label http://mashable.com/owa #openwebawards ankit_9oct (Ankit
Khandelwal) openwebawards Mashable Connect
2009-11-28 Saturday 18 35 @reginafetalvero HAHA. YUHH. Gift ko
ah? :&quot;&gt; Jhoriiliee (Jorylie Cando) web
2009-11-28 Saturday 18 24 @Tweet_Words JAGGERY PALM gannirules
(gaanish) Snaptu
2009-11-28 Saturday 17 34 @Karan_Talwar pls post that if you get an
answer. champbox (champbox) Tweets60
2009-11-28 Saturday 17 34 Just Got Home! :) Wee. Had FUN tonight! :)
HBD kathy! Sayang wala si Beb, complete na sana. Jhoriiliee (Jorylie
Cando) web
2009-11-28 Saturday 17 34 I'm listening to Kurbaan: Kurbaan Hua
(Soundtrack) - @Spinlet kmadvani (Kunal M Advani) API
2009-11-28 Saturday 17 03 Eastern Province Under-19s 322/7 &amp; 185/5
v South Western Districts Under-19s 92/10 &amp; 152/10 *: Eastern
Province.. http://bit.ly/4rS1iA venky888 (venkatesh iyer)
twitterfeed
2009-11-28 Saturday 16 52 Hey tweeps......Rocket Singh pics
http://www.yashrajfilms.com/microsites/rocketsingh/fullpage.html
check them out! ShazahnPadamsee (Shazahn Padamsee) web
2009-11-28 Saturday 16 08 Started IE assignment jyotiswaroopr (Jyoti
Swaroop Repaka) Digsby
2009-11-28 Saturday 15 24 @PaulaAbdul Love you more than anything in
this world. Thanks for being a huge part of my life. &lt;3 LuvPaula
(Anahita Abdul Cowell) web
2009-11-28 Saturday 15 18 @richa_august84 fan of purane hindi gaane,
hmm? me too!! sonali_k (sonali_k) web
2009-11-28 Saturday 14 54 Fruits and Vegetables for energyzing the
Solar Plexus Chakra: http://bit.ly/4NQV9M AnamikaS (Anamika S) web
2009-11-28 Saturday 14 52 I'm off to read and then sleep. Don't dare
disturb my slumber. eyemanut87 (Moo) Snaptu
2009-11-28 Saturday 14 52 White House gate-crashers met Obama, PM:
American couple Michaele and Tareq Salahi, who gate-crashed into a
State D... http://bit.ly/5TH6k5 RediffNews (Rediff News)
twitterfeed
2009-11-28 Saturday 14 45 @SuButcher Good Morning. Have wonderful
ahead :) nikhilwad1 (Nikhil Wad) UberTwitter
2009-11-28 Saturday 14 41 @venkatananth hey! 1- was standing at the
bar. 2-liked the music. Very interesting. But had to leave cos friend
had an early mrng flight :( wanderblah (wanderblah) dabr

dbasch

unread,
Nov 28, 2009, 12:16:41 PM11/28/09
to Twitter Development Talk
Hi Elroy,

I tried your query from python several times within the same minute.
After running the query several times in a row I start getting fresh
results and they remain fresh for a while. I tried changing the least
significant decimal to make it a different query and I get stale
results immediately. Switching back yields fresh results.

This to me suggests that there may be two search tiers: one for low-
frequency queries that probably searches a subset of tweets, and
another one for frequent ones that searches everything and has an LRU
cache of important queries. It seems that we can force queries into
the LRU cache of the "good" tier by querying frequently enough. When I
stop querying for three minutes or so I see the old results again. The
question for the search team is how to have your query treated as an
"important" one without abusing the API.

Diego



Diego


On Nov 28, 1:18 pm, enygmatic <enygma...@gmail.com> wrote:
> I got some requests to post the query that I am using:
> here is the query :http://search.twitter.com/search.atom?geocode=19.017656%2C72.856178%2...
> Mats Offer: One of the best ways to keep any p..http://bit.ly/6sZWJg
> #blog   MishraNatty (Natasha Mishra)    blog    twitterfeed
> 2009-11-28      Saturday        19      09      @mattyza when launched back in 2005, the
> Xbox 360 was available in Core and Pro. Now it's Arcade and Elite.
> Same difference!        aalaap (Aalaap Ghag)            Tweetie
> 2009-11-28      Saturday        19      05      Profit with Google, Twitter &amp; affiliate
> marketinghttp://snipurl.com/tet1r Tiifani_Lurid (Tiifani Lurid)
> API
> 2009-11-28      Saturday        18      35      Just voted OOiZiT.com  for Best Online Music
> Labelhttp://mashable.com/owa#openwebawards ankit_9oct (Ankit
> Khandelwal)     openwebawards   Mashable Connect
> 2009-11-28      Saturday        18      35      @reginafetalvero HAHA. YUHH. Gift ko
> ah? :&quot;&gt; Jhoriiliee (Jorylie Cando)              web
> 2009-11-28      Saturday        18      24      @Tweet_Words JAGGERY PALM       gannirules
> (gaanish)               Snaptu
> 2009-11-28      Saturday        17      34      @Karan_Talwar pls post that if you get an
> answer. champbox (champbox)             Tweets60
> 2009-11-28      Saturday        17      34      Just Got Home! :) Wee. Had FUN tonight! :)
> HBD kathy! Sayang wala si Beb, complete na sana.        Jhoriiliee (Jorylie
> Cando)          web
> 2009-11-28      Saturday        17      34      I'm listening to Kurbaan: Kurbaan Hua
> (Soundtrack) - @Spinlet kmadvani (Kunal M Advani)               API
> 2009-11-28      Saturday        17      03      Eastern Province Under-19s 322/7 &amp; 185/5
> v South Western Districts Under-19s 92/10 &amp; 152/10 *: Eastern
> Province..http://bit.ly/4rS1iAvenky888 (venkatesh iyer)
> twitterfeed
> 2009-11-28      Saturday        16      52      Hey tweeps......Rocket Singh picshttp://www.yashrajfilms.com/microsites/rocketsingh/fullpage.html
> check them out! ShazahnPadamsee (Shazahn Padamsee)              web
> 2009-11-28      Saturday        16      08      Started IE assignment   jyotiswaroopr (Jyoti
> Swaroop Repaka)         Digsby
> 2009-11-28      Saturday        15      24      @PaulaAbdul Love you more than anything in
> this world. Thanks for being a huge part of my life. &lt;3  LuvPaula
> (Anahita Abdul Cowell)          web
> 2009-11-28      Saturday        15      18      @richa_august84 fan of purane hindi gaane,
> hmm? me too!!   sonali_k (sonali_k)             web
> 2009-11-28      Saturday        14      54      Fruits and Vegetables for energyzing the
> Solar Plexus Chakra:http://bit.ly/4NQV9M      AnamikaS (Anamika S)            web
> 2009-11-28      Saturday        14      52      I'm off to read and then sleep. Don't dare
> disturb my slumber.     eyemanut87 (Moo)                Snaptu
> 2009-11-28      Saturday        14      52      White House gate-crashers met Obama, PM:
> American couple Michaele and Tareq Salahi, who gate-crashed into a
> State D...http://bit.ly/5TH6k5RediffNews (Rediff News)

enygmatic

unread,
Nov 29, 2009, 2:20:42 AM11/29/09
to Twitter Development Talk
Hi Everyone,
I've been running my script as a cron task (every 15 minutes) since
last evening. So far I've got about 1375 results logged, out of which
973 are duplicates (meaning "stale" entries)...a staggering 70.7076%
or approximately 71%. This is way more than expected..so a shout out
to the development team - Is there anyway to solve this problem, get
around it ?
@Diego, thanks a lot for confirming what I found. Also I tried
querying frequently like you suggested, and yes I do hit "good"
results more frequently. I didn't get the idea of the least
significant decimal - are u referring to the geocode?

@twitter dev team
I do agree with Diego, there is got to be a way of getting "good"
search results without finding ways to "trick" the API. Even with a
cache, I see no reason why I should be getting results from over 6
hours ago for my search query.

Regards,
Elroy
> > Labelhttp://mashable.com/owa#openwebawardsankit_9oct (Ankit
> > Khandelwal)     openwebawards   Mashable Connect
> > 2009-11-28      Saturday        18      35      @reginafetalvero HAHA. YUHH. Gift ko
> > ah? :&quot;&gt; Jhoriiliee (Jorylie Cando)              web
> > 2009-11-28      Saturday        18      24      @Tweet_Words JAGGERY PALM       gannirules
> > (gaanish)               Snaptu
> > 2009-11-28      Saturday        17      34      @Karan_Talwar pls post that if you get an
> > answer. champbox (champbox)             Tweets60
> > 2009-11-28      Saturday        17      34      Just Got Home! :) Wee. Had FUN tonight! :)
> > HBD kathy! Sayang wala si Beb, complete na sana.        Jhoriiliee (Jorylie
> > Cando)          web
> > 2009-11-28      Saturday        17      34      I'm listening to Kurbaan: Kurbaan Hua
> > (Soundtrack) - @Spinlet kmadvani (Kunal M Advani)               API
> > 2009-11-28      Saturday        17      03      Eastern Province Under-19s 322/7 &amp; 185/5
> > v South Western Districts Under-19s 92/10 &amp; 152/10 *: Eastern
> > Province..http://bit.ly/4rS1iAvenky888(venkatesh iyer)
> > twitterfeed
> > 2009-11-28      Saturday        16      52      Hey tweeps......Rocket Singh picshttp://www.yashrajfilms.com/microsites/rocketsingh/fullpage.html
> > check them out! ShazahnPadamsee (Shazahn Padamsee)              web
> > 2009-11-28      Saturday        16      08      Started IE assignment   jyotiswaroopr (Jyoti
> > Swaroop Repaka)         Digsby
> > 2009-11-28      Saturday        15      24      @PaulaAbdul Love you more than anything in
> > this world. Thanks for being a huge part of my life. &lt;3  LuvPaula
> > (Anahita Abdul Cowell)          web
> > 2009-11-28      Saturday        15      18      @richa_august84 fan of purane hindi gaane,
> > hmm? me too!!   sonali_k (sonali_k)             web
> > 2009-11-28      Saturday        14      54      Fruits and Vegetables for energyzing the
> > Solar Plexus Chakra:http://bit.ly/4NQV9M     AnamikaS (Anamika S)            web
> > 2009-11-28      Saturday        14      52      I'm off to read and then sleep. Don't dare
> > disturb my slumber.     eyemanut87 (Moo)                Snaptu
> > 2009-11-28      Saturday        14      52      White House gate-crashers met Obama, PM:
> > American couple Michaele and Tareq Salahi, who gate-crashed into a
> > State D...http://bit.ly/5TH6k5RediffNews(Rediff News)

enygmatic

unread,
Dec 1, 2009, 11:49:49 PM12/1/09
to Twitter Development Talk
Hi, Raffi
Were you able to raise the cache issue with the search team?
Seems the problem is worse than I thought. I have run my script
(getting 25 results from search every 15 minutes, for Mumbai) for two
days. The first day had 71% duplicate results due to the caching
issue, while the second day fetched an amazing 90% duplicates. With
these kind of results, I think it’s probably quite useless for me to
even use the search API .
So would appreciate if you could let me know if there is a chance that
this issue may be resolved in the near future or if location specific
streams would be available via the streaming API anytime soon. I
understand that the twitter dev team has a lot on its hands, so it
would be understandable if this isn’t anywhere in the list of features
they intend to ship out in the near future. However, would definitely
appreciate it if you could let me know if anything could be done or
not.
Thanks and Regards,
Elroy Serrao
> ra...@twitter.com | @raffi- Hide quoted text -
>
> - Show quoted text -

AJ Chen

unread,
Dec 2, 2009, 8:31:14 PM12/2/09
to twitter-deve...@googlegroups.com
unless I miss something, it's usually user's responsibility to dedup returned tweets on the client side. if you see duplicates between two feeds, just remove the duplicates. this is what client application should have in any case.

if you see no fresh tweets but only old tweets, there may be a possibility that twitter returns only cashed results because you api calls exceed rate-limit. I'm not sure, though.  does any one know about rate-limit for using search feed http://search.twitter.com/search.atom?

-aj
--
AJ Chen, PhD
Chair, Semantic Web SIG, sdforum.org
http://web2express.org
@web2express on twitter
Palo Alto, CA, USA
650-283-4091
*Monitor realtime web and follow trending topics with semantic intelligence*

enygmatic

unread,
Dec 6, 2009, 11:50:03 PM12/6/09
to Twitter Development Talk
@AJ Chen
You are 100% correct when you say that it’s the user’s responsibility
to clean up duplicates in the search results. My issue is not so much
about there being duplicates, but the fact that there are so many of
them. My concept of search is that if there have been new tweets
posted, say 30 odd since I last queried search, I ought to get the new
tweets on my next query. What I shouldn’t be getting is, search
results from say two hours ago whenever I query search. Maybe I am
wrong here, but that’s how I expected the search API to work.
On the issue of Rate Limiting, I am really not sure what the rate
limit would be, since the documentation does not give a clear picture
of what that limit is. The documentation (http://apiwiki.twitter.com/
Rate-limiting) merely hints that it is significantly higher than the
150 requests per hour limit for the REST API. Considering this, I
don’t think my application / script should be exceeding that limit
since I only make 4 requests per hour.
Anyway, would really appreciate it if someone could point me in the
right direction, or at least let me know if I am trying to the wrong
thing with the search API.

Regards,
Elroy

On Dec 3, 6:31 am, AJ Chen <cano...@gmail.com> wrote:
> unless I miss something, it's usually user's responsibility to dedup
> returned tweets on the client side. if you see duplicates between two feeds,
> just remove the duplicates. this is what client application should have in
> any case.
>
> if you see no fresh tweets but only old tweets, there may be a possibility
> that twitter returns only cashed results because you api calls exceed
> rate-limit. I'm not sure, though.  does any one know about rate-limit for
> using search feedhttp://search.twitter.com/search.atom<http://search.twitter.com/search.atom?geocode=19.017656%2C72.856178%2.>
> ?
> Chair, Semantic Web SIG, sdforum.orghttp://web2express.org
> @web2express on twitter
> Palo Alto, CA, USA
> 650-283-4091
> *Monitor realtime web and follow trending topics with semantic intelligence*- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages