should search and streaming apis return similar tweets for equivalent geolocation areas

76 views
Skip to first unread message

Colin Surprenant

unread,
Feb 10, 2011, 5:32:02 PM2/10/11
to Twitter Development Talk
Hi,

I have been running some tests to gather tweets from users within a
geo area using both the search API (with the geocode parameter) and
the streaming API (with the statuses/filter method & locations
parameter).

I have noticed that the streaming API returns far less tweets for an
equivalent area expressed either as a latlong+radius for the search
API or as a bounding box for the streaming API.

Is this normal or should we expect a similar result set with both
methods?

In the doc it says that the streaming API will only return tweets that
are created using the Geotagging API (and within the bounding box) but
the search API will preferentially use the Geotagging API, but will
fall back to the Twitter profile location.

Can this explain why I see much more results with the search API?

Thanks,
Colin

Colin Surprenant

unread,
Feb 12, 2011, 2:24:50 PM2/12/11
to Twitter Development Talk
Some metrics:

I just reran some tests to compare results for both the polling search
api + geocode and the steaming api statuses/filter + locations using
San Francisco as the geolocation.

Basically, the polling search api+geocode returns approximately 30x
more results than the steaming api statuses/filter + locations within
the same test period for the same geolocation.

The parameters used for the search api+geocode were: 37.736784,
-122.44709, 40km
The parameters used for the steaming api statuses/filter + locations
were:
-122.901549008664,37.3773810096865,-121.992630991336,38.0961869903135
which correspond to the bounding box around 37.736784, -122.44709,
40km.

Why is there such huge difference and can we expect the streaming API
to eventually match what the search API produces for geolocalized
searches?

Thanks,
Colin

On Feb 10, 5:32 pm, Colin Surprenant <colin.surpren...@gmail.com>
wrote:

Taylor Singletary

unread,
Feb 14, 2011, 2:33:54 PM2/14/11
to Twitter Development Talk
Hi Colin,

You hit the nail on the head with this observation:

> In the doc it says that the streaming API will only return tweets that
> are created using the Geotagging API (and within the bounding box) but
> the search API will preferentially use the Geotagging API, but will
> fall back to the Twitter profile location.

The Search API is greedy with those location fields on user's
profiles. It's not likely this behavior will be emulated in the
Streaming API with the bright side that you can be more confident in
the location accuracy in matches on the Streaming API.

Thanks,
Taylor

M. Edward (Ed) Borasky

unread,
Feb 14, 2011, 2:49:29 PM2/14/11
to twitter-deve...@googlegroups.com
On Mon, 14 Feb 2011 11:33:54 -0800 (PST), Taylor Singletary
<taylorsi...@twitter.com> wrote:
> The Search API is greedy with those location fields on user's
> profiles. It's not likely this behavior will be emulated in the
> Streaming API with the bright side that you can be more confident in
> the location accuracy in matches on the Streaming API.
>
> Thanks,
> Taylor

I wouldn't call the Search API "greedy" on location as much as I'd call
it "myopic" or "easily confused". ;-) Twitalyzer is now getting some of
their location data from PeerIndex when the Twitter profile isn't
accurate.
--
http://twitter.com/znmeb http://borasky-research.net

"A mathematician is a device for turning coffee into theorems." -- Paul
Erdős

Colin Surprenant

unread,
Feb 15, 2011, 2:19:35 PM2/15/11
to Twitter Development Talk
So basically today we have two options for geo search:

- use the search api and get results that will include some
incorrectly geolocalized tweets when falling back on the user location
field.
- use the streaming api and retrieve significantly far less tweets but
with a higher degree of confidence in their geolocation using only the
geotagging api.

Can we expect these two methods to be available concurrently for the
next 3, 6, 12 months?

I have two problems with this:

- As developers we are asked to migrate toward the streaming api
instead of using periodic polling, which makes sense. But for geo
search, the streaming api is not a viable alternative for those who
actually prefer/require/want the behaviour of the geo search api.

- The fact the the streaming api returns far less but more precise
data is not necessarily better, it really depends who you ask. For me,
having lots of geolocalized data that will contain a fraction of
invalid data is far more valuable than having far less but more
accurate data.

My tests told me the streaming api currently returns only ~3% of the
volume of data the search api produces. If the only difference between
the search api and the streaming api is the usage of the user location
field, then we can certainly say that FAR more people are still only
using their user location field and not using the geotagging api.

Will you offer an option in the streaming api to fall back or not on
the user location field when evaluating the geolocation of a tweet?

Thanks,
Colin

On Feb 14, 2:33 pm, Taylor Singletary <taylorsinglet...@twitter.com>
wrote:

Karussell

unread,
Feb 15, 2011, 2:39:24 PM2/15/11
to Twitter Development Talk
Hmmh, would you mind to test this without the geo location filter?
And report your findings here? I'm having an issue even with that.
See:

http://groups.google.com/group/twitter-development-talk/browse_thread/thread/f5a0f2a416893c27

Kind Regards,
Peter.

--

http://jetwick.com Twitter Search without Noise

Colin Surprenant

unread,
Feb 15, 2011, 3:26:08 PM2/15/11
to Twitter Development Talk
I did ran tests on keyword search and found only very marginal
differences between polling the search api and using the streaming
api. I also followed up in your thread.

Colin

On Feb 15, 2:39 pm, Karussell <tableyourt...@googlemail.com> wrote:
> Hmmh, would you mind to test this without the geo location filter?
> And report your findings here? I'm having an issue even with that.
> See:
>
> http://groups.google.com/group/twitter-development-talk/browse_thread...

Karussell

unread,
Feb 15, 2011, 3:44:08 PM2/15/11
to Twitter Development Talk
Thanks Colin!

Colin Surprenant

unread,
Feb 24, 2011, 11:35:31 AM2/24/11
to Twitter Development Talk
Did not receive any answer on the questions below. I think it is
important for us developers to understand the direction Twitter is
taking in relation to geolocalized searches.

In a nutshell:

For geolocalized searches, the streaming API returns a very small
fraction (3% in my tests) of what the search API returns. This is
because the streaming API only uses the geotagging API to locate
tweets, but the search API uses both the geotagging API and the user
location field.

Depending on your application, both methods can be valuable. In
particular, what the search API retrieves make sense in my context but
it is not possible to get this using the streaming API.

- Can we expect both methods to be supported in the future and can we
expect to get a streaming version of what the search API does today?

Thanks,
Colin

On Feb 15, 2:19 pm, Colin Surprenant <colin.surpren...@gmail.com>
wrote:
Reply all
Reply to author
Forward
0 new messages