Searching for tweets containing a specific domain

18 views
Skip to first unread message

Eric Marcoullier @ Gnip

unread,
Sep 21, 2010, 7:39:30 PM9/21/10
to Twitter Development Talk
If you query search.twitter.com for a specific domain, such as
techcrunch.com, you'll get a list of all tweets that contain that
domain, even if it's contained in a shortened URL. Using domains as
predicates in Streaming Track doesn't result in the same behavior,
only matching on actual body text as opposed to metadata.

What is the most effective strategy to consume a feed of domain-
specific tweets at this point?

Thanks!
Eric

snydeq

unread,
Sep 23, 2010, 2:32:08 PM9/23/10
to Twitter Development Talk
Twitter Search seems to have dropped or lost its ability to parse
shortened URLs, as of Sept. 21, which is the key to surfacing domain
searches within shortened URLs.

Not sure if this is a glitch or an intentional change.

Brian Medendorp

unread,
Sep 24, 2010, 8:34:18 AM9/24/10
to Twitter Development Talk
I just did a search for a domain, and it did indeed return results
from shortened URLs, so it seems to still work (or they fixed the
problem).

The only reasonable way I have found to consume the feed of a specific
domain is to use the search API. You'll need to periodically perform a
search, and keep track of the most recent ID that you found in the
previous search so you'll know when to stop. You may also need to
adjust the time between searches based on how many results you get.

You could probably also use the streaming API with a filter for the
domain, though I am not sure if that will work with the shortened
URLs, and it would of course require keeping a connection open.

Eric Marcoullier @ OneTrueFan

unread,
Sep 24, 2010, 5:30:57 PM9/24/10
to Twitter Development Talk
I'll start digging into the Search API and seeing how that performs.

I can tell you from experience that the streaming API does not
currently carry the expanded URL metadata in its payload. All you get
is a match on raw tweet content.

Matt Harris

unread,
Sep 24, 2010, 8:44:05 PM9/24/10
to twitter-deve...@googlegroups.com
It's not there yet but we are adding support for this feature to the
Streaming API.
We'll let you know when it goes live.

Best,
@themattharris

> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group: http://groups.google.com/group/twitter-development-talk
>

--


Matt Harris
Developer Advocate, Twitter
http://twitter.com/themattharris

snydeq

unread,
Sep 25, 2010, 10:28:28 AM9/25/10
to Twitter Development Talk
The method Brian describes is essentially what I have been using for
the past four weeks, tapping the Search API atom feed at intervals
(every three hours), using since_id to pull everything since the last
saved tweet ID.

Since Sept. 21, tweet volumes from this method have dropped 99
percent. The only tweets coming through are those for which domain
searches appear in the body text of the tweet. This behavior is
consistent across all six domains I am searching.

Spot checks of domain searches do very occasionally show returns of
tweets with shortened URLs that point to these domains, and that do
not mention the domains specifically in the body text, as they should,
but for the most part this is not the case.

Even when some of these results are returned, scanning down the
complete list of returned tweets shows that shortened URLs will have
been parsed only for the first few returned results, and the rest of
the results returned revert to the non-parsing behavior, presumably
because of the use of cache.

Refreshing on these searches, and scanning down the results, will show
this behavior:

http://search.twitter.com/search.atom?q=pcworld.com

And refreshing on Twitter's search page will show the same behavior,
with very occasional flickers of the "expand" option for shortened
URLs coming through. For the most part, however, it will not to
showing the "expand" option.

http://search.twitter.com/search?q=pcworld.com

Still seems like there's something amiss here.
Reply all
Reply to author
Forward
0 new messages