Tweets2011 collection from NIST

Skip to first unread message

Ian Soboroff

Sep 1, 2011, 3:15:11 PM9/1/11
Hi, all,

I'm pleased to announce that the Tweets2011 collection, built as part of the Text REtrieval Conference's (TREC) microblog track, is now available to everyone.  Details are at

I shared information about this collection with a lot of people at ICWSM.  In case you weren't there, what is distributed is a set of tweet identifiers sampled by Twitter.  You sign a usage agreement, and get those lists.  You combine those with an open-source crawler tool, which gets the actual data directly from Twitter, via HTTP or the REST API, whichever you prefer.  You cannot share the tweets (that violates the Twitter TOS) but you can share the ids and the crawler.

As an added bonus, the Tweets2011 collection is contemporaneous with two weeks of the ICWSM 2011 Spinn3r collection.  That sounds like fun.

Reply all
Reply to author
0 new messages