New data release: Freebase annotations for TREC Million Query Track and Web Track queries

479 views
Skip to first unread message

Evgeniy Gabrilovich

unread,
May 30, 2013, 3:40:35 PM5/30/13
to knowledge-d...@googlegroups.com, IRL...@lists.shef.ac.uk
We have just released Freebase annotations of several commonly-used sets of queries, including TREC Million Query Track and Web Track queries. 

The annotation process was automatic (and hence imperfect :), and for every Freebase entity we recognized in a query with high confidence, we provided the beginning and end byte offsets of the entity mention, its Freebase identifier (mid), and the confidence level.

The annotated data is available at http://www.lemurproject.org/clueweb09.php/related-data.php, as well as at http://www.lemurproject.org/clueweb12.php/related-data.php (thanks to Jamie Callan and CMU for hosting this data!)

We hope this dataset will be particularly useful in conjunction with the forthcoming release of similarly annotated ClueWeb corpora (2009 and 2012 versions), which we hope will be available in a couple of months.

Evgeniy.

P.S. You might want to subscribe to our mailing list (http://goo.gl/MJb3A) to get timely notifications of future data releases. You might also be interested in the relation extraction corpus we released last month: http://googleresearch.blogspot.com/2013/04/50000-lessons-on-how-to-read-relation.html


Reply all
Reply to author
Forward
0 new messages