Fwd: New data release: Freebase Annotations of the TREC KBA Stream Corpus 2014 (FAKBA1)

67 views
Skip to first unread message

Jeff Dalton

unread,
Feb 3, 2015, 3:26:35 PM2/3/15
to trec...@googlegroups.com
Hi everyone,

I'm happy to announce the public release of our largest corpus of Freebase annotated data. The Freebase Annotations of the TREC KBA Stream Corpus 2014 (FAKBA1) contains over 9.4 billion entity annotations from over 496 million documents. More details, including a link to download the data are available at:

We annotated all of the English documents from the TREC KBA Stream Corpus 2014 (http://trec-kba.org/kba-stream-corpus-2014.shtml) with entity links to Freebase. The entity links are resolved automatically, and are imperfect. For each named entity recognized we provide: the mention text, begin and end byte offsets, Freebase MID, and confidence scores. We also provide manual annotations of the TREC KBA CCR 2014 entity queries.

FAKBA1 has 394,051,027 documents with at least one entity annotated. There are over 9.4 billion entity mentions with links to Freebase. On average, Stream Corpus documents have 19 annotated mentions per document.  The annotations can be used with data in the KBA Stream Corpus that includes named entities, within-doc coreference resolution, and dependency parsed sentences. 

Thanks to John R. Frank (MIT) for providing the KBA Stream Corpus and for his assistance throughout the annotation process.

Thanks,

Jeff Dalton

P.S. You might want to subscribe to our mailing list (http://goo.gl/MJb3A) to get timely notifications of future data releases. The list archives are open so you're welcome to browse them to learn more about our data releases to date.

Jeff Dalton

unread,
Feb 3, 2015, 3:27:07 PM2/3/15
to trec...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages