[TREC StreamCorpus] serif-only subset of 2014 streamcorpus and survey

John R. Frank

unread,

Jun 17, 2014, 12:47:02 AM6/17/14

to stream...@googlegroups.com

The Serif-only filtering of the TREC StreamCorpus has finished.

579,838,246 of the 1.2B documents in the TREC StreamCorpus have been
tagged by BBN's Serif system for NER, sentence parse trees, and
within-document coreference chaining.

This 10.9TB subset has been filtered and stored separately here:

http://s3.amazonaws.com/aws-publicdatasets/trec/kba/kba-streamcorpus-2014-v0_3_0-serif-only/index.html

See details and download tools here:

http://aws-publicdatasets.s3.amazonaws.com/trec/kba/index.html

If you plan to use the corpus, please complete this three-question survey:

https://www.surveymonkey.com/s/RHK9T29

TREC KBA Organizers

http://trec-kba.org

Shun

unread,

Jun 17, 2014, 7:03:06 AM6/17/14

to stream...@googlegroups.com

Hello.

I have two questions.

1.Is kba-streamcorpus-2014-v0_3_0-serif-only the English-only corpus?

2.Is it best that the participant of KBA uses this corpus?

Shun

2014年6月17日火曜日 13時47分02秒 UTC+9 John Frank:

John R. Frank

unread,

Jun 17, 2014, 8:13:36 AM6/17/14

to stream...@googlegroups.com

Yes, this is the English and Unknown language subset of the corpus.

Yes, this is the only portion of the corpus being used in TREC KBA. Only
documents from this subset will be judged by assessors.

John

On Tue, 17 Jun 2014, Shun wrote:

> Hello.I have two questions.

tuan tran

unread,

Jun 17, 2014, 10:25:48 AM6/17/14

to stream...@googlegroups.com

Dear John,

I registered for the StreamCorpus 2013 and had a GPG key to access the corpus, but did not catch up with the deadline for TREC StreamCorpus 2014. Is it still possible to have GPG keys for this year's corpus ? If not, will similar Serif-filtered portion for StreamCorpus 2013 be release soon ?

Thanks,

Tuan

John R. Frank

unread,

Jun 17, 2014, 10:50:59 AM6/17/14

to stream...@googlegroups.com

The GPG keys is the same as last year.

jrf

Shun

unread,

Jun 17, 2014, 10:17:50 PM6/17/14

to stream...@googlegroups.com

Thank you for your reply.

Then I intend to download kba-streamcorpus-2014-v0_3_0-serif-only for TREC KBA.

Shun

2014年6月17日火曜日 21時13分36秒 UTC+9 John Frank:

Parsa Ghaffari

unread,

Jun 19, 2014, 8:29:17 AM6/19/14

to stream...@googlegroups.com

Hi John,

How can one obtain the GPG keys required for decrypting the corpus?

Parsa

John R. Frank

unread,

Jun 19, 2014, 8:31:01 AM6/19/14

to stream...@googlegroups.com

> How can one obtain the GPG keys required for decrypting the corpus?

See the forms on this page:

http://trec.nist.gov/data/kba.html

jrf

Reply all

Reply to author

Forward