Groups
Conversations
All groups and messages
Send feedback to Google
Help
Training
Sign in
Groups
streamcorpus
Conversations
About
Groups keyboard shortcuts have been updated
Dismiss
See shortcuts
streamcorpus
Contact owners and managers
1–30 of 44
Read more about the streamcorpus interfaces here:
https://github.com/trec-kba/
streamcorpus
Mark all as read
Report group
0 selected
GB
,
John R. Frank
3
4/8/16
How to cite the KBA Stream Corpus in a publication?
Thank your for the quick response :) Warm regards, GB On Friday, April 8, 2016 at 11:52:36 AM UTC-4,
unread,
How to cite the KBA Stream Corpus in a publication?
Thank your for the quick response :) Warm regards, GB On Friday, April 8, 2016 at 11:52:36 AM UTC-4,
4/8/16
Daniel
3/3/16
How to parse .sc files for content
I am having trouble parsing the .sc files to get the content. Do the python examples given provide
unread,
How to parse .sc files for content
I am having trouble parsing the .sc files to get the content. Do the python examples given provide
3/3/16
Christopher Kedzie
, …
John R. Frank
10
1/27/16
Error trying to run example python code
The good and bad of Thrift is that it rigidly specifies what fields mean. If you want to change a
unread,
Error trying to run example python code
The good and bad of Thrift is that it rigidly specifies what fields mean. If you want to change a
1/27/16
Matteo Bernardon
,
John R Frank
4
9/8/15
KBA with only content in English
Hi Matteo, See these dirs for info on how to use the streamcorpus thrift definitions in Java: https:/
unread,
KBA with only content in English
Hi Matteo, See these dirs for info on how to use the streamcorpus thrift definitions in Java: https:/
9/8/15
姚应哲
7/22/15
a problem to get data from *.sc files
hello,organizers: when I use the streamcorpus python scripts to get data out of *.sc files ,I come
unread,
a problem to get data from *.sc files
hello,organizers: when I use the streamcorpus python scripts to get data out of *.sc files ,I come
7/22/15
Kanika Parashar
,
John R. Frank
2
6/28/15
streamcorpus_dump tool to get data out of the .sc files
> I installed streamcorpus using pip. I am trying to get data out of the > .sc files. Using the
unread,
streamcorpus_dump tool to get data out of the .sc files
> I installed streamcorpus using pip. I am trying to get data out of the > .sc files. Using the
6/28/15
Cristina Garbacea
,
John R. Frank
2
5/9/15
Extract text content out of .sc files
Hi Cristina, There is extensive tooling in Python. See http://streamcorpus.org/ and in particular the
unread,
Extract text content out of .sc files
Hi Cristina, There is extensive tooling in Python. See http://streamcorpus.org/ and in particular the
5/9/15
Piji Li
,
John R. Frank
3
4/23/15
Where is the corpus collected from?
I see. Thanks a lot. Piji On Wed, Apr 22, 2015 at 11:29 AM, John R. Frank <j...@diffeo.com>
unread,
Where is the corpus collected from?
I see. Thanks a lot. Piji On Wed, Apr 22, 2015 at 11:29 AM, John R. Frank <j...@diffeo.com>
4/23/15
John R. Frank
1/14/15
Re: MD5 checksums for kba-2014-clean compressed files?
This question might come up for other people interested in using the TREC KBA StreamCorpora described
unread,
Re: MD5 checksums for kba-2014-clean compressed files?
This question might come up for other people interested in using the TREC KBA StreamCorpora described
1/14/15
Jeroen Vuurens
,
John Frank
2
12/3/14
epoch and zulu
Yes, the epoch_ticks is seconds in the epoch since 1970. The zulu_timestamp is computed from the
unread,
epoch and zulu
Yes, the epoch_ticks is seconds in the epoch since 1970. The zulu_timestamp is computed from the
12/3/14
Wallace
, …
zhaochun ren
3
8/26/14
How to deal with Corpora?
Hi, When I used the one of the java example "ReadThrift.java" in the sreamcorpus githup
unread,
How to deal with Corpora?
Hi, When I used the one of the java example "ReadThrift.java" in the sreamcorpus githup
8/26/14
Hung Nguyen
,
John R. Frank
2
8/20/14
Entities list
Hi Hung, if you are part of a team that registered for TREC and added itself to the KBA list, then
unread,
Entities list
Hi Hung, if you are part of a team that registered for TREC and added itself to the KBA list, then
8/20/14
张东旭
,
John R. Frank
4
8/19/14
Hi, John.
> Do you mean that if two tokens have same value of mention_id, they will > refer to the same
unread,
Hi, John.
> Do you mean that if two tokens have same value of mention_id, they will > refer to the same
8/19/14
John R. Frank
,
kita3...@gmail.com
2
7/23/14
Re: extracting stream items from the TS-specific corpus subset
Hi John Thank you very much for your helpful answer. I will use Serif in place of Lingpipe. Regards,
unread,
Re: extracting stream items from the TS-specific corpus subset
Hi John Thank you very much for your helpful answer. I will use Serif in place of Lingpipe. Regards,
7/23/14
kita3...@gmail.com
7/21/14
extracting stream items from the TREC-TS-2014F dataset
Hello, I am one of participants in the TREC 2014 Temporal Summarization (TS) track. I use the TS-
unread,
extracting stream items from the TREC-TS-2014F dataset
Hello, I am one of participants in the TREC 2014 Temporal Summarization (TS) track. I use the TS-
7/21/14
Sayaka Kitaguchi
7/3/14
Re: [TREC-TS14] TREC 2014 Temporal Summarization Test Events and Guidelines
Hi Fernando, Thank you for answering my question. I see. I will use 2013 events to train our
unread,
Re: [TREC-TS14] TREC 2014 Temporal Summarization Test Events and Guidelines
Hi Fernando, Thank you for answering my question. I see. I will use 2013 events to train our
7/3/14
John R. Frank
, …
Parsa Ghaffari
8
6/19/14
[TREC StreamCorpus] serif-only subset of 2014 streamcorpus and survey
> How can one obtain the GPG keys required for decrypting the corpus? See the forms on this page:
unread,
[TREC StreamCorpus] serif-only subset of 2014 streamcorpus and survey
> How can one obtain the GPG keys required for decrypting the corpus? See the forms on this page:
6/19/14
John R. Frank
, …
张东旭
12
6/4/14
TREC StreamCorpus 2014 released -- 1.2B docs, rich NLP tagging, >18 months of contiguous news, web, blogs
Thank you very much indeed for your patient reply. It's really helpful ! And our team look
unread,
TREC StreamCorpus 2014 released -- 1.2B docs, rich NLP tagging, >18 months of contiguous news, web, blogs
Thank you very much indeed for your patient reply. It's really helpful ! And our team look
6/4/14
John R. Frank
6/3/14
Re: Big Corpora, how to deal with?
Hi Wallace, Moving your question from trec-kba to streamcorpus discussion forum. > Are there any
unread,
Re: Big Corpora, how to deal with?
Hi Wallace, Moving your question from trec-kba to streamcorpus discussion forum. > Are there any
6/3/14
wim.g...@gmail.com
, …
ashwin
5
12/6/13
StreamItem.body.getClean_visible
SteamItem.body.clean_visible. body is ContentItem with clean_visible field. On Thursday, December 5,
unread,
StreamItem.body.getClean_visible
SteamItem.body.clean_visible. body is ContentItem with clean_visible field. On Thursday, December 5,
12/6/13
John Frank
12/2/13
streamcorpus thrift definition update
users of the streamcorpus interfaces, Now that TREC 2013 is complete, we are switching the master
unread,
streamcorpus thrift definition update
users of the streamcorpus interfaces, Now that TREC 2013 is complete, we are switching the master
12/2/13
ashwin
,
John R. Frank
4
10/22/13
information on training
Hi Ashwani, I think your questions pertain to TREC KBA and not the other tracks using the
unread,
information on training
Hi Ashwani, I think your questions pertain to TREC KBA and not the other tracks using the
10/22/13
wim.g...@gmail.com
, …
John R. Frank
3
9/1/13
Directories are missing
That's probably expected. There is a gap in mid-summer 2012. See attached graph of counts per
unread,
Directories are missing
That's probably expected. There is a gap in mid-summer 2012. See attached graph of counts per
9/1/13
John R. Frank
2
8/20/13
updated tarball with training data for SSF and fixed stream_ids for CCR training examples
> Can someone expalain to me why there are two annotation files and how we > are supposed to
unread,
updated tarball with training data for SSF and fixed stream_ids for CCR training examples
> Can someone expalain to me why there are two annotation files and how we > are supposed to
8/20/13
Laura Dietz
, …
wim.g...@gmail.com
12
8/7/13
Which fields are defined under which circumstances?
Hi, John Thank you for your prompt reply! Best, wim
unread,
Which fields are defined under which circumstances?
Hi, John Thank you for your prompt reply! Best, wim
8/7/13
Tom Kenter
,
John R. Frank
2
8/1/13
Missing fields in sc chunks?
> Could it be that in some cases the attributes for the processed fields > (clean_html,
unread,
Missing fields in sc chunks?
> Could it be that in some cases the attributes for the processed fields > (clean_html,
8/1/13
j...@mit.edu
, …
Tom Kenter
5
7/25/13
TREC kba-streamcorpus-2013-v0_2_0 released
> But actually I am having a hard time finding any of the chunks mentioned > in trec-kba-ccr-
unread,
TREC kba-streamcorpus-2013-v0_2_0 released
> But actually I am having a hard time finding any of the chunks mentioned > in trec-kba-ccr-
7/25/13
Fernando Diaz
, …
Craig Willis
7
7/18/13
stream item timestamps: stream_time.zulu_timestamp vs stream_time.epoch_ticks
>> Just to confirm, there will be an updated version of trec-kba-ccr-judgments-2013-04-08.
unread,
stream item timestamps: stream_time.zulu_timestamp vs stream_time.epoch_ticks
>> Just to confirm, there will be an updated version of trec-kba-ccr-judgments-2013-04-08.
7/18/13
John R. Frank
,
wim.g...@gmail.com
15
7/4/13
decrypting the corpus
> I have tried to use boilerpipe to remove the chrome, but with no > success. The 'artice
unread,
decrypting the corpus
> I have tried to use boilerpipe to remove the chrome, but with no > success. The 'artice
7/4/13
Vincent Bouvier
,
John R. Frank
3
7/3/13
StreamIds map
It was posted on Monday: https://groups.google.com/forum/#!topic/streamcorpus/6KmaaUKSPIM jrf On Wed,
unread,
StreamIds map
It was posted on Monday: https://groups.google.com/forum/#!topic/streamcorpus/6KmaaUKSPIM jrf On Wed,
7/3/13