Several important updates for temporal summarization.
I. 2015 Test Events
We have posted the test events on the Temporal Summarization website.
Because these are test events, you should not train your system using these events. We stress that participants should not use information `from the future' in their systems, including corpus-level IDF. This is explained further in the guidelines.
II. 2015 (Filtered) Corpus
The Task 1 'Filtering and Summarization' corpus, denoted TREC-TS-2015F is now available for download from Amazon Web Services (AWS). TREC-TS-2015F is a version of the KBA-2014 stream corpus that has been pre-filtered to only contain documents likely to be relevant to each of the 2015 test events. The list of files to download for each topic is available from the trec-ts.org website.
For first time participants, you will need to initially download the file lists within the above zip file. Then, for each topic, download the stream corpus files listed from AWS. Finally, each stream corpus file will need to be decrypted using the key provided by NIST. If you have not yet requested the decryption key, instructions are available at: http://trec.nist.gov/data/kba.html.
For participants also looking to also participate in the Task 2 'Summarization Only' task, the associated corpus for that task will be released at the end of this month.
III. 2015 Guidelines
We have posted the 2015 guidelines. Important changes from last year include,
IV. 2015 Metrics
We have updated the metrics to use the earliest matching update instead of the nugget time for evaluation.
V. 2013 and 2014 Data
You may use 2013 and 2014 events as training events for 2015. Data and evaluation scripts can be found in the downloads section.
We have released a filtered version of the 2013 corpus. The filtered version of the 2014 was released last year.
VI. Location
All downloads can be found on the track website here,
https://sites.google.com/site/temporalsummarization/downloads
As always, please post any questions to this list.
|
|
|
--
You received this message because you are subscribed to the Google Groups "temporalsummarization" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trec-ts+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Yes, you may download the entire TREC KBA 2014 corpus for local processing.
You can train on the 2013 and 2014 topics and judgments using either the entire corpus, as you suggest, or using the filtered sets for those years, as found on the website. We provide the filtered corpora in order to make processing easier for participants with resource constraints.
All sentences returned by participants are pooled and then judged, including sentences in both the filtered and complete corpora so you would not be penalized if you find relevant content outside of the filtered set.
Keep in mind that participants should not be looking at test events or training systems toward them. You should only be running trained systems on them.
Let me know if you have any other questions.
F
I have a question. I started downloading the TREC-TS-2015F by topic but the topic ids do not match with the downloading documents. What i mean is that for example the text file 27.txt seems to contain relevant documents for topic 29 (folder name is 29) and so on. Do I miss something?
Thank you,
Anastasia