TREC 2015 Temporal Summarization Test Events and Guidelines

66 views
Skip to first unread message

Fernando Diaz

unread,
Jun 10, 2015, 1:40:57 PM6/10/15
to tre...@googlegroups.com


Several important updates for temporal summarization.


I. 2015 Test Events


We have posted the test events on the Temporal Summarization website.  


Because these are test events, you should not train your system using these events.  We stress that participants should not use information `from the future' in their systems, including corpus-level IDF.  This is explained further in the guidelines.


II. 2015 (Filtered) Corpus


The Task 1 'Filtering and Summarization' corpus, denoted TREC-TS-2015F is now available for download from Amazon Web Services (AWS).  TREC-TS-2015F is a version of the KBA-2014 stream corpus that has been pre-filtered to only contain documents likely to be relevant to each of the 2015 test events. The list of files to download for each topic is available from the trec-ts.org website.


For first time participants, you will need to initially download the file lists within the above zip file. Then, for each topic, download the stream corpus files listed from AWS. Finally, each stream corpus file will need to be decrypted using the key provided by NIST. If you have not yet requested the decryption key, instructions are available at: http://trec.nist.gov/data/kba.html.


For participants also looking to also participate in the Task 2 'Summarization Only' task, the associated corpus for that task will be released at the end of this month.


III. 2015 Guidelines


We have posted the 2015 guidelines.  Important changes from last year include,


  • addition of a new "summarization only" subtask for those wanting to work with a higher quality set of retrieved documents
  • limit of 1000 sentences per event per run

IV. 2015 Metrics


We have updated the metrics to use the earliest matching update instead of the nugget time for evaluation.  


V. 2013 and 2014 Data


You may use 2013 and 2014 events as training events for 2015.  Data and evaluation scripts can be found in the downloads section.


We have released a filtered version of the 2013 corpus.  The filtered version of the 2014 was released last year.


VI. Location


All downloads can be found on the track website here,


https://sites.google.com/site/temporalsummarization/downloads


As always, please post any questions to this list.




Matthew Ekstrand-Abueg

unread,
Jun 10, 2015, 8:06:06 PM6/10/15
to tre...@googlegroups.com
Quick addendum/apology: if you downloaded the test events file in the first half hour after the email was sent, please download it again, as there were a couple edits to event titles.

Happy Summarizing!

-Matthew


--
You received this message because you are subscribed to the Google Groups "temporalsummarization" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trec-ts+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jeroen Vuurens

unread,
Jun 11, 2015, 1:43:50 AM6/11/15
to tre...@googlegroups.com
Hi,

Thanks for the great work in organizing this.

Is it possible to just download the entire 2014 KBA corpus, so that we can just process one collection and do some training on the 2014 topics as well as output for the new topics? Since we require an admin to do the downloading it is easier for us to download the entire thing and work from there, am I correct that that would be the second collection mentioned in http://s3.amazonaws.com/aws-publicdatasets/trec/kba/index.html ?

Also, suppose there is a relevant text outside the set you have filtered, would returning that be regarded as irrelevant or would that be annotated similar to texts from within the filtered set?

Thanks, Jeroen

Fernando Diaz

unread,
Jun 11, 2015, 9:43:31 AM6/11/15
to Jeroen Vuurens, tre...@googlegroups.com

Yes, you may download the entire TREC KBA 2014 corpus for local processing.  


You can train on the 2013 and 2014 topics and judgments using either the entire corpus, as you suggest, or using the filtered sets for those years, as found on the website.  We provide the filtered corpora in order to make processing easier for participants with resource constraints.  


All sentences returned by participants are pooled and then judged, including sentences in both the filtered and complete corpora so you would not be penalized if you find relevant content outside of the filtered set.  


Keep in mind that participants should not be looking at test events or training systems toward them.  You should only be running trained systems on them.


Let me know if you have any other questions.


F




From: tre...@googlegroups.com <tre...@googlegroups.com> on behalf of Jeroen Vuurens <jbpvu...@gmail.com>
Sent: Thursday, June 11, 2015 1:43 AM
To: tre...@googlegroups.com
Subject: Re: [TREC-TS] TREC 2015 Temporal Summarization Test Events and Guidelines
 

agia...@gmail.com

unread,
Jun 12, 2015, 9:25:33 AM6/12/15
to tre...@googlegroups.com
Hi, thank you for organising all this.

I have a question. I started downloading the TREC-TS-2015F by topic but the topic ids do not match with the downloading documents. What i mean is that for example the text file 27.txt seems to contain relevant documents for topic 29 (folder name is 29) and so on. Do I miss something?

Thank you,
Anastasia

Fernando Diaz

unread,
Jun 12, 2015, 9:54:04 AM6/12/15
to agia...@gmail.com, tre...@googlegroups.com

Anastasia,

The current folder names on aws have older id's we were using previously. The documents should relevant to the id of the filter set file (e.g. 27.txt does point to the right files). Apologies for the confusion. We will try to update this and let the group know.

F

________________________________________
From: tre...@googlegroups.com <tre...@googlegroups.com> on behalf of agia...@gmail.com <agia...@gmail.com>
Sent: Friday, June 12, 2015 9:25 AM
To: tre...@googlegroups.com
Subject: [TREC-TS] Re: TREC 2015 Temporal Summarization Test Events and Guidelines
Reply all
Reply to author
Forward
0 new messages