Laura Dietz

Jun 4, 2018, 5:37:51 PM6/4/18
to trec...@googlegroups.com

Dear TREC CAR fans,

The ground truth for benchmarkY1test are released.

You find several new packages listed below. Please ask if you are not sure which one is right for you.

For the current V2.0 data set [1]:

Use this dataset when training for the 2018 TREC CAR evaluation. Be aware that, although it was created in the same Wikipedia dump (2016 Dec 20) identifiers are incompatible with the V1.5 dataset.

However, many parsing errors were eliminated and a much large dataset was released. So we believe this change is worth it.

For the old V1.5 data set that was used in the 2017 TREC CAR evaluation [2]:

  • trec-car-2017-qrels.tar.gz (NEW!) Manual ground truth assessments by NIST assessors. See Overview and README file in archive for details. (To use with v2.0 data, please use version from the v2.0 data release page)

  • benchmarkY1test-v1.5.tar.xz (NEW!) Automatic ground truth assessments and full articles for test set for Y1 benchmark. Includes outlines, articles, qrels. (Selected in an with the same process as benchmarkY1train.)

Use this dataset if you want to compare to results reported in the Overview Report [3]

When you use the TREC CAR data, please cite the dataset you use.


Laura Dietz, Ben Gamari. 
"TREC CAR 2.0: A Data Set for Complex Answer Retrieval". 
Version 2.0, 2018. 


Laura Dietz, Ben Gamari. 
"TREC CAR: A Data Set for Complex Answer Retrieval". 
Version 1.5, 2017. 

[1] http://trec-car.cs.unh.edu/datareleases/index.html

[2] http://trec-car.cs.unh.edu/datareleases/v1.5-release.html

[3] https://trec.nist.gov/pubs/trec26/papers/Overview-CAR.pdf

Let me know if you have any questions, or think you found any bugs.

Stay tuned for the new query-set for 2018.

Laura Dietz (thanks to Jeff and Ben for the translation of the qrel files!)

