benchmarkY3train vs benchmarkY2test --- and the mythical compatibility file

27 views
Skip to first unread message

Laura Dietz

unread,
Jun 28, 2019, 9:43:45 AM6/28/19
to trec...@googlegroups.com

Dear TREC CAR participants,

This question came up, and I want to share the answer with you.



I am working with benchmarkY3train dataset and I have some confusion regarding the queries. I am using benchmarkY3train.cbor-outlines.cbor for retrieving the queries and as discussed I would have to map the id with the benchmarkY3-Y2.json to get the queries. The mapping json is available for train but it is not present for benchmarkY3test, so does that mean that just using the benchmarkY3train.cbor-outlines.cbor would be sufficient?

in order to produce a ranking, you only need the outline file --- as before. It contains the queries (titles and headings) and defines the query ids.

benchmarkY3train does not come with qrels -- but it comes with gold articles. You can choose to train on the date that you think are most appropriate for your model.


If you want to train your system, I recommend to use one of benchmarkY1train/benchmarkY1test/benchmarkY2test (both queries and qrels)


Usually you should not need use the compatibility file aside from when your training makes use of two pieces of information, one only available in benchmarkY2test and the other only available in benchmarkY3train.

To clarify:
- benchmarkY3train topics are IR-friendly reformulations of a subset of benchmarkY2test topics
- benchmarkY3train topics have a set of gold keywords associated
- benchmarkY3train topics have gold articles  (but no qrels)
- benchmarkY2test topics have qrels  (auto/manual and entity/passage)
- benchmarkY2test topics have gold articles of lower quality (i.e.,  missing introductions.)


benchmarkY3test queries are brand new queries, they are not part of the Y2 benchmark.

Best,
Laura

Reply all
Reply to author
Forward
0 new messages