Dear TREC CAR participants,
This question came up, and I want to share the answer with you.
I am working with
benchmarkY3train dataset and I have some confusion regarding
the queries. I am using benchmarkY3train.cbor-outlines.cbor
for retrieving the queries and as discussed I would have to
map the id with the benchmarkY3-Y2.json to get the queries.
The mapping json is available for train but it is not present
for benchmarkY3test, so does that mean that just using the
benchmarkY3train.cbor-outlines.cbor would be sufficient?
in order to produce a ranking, you only
need the outline file --- as before. It contains the queries
(titles and headings) and defines the query ids.
benchmarkY3train does not come with
qrels -- but it comes with gold articles. You can choose to train
on the date that you think are most appropriate for your model.
If you want to train your system, I
recommend to use one of
benchmarkY1train/benchmarkY1test/benchmarkY2test (both queries and
qrels)
Usually you should not need use the
compatibility file aside from when your training makes use of two
pieces of information, one only available in benchmarkY2test and
the other only available in benchmarkY3train.
To clarify:
- benchmarkY3train topics are IR-friendly reformulations of a subset
of benchmarkY2test topics
- benchmarkY3train topics have a set of
gold keywords associated
- benchmarkY3train topics have gold
articles (but no qrels)
- benchmarkY2test topics have qrels
(auto/manual and entity/passage)
- benchmarkY2test topics have gold
articles of lower quality (i.e., missing introductions.)
benchmarkY3test queries are brand new
queries, they are not part of the Y2 benchmark.
Best,
Laura