On 07/28/2018 11:48 AM, Rodrigo Nogueira wrote:
> Hello TREC CAR organizers,
> Could you please clarify some questions regarding the dataset and the
We only provide training qrels for the Y1 benchmarks -- this includes
the train, the benchmarkY1train, the benchmarkY1test, and test200.
Note that v2.x versus v1.x does not refer to year 1 or year 2 -- it just
refers to the version of our conversion pipeline, so you know that these
are compatible with one another.
When I mean "Y1 benchmark", I refer to the set of
topics/queries/outlines in the respective benchmarks.
You could have derived tree.qrels yourself -- but I figured it may help
to have training data to match the task.
> 2. Could you explain why someone would use
> "unprocessedAllButBenchmark.v2.1.tar.xz" if the correct paragraphs are
> in "paragraphCorpus.v2.0.tar.xz"? Maybe I'm missing preprocessing or
> extra information that I should include while retrieving the paragraphs?
These two datasets are complementary:
You are supposed to retrieve passages from paragraphCorpus.
You are supposed to retrieve entities that are (1) linked in
paragraphCorpus, (2) have an entries in allButBenchmark, or (3) are
linked in the allButBenchmark collection.
The allButbenchmark is offered to teams who would like to build a
knowledge graphs. Note tha allButBenchmark will be missing paragraphs
that are relevant under the automatic tree/hierarchical qrels.
> 3. If the tree qrels will be used in the evaluation, why there are
> hierarchical, top-level and article qrels? I found quite confusing
> which one to use to train my models by reading the website.
Hierarchical, toplevel and article are provided for backwards
compatibility. I will try to state this clearly on the website.
> 6. Since the evaluation is changing from hierarchical to tree qrels,
> could you postpone the deadline until the end of August? It takes
> quite some time (1-2 weeks) to train the models :)
The deadline is not in my choice, but set by NIST. The deadline is more
than 2 weeks away.
You had a number of complaints about the way information is
disseminated. The intention is that the website is all you need to know.
If this is not the case, it would help to have concrete feedback which
information is missing or unclear.
If you have any general questions about information retrieval, please ask.