Article Titles in BenchmarkY2

48 views
Skip to first unread message

Widad Machmouchi

unread,
Aug 3, 2018, 1:10:11 PM8/3/18
to TREC Car
Hello, 

I noticed that articles titles are not in the benchmarkY2.topics file but are the only ones available in benchmarkY2.titles. For passage ranking, can we just use all the headings as available from iterating over benchmarkY2.cbor-outline.cbor? Also, for passage retrieval, we are supposed to use paragraphCorpus.v2.0.tar.xz, correct? 

Thanks, 

Widad

Laura Dietz

unread,
Aug 3, 2018, 1:28:08 PM8/3/18
to trec...@googlegroups.com
Dear Wilad,

Yes! To create rankings, please iterate over all headings in the benchmarkY2.cbor-outline.cbor (including internal headings)

The *.topics file is intended for validation. It works with the validation script (download from trec-car site)

I followed TREC terminology where topic = query, and therefore *topics indicated all the things for which a ranking is to be created.
The *titles is just the list of article titles.

Best,
Laura
--
You received this message because you are subscribed to the Google Groups "TREC Car" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trec-car+u...@googlegroups.com.
To post to this group, send email to trec...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trec-car/143b3295-de71-415d-bc1a-9b97f008d649%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Widad Machmouchi

unread,
Aug 6, 2018, 8:32:23 PM8/6/18
to TREC Car
Thanks Laura. Can you also please comment on this: for passage retrieval, we are supposed to use paragraphCorpus.v2.0.tar.xz, correct? Or is there a smaller paragraph corpus to work with?

Widad

On Friday, August 3, 2018 at 10:28:08 AM UTC-7, Laura Dietz wrote:
Dear Wilad,

Yes! To create rankings, please iterate over all headings in the benchmarkY2.cbor-outline.cbor (including internal headings)

The *.topics file is intended for validation. It works with the validation script (download from trec-car site)

I followed TREC terminology where topic = query, and therefore *topics indicated all the things for which a ranking is to be created.
The *titles is just the list of article titles.

Best,
Laura


On 08/03/2018 01:10 PM, Widad Machmouchi wrote:
Hello, 

I noticed that articles titles are not in the benchmarkY2.topics file but are the only ones available in benchmarkY2.titles. For passage ranking, can we just use all the headings as available from iterating over benchmarkY2.cbor-outline.cbor? Also, for passage retrieval, we are supposed to use paragraphCorpus.v2.0.tar.xz, correct? 

Thanks, 

Widad
--
You received this message because you are subscribed to the Google Groups "TREC Car" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trec-car+unsubscribe@googlegroups.com.

Laura Dietz

unread,
Aug 6, 2018, 10:08:52 PM8/6/18
to trec...@googlegroups.com
Hi Widad,

yes, you retrieve paragraphs from the paragraphCorpus.

By IR standards this is not a large collection. I am not sure what you are hoping a "smaller collection" would do.

If you only want paragraphs that are relevant for, say, benchmarkY1train, then you find those in the benchmarkY1-train archive as *paragraphs.cbor.  But these will not be available for the benchmarkY2test queries used in the evaluation.

Best,
Laura
To unsubscribe from this group and stop receiving emails from it, send an email to trec-car+u...@googlegroups.com.

To post to this group, send email to trec...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages