Input/output data and system constraints

Mateusz Kopeć

unread,

Oct 1, 2012, 9:46:44 AM10/1/12

to semeval-2013-ws...@googlegroups.com

I want to clarify few things about the task description:

1. Are the input files to the system be exactly like: topics.txt, subTopics.txt, results.txt in the example? Especially, will we be given Wikipedia senses for given query, as in subTopics.txt?

2. How should the output look like? Like STRel.txt exactly? In task description is written: "... WSD/WSI must provide a score for each snippet in each cluster and must rank clusters according to their diversity.". Is it going to be assumed, that clusters are ranked simply by their order of appearance in the STRel.txt file, the same with order of links in each cluster? What evaluation procedure needs is not a score for each snippet in each cluster, but the order of them, am I right?

3. Are we going to be given search queries as they were given to the search engine, or should we derive them from the topics.txt file? (but wa "stephen_king" query "Stephen King" or "stephen king"?)

4. Are there any constraints about the system? For example, should it use only the information from text snippets to cluster results, or can it also fetch the original webpage the result is pointing to?

Roberto Navigli

unread,

Oct 3, 2012, 5:18:52 AM10/3/12

to semeval-2013-ws...@googlegroups.com

Dear Mateusz,

thanks for your interest in the task! Very good questions (we will update the Semeval page based on our answers).

2012/10/1 Mateusz Kopeć <mkop...@gmail.com>

I want to clarify few things about the task description:

1. Are the input files to the system be exactly like: topics.txt, subTopics.txt, results.txt in the example? Especially, will we be given Wikipedia senses for given query, as in subTopics.txt?

The only files which will be provided are: topics.txt and results.txt, because subTopics.txt is the sense inventory you should aim at inducing automatically.

2. How should the output look like? Like STRel.txt exactly? In task description is written: "... WSD/WSI must provide a score for each snippet in each cluster and must rank clusters according to their

diversity.". Is it going to be assumed, that clusters are ranked simply by their order of appearance in the STRel.txt file, the same with order of links in each cluster? What evaluation procedure needs is not a score for each snippet in each cluster, but the order of them, am I right?

The clusters have to be ranked so as to be as diversified as possible. For instance, providing the set of clusters (C1, C2, C3) and (C1, C3, C2) is different if, e.g., C1 and C2 contain better diversified snippets and C3 contains snippets similar to those in C1. The acceptable output format will be exactly that of STRel.txt. However, we are also working on allowing people to produce an output in the same format as previous Semeval WSI competitions. We will keep you posted on this.

3. Are we going to be given search queries as they were given to the search engine, or should we derive them from the topics.txt file? (but wa "stephen_king" query "Stephen King" or "stephen king"?)

Queries were formulated as "<query>" where <query> is exactly the query you find in topics.txt.

4. Are there any constraints about the system? For example, should it use only the information from text snippets to cluster results, or can it also fetch the original webpage the result is pointing to?

No specific constraint but the systems have to be unsupervised, i.e. they cannot use existing sense inventories. Yes, you can use snippets and the original webpage.

Note only that we will ask all participants to provide details about the resources they use in their system.

All the best,
Roberto Navigli

--

--
=====================================
Roberto Navigli
Dipartimento di Informatica
SAPIENZA Universita' di Roma
Via Salaria, 113 (now in: Viale Regina Elena 295)
00198 Roma Italy
Phone: +39 06 49255364 - Fax: +39 06 8541842
Home Page: http://wwwusers.di.uniroma1.it/~navigli
=====================================

Ted Pedersen

unread,

Jan 6, 2013, 7:19:29 AM1/6/13

to semeval-2013-ws...@googlegroups.com

Hi Roberto,

In the note below you mention perhaps working on an output format consistent with previous WSI semeval formats. Does that continue to be a possibility?

Also, has a date been set for the release of test data?

Cordially,

Ted

Roberto Navigli

unread,

Jan 7, 2013, 11:58:52 AM1/7/13

to semeval-2013-ws...@googlegroups.com

Hi Ted,

the format will be the same as in previous WSI semeval evaluations. The test set will be released as soon as the evaluation starts, so March 1st. It will be ready by February 15, though.

We are about to release the evaluation suite, so that participants can start to test their algorithms on the development set.

Do not hesitate to ask more questions should you need further information.

Best,

Roberto

2013/1/6 Ted Pedersen <dulu...@gmail.com>

Reply all

Reply to author

Forward