I want to clarify few things about the task description:
1. Are the input files to the system be exactly like: topics.txt, subTopics.txt, results.txt in the example? Especially, will we be given Wikipedia senses for given query, as in subTopics.txt?
2. How should the output look like? Like STRel.txt exactly? In task description is written: "... WSD/WSI must provide a score for each snippet in each cluster and must rank clusters according to their diversity.". Is it going to be assumed, that clusters are ranked simply by their order of appearance in the STRel.txt file, the same with order of links in each cluster? What evaluation procedure needs is not a score for each snippet in each cluster, but the order of them, am I right?
3. Are we going to be given search queries as they were given to the search engine, or should we derive them from the topics.txt file? (but wa "stephen_king" query "Stephen King" or "stephen king"?)
4. Are there any constraints about the system? For example, should it use only the information from text snippets to cluster results, or can it also fetch the original webpage the result is pointing to?