Hi everybody,
thanks to all of you! I am sure you had a great time participating and
I am sure we all learned something from this experiment.
We would like to start a discussion about the scoring metrics used in
the task. Especially the 'oot normal' metric, which allows for
duplicates, and can therefore reach scores above 100. Some of the
systems utilized this, and therefore obtained high precision and
recall for out-of-ten, while some others did not. Additionally, some
systems did not supply 10 translations for oot and this put them at a
disadvantage. We will not of course change any of the official scores
but we would like to give the floor to any of you who have some
thoughts/analysis? We can discuss in this group and then any of us
might do further analysis for discussion.
We do hope that you can make the meeting at Uppsala and we are
thinking of carrying on the discussion there, perhaps over an informal
lunch
All thoughts, comments are welcome.
Ravi
--
You received this message because you are subscribed to the Google Groups "SemEval2010.CrossLingualLexicalSubstitution" group.
To post to this group, send email to
clls...@googlegroups.com.
To unsubscribe from this group, send email to
clls2010+u...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/clls2010?hl=en.