Hi Nikolay,
Thanks for the feedback!
Indeed, the`test_ru_1126` instance missed the correct example by
mistake. We are sorry for that. It is fixed now in the Github repository.
As for the Russian test instances from the "old" time period you
mentioned, most of them actually do contain the target word, just in an
archaic or a derivative form. We are here sticking to our dataset
source. However, 4 of the instances from your list are indeed not
entirely correct:
test_ru_228
test_ru_568
test_ru_1867
test_ru_1965
Note that the examples from the "old" time period are not strictly
necessary for AXOLOTL'24, and can even be empty. Still, we decided to
fix them.
To sum up, we updated the contents of the `example` field in 1 "new" and
4 "old" instances out of 2126 total in the Russian test set. We believe
it will not influence the evaluation scores significantly, but we still
encourage all the participants to re-produce their predictions on the
updated version of the dataset:
https://github.com/ltgoslo/axolotl24_shared_task/blob/main/data/test/axolotl.test.ru.tsv
Thanks again for your feedback, we will get in touch regarding the train
and development sets after the shared task is over.
On 08.04.2024 04:20, nick arefyev wrote:
> Looking at the test set in Russian , one example looks broken:
>
> test_ru_1126: the example is "ъ", does not contain the target word "королёк"
>
> Looks like the task doesn't make sense for this example.
> Also some old usages from the test set don't contain target words
> announcedhavehavehavehavehavehavehavehave. Here I attach the file with
> examples where our indexer could not find usages of the target word
> specified. Some of them are errors of the indexer, but lots of them
> really are dataset errors. Hopefully, this will help you to improve
> quality of your dataset. Let me know if you want to collaborate on that,
> I can build similar files for dev/train sets.
>
> On Monday, April 8, 2024 at 3:35:42 AM UTC+2 nick arefyev wrote:
>
> Looking at the test set in Russian , one example looks broken:
>
> test_ru_1126: the example is "ъ", does not contain the target word
> "королёк"
>
> Looks like the task doesn't make sense for this example.
>
> --
> You received this message because you are subscribed to the Google
> Groups "AXOLOTL-24" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
axolotl-24+...@googlegroups.com
> <mailto:
axolotl-24+...@googlegroups.com>.
> To view this discussion on the web visit
>
https://groups.google.com/d/msgid/axolotl-24/3d4db97f-afe2-40c1-a688-788bfc66f84cn%40googlegroups.com <
https://groups.google.com/d/msgid/axolotl-24/3d4db97f-afe2-40c1-a688-788bfc66f84cn%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
Andrey
Language Technology Group (LTG)
University of Oslo