Confusing description of the shared task

22 views
Skip to first unread message

nick arefyev

unread,
Feb 20, 2024, 11:16:46 AM2/20/24
to AXOLOTL-24

Hello! Reading the description of the shared task, some formulations seems a bit confusing to me.

"The task is to find all usages of the target words belonging to newly gained senses, i.e., senses not covered by the provided sense inventory."

contradicts with

" Predictions: sense id for every word usage of the second time period (either re-using an id from the provided dictionary or adding a novel one)."

Also it is confusing how the description of the inputs correspond to the published train/dev sets. "Inputs: a set of target words, two sets of usages for each target word (a usage is a text fragment containing a target word); target word dictionary entries with sense ids for the first of two time periods." In the published data:

1) Examples sometimes don't contain the specified target word as part of them: e.g. dev_ru_4, train_fi_70.

2) In the Russian data there are lines with the empty string in the "Example" field, e.g. dev_ru_1, while in the Finnish data there no such lines.

What should we return for the empty examples or examples that don't contain the target word ? It seems impossible to determine the correct sense id/gloss if there is actually no usage of the target word.

Could you please let us know which of those are intended and which are a bug and what is the logic behind that?




Andrey Kutuzov

unread,
Feb 20, 2024, 11:59:18 AM2/20/24
to axolo...@googlegroups.com
Hi Nick,

Thanks for your questions and for your interest in the AXOLOTL'24 shared
task!

1) In Subtask 1, the participants have to assign a sense id to _every_
target word usage in the second time period (be it an old sense or a
novel one). The description put a special emphasis on novel senses,
since they are arguably more interesting linguistically. However, the
participants still have to predict a sense for all the usages.
Thanks for your feedback, we have updated the description text to make
it more clear.

2) Historical data is not 100% clean. Because of that, some instances
might not contain the target word in the "example" field, or the
"example" might be empty. It reflects then noisiness of natural language.

Of course, in the test data, all the "new" usages will feature clean
examples containing the target word; the instances you mentioned are all
from the "old" time period, so you don't have to predict anything for
them. If you spot usages without examples from the "new" time period in
the train or dev test sets, please send us a message, we will fix the issue.

3) Please have a look at the example we created to explain the essence
of AXOLOTL'24:
https://github.com/ltgoslo/axolotl24_shared_task/blob/main/example.md

On 20.02.2024 17:16, nick arefyev wrote:
> Hello! Reading the description of the shared task, some formulations
> seems a bit confusing to me.
>
> "The task is to find all usages of the target words belonging to *newly
> gained senses*, i.e., senses not covered by the provided sense inventory."
>
> contradicts with
>
> " *Predictions*: sense id for every word usage of the *second* time
> period (either re-using an id from the provided dictionary or adding a
> novel one)."
>
> Also it is confusing how the description of the inputs correspond to the
> published train/dev sets. "*Inputs*: a set of target words, two sets of
> usages for each target word (a usage is a text fragment containing a
> target word); target word dictionary entries with sense ids for the
> *first* of two time periods." In the published data:
>
> 1) Examples sometimes don't contain the specified target word as part of
> them: e.g. dev_ru_4, train_fi_70.
>
> 2) In the Russian data there are lines with the empty string in the
> "Example" field, e.g. dev_ru_1, while in the Finnish data there no such
> lines.
>
> What should we return for the empty examples or examples that don't
> contain the target word ? It seems impossible to determine the correct
> sense id/gloss if there is actually no usage of the target word.
>
> Could you please let us know which of those are intended and which are a
> bug and what is the logic behind that?
>
>
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "AXOLOTL-24" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to axolotl-24+...@googlegroups.com
> <mailto:axolotl-24+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/axolotl-24/724d9769-3b24-4dc3-8a5e-1150c69cfdb2n%40googlegroups.com <https://groups.google.com/d/msgid/axolotl-24/724d9769-3b24-4dc3-8a5e-1150c69cfdb2n%40googlegroups.com?utm_medium=email&utm_source=footer>.

--
Andrey
Language Technology Group (LTG)
University of Oslo

Reply all
Reply to author
Forward
0 new messages