updates to Russian datasets

21 views
Skip to first unread message

Andrey Kutuzov

unread,
Mar 9, 2024, 12:40:24 PM3/9/24
to AXOLOTL-24
Dear AXOLOTL'24 participants,

We have significantly updated training and development datasets for
Russian. In particular, we:
1) fixed a bug that could lead to some words being omitted from
definitions and examples
2) standardized quotation marks
3) removed stress marks
4) did other minor cleanup tasks

Please make sure to always use the latest version of the AXOLOTL datasets:
https://github.com/ltgoslo/axolotl24_shared_task/tree/main/data

Feel free to use this mailing list to ask any questions about the shared
task.
We hope to make AXOLOTL'24 a great success and a worthy addition to the
LChange'24 workshop.

On behalf of other organizers:

--
Andrey
Language Technology Group (LTG)
University of Oslo

Mariia Fedorova

unread,
Mar 9, 2024, 3:16:50 PM3/9/24
to AXOLOTL-24

Dear AXOLOTL'24 participants,


we recommend to git pull  https://github.com/ltgoslo/axolotl24_shared_task once again, because of just another important bug 🐞 fixed in the data.


With kind regards,

Maria F.


From: axolo...@googlegroups.com <axolo...@googlegroups.com> on behalf of Andrey Kutuzov <and...@ifi.uio.no>
Sent: 09 March 2024 18:40:19
To: AXOLOTL-24
Subject: [axolotl] updates to Russian datasets
 
--
You received this message because you are subscribed to the Google Groups "AXOLOTL-24" group.
To unsubscribe from this group and stop receiving emails from it, send an email to axolotl-24+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/axolotl-24/181e0f16-1fb6-475c-b75e-757619987e3d%40ifi.uio.no.

Andrey Kutuzov

unread,
Mar 13, 2024, 6:57:29 PM3/13/24
to axolo...@googlegroups.com
Dear AXOLOTL'24 participants,

We updated the training and development Russian datasets once again,
making them cleaner and more consistent.

Please make sure to always use the latest version of the AXOLOTL datasets:
https://github.com/ltgoslo/axolotl24_shared_task/tree/main/data

Most important changes:
1) Old XIX century spelling in definitions and examples is
(automatically) converted to the modern Russian orthography. That is,
for example, "Нѣтъ, по струнамъ" is now transformed into "Нет, по
струнам". We hope it will make the whole task more focused on semantics,
instead of dealing with the peculiarities of historical orthography.
2) Some erroneous sense ids are fixed
3) Lots of OCR and parsing errors in definitions and usage examples are
manually fixed
4) Redundant instances removed

The same changes were applied to the held-out test set, so there will be
no surprises when we publish it on March 25.

This week, we will finalized our Codalab instance (where you will submit
your test set predictions) and announce it in this mailing list.

Good luck!


On 09.03.2024 21:16, Mariia Fedorova wrote:
> Dear AXOLOTL'24 participants,
>
>
> we recommend to git pull
> https://github.com/ltgoslo/axolotl24_shared_task
> <https://github.com/ltgoslo/axolotl24_shared_task> once again, because
> of just another important bug 🐞 fixed in the data.
>
>
> With kind regards,
>
> Maria F.
>
> ------------------------------------------------------------------------
> *From:* axolo...@googlegroups.com <axolo...@googlegroups.com> on
> behalf of Andrey Kutuzov <and...@ifi.uio.no>
> *Sent:* 09 March 2024 18:40:19
> *To:* AXOLOTL-24
> *Subject:* [axolotl] updates to Russian datasets
> https://groups.google.com/d/msgid/axolotl-24/181e0f16-1fb6-475c-b75e-757619987e3d%40ifi.uio.no <https://groups.google.com/d/msgid/axolotl-24/181e0f16-1fb6-475c-b75e-757619987e3d%40ifi.uio.no>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "AXOLOTL-24" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to axolotl-24+...@googlegroups.com
> <mailto:axolotl-24+...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/axolotl-24/8b685c1d763445d5b5a19f68deed334a%40ifi.uio.no <https://groups.google.com/d/msgid/axolotl-24/8b685c1d763445d5b5a19f68deed334a%40ifi.uio.no?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages