Wrong target words

12 views
Skip to first unread message

Mike Kuklin

unread,
Mar 26, 2024, 2:12:22 PM3/26/24
to AXOLOTL-24
Good evening!

We have found that finnish datasets, including test dataset, have problems with target words. For example: some of them start with symbols like "[", "(", and end with ",". Could you please fix that?

Kuklin Mike


Screenshot from 2024-03-26 21-07-06.pngScreenshot from 2024-03-26 21-08-41.png

Timothee Mickus

unread,
Mar 27, 2024, 1:35:27 AM3/27/24
to AXOLOTL-24
Hi!

As noted previously, the orth column and the associated indices are regarded as non-essential for the shared task. They are provided as is: we assumed this extra information can potentially prove useful to participants but thoroughly checking the contents was not deemed a high priority for AXOLOTL-24. Participants are obviously welcome to do further preprocessing if they feel this is relevant.

As for Finnish specifically:
1. some of the punctuation marks in the target words within the examples of usage are intentional: they can indicate missing or reconstructed segments of text. We therefore decided against removing them globally. While commas, periods, colons and semicolons can likely be stripped safely, it would be more reasonable to preserve quotes, square brackets and hyphens.
2. as mentioned earlier, the information was not and will not be verified for the train set, owing to the time costs required to fix the automatic alignment.

Best

On behalf of other organizers,
Timothee Mickus
Reply all
Reply to author
Forward
0 new messages