Request to Merge Japanese Dataset into UniMorph

14 views
Skip to first unread message

松﨑孝介

unread,
May 31, 2024, 3:08:08 AM5/31/24
to unim...@googlegroups.com, masaya.t...@riken.jp, keisuke....@tohoku.ac.jp, kentar...@tohoku.ac.jp
Dear UniMorph Organizers,

I hope this message finds you well.
I am Kosuke Matsuzaki from Tohoku University in Japan.

I am writing to inform you that we have created a Japanese dataset in UniMorph. We believe it will be a valuable addition to the UniMorph project, and we would like to merge it with the main repository.

In addition, the existing Japanese dataset in UniMorph (automatically extracted from wiktionary) is currently registered with the language code "jap." However, this code is considered outdated and potentially offensive in some context (https://en.wikipedia.org/wiki/Jap). Furthermore, changing the code to "jpn" would align it with the ISO 639-2 standard, ensuring consistency and accuracy in language representation.

I am also pleased to share that I will be presenting our dataset at the upcoming 21st SIGMORPHON Workshop at NAACL 2024.

Our dataset is already publicly available at the following URL: https://github.com/cl-tohoku/J-UniMorph.

Would it be okay to proceed with a pull request to integrate these updates?
Please let me know if there are any specific guidelines or requirements I should follow.

Thank you for your time and consideration.

Best regards,
Kosuke MATSUZAKI
Tohoku University, Japan

David Yarowsky

unread,
May 31, 2024, 8:21:47 AM5/31/24
to 松﨑孝介, unim...@googlegroups.com, masaya.t...@riken.jp, keisuke....@tohoku.ac.jp, kentar...@tohoku.ac.jp
Dear J-UniMorph organizers,

First of all, please let me deeply apologize for the very unfortunate erroneous language-code usage on the 2023 inflection shared task TBA language list. I'm not sure how this error was introduced, but the erroneous code was not used elsewhere in the UniMorph project and clearly it is appropriate to use the correct ISO-639-3 code 'jpn' in the full Unicode repository.

Second, I've reviewed both your dataset and arxiv paper and applaud your excellent work. For many reasons it would be very welcome to incorporate this long-overdue contribution to UniMorph and you are very welcome to proceed with a pull request.

Our warmest appreciation,

          David Yarowsky
          Professor, Department of Computer Science
          Johns Hopkins University





--
You received this message because you are subscribed to the Google Groups "unimorph" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unimorph+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unimorph/CAMb1tsTdN1-SOAb3RWrS%3DGf8k_%2BrV1FAOhLDeE5xNxXP1nCb8w%40mail.gmail.com.

松﨑孝介

unread,
Jun 5, 2024, 6:55:43 AM6/5/24
to yaro...@gmail.com, keisuke....@tohoku.ac.jp, masaya.t...@riken.jp, kentar...@tohoku.ac.jp, unim...@googlegroups.com
Dear David,

Thank you for allowing us to proceed with the pull request.

I noticed there was not a repository for Japanese (jpn), so I created one. However, when I attempted to transfer it, I received an error message saying, "You don't have the permission to create public repositories on unimorph."

Could you please grant me the necessary permissions to create and transfer the repository, or alternatively, could you create the repository on unimorph so that I can transfer my work there?

In order to continue contributing and making necessary improvements to the dataset, I kindly request to be registered as a maintainer for this repository. This will allow me to assist more effectively in ensuring the repository remains up-to-date and accurate.

Thank you for considering my request. I look forward to your positive response.

Best regards,
Kosuke MATSUZAKI
Tohoku University, Japan


2024年5月31日(金) 22:56 Keisuke Sakaguchi <keisuke....@tohoku.ac.jp>:
Hi David,

It's great to hear from you. Thank you very much for your prompt response and for allowing us to proceed with the pull request. 
We appreciate your support and look forward to contributing to the UniMorph project.

Best,
Keisuke


Arya McCarthy

unread,
Jun 13, 2024, 10:24:46 PM6/13/24
to 松﨑孝介, David Yarowsky, keisuke....@tohoku.ac.jp, masaya.t...@riken.jp, kentar...@tohoku.ac.jp, unim...@googlegroups.com
Hi Kosuke,

Thank you for writing. I’ve created the `jpn` repository and added you as a maintainer. (If I identified the wrong person, please let me know.) Thanks for your contributions!


Cheers,
Arya

Reply all
Reply to author
Forward
0 new messages