Call for Shared Task - Sixth Workshop on Indian Language Data: Resources and Evaluation (WILDRE-6)

19 views

Skip to first unread message

Akanksha Bansal

unread,

Feb 1, 2022, 12:54:33 AM2/1/22

to centre-for-linguistics-jnu, general-ling...@googlegroups.com, peerli...@googlegroups.com, researchg...@googlegroups.com

Apologies for cross-posting. You are requested to please circulate it for wider publicity.
......................................................................................................

Sixth Workshop on Indian Language Data: Resources and Evaluation (WILDRE-6)

Shared Tasks@LREC 2022

The Sixth Workshop on Indian Language Data: Resources and Evaluation (WILDRE-6) at LREC-2022 will include two shared tasks on (a) Speech Technologies for Under-resourced Indian Languages (SpeechTech-IL) and (b) Universal Dependency based Morpho-Syntactic Parsing in Indian Languages (UDParse-IL).

(a) Speech Technologies for Under-resourced Indian Languages (SpeechTech-IL)

Neural or deep learning techniques are currently being applied in state-of-the-art automated systems that report significant performance improvements, but typically require a large amount of high-quality data. However, in order to advance Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) systems for low resource languages, the zero-shot/unsupervised approach is one notable development in Neural learning that builds ASR/TTS systems for languages where the size of audio and/or transcribed speech data may be small or even non-existent. In this shared task, we will solicit participants to submit novel zero-shot (or similar methods) and/or linguistically-encoded features systems for under-resourced Indian languages. The goal will be to ascertain the effectiveness of the method implemented for language pairs as well as for unseen similar languages. The languages are Hindi, Odia, Marathi and Bhojpuri. In evaluation, participants will also get 2/3 surprise tests for closely-related languages. The system(s) will be evaluated using WER, precision, recall and F-score.

Shared Task Organizers

Atul Kr. Ojha, NUI Galway, Ireland and Panlingua Language Processing LLP

Kalika Bali, Microsoft Research India

Vivek Sheshadri, MSR, India

Esha Banerjee, Google USA

Sourabrta Mukherjee, Panlingua Language Processing LLP & Charles University, Prague

Swapnil, Goa University, Goa

Manu Chopra, Karya Inc.

(b) Universal Dependency based Morpho-Syntactic Parsing in Indian Languages (UDParse-IL)

The primary objective of the UDParse-IL task is to find notable techniques for developing universal dependency parsers, especially when a language is low-resourced. In this task, the participants will be provided with training, development and testing datasets annotated with dependency relations in 10 Indian Languages - Bhojpuri, Hindi (including Hindi-English code switched), Marathi, Sanskrit, Tamil, Telugu, Urdu, Punjabi, and Magahi - and we will solicit participants to submit systems based on novel zero/few-shot (or other cross-lingual and multilingual) similar methods for these low-resource Indian languages. All the languages included in this task, with the exception of Hindi and Urdu, don’t have more than 1,350 annotated sentences. The data of the first nine languages mentioned above will be shared by UFAL, Charles University from the Universal Dependencies (UD) repositories. We will provide test data and an evaluation platform to evaluate the participant's developed parsers. The parsers will be evaluated using LAS, UAS, precision, recall and F-score. One of the primary goals of the task is to ascertain the effectiveness of the implemented methods for unseen but closely-related languages, in addition to the languages for which the training dataset is being provided. In order to do this, the test data will include some surprise languages - the names of these surprise/unseen test languages will be revealed at the test time itself and a test set for these languages will be provided.

Shared Task Organizers

Atul Kr. Ojha, NUI Galway, Ireland and Panlingua Language Processing LLP

Ritesh Kumar, Agra University

Akanksha Bansal, Panlingua Language Processing LLP

Aryaman Arora, Georgetown University

Girish Nath Jha, JNU, New Delhi, India

Sobha L., AU-KBC, India

Shard Task Dates

Jan 31, 2022: Registration

Feb 09, 2022: Train and Validation Data set Release

March 17, 2022: Test Set Release

March 24, 2022: System Submission Due

April 08, 2022: System Results

April 18, 2022: System Description Paper Due

May 03, 2022: Paper notification of acceptance

May 23, 2022: Camera-ready papers due

Contact

For questions related to shared tasks (a) and (b), please send an email to wildre-sp...@googlegroups.com and wildre-u...@googlegroups.com respectively.

For urgent/specific queries on the workshop or shared tasks please contact Atul Kr. Ojha at atulkum...@insight-centre.org