Call for participation-Machine Translation Shared Task-DravidianLangTech-EACL 2021

3 views
Skip to first unread message

Bharathi Raja Asoka Chakravarthi

unread,
Nov 30, 2020, 2:45:46 AM11/30/20
to SIGMT

DravidianLangTech-2021

------------------------------------------
Call for Participation
--------------------------------------------

First Workshop on Speech and Language Technologies for Dravidian Languages.

 at EACL 19th-20th April 2021

 Workshop Websitehttps://dravidianlangtech.github.io/2021/index.html

 ----------------------------------------------

 

The development of technology increases our internet use and most of the world's languages adopt it. Whereas, the local or under-resourced languages still pose challenges as they still lack effective technology developments in their languages [1] one such language family are Dravidian languages. Dravidian languages are majorly spoken in South India and Sri Lanka. Pockets of speakers are found in Nepal, Pakistan, Malaysia other parts of India and elsewhere in the world. Even though Dravidian languages are 4,500 years old [2] still they are under-resourced in speech and natural language processing [1]. The Dravidian languages are divided into four groups: South, South-Central, Central, and North groups. Dravidian morphology is agglutinating and exclusively suffixal. Syntactically, Dravidian languages are head-final and left-branching. They are free-constituent order languages. To improve access to and production of information for monolingual speakers of Dravidian languages, it is necessary to have speech and languages technologies. The aim of these workshops is to save the Dravidian languages from extinction in technology. This is the first workshop on speech and language technologies for Dravidian languages.


The broader objective of DravidianLangTech-2021 will be 

  • To investigate challenges related to speech and language resource creation for Dravidian languages.
  • To promote research in speech and language technology in Dravidian languages. 
  • To adopt appropriate language technology models which suit Dravidian languages
  • To provide opportunities for researchers from the Dravidian language community from around the world to collaborate with other researchers.
DravidianLangTech-2021 welcomes theoretical and practical paper submission on any Dravidian languages (Tamil, Kannada, Malayalam, Telugu, Tulu, Allar,  Aranadan,  Attapadya,   Kurumba,  Badaga,  Beary,  Betta Kurumba,   Bharia,  Bishavan,  Brahui,  Chenchu,  Duruwa,  Eravallan,  Gondi,  Holiya,  Irula,  Jeseri,  Kadar,  Kaikadi,  Kalanadi,  Kanikkaran,  Khiwar,  Kodava,  Kolami,  Konda,  Koraga,  Kota,  Koya,  Kurambhag Paharia, Kui,  Kumbaran,  Kunduvadi,  Kurichiya,  Kurukh,  Kurumba,  Kuvi,  Madiya,  Mala Malasar,  Malankuravan,  Malapandaram,  Malasar,  Malto,  Manda,  Muduga,  Mullu Kurumba, Muria, Muthuvan, Naiki, Ollari, Paliyan, Paniya, Pardhan, Pathiya, Pattapu, Pengo, Ravula, Sholaga, Thachanadan, Toda, Wayanad Chetti, and Yerukala) that contributes to research in language processing, speech technologies or resources for the same. We will particularly encourage studies that address either practical application or improving resources for a given language in the field.

We invite submissions on topics that include, but are not limited to, the following:


  • Code-mixing/Code-switching

  • Cognitive Modeling and Psycholinguistics

  • Computer-assisted language learning (CALL)

  • Corpus development, tools, analysis and evaluation

  • COVID-19 alert, NLP Applications for Emergency Situations and Crisis Management 

  • Equality, Diversity, and Inclusion

  • Fake News, Spam, and Rumor Detection

  • Hate speech detection and Offensive Language Detection

  • Lexicons and Machine-readable dictionaries

  • Linguistic Theories, Phonology, Morphological analysis, Syntax and Semantics

  • Machine Translation, Sentiment Analysis, and Text summarization

  • Multimodal Analysis

  • Speech technology and Automatic Speech Recognition

--------------------------------------------
Invited Speakers
-------------------------------------------- 

Vasu Renganathan, Department of South Asia Studies, University of Pennsylvania, Philadelphia (Confirmation Awaited)

John Phillip McCrae is lecturer above-the-bar at the Data Science Institute and Insight Centre for Data Analytics at the National University of Ireland Galway and the leader of the Unit for Linguistic Data.  The topic of the talk: “Under-resourced Languages  
--------------------------------------------
Important Dates
-------------------------------------------- 

Jan 18th    -  Workshop Paper Due
Feb 18th    -  Notification of Acceptance

March 1st    -  Camera-Ready Paper Due
April 19-20  -  Workshop Dates


Submission can be made through this link.

 ------------------------------------------
Shared Tasks
-------------------------------------------- 

Shared Task on Machine Translation in Dravidian languages (https://competitions.codalab.org/competitions/27650)

The development of technology increases our internet use and most of the world's languages adapt to it. Whereas the local or under-resourced languages still pose challenges as they still lack effective technology developments in their languages [1], one such language family is the Dravidian family of languages. Dravidian languages are majorly spoken in South India and in small pockets at Nepal, Pakistan, Sri Lanka and a few other places in South Asia.  To improve access to and production of information for monolingual speakers of Dravidian languages, it is necessary to have machine translation.  Evaluation of this shared task will be carried out using automatic evaluation metrics and human evaluation. Look at Evaluation tabs for more details.

Offensive language identification is a classification task in natural language processing (NLP) where the aim is to moderate and minimise offensive content in social media. The goal of this task is to identify offensive language content of the code-mixed dataset of comments/posts in Dravidian Languages ( (Tamil-English, Malayalam-English, and Kannada-English)) collected from social media. The comment/post may contain more than one sentence but the average sentence length of the corpora are 1. Each comment/post is annotated at the comment/post level. This dataset also has class imbalance problems depicting real-world scenarios.

Traditional media such as television, radio and newspaper are monitored and scrutinized for their content. However, social media platforms facilitate internet users to interact and contribute to their online community without any moderation. Although most of the time, these internet users are harmless, some tend to produce offensive content due to anonymity and freedom provided by social networks. Memes have become an integrated part of online communication due to the ability to self-replicate and propagate across cultures. Most of these memes tend to be funny, but sometimes they might cross their limit to become offensive to specific individuals or groups, such memes could be referred to as troll memes. The use case of this shared task is to bring people to discuss these issues of trolling and study the problem to solve it systematically.

--------------------------------------------  

Workshop contact:

dravidia...@gmail.com and bharathi...@gmail.com

------------------------------------------------------
Workshop Organizers
----------------------------------------------
Bharathi Raja Chakravarthi, Insight SFI Research Centre for Data Analytics, Data Science Institute, National University of Ireland Galway. 

Ruba Priyadharsini,  ULTRA Arts and Science College, Madurai.   

Anand Kumar M, National Institute of Technology Karnataka Surathkal, India.

Parameshwari K, Centre for Applied Linguistics and Translation Studies, University of Hyderabad, India.

Elizabeth Sherly, Indian Institute of Information Technology and Management Kerala, India. 



Reply all
Reply to author
Forward
0 new messages