1111 HOURS HINDI ASR CHALLENGE 2022

294 views
Skip to first unread message

Sathvik Udupa

unread,
Feb 13, 2022, 10:17:08 PM2/13/22
to Machine Learning News

1111 HOURS HINDI ASR CHALLENGE 2022

A challenge on Automatic Speech Recognition for Hindi is being organised as part of INTERSPEECH 2022 Low-Resource ASR Development Special Session. We will be sharing spontaneous telephone speech recordings collected by a social technology enterprise Gram Vaani. The regional variations of Hindi together with the spontaneity of speech, natural background and transcriptions with varying degrees of accuracy due to crowdsourcing make it a unique corpus for automatic recognition of spontaneous telephone speech.

More information is available at https://sites.google.com/view/gramvaaniasrchallenge/

Dataset

The data set comprises telephone quality speech data in Hindi from all across India. We will be releasing 1000 hours of unlabelled data and 105 hours of labelled speech data through this challenge. The details of the data sets released for this challenge are as follows:

1) Train set - 100 hours  (labelled)

2) Development set - 5 hours (labelled)

3) 1000 hours of unlabelled data

Gram Vaani data has .mp3 files with a mix of sampling  rates ranging from 8KHz to 48 KHz for both labelled 100 hours of data & unlabelled 1000 hours of data. Data can be accessed after registering here.

Challenge Overview:

There are 3 types of challenges

1) Closed Challenge - Participants can use only the Gram Vaani 100 hours Train dataset and Gram Vaani 5 hours Development dataset for training models (Both acoustic and language models).

2) Self Supervised Closed Challenge - Participants can use only the Gram Vaani 1000 hours, Gram Vaani 100 hours Train dataset and Gram Vaani 5 hours Development dataset for training models (Both acoustic and language models).

3) Open Challenge - Participants can use any external/additional dataset for training models (Both acoustic and language models).

Important Dates

  • Release of training data: 1 February

  • Evaluation data release: 7 March

  • Closing of submission site: To be decided

  • Announcement of results: To be decided

Organisers


Please share this message with any interested colleagues.


Reply all
Reply to author
Forward
0 new messages