Fwd: Biggest Common Voice dataset yet!

6 views
Skip to first unread message

Muthu A

unread,
May 1, 2022, 2:29:00 PM5/1/22
to ThamiZha! - Free Tamil Computing(FTC)

வணக்கம்:
மொசில்லா நிறுவனம், நமது தமிழ் தன்னார்வலர்கள், விக்கி/கணியம், மொசில்லா தமிழ் நாடு நண்பர்கள் பணியின்பால், 700GB மேலான தமிழ் தகவல் ஒலி-உரை தரவமைப்புகள் இங்கு பொது உரிமத்தில் / வெளியில் தரப்பட்டுள்ளன  https://commonvoice.mozilla.org/ta/datasets
இதனை பயன்செய்து செயற்கையறிவு / கற்கும் கருவிகள் உறுவாககலாம்.
Screen Shot 2022-05-01 at 11.24.42 AM.png
நன்றி
-முத்து

---------- Forwarded message ---------
அனுப்புநர்: Hillary Juma, Mozilla <moz...@email.mozilla.org>
Date: வியா., 28 ஏப்., 2022, முற்பகல் 10:28
Subject: Biggest Common Voice dataset yet!
To: <ezhi...@gmail.com>


Your incredible contributions and community activities have made this latest version of the Common Voice Dataset possible.‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ ͏‌ 

 * * * Mozilla * * *
Common Voice 9 has just been released
     
  Hello Common Voice Contributor!

We are excited to announce the second dataset release in 2022 — Common Voice 9!

Your incredible contributions and community activities have made this latest version of the Common Voice Dataset possible. You can download the Common Voice dataset here for free. The dataset is now more than 20,000 hours!

It has doubled this year, and has 94 languages — the most diverse multilingual speech corpus in the world.
 
  DOWNLOAD THE DATASET  
   

Dataset Highlights

Twenty seven languages now have at least 100 hours of speech data. They include Bengali, Thai, Basque, and Frisian.

Nine languages now have at least 45% of their gender tags as female. They include Marathi, Dhivehi, and Luganda.

We’re excited to welcome the languages of Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona and Cantonese to the dataset.

We would also like to congratulate Igbo, Catalan, Urdu, Norwegian Nynorsk and Marathi communities for their amazing dataset growth.

What could I do next?

Care about tech being more inclusive?

Share via your social media: I am a #CommonVoice contributor, we are making voice technology better for languages spoken across the world. Join us by visiting commonvoice.mozilla.org

Already using the Common Voice dataset?

Let us know what you're building via social media using #CommonVoice hashtag or Community Discourse.


Hillary Juma
Common Voice Community Manager
Mozilla Foundation


 

 
 
 
Connect with us
YouTube Twitter Instagram


Thanks for reading!
You're receiving this email because we think you're neat, AND you subscribed to hear from us. If our emails aren't sparking joy, we'll understand if you unsubscribe.

You can also update your email preferences at any time.


Mozilla
Donate to Mozilla  |  Download Firefox

2 Harrison St. #175, San Francisco, CA 94105
LegalPrivacy
Reply all
Reply to author
Forward
0 new messages