Your incredible contributions and community activities have made this latest version of the Common Voice Dataset possible. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
|
|
|
|
|
Hello Common Voice Contributor!
We are excited to announce the second dataset release in 2022 — Common Voice 9!
Your incredible contributions and community activities have made this latest version of the Common Voice Dataset possible. You can download the Common Voice dataset here for free. The dataset is now more than 20,000 hours!
It has doubled this year, and has 94 languages — the most diverse multilingual speech corpus in the world.
Dataset Highlights
Twenty seven languages now have at least 100 hours of speech data. They include Bengali, Thai, Basque, and Frisian.
Nine languages now have at least 45% of their gender tags as female. They include Marathi, Dhivehi, and Luganda.
We’re excited to welcome the languages of Tigre, Taiwanese (Minnan), Meadow Mari, Bengali, Toki Pona and Cantonese to the dataset.
We would also like to congratulate Igbo, Catalan, Urdu, Norwegian Nynorsk and Marathi communities for their amazing dataset growth.
What could I do next?
Care about tech being more inclusive?
Share via your social media: I am a #CommonVoice contributor, we are making voice technology better for languages spoken across the world. Join us by visiting commonvoice.mozilla.org
Already using the Common Voice dataset?
Let us know what you're building via social media using #CommonVoice hashtag or Community Discourse.
Hillary Juma
Common Voice Community Manager
Mozilla Foundation
|
|
|
|
|
|