AI4LAM Speech-to-Text WG, November 25: Community-driven Speech Data with Mozilla Common Voice and Mozilla Data Collective

25 views

Skip to first unread message

Owen King

unread,

Nov 20, 2025, 11:24:10 AM11/20/25

to AI4LAM group

The AI4LAM Speech-to-Text Working Group invites you to its next meeting on Tuesday, November 25 at 09:00 US-Pacific | 12:00 US-Eastern | 17:00 UK | 18:00 Central Europe | 03:00 +1 Canberra.

Topic: Community-driven Speech Data with Mozilla Common Voice and Mozilla Data Collective

The ASR systems we discuss in this group were trained on many thousands of hours of speech data. Such data have many possible sources. This week we'll be focusing on community-driven approaches to speech data, with special attention to the Mozilla Common Voice project and the wider Mozilla Data Collective platform. We'll think about how the LAM community can benefit from these projects and also how we might be able to contribute back to them. Robert Pugh from Indiana University, who is a Language Community Manager at the Mozilla Data Collective, will be joining us to help us think through these questions!

In advance of the meeting, please take a moment to brainstorm: What kinds of datasets might benefit the projects you work on? Do you steward audio collections that could be contributed to a shared dataset?

Note that, because the general AI4LAM Community Call is a week later than usual this month, our Speech-to-Text call will take place immediately following the general call. Please join us for both!

Agenda and running notes: https://docs.google.com/document/d/1lUI1l_cfJ-hM7ZXgfITyjcevUxFfc0C6HRzhc_Ui8bU

Zoom: https://stanford.zoom.us/j/99320941121?pwd=AafIBuc5maw5mcsiHYrcW7uSQmB6t5.1&from=addon

Cheers,

Owen

(on behalf of the Speech-to-Text WG conveners)

Owen King (he/him)

Metadata Operations Manager

E: owen...@wgbh.org

One Guest Street, Boston, MA 02135

Reply all

Reply to author

Forward

0 new messages