Webinar: Text Language Identification with CommonCrawl and Mozilla Data Collective
25 views
Skip to first unread message
Santiago M
unread,
May 20, 2026, 9:09:49 AMMay 20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to ai4lam
Dear AI4LAM community,
Happy to share this webinar which I feel might be of interest to many.
Mozilla Data Collective is partnering withCommon Crawl Foundationfor a hands-on webinar on Text Language Identification for under-represented languages, featuring two new open benchmarks on the Mozilla Data Collectiveplatform: CommonLID and CommonVoiceLID. As you well know, most language identification models work well for English. For many of the world’s other languages, they still fall short. This gap matters because it shapes what data enters AI systems, what tools work for which communities, and whose languages are treated as first-class citizens online. In this session,Laurie BurchellandPedro Ortiz SuarezfromCommon Crawl FoundationandKostis Saitas ZarkiasandRobert PughfromMozilla Data Collectivewill compare frontier LLMs with standard out-of-the-box tools, train local-first low-resource models from scratch, and show how to extend the pipeline to a language you care about. Join us as we work toward technology that is more inclusive, multilingual, and multicultural!