Join us for our 2023 Data Science for Society Seminar Series talk.
Lost in Translation, Large Language Models and Non-English Content Analysis
Aliya Bhatia and Gabriel Nicholas , Center for Democracy & Technology
Sign Up: RSVP
Date: 15 September 2023
In recent years, large language models (e.g., Open AI’s GPT-4, Meta’s LLaMa, Google’s PaLM) have become the dominant approach for building AI systems to generate and analyze language online. However, most of these automated systems that increasingly mediate our interactions online -- such as chatbots, content moderation systems, and search engines -- are primarily designed for and work far more effectively in English than in the world’s other 7,000 languages. Recently, researchers and technology companies have attempted to extend the capabilities of large language models into languages other than English by building what are called multilingual language models. In this talk we will explain how these multilingual language models work and explore their capabilities and limits. We will also talk more broadly about how companies, researchers, and policymakers can lift the bar of language resourcing and work towards building tools that work equitably across languages and speakers.
Aliya Bhatia is a policy analyst on CDT’s Free Expression team, which works to promote users’ free expression rights in the United States and around the world. Aliya works on issues regarding online safety and content moderation, and is dedicated to upholding media freedom and creative expression online.
Gabriel Nicholas is a Research Fellow at the Center for Democracy Technology where his research focuses on automated content moderation and data governance. He is also a joint fellow at the NYU School of Law Information Law Institute and the NYU Center for Cybersecurity. Gabriel is a software engineer by training and has a Masters in Information Management and Systems from the UC Berkeley School of Information. His written work has appeared in academic journals, law reviews, and journalistic outlets, including The Atlantic, The Washington Post, Slate, and Wired. His website can be found here
Seminars are hosted by the Data Science for Social Impact Research Group, in the Department of Computer Science at the University of Pretoria.
Connect with us
If you have any questions, suggestions or comments, feel free to contact Fiskani Banda at fiskan...@tuks.co.za