#DS4SocietySeminar DS@UP🚀 - Aliya Bhatia and Gabriel Nicholas - Lost in Translation, Large Language Models and Non-English Content Analysis

4 views
Skip to first unread message

Vukosi Marivate

unread,
Sep 6, 2023, 4:09:20 AMSep 6
to mlds-...@googlegroups.com, masakh...@googlegroups.com, ds...@googlegroups.com

2023 Data Science for Society Seminar Series #DS4SocietySeminar

 
Join us for our 2023 Data Science for Society Seminar Series talk.

Topic: Lost in Translation, Large Language Models and Non-English Content Analysis
Speakers: Aliya Bhatia and Gabriel Nicholas , Center for Democracy & Technology
Sign Up: RSVP
Date: 15 September 2023
Time:  2:00PM-3:00PM SAST
 
Abstract
In recent years, large language models (e.g., Open AI’s GPT-4, Meta’s LLaMa, Google’s PaLM) have become the dominant approach for building AI systems to generate and analyze language online. However, most of these automated systems that increasingly mediate our interactions online -- such as chatbots, content moderation systems, and search engines -- are primarily designed for and work far more effectively in English than in the world’s other 7,000 languages. Recently, researchers and technology companies have attempted to extend the capabilities of large language models into languages other than English by building what are called multilingual language models. In this talk we will explain how these multilingual language models work and explore their capabilities and limits. We will also talk more broadly about how companies, researchers, and policymakers can lift the bar of language resourcing and work towards building tools that work equitably across languages and speakers.

Author Bios
Aliya Bhatia is a policy analyst on CDT’s Free Expression team, which works to promote users’ free expression rights in the United States and around the world. Aliya works on issues regarding online safety and content moderation, and is dedicated to upholding media freedom and creative expression online.

Gabriel Nicholas is a Research Fellow at the Center for Democracy Technology where his research focuses on automated content moderation and data governance. He is also a joint fellow at the NYU School of Law Information Law Institute and the NYU Center for Cybersecurity. Gabriel is a software engineer by training and has a Masters in Information Management and Systems from the UC Berkeley School of Information. His written work has appeared in academic journals, law reviews, and journalistic outlets, including The Atlantic, The Washington Post, Slate, and Wired. His website can be found here.

Seminars are hosted by the Data Science for Social Impact Research Group, in the Department of Computer Science at the University of Pretoria.

Connect with us 2023 Schedule
 
Date Speaker Organisation
21/07/2023 Maxamed Ahmed Microsoft Africa Research Institute (MARI)
18/08/2023 Chijioke Okorie Univesity of Pretoria, Department of Private Law
15/09/2023 Aliya Bhatia Center for Democracy & Technology
13/10/2023 Thipe Modipa University of Limpopo, Department of Computer Science
10/11/2023    
08/12/2023 Dalton Lunga Oak Ridge National Laboratory

If you have any questions, suggestions or comments, feel free to contact Fiskani Banda at fiskan...@tuks.co.za 


This message and attachments are subject to a disclaimer.
Please refer to http://upnet.up.ac.za/services/it/documentation/docs/004167.pdf 
for full details.
Reply all
Reply to author
Forward
0 new messages