"Scaling Laws for Multilingual Neural Machine Translation" - Patrick Fernandes (IST, CMU, Google Brain)

12 views
Skip to first unread message

Diogo Pernes

unread,
Apr 26, 2023, 10:00:07 AM4/26/23
to priberam_...@googlegroups.com, isr-...@isr.tecnico.ulisboa.pt, si...@omni.isr.ist.utl.pt

Dear all,

We are pleased to announce that we will hold the fifth session of this year's Priberam Machine Learning Seminars next Tuesday, May 2. Our featured speaker will be Patrick Fernandes, a PhD candidate at Carnegie Mellon University and Instituto Superior Técnico and a part-time Researcher at Google Brain. He will be presenting his research on the scaling properties of multilingual neural machine translation models, where he investigates the impact of model size and training mixture composition on multilingual neural machine translation performance.

The event will occur at 1 PM in Instituto Superior Técnico (room PA2), and we will provide lunch bags for attendees. To learn more about the event and register (which is mandatory if you plan to attend), please follow the link below:


We look forward to seeing you all there!

Kind regards,
Diogo Pernes

 

Priberam is hiring!
If you are interested in working with us please consult the available positions at priberam.com/careers.

Image result for priberam logoPRIBERAM SEMINARS

__________________________________________________

Priberam Machine Learning Lunch Seminar
Speaker: Patrick Fernandes (CMU, IST, and Google Brain)
Venue: Instituto Superior Técnico (room PA2)
Date: Tuesday, May 2, 2023
Time: 1 PM 
Title:
Scaling Laws for Multilingual Neural Machine Translation
Abstract:
In this talk, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the training mixture composition on the scaling behavior. We find that changing the weightings of the individual language pairs in the training mixture only affect the multiplicative factor of the scaling law. In particular, we observe that multilingual models trained using different mixing rates all exhibit the same scaling exponent. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, the direction of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, significantly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.
Short Bio:
Patrick Fernandes is a dual-degree PhD student at Carnegie Mellon University and Instituto Superior Técnico, supervised by Graham Neubig and André Martins. He is also a part-time Student Researcher at Google Brain. His main research interests revolve around understanding how neural networks learn, how they use the information they are provided to make decisions, and how to improve this decision process, with a special emphasis on Machine Translation. In a past life, he also worked with Graph Neural Networks for source code understanding.

Reply all
Reply to author
Forward
0 new messages