Title:
Semantic and Style Divergences in Machine Translation
Abstract:
Neural machine translation achieves impressive quality when trained on
millions of parallel sentences (translations produced by
humans). However, there is growing evidence that parallel sentences
are not equally useful. While noise and domain mismatch naturally
affect their usefulness, we hypothesize that more subtle semantic and
style divergences, reflecting choices made by human translators, also
matter for machine translation. We first show that mismatches in the
meaning of source and target are surprisingly frequent in parallel
corpora, and have a substantial impact on neural machine translation
at training and decoding time (Vyas, Niu and Carpuat NAACL 2018). We
then turn to the problem of producing machine translation for a
specific audience by controlling not only the content, but also the
style of the output (Niu, Rao and Carpuat COLING 2018).
Biography:
Marine Carpuat is an Assistant Professor in Computer Science at the
University of Maryland. Her research focuses on multilingual natural
language processing and machine translation. Marine is the recipient
of an NSF CAREER award, research awards from Google and Amazon, best paper awards at *SEM and TALN, and an Outstanding Teaching Award. She received her PhD in Computer Science from the Hong Kong University of Science & Technology, a MPhil in Electrical Engineering from the Hong Kong University of Science & Technology and a Diplome d'Ingenieur from the French Grande Ecole Supelec.