Dear all,
Next week we have a seminar by Marzieh Fadaee from the Institute of
Informatics at UvA. The talk will take place at 12pm on Tuesday, 20
November in SP B0.203.
The abstract and title are below. Hope to see many of you there!
If you'd like to meet with Marzieh please email her directly, she is
here in the Science Park (
M.Fa...@uva.nl).
Best,
Katia
Title: Understanding Rare Properties of the Language for Neural
Machine Translation
Abstract:
The quality of a Neural Machine Translation (NMT) system depends
substantially on the availability of sizable parallel corpora. For
low-resource language pairs or rare properties of the language in
general this is usually not the case, and this results in poor
translation quality.
In this talk I look into these rare properties, namely translation of
infrequent words, idiomatic expressions, and words that are difficult
to predict, and present different data modification and data
augmentation approaches to address these challenges.
We propose a novel data augmentation approach that targets
low-frequency words by generating new sentence pairs containing rare
words in new, synthetically created contexts. We also introduce
several variations of sampling strategies for back-translation,
targeting difficult-to-predict words using prediction losses.
Relevant publications:
Marzieh Fadaee, Christof Monz. Back-Translation Sampling by Targeting
Difficult Words in Neural Machine Translation. EMNLP 2018.
Marzieh Fadaee, Arianna Bisazza, Christof Monz. Examining the Tip of
the Iceberg: A Data Set for Idiom Translation. LREC 2018.
Marzieh Fadaee, Arianna Bisazza, Christof Monz. Data Augmentation for
Low-Resource Neural Machine Translation. ACL 2017.