Meeting #128: [Online; 15:00] Model Selection for Cross-lingual Transfer
17 views
Skip to first unread message
Karen Hambardzumyan
unread,
Dec 3, 2021, 11:15:51 AM12/3/21
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Machine Learning Reading Group Yerevan
Hi everyone,
This week Knarik Mheryan from YerevaNN will present a paper from EMNLP 2021 - Model Selection for Cross-lingual Transfer. In the traditional zero-shot cross-lingual transfer there's no access to development/validation data (otherwise we could always train on dev data instead of zero-shot transfer and get consistently better results). The trivial approach is to use the source language dev set, but it is already shown to produce suboptimal results. This paper proposes an ML-based method of model selection for cross-lingual transfer and shows consistently better results.
Original Abstract: Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language development data is available to select among multiple fine-tuned models. Prior work has relied on English dev data to select among models that are fine-tuned with different learning rates, number of steps and other hyperparameters, often resulting in suboptimal choices. In this paper, we show that it is possible to select consistently better models when small amounts of annotated data are available in auxiliary pivot languages. We propose a machine learning approach to model selection that uses the fine-tuned model’s own internal representations to predict its cross-lingual capabilities. In extensive experiments we find that this method consistently selects better models than English validation data across twenty five languages (including eight low-resource languages), and often achieves results that are comparable to model selection using target language development data.