Hi,
on Tuesday, February 7 (at 10:30 in N207), I will present a (one year old) paper
about weaknesses of COMET by Chantal Amrhein and Rico Sennrich:
https://arxiv.org/pdf/2202.05148.pdf
Identifying Weaknesses in Machine Translation Metrics Through Minimum Bayes Risk Decoding: A Case Study for COMET
COMET is now the most recommended automatic metric for MT evaluation (as a substitute for BLEU).
The paper shows that COMET models are not sensitive enough to discrepancies in numbers and named entities
and that these biases are hard to fully remove by simply training on additional synthetic data.
The paper also describes a way how to explore and quantify such weaknesses: sample-based Minimum Bayes Risk decoding.
Looking forward to see you at the RG either in person or via Zoom
Martin