CoCo4MT Shared Task: First Call for Participation
We are excited to introduce a new shared task for this year’s CoCo4MT workshop! Our aim is to encourage and facilitate research on corpus construction for low-resource machine translation.
Corpus creation for machine translation is typically constrained by the cost and availability of human translators. When a new dataset needs to be created for a low-resource language or a specialized domain, the annotation budget should be used efficiently and any sentences chosen for translation should be of high quality and as useful for machine translation system training as possible.
In this shared task, we ask participants to come up with ways in which such examples can be identified for a target language without any existing data. Specifically, given a parallel corpus between high-resource languages, the goal is to choose a good subset of the high-resource corpus to be translated into the low-resource language, in order to obtain a good training set for a machine translation system. The shared task winner will be the team whose instances result in the best final system after training.
Detailed information: https://sites.google.com/view/coco4mt/shared-task
May 19 2023: Release of train, dev and test data
May 30 2023: Release of baselines
July 12, 2023: Deadline to submit results
July 20, 2023: System description papers due
Organizers (listed alphabetically)
Ananya Ganesh, University of Colorado Boulder
Constantine Lignos, Brandeis University
John E. Ortega, Northeastern University
Jonne Sälevä, Brandeis University
Katharina Kann, University of Colorado Boulder
Marine Carpuat, University of Maryland
Rodolfo Zevallos, Universitat Pompeu Fabra
Shabnam Tafreshi, University of Maryland
William Chen, Carnegie Mellon University