Hi Marthias,
I have several questions regarding this year’s WMT-SLT shared task.
1. I got the approval from SwissUbase but had a problem downloading the dataset from Chrome. I always got a network error because it took too long to download. I’m wondering if there is another way to download the dataset, e.g. using wget?
I am sorry to hear this, unfortunately there is no other way to download this data. (Swissubase does not allow any automated download such as wget currently)
2. Are we allowed to use other datasets besides provided ones, e.g. PHOENIX 2014T, for pre-training?
Yes, you are allowed to use any dataset, as long as you declare that your submission is “unconstrained” (meaning: it is trained on other datasets that are not our primary training data). You will need to declare this when you are making a submission.
3. “The test data will consist of 50% Signsuisse and 50% SRF examples.” Does this mean that our models will be evaluated and ranked based on both test sets?
Yes, the final ranking will be based on the average performance in these two domains. But there will also be secondary rankings separated into the domains, like here:
https://aclanthology.org/2022.wmt-1.71/Are we supposed to submit separate models for each dataset?
You can either a) submit one model and therefore regard on of the domains as out-of-domain for the model or b) submit two different models. However, you can only mark one submission as primary. Only primary submissions will be evaluated by humans.
Please let me know if this helps and kind regards
Mathias
Thank you,
Xuan Zhang (she, her, hers)
Ph.D. candidate
Center for Language and Speech Processing
Department of Computer Science
Johns Hopkins University