Re: [WMT-SLT sign language translation] Can't download the dataset

25 views

Skip to first unread message

Mathias Müller

unread,

Jul 11, 2023, 11:52:13 AM7/11/23

to estel...@gmail.com, wmt-...@googlegroups.com

Dear Xuan

(+ WMT google group in case other participants have the same questions)

Thank you for getting in touch!

On 10 Jul 2023, at 17:56, Xuan Zhang <estel...@gmail.com> wrote:

Hi Marthias,

I have several questions regarding this year’s WMT-SLT shared task.

1. I got the approval from SwissUbase but had a problem downloading the dataset from Chrome. I always got a network error because it took too long to download. I’m wondering if there is another way to download the dataset, e.g. using wget?

I am sorry to hear this, unfortunately there is no other way to download this data. (Swissubase does not allow any automated download such as wget currently)

2. Are we allowed to use other datasets besides provided ones, e.g. PHOENIX 2014T, for pre-training?

Yes, you are allowed to use any dataset, as long as you declare that your submission is “unconstrained” (meaning: it is trained on other datasets that are not our primary training data). You will need to declare this when you are making a submission.

3. “The test data will consist of 50% Signsuisse and 50% SRF examples.” Does this mean that our models will be evaluated and ranked based on both test sets?

Yes, the final ranking will be based on the average performance in these two domains. But there will also be secondary rankings separated into the domains, like here: https://aclanthology.org/2022.wmt-1.71/

Are we supposed to submit separate models for each dataset?

You can either a) submit one model and therefore regard on of the domains as out-of-domain for the model or b) submit two different models. However, you can only mark one submission as primary. Only primary submissions will be evaluated by humans.

Please let me know if this helps and kind regards

Mathias

Thank you,
Xuan Zhang (she, her, hers)
Ph.D. candidate
Center for Language and Speech Processing
Department of Computer Science
Johns Hopkins University

—

Dr. Mathias Müller
AND-2-20
Department of Computational Linguistics
University of Zurich
Switzerland
mmue...@cl.uzh.ch

Reply all

Reply to author

Forward

0 new messages