Multitask Training: Direct Speech-to-Speech Translation With Discrete Units

31 views

Skip to first unread message

saranya v

unread,

Sep 12, 2023, 1:02:47 PM9/12/23

to fairseq Users

Hi All,

I trying to implement Direct Speech-to-Speech Translation With Discrete Units (fairseq/examples/speech_to_speech/docs/direct_s2st_discrete_units.md at main · facebookresearch/fairseq (github.com)). I have few doubts.

Tried to train a model with multitask data for fr to en dataset, but the source and target task loss look very high, and I am not getting any improvement with multitask model compared to without multitask. ( S2UT reduced, no auxiliary task : 38.4 , S2UT reduced, sc + tc : 37.8)

Using chars for source and targets, below are the sample task (source letter and target letter) for multitask data.

Source_letter:

train.tsv

id tgt_text

dict.txt

| 1917002
e 1314627
t 927951

Target_letter:

train.tsv

id tgt_text
common_voice_fr_17299458 t h e | f l o o r | i s | t o | m r s | a n n i e | l e | h o u e r o u
common_voice_fr_17299459 t h a t | i s | c l e a r

dict.txt:

| 1881417
e 1570568
s 831638

Anyone could you pls help me to understand the why i am not getting improvement? or Am I doing something wrong.

Thanks in Advance.

Regards,

Saranya V

Reply all

Reply to author

Forward

0 new messages