Multitask Training: Direct Speech-to-Speech Translation With Discrete Units

31 views
Skip to first unread message

saranya v

unread,
Sep 12, 2023, 1:02:47 PM9/12/23
to fairseq Users
Hi All,

I trying to implement Direct Speech-to-Speech Translation With Discrete Units  (fairseq/examples/speech_to_speech/docs/direct_s2st_discrete_units.md at main · facebookresearch/fairseq (github.com)). I have few doubts.

Tried to train a model with multitask data for fr to en dataset, but the source and target task loss look very high, and I am not getting any improvement with multitask model compared to without multitask. ( S2UT reduced, no auxiliary task : 38.4 , S2UT reduced, sc + tc : 37.8)

Using chars for source and targets, below are the sample task (source letter and target letter) for multitask data.
Source_letter:
train.tsv
                id      tgt_text
                 common_voice_fr_19510547        l e u r | p r o l i f é r a t i o n | c e l l u l a i r e | e s t | i m p o r t a n t e | e t | i l s | p e u v e n t | é v o l u e r | r a p i d e m e n t | v e r s | u n | g l i o b l a s t o m e
                common_voice_fr_19510553        n ' y | a r r i v a n t | p a s | i l s | s ' e n | p r e n n e n t | a u | s u p p o r t | e t | c a u s e n t | d e | n o m b r e u s e s | d é g r a d a t i o n s
               common_voice_fr_19510554        c e t t e | s a i s o n | s e r a | t o u t e | l e u r | v i e
dict.txt
               | 1917002
              e 1314627
               t 927951
Target_letter:
train.tsv
                 id      tgt_text
                common_voice_fr_17299458        t h e | f l o o r | i s | t o | m r s | a n n i e | l e | h o u e r o u
                common_voice_fr_17299459        t h a t | i s | c l e a r
dict.txt:
            | 1881417
            e 1570568
            s 831638

Anyone could you pls help me to understand the why i am not getting improvement? or Am I doing something wrong.

Thanks in Advance.

Regards,
Saranya V

Reply all
Reply to author
Forward
0 new messages