Couple interesting things I was made aware of recently

Daniel Povey

unread,

Oct 24, 2019, 1:27:50 PM10/24/19

to kaldi-help

Just to keep people up to date...

The first is some Facebook work on transformer stuff.

https://arxiv.org/abs/1910.09799

(Right now I am doing some related experiments... transformer is such a complicated architecture, what I'm wondering is whether the attention part is even the key part.. I wonder whether other aspects of that setup might be the key parts and the attention block maybe could be replaced by something convolutional like a TDNN layer. I've had trouble finding anything simple that's attention-based and outperforms TDNN)

And something interesting about vocoders...

https://www.reddit.com/r/MachineLearning/comments/dmdyat/p_melgan_vocoder_implementation_in_pytorch/

Dan

Armando

unread,

Oct 24, 2019, 4:29:20 PM10/24/19

to kaldi-help

On Thursday, October 24, 2019 at 7:27:50 PM UTC+2, Dan Povey wrote:

Just to keep people up to date...

The first is some Facebook work on transformer stuff.
https://arxiv.org/abs/1910.09799
(Right now I am doing some related experiments... transformer is such a complicated architecture, what I'm wondering is whether the attention part is even the key part.. I wonder whether other aspects of that setup might be the key parts and the attention block maybe could be replaced by something convolutional like a TDNN layer. I've had trouble finding anything simple that's attention-based and outperforms TDNN)

"Though compared with the sequence-to-sequence or neural transducer architecture, the hybrid approach is admittedly less appealing as it is not end-to-end trained, it is still the best performing system for authors’ practical problems. It also has the advantage that it can be easily integrated with other knowledge sources (e.g., personalized lexicon) that may not be available during training. I"

I think it's funny that the authors feel the need to "apologize" that their system is not end-to-end trained. The reasons they indicate for its use (better performance, seamless integration of new knowledge sources) are a big plus over end-to-end systems

Rémi Francis

unread,

Oct 25, 2019, 6:35:02 AM10/25/19

to kaldi-help

Transformers tend to work better when they're quite big.

Armando

unread,

Oct 28, 2019, 7:25:34 AM10/28/19

to kaldi-help

like SpecAugment, apparently.

In the paper, the smallest Transformer model has about 90M parameters

Yasser Hifny

unread,

Oct 29, 2019, 11:47:02 AM10/29/19

to kaldi-help

Related to the topic https://arxiv.org/abs/1910.10352

Daniel Povey

unread,

Oct 29, 2019, 8:57:01 PM10/29/19

to kaldi-help

Interesting paper!

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4fdbba72-2f86-48ed-9bc3-537e44881d30%40googlegroups.com.

Yasser Hifny

unread,

Oct 30, 2019, 7:07:20 AM10/30/19

to kaldi-help

another paper https://arxiv.org/pdf/1910.10387.pdf

On Thursday, October 24, 2019 at 1:27:50 PM UTC-4, Dan Povey wrote:

Reply all

Reply to author

Forward