Time delay neural network implementation

speechMachine

unread,

Feb 3, 2016, 1:47:29 PM2/3/16

to kaldi-help

Hello,

I had a short/brief question. Does Kaldi have an implementation for a time-delay neural network?

Thanks...!

Daniel Povey

unread,

Feb 3, 2016, 1:49:37 PM2/3/16

to kaldi-help

Yes, search for TDNN in the scripts, it's the default recipe in many of them.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

speechMachine

unread,

Feb 3, 2016, 2:16:27 PM2/3/16

to kaldi-help

Thanks Dan, I did a grep and found the recipes that do have the implementation. From the comments I understand that the referring paper would be the multi-splice paper http://www.danielpovey.com/files/2015_interspeech_multisplice.pdf. I'm assuming the nnet3 binary for TDNN allows the layered architecture portrayed in Fig 1 which allows TDNN with subsampling at multiple layers, is that right?

Is a basic TDNN in a single layer essentially no different from a general splicing of the frames with a certain context window as is done in regular DNNs?

Daniel Povey

unread,

Feb 3, 2016, 2:19:59 PM2/3/16

to kaldi-help

Thanks Dan, I did a grep and found the recipes that do have the implementation. From the comments I understand that the referring paper would be the multi-splice paper http://www.danielpovey.com/files/2015_interspeech_multisplice.pdf. I'm assuming the nnet3 binary for TDNN allows the layered architecture portrayed in Fig 1 which allows TDNN with subsampling at multiple layers, is that right?

Yes.. well there's not specific binary for that, it's done via config files, but any recipe that uses 'steps/nnet3/train_tdnn.sh' is a TDNN recipe. The corresponding nnet2 recipes are steps/nnet2/train_multisplice*.sh.

Is a basic TDNN in a single layer essentially no different from a general splicing of the frames with a certain context window as is done in regular DNNs?

Yes, if there is just one layer.

Dan

Vijayaditya Peddinti

unread,

Feb 3, 2016, 2:44:49 PM2/3/16

to kaldi-help

The core element of the TDNN is the Splicing layer (Append descriptor in nnet3), which can splice non-contiguous indices in the given context. Once this spliced input is formed it is just processed by a normal Affine layer.

--Vijay

Reply all

Reply to author

Forward