About affine transform and layers dimension

715 views
Skip to first unread message

Giovanni Rescia

unread,
Apr 24, 2017, 10:06:15 AM4/24/17
to kaldi-help, Leandro Lichtensztein
Hi everyone!

I am doing some research on Kaldi, and I have a couple of questions, in particular with nn3, in this script[1]. First, in the next snippet:


input dim=100 name=ivector
input dim=40 name=input


# please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-2,-1,0,1,2,
ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat

# the first splicing is moved before the lda layer, so no splicing here
relu-renorm-layer name=tdnn1 dim=520  [**]
relu-renorm-layer name=tdnn2 dim=520 input=Append(-1,0,1)
fast-lstmp-layer name=lstm1 cell-dim=520 [*] recurrent-projection-dim=130 non-recurrent-projection-dim=130 decay-time=20 delay=-3


How is the input dimension get transformed from 40 to 520, what operations are being applied? I read some material about LDA [2][3](suggested in this group), but the idea is still not clear for me. Is the fixed affine layer the one that modifies the dimensions? If so, how?

Also, in some results[4], a configuration is specified as:


# bidirectional LSTM # -----------------------
# local/nnet3/run_lstm.sh --affix bidirectional
# --lstm-delay " [-1,1] [-2,2] [-3,3] "
# --label-delay 0
# --cell-dim 1024
# --recurrent-projection-dim 128
# --non-recurrent-projection-dim 128
# --chunk-left-context 40
# --chunk-right-context 40

The cell dimension is clearly defining the cell dimension in [*], but is it also affecting at the relu-renorm-layer in [**]?


I would highly appreciate some clarifications.

Cheers,
Giovanni

[1] https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/nnet3/tuning/run_tdnn_lstm_1a.sh
[2] http://www1.icsi.berkeley.edu/ftp/pub/speech/papers/panus_eur03.pdf
[3] http://www.danielpovey.com/files/2013_interspeech_nnet_lda.pdf
[4] https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/RESULTS

Daniel Povey

unread,
Apr 24, 2017, 1:59:18 PM4/24/17
to kaldi-help, Leandro Lichtensztein
The dimensions of those layers in the xconfig-based setup are the *output* dimensions.  Implicitly there is an affine transform there, so it's affine+relu+renorm.
(renorm is a kind of variant of BatchNorm, that we were using before BatchNorm existed.).


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giovanni Rescia

unread,
Apr 24, 2017, 3:00:25 PM4/24/17
to kaldi-help, leandr...@deepvisionai.com, dpo...@gmail.com
Oh, I see. Thanks!

About the input size, in the BLSTM configuration, is it 140x81 (100 + 40) x (chunk_L = 40 + current + 40 = chunk_R)? If so, is it mapped the whole matrix will be mapped to the vector (520, 1) after the relu layer?
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Daniel Povey

unread,
Apr 24, 2017, 3:05:58 PM4/24/17
to Giovanni Rescia, kaldi-help, Leandro Lichtensztein
It's never the case that we collapse the time dimension, if there are 100 't' values before the relu layer there would be 100 't' values after the relu layer.
Reply all
Reply to author
Forward
0 new messages