Hi everyone!
I am doing some research on Kaldi, and I have a
couple of questions, in particular with nn3, in this script[1]. First,
in the next snippet:
input dim=100 name=ivector
input dim=40 name=input # please note that it is important to have input layer with the name=input
# as the layer immediately preceding the fixed-affine-layer to enable
# the use of short notation for the descriptor
fixed-affine-layer name=lda input=Append(-2,-1,0,1,2,
ReplaceIndex(ivector, t, 0)) affine-transform-file=$dir/configs/lda.mat
# the first splicing is moved before the lda layer, so no splicing here
relu-renorm-layer name=tdnn1 dim=520 [**]
relu-renorm-layer name=tdnn2 dim=520 input=Append(-1,0,1)
fast-lstmp-layer name=lstm1 cell-dim=520 [*] recurrent-projection-dim=130 non-recurrent-projection-dim=130 decay-time=20 delay=-3
How
is the input dimension get transformed from 40 to 520, what operations
are being applied? I read some material about LDA [2][3](suggested in
this group), but the idea is still not clear for me. Is the fixed affine layer the one that modifies the dimensions? If so, how?
Also, in some results[4], a configuration is specified as:
# bidirectional LSTM
# -----------------------
# local/nnet3/run_lstm.sh --affix bidirectional
# --lstm-delay " [-1,1] [-2,2] [-3,3] "
# --label-delay 0
# --
cell-dim 1024 # --recurrent-projection-dim 128
# --non-recurrent-projection-dim 128
# --chunk-left-context 40
# --chunk-right-context 40
The cell dimension is clearly defining the cell dimension in
[*], but is it also affecting at the relu-renorm-layer in [**]?
I would highly appreciate some clarifications.
Cheers,
Giovanni
[1]
https://github.com/kaldi-asr/kaldi/blob/master/egs/wsj/s5/local/nnet3/tuning/run_tdnn_lstm_1a.sh
[2] http://www1.icsi.berkeley.edu/ftp/pub/speech/papers/panus_eur03.pdf
[3] http://www.danielpovey.com/files/2013_interspeech_nnet_lda.pdf
[4] https://github.com/kaldi-asr/kaldi/blob/master/egs/swbd/s5c/RESULTS