LSTM+DNN training

1,005 views
Skip to first unread message

Yanhua Long

unread,
Sep 6, 2015, 10:48:10 PM9/6/15
to kaldi-help
Dear everyone, 

In these recent days, I use kaldi to implement a LSTM+DNN system which has been proposed in the Tara's paper“Convolutional, long short-term memory, fully connected deep neural networks”, however, the results I got were not as I expected. 


Here are some details of my experiments:


1. A LSTM baseline system (2 layers) is trained on 300hrs speech data using kaldi's rm recipe of nnet run_lstm.sh, it got around 15% relative WER improvement than the DNN system.


2. 2 DNN layers (fully connected layers) are added after the output of the LSTM to train a LSTM+DNN system, results are listed as below:

 

LSTM:  WER=19.4%

LSTM+DNN:  WER=29.4%

 

All the LSTM+DNN system training parameter configurations are the same as the LSTM, except the extra added 2 DNN layers in the nnet proto file. such as,learning rate =0.0001, splice = 0, momentum = 0.9, BPTT=20, etc. Also including the nnet initialization method. 


I am not sure whether I can do this implementation directly using nnet, and in the recent updated kaldi, I find there is a lstm recipe- wsj/s5/steps/nnet3/lstm/train.sh, from this script, the "--hidden-dim" is already taken as the lstm input, does it mean that I can use this recipe to reach my lstm+dnn goal ?


I really appreciate if someone can tell me some points. 


-Yanhua

 






Xingyu Na

unread,
Sep 6, 2015, 10:57:55 PM9/6/15
to kaldi...@googlegroups.com
If you are borrowing the run_cnn script to do lstm+dnn, try add shift and rescale components before attaching the dnn components.
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yanhua Long

unread,
Sep 6, 2015, 11:02:12 PM9/6/15
to kaldi-help
oh~ the script I borrowed is the rm/s5/local/nnet/run_lstm.sh

here is the proto of my lstm+dnn: 

<NnetProto>
<LstmProjectedStreams> <InputDim> 90 <OutputDim> 256 <CellDim> 2000 <ParamScale> 0.010000 <ClipGradient> 5.000000
<LstmProjectedStreams> <InputDim> 256 <OutputDim> 256 <CellDim> 2000 <ParamScale> 0.010000 <ClipGradient> 5.000000
<AffineTransform> <InputDim> 256 <OutputDim> 1024 <BiasMean> -2.000000 <BiasRange> 4.000000 <ParamStddev> 0.039938 <MaxNorm> 0.000000
<Sigmoid> <InputDim> 1024 <OutputDim> 1024
<AffineTransform> <InputDim> 1024 <OutputDim> 1024 <BiasMean> -2.000000 <BiasRange> 4.000000 <ParamStddev> 0.109375 <MaxNorm> 0.000000
<Sigmoid> <InputDim> 1024 <OutputDim> 1024
<AffineTransform> <InputDim> 1024 <OutputDim> 3331 <BiasMean> 0.000000 <BiasRange> 0.0 <ParamStddev> 0.075005 <LearnRateCoef> 1.000000 <BiasLearnRateCoef> 0.100000
<Softmax> <InputDim> 3331 <OutputDim> 3331
</NnetProto>

is there is something wrong? many thanks.

-Yanhua



在 2015年9月7日星期一 UTC+8上午10:57:55,Xingyu Na写道:
Reply all
Reply to author
Forward
0 new messages