LSTM+DNN training

Yanhua Long

unread,

Sep 6, 2015, 10:48:10 PM9/6/15

to kaldi-help

Dear everyone,

In these recent days, I use kaldi to implement a LSTM+DNN system which has been proposed in the Tara's paper“Convolutional, long short-term memory, fully connected deep neural networks”， however, the results I got were not as I expected.

Here are some details of my experiments:

1. A LSTM baseline system (2 layers) is trained on 300hrs speech data using kaldi's rm recipe of nnet run_lstm.sh, it got around 15% relative WER improvement than the DNN system.

2. 2 DNN layers (fully connected layers) are added after the output of the LSTM to train a LSTM+DNN system, results are listed as below:

LSTM: WER=19.4%

LSTM+DNN: WER=29.4%

All the LSTM+DNN system training parameter configurations are the same as the LSTM, except the extra added 2 DNN layers in the nnet proto file. such as,learning rate =0.0001, splice = 0, momentum = 0.9, BPTT=20, etc. Also including the nnet initialization method.

I am not sure whether I can do this implementation directly using nnet, and in the recent updated kaldi, I find there is a lstm recipe- wsj/s5/steps/nnet3/lstm/train.sh, from this script, the "--hidden-dim" is already taken as the lstm input, does it mean that I can use this recipe to reach my lstm+dnn goal ?

I really appreciate if someone can tell me some points.

-Yanhua

Xingyu Na

unread,

Sep 6, 2015, 10:57:55 PM9/6/15

to kaldi...@googlegroups.com

If you are borrowing the run_cnn script to do lstm+dnn, try add shift and rescale components before attaching the dnn components.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yanhua Long

unread,

Sep 6, 2015, 11:02:12 PM9/6/15

to kaldi-help

oh~ the script I borrowed is the rm/s5/local/nnet/run_lstm.sh

here is the proto of my lstm+dnn:

<LstmProjectedStreams> <InputDim> 90 <OutputDim> 256 <CellDim> 2000 <ParamScale> 0.010000 <ClipGradient> 5.000000

<LstmProjectedStreams> <InputDim> 256 <OutputDim> 256 <CellDim> 2000 <ParamScale> 0.010000 <ClipGradient> 5.000000

<AffineTransform> <InputDim> 256 <OutputDim> 1024 <BiasMean> -2.000000 <BiasRange> 4.000000 <ParamStddev> 0.039938 <MaxNorm> 0.000000

<Sigmoid> <InputDim> 1024 <OutputDim> 1024

<AffineTransform> <InputDim> 1024 <OutputDim> 1024 <BiasMean> -2.000000 <BiasRange> 4.000000 <ParamStddev> 0.109375 <MaxNorm> 0.000000

<Sigmoid> <InputDim> 1024 <OutputDim> 1024

<AffineTransform> <InputDim> 1024 <OutputDim> 3331 <BiasMean> 0.000000 <BiasRange> 0.0 <ParamStddev> 0.075005 <LearnRateCoef> 1.000000 <BiasLearnRateCoef> 0.100000

<Softmax> <InputDim> 3331 <OutputDim> 3331

</NnetProto>

is there is something wrong? many thanks.

-Yanhua

在 2015年9月7日星期一 UTC+8上午10:57:55，Xingyu Na写道：

Reply all

Reply to author

Forward