LSTM discriminative training

Xiang Li

unread,

May 23, 2016, 3:11:42 AM5/23/16

to kaldi-help

Hi, all,

I tried LSTM discriminative training recently.

The configuration for LSTM

splice_indexes="-2,-1,0,1,2 0 0"
label_delay=5
num_lstm_layers=3
cell_dim=1024
hidden_dim=1024
recurrent_projection_dim=256
non_recurrent_projection_dim=256
chunk_width=20
chunk_left_context=40

Here are the results of LSTM and my best nnet2 TDNN (for comparison):

MODEL WER

TDNN-Xent 11.45

TDNN-SMBR 10.00

LSTM-Xent 10.89

And below is the WER of LSTM SMBR:

LearningRate Epoch1 Epoch2 Epoch3 Epoch4

0.0000125 11.40 11.73 11.93 11.95

0.00000125 10.53 10.52 10.59 11.71

0.000000125 10.60 10.60 10.59 10.62

So, from results above, LSTM-Xent is better than TDNN-Xent,

but SMBR does not help LSTM as much as TDNN, which is unnormal and I can't find why.

BTW, the LR used in wsj recipre is 0.0000125, I suspect it is too large.

So, is there a LSTM-SMBR result that I can refer to? I can't find one in RESULT file.

And any suggestion to improve my LSTM-SMBR WER?

Thanks.

Xiang

Xingyu Na

unread,

May 23, 2016, 3:19:15 AM5/23/16

to kaldi...@googlegroups.com

My suggestion is to try BLSTM instead of LSTM. And 0.0000125 is for small setup like wsj.

Xingyu

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xiang Li

unread,

May 23, 2016, 3:35:13 AM5/23/16

to kaldi-help

Hi, Xingyu,

I prefer LSTM, because it's faster than BLSTM when decoding.

The WER of my Xent-LSTM is very promising, so I think by SMBR there's a chance to get better WER than my best TDNN.

Any other ideas?

在 2016年5月23日星期一 UTC+8下午3:19:15，Xingyu Na写道：

Xiang Li

unread,

May 23, 2016, 3:44:38 AM5/23/16

to kaldi-help

Below is the plot of objectives and gradients:

LR:0.000000125

Daniel Povey

unread,

May 23, 2016, 5:32:16 PM5/23/16

to kaldi-help, Vimal Manohar

Vimal may have more comments.

I think there have been some fixes to the nnet3 discriminative training recently- not sure if they would be relevant.
Dan

Vimal Manohar

unread,

May 23, 2016, 5:41:33 PM5/23/16

to kaldi-help

Are you using the same configuration of discriminative training as the one used for nnet2. e.g. the one-silence-class, adjust-priors etc.? You could try tuning these.

Which run script are you using for the discriminative training? It is possible, there is some mismatch in the lstm left-context or right-context when you are decoding for denominator lattice generation. Look at the logs in the alignment or denominator lattice generation directories and see there is something too much different from the one in TDNN. e.g. lots of UNK or NOISE.

--

Vimal Manohar

PhD Student

Electrical & Computer Engineering

Johns Hopkins University

Songjun Cao

unread,

Jan 10, 2017, 9:20:38 PM1/10/17

to kaldi-help

Hi, Xiang Li

I have the same problem. When lr is bigger, results are getting worse. When lr is smaller enough, results are getting better, but only a little.

Have you found the reason or any recipe? Thanks!

在 2016年5月23日星期一 UTC+8下午3:11:42，Xiang Li写道：

Daniel Povey

unread,

Jan 10, 2017, 9:26:11 PM1/10/17

to kaldi-help

... Yes, if you are training an LSTM the frames-per-chunk,
extra-left-context and extra-right-context used in decoding and are
important and need to be matched; and the scripts to dump egs and
maybe also to train have similar options, IIRC.

Reply all

Reply to author

Forward