LSTM discriminative training

363 views
Skip to first unread message

Xiang Li

unread,
May 23, 2016, 3:11:42 AM5/23/16
to kaldi-help
Hi, all,
I tried LSTM discriminative training recently.
The configuration for LSTM
splice_indexes="-2,-1,0,1,2 0 0"
label_delay=5
num_lstm_layers=3
cell_dim=1024
hidden_dim=1024
recurrent_projection_dim=256
non_recurrent_projection_dim=256
chunk_width=20
chunk_left_context=40

Here are the results of LSTM and my best nnet2 TDNN (for comparison):
MODEL                 WER
TDNN-Xent            11.45
TDNN-SMBR         10.00
LSTM-Xent            10.89  
 
And below is the WER of LSTM SMBR:
LearningRate Epoch1 Epoch2 Epoch3 Epoch4          
0.0000125     11.40    11.73    11.93    11.95   
0.00000125   10.53    10.52    10.59    11.71
0.000000125 10.60    10.60    10.59    10.62

So, from results above, LSTM-Xent is better than TDNN-Xent,
but SMBR does not help LSTM as much as TDNN, which is unnormal and I can't find why.

BTW, the LR used in wsj recipre is 0.0000125, I suspect it is too large.

So, is there a LSTM-SMBR result that I can refer to? I can't find one in RESULT file.
And any suggestion to improve my LSTM-SMBR WER?

Thanks.
Xiang




Xingyu Na

unread,
May 23, 2016, 3:19:15 AM5/23/16
to kaldi...@googlegroups.com
My suggestion is to try BLSTM instead of LSTM. And 0.0000125 is for small setup like wsj.

Xingyu
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xiang Li

unread,
May 23, 2016, 3:35:13 AM5/23/16
to kaldi-help
Hi, Xingyu,
I prefer LSTM, because it's faster than BLSTM when decoding.
The WER of my Xent-LSTM is very promising, so I think by SMBR there's a chance to get better WER than my best TDNN.
Any other ideas?

在 2016年5月23日星期一 UTC+8下午3:19:15,Xingyu Na写道:

Xiang Li

unread,
May 23, 2016, 3:44:38 AM5/23/16
to kaldi-help
Below is the plot of objectives and gradients:
LR:0.000000125



Daniel Povey

unread,
May 23, 2016, 5:32:16 PM5/23/16
to kaldi-help, Vimal Manohar
Vimal may have more comments.
I think there have been some fixes to the nnet3 discriminative training recently- not sure if they would be relevant.
Dan


Vimal Manohar

unread,
May 23, 2016, 5:41:33 PM5/23/16
to kaldi-help
Are you using the same configuration of discriminative training as the one used for nnet2. e.g. the one-silence-class, adjust-priors etc.? You could try tuning these.
Which run script are you using for the discriminative training? It is possible, there is some mismatch in the lstm left-context or right-context when you are decoding for denominator lattice generation. Look at the logs in the alignment or denominator lattice generation directories and see there is something too much different from the one in TDNN. e.g. lots of UNK or NOISE.
--
Vimal Manohar
PhD Student
Electrical & Computer Engineering
Johns Hopkins University

Songjun Cao

unread,
Jan 10, 2017, 9:20:38 PM1/10/17
to kaldi-help
Hi, Xiang Li
I have the same problem. When lr is bigger, results are getting worse. When lr is smaller enough, results are getting better, but only a little.
Have you found the reason or any recipe? Thanks!

在 2016年5月23日星期一 UTC+8下午3:11:42,Xiang Li写道:

Daniel Povey

unread,
Jan 10, 2017, 9:26:11 PM1/10/17
to kaldi-help
... Yes, if you are training an LSTM the frames-per-chunk,
extra-left-context and extra-right-context used in decoding and are
important and need to be matched; and the scripts to dump egs and
maybe also to train have similar options, IIRC.
Reply all
Reply to author
Forward
0 new messages