About chunk width and context in LSTM

705 views
Skip to first unread message

Xiang Li

unread,
Mar 21, 2017, 12:35:44 AM3/21/17
to kaldi-help
Hi, Dan,
I tried tdnn-lstm recently, and the result was good.
In the script, when training, the chunk width was 150 and chunk left context was 40,
when decoding, the chunk width was 150 and extra left context was 50.
So there's a mismatch for chunk extra left context, and I found that decoding with 50 extra left context gave the best result.
Is there a theory for this?
I know you have tuned the chunk width, so what's the result for chunk width 50?
And what's your suggestion for extra left context when chunk width changes.

Best,
Xiang

Daniel Povey

unread,
Mar 21, 2017, 12:50:14 AM3/21/17
to kaldi-help
Cool.

In general we expect that increasing the left-context will help, as it gets to see more context.
In practice this is true only up to a point, and eventually it starts to degrade.  [at least, this
was true before we introduced the decay-time option.]
In future I was thinking of just hardcoding the scripts to use the same extra-left-context in testing
as in training, just for simplicity.
BTW, if you look at the latest scripts (in kaldi 5.1), you'll see that we generally supply a comma-separated list
of chunk-widths, like 150,120,100,90 or something like that; and the options
--extra-left-context-initial 0 --extra-right-context-final 0
for training and decoding (and also decay-time=20 in the LSTM layers),
and also a variety of minibatch sizes, e.g. --minibatch-size=64,32
(this is important to avoid discarding too many examples when you have a variety of
chunk sizes).
These changes make it easier to do online decoding, and should make the models a little
less sensitive to the exact chunk sizes (e.g. decay-time helps to make it fine to use
infinitely large context without it degrading results).

Dan



--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xiang Li

unread,
Mar 21, 2017, 1:05:40 AM3/21/17
to kaldi-help, dpo...@gmail.com
Wow, a lot of new stuff. 
That's awesome. I'll try it.
Thanks a lot.

在 2017年3月21日星期二 UTC+8下午12:50:14,Dan Povey写道:
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Xiang Li

unread,
Mar 21, 2017, 2:02:41 AM3/21/17
to kaldi-help, dpo...@gmail.com
Hi, Dan,
I can't find the code for multiple chunk width in chain_lib or chain/get_egs.sh,
Has it been committed to master?


在 2017年3月21日星期二 UTC+8下午12:50:14,Dan Povey写道:
Cool.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Rudolf A. Braun

unread,
Mar 4, 2019, 5:33:52 PM3/4/19
to kaldi-help
Sorry to butt in, but I don't understand how decay-time is supposed to help when looking at the code of FastLstmp for example, the only thing it is used for is to define recurrence_scale which is then passed as a parameter to `BackpropTruncationComponent`. How is that going to have any effect at test time (I did one experiment and found it did not help at test time, though there may have been another reason for that, I'll know for sure tomorrow)?

Daniel Povey

unread,
Mar 4, 2019, 5:57:49 PM3/4/19
to kaldi-help
Its purpose is to stop gradient explosion during backprop in training.

Reply all
Reply to author
Forward
0 new messages