Latency regression with more recent tdnn models

371 views
Skip to first unread message

David van Leeuwen

unread,
Oct 15, 2019, 10:16:30 AM10/15/19
to kaldi-help
Hello, 

In moving from the very old nnet2 multisplice training scripts to the more recent nnet3 tdnn_1{c,d} scripts, performance on, e.g., Librispeech has improved a lot–-and that is great!

However, I've noticed that the right context has also gone up from 40 ms to 400 ms over the same progression of network configurations.  I've tried to study the configurations of the networks, and I think the network specification syntax has also progressed over the years. 

I seems that in librispeech `local/chain/tuning/run_tdnn_1d.sh` the context is indicated in `tdnnf-layer` statements with a `time-stride=$n` option, which then is converted to `.linear` layers with time-offsets `-$n,0` and `.affine` layers with time-offsets `0,$n`.  I gather these have the combined effect of a symmetric context {-$n,0,$n}.  All in all these are resulting in an overall (left, right) context of (40, 40) frames. 

If I would want to reduce the latency (probably giving in on ASR accuracy) for the chain models and specify asymmetric contexts, as in `local/nnet3/tun_tdnn.sh`, what would be my best approach for using as nnet configuration tool?

Cheers, 

–-david

Daniel Povey

unread,
Oct 15, 2019, 3:12:43 PM10/15/19
to kaldi-help
I just created a PR here
https://github.com/kaldi-asr/kaldi/pull/3658
where I added some options to tdnnf-layer (not tested).
Can you try adding
  context=left-only
or
  context=shift-left
to some of the TDNN-F layers?  (Presumably the ones with stride 3.)
and let us know how the results change?


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/63eb618f-87f6-4798-93d7-d06b2506dbbe%40googlegroups.com.

David van Leeuwen

unread,
Oct 16, 2019, 5:17:48 AM10/16/19
to kaldi-help
Hello, 


On Tuesday, October 15, 2019 at 9:12:43 PM UTC+2, Dan Povey wrote:
I just created a PR here
https://github.com/kaldi-asr/kaldi/pull/3658
where I added some options to tdnnf-layer (not tested).
Can you try adding
  context=left-only
or
  context=shift-left
to some of the TDNN-F layers?  (Presumably the ones with stride 3.)
and let us know how the results change?

Thanks for the quick response and patch, wonderful! I will definitely do this.  In my current setup librispeech 960 / tdnn_1d takes about two days to train, so I hope to be able to report on results next week. 

Cheers, 
 
–-david


On Tue, Oct 15, 2019 at 7:16 AM David van Leeuwen <david.v...@gmail.com> wrote:
Hello, 

In moving from the very old nnet2 multisplice training scripts to the more recent nnet3 tdnn_1{c,d} scripts, performance on, e.g., Librispeech has improved a lot–-and that is great!

However, I've noticed that the right context has also gone up from 40 ms to 400 ms over the same progression of network configurations.  I've tried to study the configurations of the networks, and I think the network specification syntax has also progressed over the years. 

I seems that in librispeech `local/chain/tuning/run_tdnn_1d.sh` the context is indicated in `tdnnf-layer` statements with a `time-stride=$n` option, which then is converted to `.linear` layers with time-offsets `-$n,0` and `.affine` layers with time-offsets `0,$n`.  I gather these have the combined effect of a symmetric context {-$n,0,$n}.  All in all these are resulting in an overall (left, right) context of (40, 40) frames. 

If I would want to reduce the latency (probably giving in on ASR accuracy) for the chain models and specify asymmetric contexts, as in `local/nnet3/tun_tdnn.sh`, what would be my best approach for using as nnet configuration tool?

Cheers, 

–-david

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

David van Leeuwen

unread,
Oct 18, 2019, 7:05:10 AM10/18/19
to kaldi-help
Hi, 

Results with `context=shift-left` for all `time-stride=3` layers are somewhat worse than the symmetric configuration, but the latency is down to a snappy 40ms.
Here is a comparison to its symmetric counterpart, a scaled-down (narrower layers: bottleneck 128 wide 1024) version of librispeech tdnn_1d:

# local/chain/compare_wer.sh tdnn_1d_narrow context-shift-left
# System                      tdnn_1d_s2_sp tdnn_1d_s3_sp
# WER on dev(fglarge)              3.36      3.49
# WER on dev(tglarge)              3.44      3.60
# WER on dev(tgmed)                4.29      4.51
# WER on dev(tgsmall)              4.83      5.00
# WER on dev_other(fglarge)        8.77      9.15
# WER on dev_other(tglarge)        9.24      9.66
# WER on dev_other(tgmed)         11.30     11.69
# WER on dev_other(tgsmall)       12.51     12.98
# WER on test(fglarge)             3.79      3.94
# WER on test(tglarge)             3.94      4.04
# WER on test(tgmed)               4.82      4.88
# WER on test(tgsmall)             5.30      5.36
# WER on test_other(fglarge)       9.07      9.38
# WER on test_other(tglarge)       9.34      9.77
# WER on test_other(tgmed)        11.41     11.74
# WER on test_other(tgsmall)      12.60     13.11
# Final train prob              -0.0401   -0.0410
# Final valid prob              -0.0438   -0.0514
# Final train prob (xent)       -0.7275   -0.8232
# Final valid prob (xent)       -0.7367   -0.9041
# Num-parameters               12782368  12782368

Up next will be context=left-only for those layers, which will have fewer parameters, so I expect again a loss in performance there. 

–-david

Daniel Povey

unread,
Oct 18, 2019, 5:01:57 PM10/18/19
to kaldi-help
Thanks!!

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/5fca6eb0-ee4a-4493-a97e-3dcdb0e7bde6%40googlegroups.com.

David van Leeuwen

unread,
Oct 20, 2019, 5:36:45 AM10/20/19
to kaldi-help
Hello, 

On Friday, October 18, 2019 at 1:05:10 PM UTC+2, David van Leeuwen wrote:
Hi, 

Results with `context=shift-left` for all `time-stride=3` layers are somewhat worse than the symmetric configuration, but the latency is down to a snappy 40ms.
Here is a comparison to its symmetric counterpart, a scaled-down (narrower layers: bottleneck 128 wide 1024) version of librispeech tdnn_1d:

Results with `context=left-only` are in, so below the results for all three conditions (symmetric, shift-left, left-only), and added as the first column nnet2 multisplice for nostalgic reasons:

# local/chain/compare_wer.sh exp/nnet2_online/nnet_ms_a exp/chain_cleaned/tdnn_1d_s2_sp exp/chain_cleaned/tdnn_1d_s3_sp exp/chain_cleaned/tdnn_1d_s4_sp
# System                      nnet_ms_a tdnn_1d_s2_sp tdnn_1d_s3_sp tdnn_1d_s4_sp
# WER on dev(fglarge)              4.35      3.36      3.49      3.53
# WER on dev(tglarge)              4.62      3.44      3.60      3.56
# WER on dev(tgmed)                5.77      4.29      4.51      4.60
# WER on dev(tgsmall)              6.53      4.83      5.00      5.11
# WER on dev_other(fglarge)       12.50      8.77      9.15      9.28
# WER on dev_other(tglarge)       13.05      9.24      9.66      9.77
# WER on dev_other(tgmed)         15.57     11.30     11.69     11.78
# WER on dev_other(tgsmall)       16.78     12.51     12.98     13.02
# WER on test(fglarge)             4.96      3.79      3.94      3.96
# WER on test(tglarge)             5.15      3.94      4.04      4.09
# WER on test(tgmed)               6.26      4.82      4.88      5.00
# WER on test(tgsmall)             7.14      5.30      5.36      5.57
# WER on test_other(fglarge)      13.02      9.07      9.38      9.48
# WER on test_other(tglarge)      13.45      9.34      9.77      9.84
# WER on test_other(tgmed)        15.74     11.41     11.74     11.93
# WER on test_other(tgsmall)      17.21     12.60     13.11     13.24
# Final train prob                        -0.0401   -0.0410   -0.0451
# Final valid prob                        -0.0438   -0.0514   -0.0551
# Final train prob (xent)                 -0.7275   -0.8232   -0.9250
# Final valid prob (xent)                 -0.7367   -0.9041   -0.9962
# Num-parameters                      0  12782368  12782368  11209504

–-david 

Daniel Povey

unread,
Oct 20, 2019, 6:48:44 PM10/20/19
to kaldi-help
Thanks a lot!


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/050fd5ee-3d73-41e5-bfb9-6fab04a343d3%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages