another follow-up on my chain models for embedded platforms
experiment: my 1000+ hour english models (trained on
librispeech+voxforge data) have just finished training (using the same
scripts as the german models) and I am pretty happy with the results:
%WER 2.48 [ 12525 / 504653, 737 ins, 2720 del, 9068 sub ]
exp/nnet3_chain/tdnn_sp/decode_test/wer_10_0.0
%WER 3.03 [ 15269 / 504653, 948 ins, 3260 del, 11061 sub ]
exp/nnet3_chain/tdnn_250/decode_test/wer_9_0.0
plus: the smaller (250) model still achieves near realtime performance
on a raspberry pi 3:
[bofh@donald py-kaldi-asr]$ python examples/chain_incremental.py
tdnn_250 loading model...
tdnn_250 loading model... done, took 23.394126s.
tdnn_250 creating decoder...
tdnn_250 creating decoder... done, took 14.411979s.
decoding data/dw961.wav...
0.087s: 4000 frames ( 0.250s) decoded.
0.400s: 8000 frames ( 0.500s) decoded.
0.742s: 12000 frames ( 0.750s) decoded.
1.021s: 16000 frames ( 1.000s) decoded.
1.263s: 20000 frames ( 1.250s) decoded.
1.497s: 24000 frames ( 1.500s) decoded.
1.714s: 28000 frames ( 1.750s) decoded.
1.992s: 32000 frames ( 2.000s) decoded.
2.370s: 36000 frames ( 2.250s) decoded.
2.642s: 40000 frames ( 2.500s) decoded.
2.873s: 44000 frames ( 2.750s) decoded.
3.112s: 48000 frames ( 3.000s) decoded.
3.333s: 52000 frames ( 3.250s) decoded.
3.668s: 56000 frames ( 3.500s) decoded.
3.876s: 60000 frames ( 3.750s) decoded.
4.092s: 64000 frames ( 4.000s) decoded.
4.305s: 68000 frames ( 4.250s) decoded.
4.517s: 72000 frames ( 4.500s) decoded.
4.951s: 74000 frames ( 4.625s) decoded.
*****************************************************************
** data/dw961.wav
** i cannot follow you she said
** tdnn_250 likelihood: 1.99656772614
*****************************************************************
tdnn_250 decoding took 4.95s
in case anybody is interested in trying these models, they are
available for download here:
http://goofy.zamia.org/voxforge/en/kaldi-chain-voxforge-en-r20171129.tar.xz
and, as always, all my scripts are open source, freely available on github:
https://github.com/gooofy/speech
guenter