Online decoding performs much worse than offline decoding

293 views
Skip to first unread message

John Locke

unread,
Feb 24, 2021, 11:54:43 AM2/24/21
to kaldi-help
Hello,

I am using the WSJ recipe, but with my own data and I cannot figure out what I do wrong. I do not see anything useful error messages in the log files, but I do not know exactly what to look after either. I suspect it may be ivector extraction, but don't know how to start.

Can you please give me a hint where I could start debugging this?

For offline decoding, I use this decode command and obtain a best_wer:

# nnet3-latgen-faster-parallel --num-threads=4 --online-ivectors=scp:exp/nnet3_online_cmn/ivectors_dev_hires/ivector_online.scp --online-ivector-period=10 --frame-subsampling-factor=3 --frames-per-chunk=140 --extra-left-context=35 --extra-right-context=35 --extra-left-context-initial=0 --extra-right-context-final=0 --minimize=false --max-active=7000 --min-active=200 --beam=15.0 --lattice-beam=8.0 --acoustic-scale=1.0 --allow-partial=true --word-symbol-table=exp/chain2_online_cmn/tree_a_sp/graph_tgpr/words.txt exp/chain2_online_cmn/tdnn1i_sp/final.mdl exp/chain2_online_cmn/tree_a_sp/graph_tgpr/HCLG.fst "ark,s,cs:apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/dev_hires/split36/17/utt2spk scp:data/dev_hires/split36/17/cmvn.scp scp:data/dev_hires/split36/17/feats.scp ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain2_online_cmn/tdnn1i_sp/decode_tgpr_dev/lat.17.gz"

%WER 1.56 [ 283 / 18199, 32 ins, 83 del, 168 sub ] exp/chain2_online_cmn/tdnn1i_sp/decode_tgpr_dev/wer_17_0.0

For online decoding:

# online2-wav-nnet3-latgen-faster --do-endpointing=false --frames-per-chunk=20 --extra-left-context-initial=0 --online=true --config=exp/chain2_online_cmn/tdnn1i_sp_online/conf/online.conf --min-active=200 --max-active=7000 --beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 --word-symbol-table=exp/chain2_online_cmn/tree_a_sp/graph_tgpr/words.txt exp/chain2_online_cmn/tdnn1i_sp_online/final.mdl exp/chain2_online_cmn/tree_a_sp/graph_tgpr/HCLG.fst ark:data/dev_hires/split36/7/spk2utt "ark,s,cs:wav-copy scp,p:data/dev_hires/split36/7/wav.scp ark:- |" "ark:|lattice-scale --acoustic-scale=10.0 ark:- ark:- | gzip -c >exp/chain2_online_cmn/tdnn1i_sp_online/decode_tgpr_dev/lat.7.gz"

%WER 98.32 [ 17894 / 18199, 47 ins, 15126 del, 2721 sub ] exp/chain2_online_cmn/tdnn1i_sp_online/decode_tgpr_dev/wer_7_0.0


nshm...@gmail.com

unread,
Feb 25, 2021, 7:03:15 AM2/25/21
to kaldi-help
Most likely it is about cmn. It is strange you use online_cmn recipe but in your batch command you have '--norm-means=false', something likely went out of sync.

Share exp/chain2_online_cmn/tdnn1i_sp_online/conf/online.conf and other conf files (for ivector). Share batch ivector extraction command.

John Locke

unread,
Feb 25, 2021, 1:23:05 PM2/25/21
to kaldi-help
Hello,

Hmm.. that is strange indeed. Here is my online.conf:

--feature-type=mfcc
--mfcc-config=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/mfcc.conf
--ivector-extraction-config=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/ivector_extractor.conf
--endpoint.silence-phones=1:2:3:4:5:6:7:8:9:10:11:12:13:14:15

Here is my ivector_extractor.conf

--splice-config=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/splice.conf
--cmvn-config=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/online_cmvn.conf
--lda-matrix=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/ivector_extractor/final.mat
--global-cmvn-stats=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/ivector_extractor/global_cmvn.stats
--diag-ubm=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/ivector_extractor/final.dubm
--ivector-extractor=/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/ivector_extractor/final.ie
--num-gselect=5
--min-post=0.025
--posterior-scale=0.1
--max-remembered-frames=1000
--max-count=100
--ivector-period=10

My mfcc.conf:

--use-energy=false # use average of log energy, not energy.
--num-mel-bins=40 # similar to Google's setup.
--num-ceps=40 # there is no dimensionality reduction.
--low-freq=20 # low cutoff frequency for mel bins... this is high-bandwidth data, so
--high-freq=-400 # high cutoff frequently, relative to Nyquist of 8000 (=7600)

Command for extracting ivectors:

# ivector-extract-online2 --config=exp/nnet3_online_cmn/ivectors_train_sp_hires/conf/ivector_extractor.conf ark:exp/nnet3_online_cmn/ivectors_train_sp_hires/train_sp_hires_max2/split30/1/spk2utt scp:exp/nnet3_online_cmn/ivectors_train_sp_hires/train_sp_hires_max2/split30/1/feats.scp ark:- | copy-feats --compress=true ark:- ark,scp:/opt/kaldi/egs/my/s5/exp/nnet3_online_cmn/ivectors_train_sp_hires/ivector_online.1.ark,/opt/kaldi/egs/my/s5/exp/nnet3_online_cmn/ivectors_train_sp_hires/ivector_online.1.scp

Thank you for looking into this! I appreciate it!

nshm...@gmail.com

unread,
Feb 25, 2021, 1:37:16 PM2/25/21
to kaldi-help
Is there a difference between

exp/nnet3_online_cmn/ivectors_train_sp_hires/conf/ivector_extractor.conf

and 

/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/ivector_extractor.conf

What is inside

/opt/kaldi/egs/ro/s5/exp/chain2_online_cmn/tdnn1i_sp_online/conf/online_cmvn.conf

John Locke

unread,
Feb 25, 2021, 1:53:07 PM2/25/21
to kaldi-help
Hello,

Yes, there is a slight difference:

--max-count=0
--online-cmvn-iextractor=true

vs

--max-count=100

The other file is empty:

# configuration file for apply-cmvn-online, used in the script ../local/online/run_online_decoding_nnet2.sh

软件开发工作经验

unread,
Feb 25, 2021, 7:43:03 PM2/25/21
to kaldi-help
Hi,decode is jammed in two times,so i think cuda is not working,but my env have nvidia driver 460 and cuda 11,in the last week,driver is old and cuda is 10,but when i run dnn in run_nnet2.sh,i get message about bad cuda env ,the message force me to upgrade nvida driver and cuda,after upgrading,no error message.
Now i I want  to use nnet3,but decode is jammed with a short test dataset,and cuda 11 is questioned,What a mess,so please do me a favour.
thanks.

Daniel Povey

unread,
Feb 25, 2021, 9:50:20 PM2/25/21
to kaldi-help
Usually this kind of thing is due to hardware problems.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/tencent_D3C3DF9F84F135B2ADF77DA2A77E10A3B607%40qq.com.

软件开发工作经验

unread,
Feb 25, 2021, 10:06:49 PM2/25/21
to kaldi-help
the command is
"gmm-latgen-faster –max-active=7000 –beam=13.0 –lattice-beam=6.0 –acoustic-scale=0.083333 –alow-partical=true –word-symbao-table=exp/mono0a/graph_nosp_tgpr/words.txt exp/mono0a/final.mdl"
I get it by htop,is there any parameter wrong?
decode --nj 5,can --nj multithread?
I have runned thchs30 dataset,that's ok,maybe hardware is good,and other thing is wrong



------------------ 原始邮件 ------------------
发件人: "kaldi-help" <dpo...@gmail.com>;
发送时间: 2021年2月26日(星期五) 上午10:50
收件人: "kaldi-help"<kaldi...@googlegroups.com>;
主题: Re: [kaldi-help] wsj run.sh mono decode have no response in 6 hours,Is kaldi support CUDA 11?

Daniel Povey

unread,
Feb 25, 2021, 10:44:27 PM2/25/21
to kaldi-help
That would not usually be related to CUDA as it doesn't use CUDA.
It's best if you find someone local who is good at Linux and can debug system problems to narrow it down.  May be running fine.
Or your system may be stuck.

nshm...@gmail.com

unread,
Feb 26, 2021, 3:06:52 PM2/26/21
to kaldi-help
So you'd better add to another file.

--online-cmvn-iextractor=true

Let us know if it solves your problem.

软件开发工作经验

unread,
Feb 26, 2021, 8:43:21 PM2/26/21
to kaldi-help
CLG_3_1.fst‘s len is 0
I have run wsj ok with decode in pc,But in server get error.
there are 10 wavs in pc ,but there are at least 5000 wavs in server,in the before day,I comment decode to run wsj,it can run to end,but uncomment decode,I get error like title.
i'm very confused about tmp dir,now study mkgraph.sh's code,please give me some clue,thanks.
Reply all
Reply to author
Forward
0 new messages