Getting error in online2-wav-nnet3-latgen-grammar

84 views
Skip to first unread message

crpatel

unread,
Feb 23, 2021, 1:55:34 AM2/23/21
to kaldi-help
Hi,

I got the following error in few instances while decoding same audio file 1000 times. Most files are decoded successfully but I am getting error in some files.


Error occured in Kaldi loop
ERROR (ThreadedOnlineDecoder[5.5.0~1523-8283]:ExpandStateEnd():grammar-fst.cc:266) FST with index 0 ends with left-context-phone 92 but parent FST does not support that left-context at the return point.

Thanks in advance,

软件开发工作经验

unread,
Feb 23, 2021, 3:07:50 AM2/23/21
to kaldi-help

Hi all friends,when i run wsj/run.sh,run_nnet2.sh,i get errors like

"

steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

LOG (nnet-train-simple[5.5.839~1-0c6a]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 2097152 bytes; current memory info is: free:0M, used:0M, total:0M, free/total:1

LOG (nnet-train-simple[5.5.839~1-0c6a]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 2097152 bytes; current memory info is: free:0M, used:0M, total:0M, free/total:1

LOG (nnet-train-simple[5.5.839~1-0c6a]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 2097152 bytes; current memory info is: free:0M, used:0M, total:0M, free/total:1

# Accounting: time=441 threads=1

"
there is no Error in log,why stoped?

the output of run is:
steps/nnet2/train_multisplice_accel2.sh --stage -10 --exit-stage -100 --num-epochs 8 --num-jobs-initial 1 --num-jobs-final 1 --num-hidden-layers 4 --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --cmvn-opts --norm-means=false --norm-vars=false --num-threads 1 --minibatch-size 512 --parallel-opts --gpu 1 --io-opts --max-jobs-run 12 --initial-effective-lrate 0.005 --final-effective-lrate 0.0005 --cmd run.pl --mem 8G --pnorm-input-dim 2000 --pnorm-output-dim 250 --mix-up 12000 data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a
data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a
steps/nnet2/make_multisplice_configs.py contexts --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 exp/nnet2_online/nnet_ms_a
Namespace(bias_stddev=None, initial_learning_rate=None, ivector_dim=None, lda_dim=None, lda_mat=None, mode='contexts', num_hidden_layers=None, num_targets=None, online_preconditioning_opts=None, output_dir='exp/nnet2_online/nnet_ms_a', pnorm_input_dim=None, pnorm_output_dim=None, splice_indexes='layer0/-1:0:1 layer1/-2:1 layer2/-4:2', total_input_dim=None)
['', '0/-1:0:1 ', '1/-2:1 ', '2/-4:2']
[-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
nnet_left_context=7; nnet_right_context=4 first_left_context=1; first_right_context=1
steps/nnet2/train_multisplice_accel2.sh: calling get_lda.sh
steps/nnet2/get_lda.sh --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --transform-dir exp/tri4b_ali_si284 --left-context 1 --right-context 1 --cmd run.pl --mem 8G data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a
steps/nnet2/get_lda.sh: feature type is raw
feat-to-dim 'ark,s,cs:utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- |' - 
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- 
WARNING (feat-to-dim[5.5.839~1-0c6a]:Close():kaldi-io.cc:515) Pipe utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | had nonzero return status 36096
feat-to-dim scp:exp/nnet2_online/ivectors_train_si284/ivector_online.scp - 
feat-to-dim "ark,s,cs:utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:- |" - 
apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- 
splice-feats --left-context=1 --right-context=1 ark:- ark:- 
paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:- 
ivector-randomize --randomize-prob=0.0 ark:- ark:- 
subsample-feats --n=-10 scp:- ark:- 
WARNING (feat-to-dim[5.5.839~1-0c6a]:Close():kaldi-io.cc:515) Pipe utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:- | had nonzero return status 36096
steps/nnet2/get_lda.sh: Accumulating LDA statistics.
steps/nnet2/get_lda.sh: Finished estimating LDA
steps/nnet2/train_multisplice_accel2.sh: calling get_egs2.sh
steps/nnet2/get_egs2.sh --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --transform-dir exp/tri4b_ali_si284 --left-context 7 --right-context 4 --samples-per-iter 400000 --stage 0 --io-opts --max-jobs-run 12 --cmd run.pl --mem 8G --frames-per-eg 8 data/train_si284_hires exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a/egs
steps/nnet2/get_egs2.sh: feature type is raw
feat-to-dim scp:exp/nnet2_online/ivectors_train_si284/ivector_online.scp - 
steps/nnet2/get_egs2.sh: working out number of frames of training data
steps/nnet2/get_egs2.sh: creating 4 archives, each with 302208 egs, with
steps/nnet2/get_egs2.sh:   8 labels per example, and (left,right) context = (7,4)
steps/nnet2/get_egs2.sh: Getting validation and training subset examples.
steps/nnet2/get_egs2.sh: ... extracting validation and training-subset alignments.
copy-int-vector ark:- ark,t:- 
LOG (copy-int-vector[5.5.839~1-0c6a]:main():copy-int-vector.cc:83) Copied 16961 vectors of int32.
... Getting subsets of validation examples for diagnostics and combination.
steps/nnet2/get_egs2.sh: Generating training examples on disk
steps/nnet2/get_egs2.sh: recombining and shuffling order of archives on disk
steps/nnet2/get_egs2.sh: removing temporary archives
steps/nnet2/get_egs2.sh: Finished preparing training examples
steps/nnet2/train_multisplice_accel2.sh: initializing neural net
steps/nnet2/make_multisplice_configs.py --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 --total-input-dim 140 --ivector-dim 100 --lda-mat exp/nnet2_online/nnet_ms_a/lda.mat --lda-dim 220 --pnorm-input-dim 2000 --pnorm-output-dim 250 --online-preconditioning-opts alpha=4.0 num-samples-history=2000 update-period=4 rank-in=20 rank-out=80 max-change-per-sample=0.075 --initial-learning-rate 0.005 --bias-stddev 0.5 --num-hidden-layers 4 --num-targets 7068 configs exp/nnet2_online/nnet_ms_a
Namespace(bias_stddev=0.5, initial_learning_rate=0.005, ivector_dim=100, lda_dim='220', lda_mat='exp/nnet2_online/nnet_ms_a/lda.mat', mode='configs', num_hidden_layers=4, num_targets=7068, online_preconditioning_opts='alpha=4.0 num-samples-history=2000 update-period=4 rank-in=20 rank-out=80 max-change-per-sample=0.075', output_dir='exp/nnet2_online/nnet_ms_a', pnorm_input_dim=2000, pnorm_output_dim=250, splice_indexes='layer0/-1:0:1 layer1/-2:1 layer2/-4:2', total_input_dim=140)
['', '0/-1:0:1 ', '1/-2:1 ', '2/-4:2']
[-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
Training transition probabilities and setting priors
prepare initial vector for FixedScaleComponent before softmax
use priors^-0.25 and rescale to average 1
insert an additional layer of FixedScaleComponent before softmax
nnet-am-info exp/nnet2_online/nnet_ms_a/0.mdl 
LOG (nnet-am-info[5.5.839~1-0c6a]:main():nnet-am-info.cc:76) Printed info about exp/nnet2_online/nnet_ms_a/0.mdl
nnet-init exp/nnet2_online/nnet_ms_a/per_element.config - 
LOG (nnet-init[5.5.839~1-0c6a]:main():nnet-init.cc:69) Initialized raw neural net and wrote it to -
nnet-insert --insert-at=6 --randomize-next-component=false exp/nnet2_online/nnet_ms_a/0.mdl - exp/nnet2_online/nnet_ms_a/0.mdl 
LOG (nnet-insert[5.5.839~1-0c6a]:main():nnet-insert.cc:106) Inserted 1 components at position 6
LOG (nnet-insert[5.5.839~1-0c6a]:main():nnet-insert.cc:132) Write neural-net acoustic model to exp/nnet2_online/nnet_ms_a/0.mdl
steps/nnet2/train_multisplice_accel2.sh: Will train for 8 epochs = 256 iterations
steps/nnet2/train_multisplice_accel2.sh: Will not do mix up
On iteration 0, learning rate is 0.005.
Training neural net (pass 0)
bash: line 1: 14088 Killed                  ( nnet-train-simple --minibatch-size=256 --srand=0 "nnet-am-copy --learning-rate=0.005 exp/nnet2_online/nnet_ms_a/0.mdl -|" "ark,bg:nnet-copy-egs --frame=0 ark:exp/nnet2_online/nnet_ms_a/egs/egs.1.ark ark:-|nnet-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:-|" exp/nnet2_online/nnet_ms_a/1.1.mdl ) 2>> exp/nnet2_online/nnet_ms_a/log/train.0.1.log >> exp/nnet2_online/nnet_ms_a/log/train.0.1.log
run.pl: job failed, log is in exp/nnet2_online/nnet_ms_a/log/train.0.1.log
steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

Daniel Povey

unread,
Feb 23, 2021, 3:24:30 AM2/23/21
to kaldi-help
Killed; possibly out of memory.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/tencent_A010B636D85FB204B435892D8DFDEC272907%40qq.com.

Daniel Povey

unread,
Feb 23, 2021, 3:25:17 AM2/23/21
to kaldi-help
This error has been reported before but I was not able to fix it because no-one has sent to me the files required to reproduce the error.  If you could do that it would be great.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

软件开发工作经验

unread,
Feb 23, 2021, 3:36:44 AM2/23/21
to kaldi-help
total avaliable mem 61G,swap avaliable mem 1.1G,but I run command like './local/online/run_nnet2.sh' rather than 'run.sh',because i think run.sh use long time,please check this for me,thanks


------------------ 原始邮件 ------------------
发件人: "kaldi-help" <dpo...@gmail.com>;
发送时间: 2021年2月23日(星期二) 下午4:24
收件人: "kaldi-help"<kaldi...@googlegroups.com>;
主题: Re: [kaldi-help] steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

软件开发工作经验

unread,
Feb 23, 2021, 8:41:38 PM2/23/21
to kaldi-help
decrease minibatch-size=64
decrease epochs=4
nvidia-smi -c 1
still have error,is there another way to resolve the problem?
thanks

Daniel Povey

unread,
Feb 23, 2021, 9:37:00 PM2/23/21
to kaldi-help
That nnet2 code is very old and rarely used, I don't want to spend time supporting it.  nnet3 is much better (but requires GPU)

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

软件开发工作经验

unread,
Feb 23, 2021, 9:44:59 PM2/23/21
to kaldi-help
Thanks.
I need a online mode to recognize online data chunk,In the web page,I get a message that dnn with pnorm have good recognize result,and nnet2 is about dnn and pnorm. so if i use nnet3,which dir in egs can i choose?
I have no more time,maybe you can pick up some article to guide me about nnet2,thanks.


------------------ 原始邮件 ------------------
发件人: "kaldi-help" <dpo...@gmail.com>;
发送时间: 2021年2月24日(星期三) 上午10:36
收件人: "kaldi-help"<kaldi...@googlegroups.com>;
主题: Re: [kaldi-help] nnet-train-simple out of memory,

Daniel Povey

unread,
Feb 23, 2021, 9:45:47 PM2/23/21
to kaldi-help
nnet3 can do that too.  That page must have been out of date.  I recommend to follow the mini_librispeech example.

软件开发工作经验

unread,
Feb 23, 2021, 9:51:04 PM2/23/21
to kaldi-help
Can I use nnet3 in wsj?
# # A couple of nnet3 recipes:
# local/nnet3/run_tdnn_baseline.sh  # designed for exact comparison with nnet2 recipe
# local/nnet3/run_tdnn.sh  # better absolute results
# local/nnet3/run_lstm.sh  # lstm recipe
# bidirectional lstm recipe
# local/nnet3/run_lstm.sh --affix bidirectional \
#                         --lstm-delay " [-1,1] [-2,2] [-3,3] " \
#                         --label-delay 0 \
#                         --cell-dim 640 \
#                         --recurrent-projection-dim 128 \
#                         --non-recurrent-projection-dim 128 \
#                         --chunk-left-context 40 \
#                         --chunk-right-context 40



------------------ 原始邮件 ------------------
发件人: "kaldi-help" <dpo...@gmail.com>;
发送时间: 2021年2月24日(星期三) 上午10:45
Reply all
Reply to author
Forward
0 new messages