Getting error in online2-wav-nnet3-latgen-grammar

crpatel

unread,

Feb 23, 2021, 1:55:34 AM2/23/21

to kaldi-help

Hi,

I got the following error in few instances while decoding same audio file 1000 times. Most files are decoded successfully but I am getting error in some files.

Error occured in Kaldi loop
ERROR (ThreadedOnlineDecoder[5.5.0~1523-8283]:ExpandStateEnd():grammar-fst.cc:266) FST with index 0 ends with left-context-phone 92 but parent FST does not support that left-context at the return point.

Thanks in advance,

软件开发工作经验

unread,

Feb 23, 2021, 3:07:50 AM2/23/21

to kaldi-help

Hi all friends,when i run wsj/run.sh,run_nnet2.sh,i get errors like

"

steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

LOG (nnet-train-simple[5.5.839~1-0c6a]:AllocateNewRegion():cu-allocator.cc:506) About to allocate new memory region of 2097152 bytes; current memory info is: free:0M, used:0M, total:0M, free/total:1

# Accounting: time=441 threads=1

"

there is no Error in log，why stoped?

the output of run is:

steps/nnet2/train_multisplice_accel2.sh --stage -10 --exit-stage -100 --num-epochs 8 --num-jobs-initial 1 --num-jobs-final 1 --num-hidden-layers 4 --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --cmvn-opts --norm-means=false --norm-vars=false --num-threads 1 --minibatch-size 512 --parallel-opts --gpu 1 --io-opts --max-jobs-run 12 --initial-effective-lrate 0.005 --final-effective-lrate 0.0005 --cmd run.pl --mem 8G --pnorm-input-dim 2000 --pnorm-output-dim 250 --mix-up 12000 data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a

data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a

steps/nnet2/make_multisplice_configs.py contexts --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 exp/nnet2_online/nnet_ms_a

Namespace(bias_stddev=None, initial_learning_rate=None, ivector_dim=None, lda_dim=None, lda_mat=None, mode='contexts', num_hidden_layers=None, num_targets=None, online_preconditioning_opts=None, output_dir='exp/nnet2_online/nnet_ms_a', pnorm_input_dim=None, pnorm_output_dim=None, splice_indexes='layer0/-1:0:1 layer1/-2:1 layer2/-4:2', total_input_dim=None)

['', '0/-1:0:1 ', '1/-2:1 ', '2/-4:2']

[-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

nnet_left_context=7; nnet_right_context=4 first_left_context=1; first_right_context=1

steps/nnet2/train_multisplice_accel2.sh: calling get_lda.sh

steps/nnet2/get_lda.sh --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --transform-dir exp/tri4b_ali_si284 --left-context 1 --right-context 1 --cmd run.pl --mem 8G data/train_si284_hires data/lang exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a

steps/nnet2/get_lda.sh: feature type is raw

feat-to-dim 'ark,s,cs:utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- |' -

apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:-

WARNING (feat-to-dim[5.5.839~1-0c6a]:Close():kaldi-io.cc:515) Pipe utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | had nonzero return status 36096

feat-to-dim scp:exp/nnet2_online/ivectors_train_si284/ivector_online.scp -

feat-to-dim "ark,s,cs:utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:- |" -

apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:-

splice-feats --left-context=1 --right-context=1 ark:- ark:-

paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:-

ivector-randomize --randomize-prob=0.0 ark:- ark:-

subsample-feats --n=-10 scp:- ark:-

WARNING (feat-to-dim[5.5.839~1-0c6a]:Close():kaldi-io.cc:515) Pipe utils/subset_scp.pl --quiet 434 data/train_si284_hires/split23/1/feats.scp | apply-cmvn --norm-means=false --norm-vars=false --utt2spk=ark:data/train_si284_hires/split23/1/utt2spk scp:data/train_si284_hires/split23/1/cmvn.scp scp:- ark:- | splice-feats --left-context=1 --right-context=1 ark:- ark:- | paste-feats --length-tolerance=10 ark:- 'ark,s,cs:utils/filter_scp.pl data/train_si284_hires/split23/1/utt2spk exp/nnet2_online/ivectors_train_si284/ivector_online.scp | subsample-feats --n=-10 scp:- ark:- | ivector-randomize --randomize-prob=0.0 ark:- ark:- |' ark:- | had nonzero return status 36096

steps/nnet2/get_lda.sh: Accumulating LDA statistics.

steps/nnet2/get_lda.sh: Finished estimating LDA

steps/nnet2/train_multisplice_accel2.sh: calling get_egs2.sh

steps/nnet2/get_egs2.sh --cmvn-opts --norm-means=false --norm-vars=false --feat-type raw --online-ivector-dir exp/nnet2_online/ivectors_train_si284 --transform-dir exp/tri4b_ali_si284 --left-context 7 --right-context 4 --samples-per-iter 400000 --stage 0 --io-opts --max-jobs-run 12 --cmd run.pl --mem 8G --frames-per-eg 8 data/train_si284_hires exp/tri4b_ali_si284 exp/nnet2_online/nnet_ms_a/egs

steps/nnet2/get_egs2.sh: feature type is raw

feat-to-dim scp:exp/nnet2_online/ivectors_train_si284/ivector_online.scp -

steps/nnet2/get_egs2.sh: working out number of frames of training data

steps/nnet2/get_egs2.sh: creating 4 archives, each with 302208 egs, with

steps/nnet2/get_egs2.sh: 8 labels per example, and (left,right) context = (7,4)

steps/nnet2/get_egs2.sh: Getting validation and training subset examples.

steps/nnet2/get_egs2.sh: ... extracting validation and training-subset alignments.

copy-int-vector ark:- ark,t:-

LOG (copy-int-vector[5.5.839~1-0c6a]:main():copy-int-vector.cc:83) Copied 16961 vectors of int32.

... Getting subsets of validation examples for diagnostics and combination.

steps/nnet2/get_egs2.sh: Generating training examples on disk

steps/nnet2/get_egs2.sh: recombining and shuffling order of archives on disk

steps/nnet2/get_egs2.sh: removing temporary archives

steps/nnet2/get_egs2.sh: Finished preparing training examples

steps/nnet2/train_multisplice_accel2.sh: initializing neural net

steps/nnet2/make_multisplice_configs.py --splice-indexes layer0/-1:0:1 layer1/-2:1 layer2/-4:2 --total-input-dim 140 --ivector-dim 100 --lda-mat exp/nnet2_online/nnet_ms_a/lda.mat --lda-dim 220 --pnorm-input-dim 2000 --pnorm-output-dim 250 --online-preconditioning-opts alpha=4.0 num-samples-history=2000 update-period=4 rank-in=20 rank-out=80 max-change-per-sample=0.075 --initial-learning-rate 0.005 --bias-stddev 0.5 --num-hidden-layers 4 --num-targets 7068 configs exp/nnet2_online/nnet_ms_a

Namespace(bias_stddev=0.5, initial_learning_rate=0.005, ivector_dim=100, lda_dim='220', lda_mat='exp/nnet2_online/nnet_ms_a/lda.mat', mode='configs', num_hidden_layers=4, num_targets=7068, online_preconditioning_opts='alpha=4.0 num-samples-history=2000 update-period=4 rank-in=20 rank-out=80 max-change-per-sample=0.075', output_dir='exp/nnet2_online/nnet_ms_a', pnorm_input_dim=2000, pnorm_output_dim=250, splice_indexes='layer0/-1:0:1 layer1/-2:1 layer2/-4:2', total_input_dim=140)

['', '0/-1:0:1 ', '1/-2:1 ', '2/-4:2']

[-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4]

Training transition probabilities and setting priors

prepare initial vector for FixedScaleComponent before softmax

use priors^-0.25 and rescale to average 1

insert an additional layer of FixedScaleComponent before softmax

nnet-am-info exp/nnet2_online/nnet_ms_a/0.mdl

LOG (nnet-am-info[5.5.839~1-0c6a]:main():nnet-am-info.cc:76) Printed info about exp/nnet2_online/nnet_ms_a/0.mdl

nnet-init exp/nnet2_online/nnet_ms_a/per_element.config -

LOG (nnet-init[5.5.839~1-0c6a]:main():nnet-init.cc:69) Initialized raw neural net and wrote it to -

nnet-insert --insert-at=6 --randomize-next-component=false exp/nnet2_online/nnet_ms_a/0.mdl - exp/nnet2_online/nnet_ms_a/0.mdl

LOG (nnet-insert[5.5.839~1-0c6a]:main():nnet-insert.cc:106) Inserted 1 components at position 6

LOG (nnet-insert[5.5.839~1-0c6a]:main():nnet-insert.cc:132) Write neural-net acoustic model to exp/nnet2_online/nnet_ms_a/0.mdl

steps/nnet2/train_multisplice_accel2.sh: Will train for 8 epochs = 256 iterations

steps/nnet2/train_multisplice_accel2.sh: Will not do mix up

On iteration 0, learning rate is 0.005.

Training neural net (pass 0)

bash: line 1: 14088 Killed ( nnet-train-simple --minibatch-size=256 --srand=0 "nnet-am-copy --learning-rate=0.005 exp/nnet2_online/nnet_ms_a/0.mdl -|" "ark,bg:nnet-copy-egs --frame=0 ark:exp/nnet2_online/nnet_ms_a/egs/egs.1.ark ark:-|nnet-shuffle-egs --buffer-size=5000 --srand=0 ark:- ark:-|" exp/nnet2_online/nnet_ms_a/1.1.mdl ) 2>> exp/nnet2_online/nnet_ms_a/log/train.0.1.log >> exp/nnet2_online/nnet_ms_a/log/train.0.1.log

run.pl: job failed, log is in exp/nnet2_online/nnet_ms_a/log/train.0.1.log

steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

Daniel Povey

unread,

Feb 23, 2021, 3:24:30 AM2/23/21

to kaldi-help

Killed; possibly out of memory.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/tencent_A010B636D85FB204B435892D8DFDEC272907%40qq.com.

Daniel Povey

unread,

Feb 23, 2021, 3:25:17 AM2/23/21

to kaldi-help

This error has been reported before but I was not able to fix it because no-one has sent to me the files required to reproduce the error. If you could do that it would be great.

--

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/e9b40ffc-4a72-412b-8c7b-5940a5e1f76ao%40googlegroups.com.

软件开发工作经验

unread,

Feb 23, 2021, 3:36:44 AM2/23/21

to kaldi-help

total avaliable mem 61G,swap avaliable mem 1.1G,but I run command like './local/online/run_nnet2.sh' rather than 'run.sh',because i think run.sh use long time,please check this for me,thanks

------------------ 原始邮件 ------------------

发件人: "kaldi-help" <dpo...@gmail.com>;

发送时间: 2021年2月23日(星期二) 下午4:24

收件人: "kaldi-help"<kaldi...@googlegroups.com>;

主题: Re: [kaldi-help] steps/nnet2/train_multisplice_accel2.sh: error on iteration 0 of training

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuySKRJTAM3DQpEXZNxpyayaoJ0RsfuWyuwN1VJYn6iT4Ow%40mail.gmail.com.

软件开发工作经验

unread,

Feb 23, 2021, 8:41:38 PM2/23/21

to kaldi-help

decrease minibatch-size=64

decrease epochs=4

nvidia-smi -c 1

still have error,is there another way to resolve the problem?

thanks

Daniel Povey

unread,

Feb 23, 2021, 9:37:00 PM2/23/21

to kaldi-help

That nnet2 code is very old and rarely used, I don't want to spend time supporting it. nnet3 is much better (but requires GPU)

--

Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/tencent_7763ED748B665FEFA905E4BA4DE5F73C8105%40qq.com.

软件开发工作经验

unread,

Feb 23, 2021, 9:44:59 PM2/23/21

to kaldi-help

Thanks.

I need a online mode to recognize online data chunk,In the web page,I get a message that dnn with pnorm have good recognize result,and nnet2 is about dnn and pnorm. so if i use nnet3,which dir in egs can i choose?

I have no more time,maybe you can pick up some article to guide me about nnet2,thanks.

------------------ 原始邮件 ------------------

发件人: "kaldi-help" <dpo...@gmail.com>;

发送时间: 2021年2月24日(星期三) 上午10:36

收件人: "kaldi-help"<kaldi...@googlegroups.com>;

主题: Re: [kaldi-help] nnet-train-simple out of memory,

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyTWjsGQQtU9boC02HPqHBge_ujg7cOvmfc6vAZ4NXdszQ%40mail.gmail.com.

Daniel Povey

unread,

Feb 23, 2021, 9:45:47 PM2/23/21

to kaldi-help

nnet3 can do that too. That page must have been out of date. I recommend to follow the mini_librispeech example.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/tencent_A6AA09276B2B5B6AFBFC4F3F200D6DE6BF08%40qq.com.

软件开发工作经验

unread,

Feb 23, 2021, 9:51:04 PM2/23/21

to kaldi-help

Can I use nnet3 in wsj?

# # A couple of nnet3 recipes:

# local/nnet3/run_tdnn_baseline.sh # designed for exact comparison with nnet2 recipe

# local/nnet3/run_tdnn.sh # better absolute results

# local/nnet3/run_lstm.sh # lstm recipe

# bidirectional lstm recipe

# local/nnet3/run_lstm.sh --affix bidirectional \

# --lstm-delay " [-1,1] [-2,2] [-3,3] " \

# --label-delay 0 \

# --cell-dim 640 \

# --recurrent-projection-dim 128 \

# --non-recurrent-projection-dim 128 \

# --chunk-left-context 40 \

# --chunk-right-context 40

------------------ 原始邮件 ------------------

发件人: "kaldi-help" <dpo...@gmail.com>;

发送时间: 2021年2月24日(星期三) 上午10:45

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyQ4f4GG6bTk5Lhns%3Dn1-avZTj7WyFFk%3DHaG61GP0WrVwQ%40mail.gmail.com.

Reply all

Reply to author

Forward