online decode

aliiire...@gmail.com

unread,

Jan 13, 2019, 8:29:52 AM1/13/19

to kaldi-help

Hi

when using "online2-wav-nnet3-latgen-faster" to decoding.Seem that. first, it loads models in ram memory and then decodes. At end of decoding, ram free so if want to decode another wave must load models in ram memory.

Is it possible to keep the model in ram and decode new waves?

best regards

Daniel Povey

unread,

Jan 13, 2019, 1:31:25 PM1/13/19

to kaldi-help

When the program exits it will obviously free all resources, but it only loads the model once. You'd have to write some kind of server program based on it if you wanted to keep the things in memory.

There is a PR here https://github.com/kaldi-asr/kaldi/pull/2938

Dan

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/509e204b-f288-42b8-9f08-0350b0d315ce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

aliiire...@gmail.com

unread,

Feb 26, 2019, 3:38:47 AM2/26/19

to kaldi-help

Thanks all

I clone that PR and make src again, but It said that "bash: online2-net-nnet3-latgen-faster: command not found"

what wrong about it?

akbar

unread,

Mar 2, 2019, 11:56:55 AM3/2/19

to kaldi-help

sorry about that. I test it, really good. thanks all

in my testing, I see that in decoding, it not good on detect start and end of speech well and cut the end of the speech or can't recognize it, also constant phones (non vowel) not recognized well, how can improve it?

what configures of chain you suggest for online decoding?

I used this:


online2-net-nnet3-latgen-faster --samp-freq=16000 --frames-per-chunk=20 --extra-left-context-initial=0 \
	--frame-subsampling-factor=3 --config=conf/conf/online.conf --min-active=200 --max-active=7000 \
	--beam=15.0 --lattice-beam=6.0 --acoustic-scale=1.0 \
	--debug-computation=true --computation.debug=true \
	exp/chain/tdnn3g_sp/final.mdl exp/chain/tree_a_sp//graph/HCLG.fst exp/chain/tree_a_sp//graph/words.txt 7000

best regards

Daniel Povey

unread,

Mar 2, 2019, 11:58:43 AM3/2/19

to kaldi-help

Possibly there is some kind of mismatch in acoustic conditions between training and test?

Data augmentation during training (adding noise/reverb) often helps in these scenarios, but it may be complicated for you to set up.

Dan

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/3c92019d-222b-451d-ac2c-7583baeb324c%40googlegroups.com.

akbar

unread,

Mar 2, 2019, 12:38:25 PM3/2/19

to kaldi-help

thanks

a bit question. when used utils/data/perturb_data_dir_speed_3way.sh the utt2uniq like this, have prefix:

sp1.1-sk005701_c75321d7-3495-42e1-9275-0f93e9a92061 sk005701_c75321d7-3495-42e1-9275-0f93e9a92061

however when I augment data utts have prefix like below:

sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916-babble

and utt2uniq like this:

sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916 sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916
sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916-babble sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916
sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916-noise sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916
sp0.9-sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916 sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916
sp1.1-sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916 sk000002_1a0be8f0-7086-4391-adc9-c07b7903c916

does it have problem in training chain mode?

if yes, is it possible to change the suffix to prefix, since my I train data (extract i-vector , ...) and it almost ready to train chain model on GPU?

thanks

Daniel Povey

unread,

Mar 2, 2019, 12:39:17 PM3/2/19

to kaldi-help, David Snyder

I don't think there is a problem (david might know).

Just run the script, if it runs, it is probably fine.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/55b546c6-7b32-4d8f-ae4e-ccbc2e44d9ea%40googlegroups.com.

David Snyder

unread,

Mar 3, 2019, 4:56:02 PM3/3/19

to kaldi-help

I forwarded this email to Phani who might know better.

Akbar, what scripts did you use to augment your data? Where do the "-babble" and "-noise" suffices come from?

akbar

unread,

Mar 3, 2019, 5:59:43 PM3/3/19

to kaldi-help

thank all

I follow SRE16,

https://github.com/kaldi-asr/kaldi/blob/2e26464accd93b6d9949406e2dbda9450273e5f1/egs/sre16/v2/run.sh#L148

I didn't attend to perturb_data_dir_speed_3way.sh and I had to set augment_data_dir.py parameter to augment data like perturb_data.

I want to know that those suffixes hurt train

(https://github.com/kaldi-asr/kaldi/blob/2e26464accd93b6d9949406e2dbda9450273e5f1/egs/wsj/s5/local/chain/tuning/run_tdnn_1g.sh#L213)

in chain model and must change to prefixes?

best regards

Daniel Povey

unread,

Mar 3, 2019, 6:02:11 PM3/3/19

to kaldi-help

The real difference is in whether the speaker identities are changed when you add babble, noise, etc. This can make a bit of difference in systems with i-vectors. For speaker-identification applications you want the identities to remain the same, but for ASR they should probably be different, since we want the i-vectors to reflect the acoustic conditions and not just the speaker identity. But it could also be more robust to train with mismatched i-vectors. Bottom line: it probably won't make a huge difference; we may do some tests to figure out the best strategy, but for now, you can just ignore the issues.

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/4f8e03f1-81eb-460d-a16d-cb7c4b5db9c3%40googlegroups.com.

David Snyder

unread,

Mar 3, 2019, 6:03:47 PM3/3/19

to kaldi-help

Unfortunately the speaker recognition augmentation script (augment_data_dir.py) doesn't currently work for ASR, without some modifications. Phani might be able to explain what those are.

For now, you need to use the script reverberate_data_dir.py. This also supports adding noises. You can find example usage in the ASPIRE recipe.

akbar

unread,

Mar 18, 2019, 12:34:10 PM3/18/19

to kaldi-help

Hi Dan

In my idea, online decoding doesn't do rescore since errors get worse.

Is it possible to add some rescoring to this scripts to get the better error? e.g. it gets time-consuming of 1/4 wave duration!

how can I change them?

Daniel Povey

unread,

Mar 18, 2019, 12:34:45 PM3/18/19

to kaldi-help

It outputs lattices the same as any other decoding script, and they can be rescored in the same way.

--

Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/953b35d2-a9e7-43af-8bb3-d2b6ff60756b%40googlegroups.com.

akbar

unread,

Apr 13, 2019, 4:09:39 PM4/13/19

to kaldi-help

Possibly there is some kind of mismatch in acoustic conditions between training and test?
Data augmentation during training (adding noise/reverb) often helps in these scenarios, but it may be complicated for you to set up.

Hi Dan

I augment data and as I said used script steps/data/augment_data_dir.py and steps/data/reverberate_data_dir.py.

also, I do speed perturb and volume perturb. There is 3-way speed perturb + 3 fold augmentation equal 7 fold data.

my dataset is about 250 hours.

I used wsj tdnn configure network.

The result decode by nnet3-latgen-faster-parallel is the same the model that used only 3-way speed perturbed. both wer is about 10%.

but online result used online2-wav-nnet3-latgen-faster is so terrible.

in the model used only 3-way sp WER is about 25%

%WER 25.50 [ 14772 / 57940, 1875 ins, 2772 del, 10125 sub ] exp/chain/tdnn3g_sp_online/decode_bglm/wer_7_1.0

but in the model used augmention data WER is 98% !!!!!

%WER 98.66 [ 57166 / 57940, 482 ins, 39417 del, 17267 sub ]

it cant recognition utterances. the result like bellow

hyp *** *** *** *** *** *** <unk>

op D D D D D D S

hyp *** *** *** *** *** *** *** *** *** <unk>

op D D D D D D D D D S

what do you suggest? what is wrong about it?

Daniel Povey

unread,

Apr 13, 2019, 4:11:21 PM4/13/19

to kaldi-help

Hi Dan
I augment data and as I said used script steps/data/augment_data_dir.py and steps/data/reverberate_data_dir.py.
also, I do speed perturb and volume perturb. There is 3-way speed perturb + 3 fold augmentation equal 7 fold data.
my dataset is about 250 hours.
I used wsj tdnn configure network.
The result decode by nnet3-latgen-faster-parallel is the same the model that used only 3-way speed perturbed. both wer is about 10%.

The multi-style training / reverb+noise stuff mostly helps for out-of-domain data, it may not

help much for data from the same source.

Regarding your problems when running online2-wav-nnet3-latgen-faster: that sounds like

a script bug, e.g. using the wrong config or graph or something.

Dan

but online result used online2-wav-nnet3-latgen-faster is so terrible.
in the model used only 3-way sp WER is about 25%
%WER 25.50 [ 14772 / 57940, 1875 ins, 2772 del, 10125 sub ] exp/chain/tdnn3g_sp_online/decode_bglm/wer_7_1.0
but in the model used augmention data WER is 98% !!!!!
%WER 98.66 [ 57166 / 57940, 482 ins, 39417 del, 17267 sub ]

it cant recognition utterances. the result like bellow
hyp *** *** *** *** *** *** <unk>
op D D D D D D S

hyp *** *** *** *** *** *** *** *** *** <unk>
op D D D D D D D D D S

what do you suggest? what is wrong about it?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/68f0619f-9a5e-42ad-82ab-f97779391453%40googlegroups.com.

Alex Gurianov

unread,

Apr 15, 2019, 2:15:28 AM4/15/19

to kaldi-help

Hello,

Try to decode without endpoint logic or change ep settings

See code comments https://github.com/kaldi-asr/kaldi/blob/master/src/online2/online-endpoint.h

According to online2-wav-nnet3-latgen-faster.cc implimentation:

If a file starts with silence you could have "* * * * *" as the result If a file has silence after some speech you could have smth like "text * * * * * " as the result


 if (do_endpointing && decoder.EndpointDetected(endpoint_opts)) {
    break;
}

Best regards,
Alex

суббота, 13 апреля 2019 г., 23:11:21 UTC+3 пользователь Dan Povey написал:

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi...@googlegroups.com.

Reply all

Reply to author

Forward