Are there available recipes for training nnet3 models but without i-vectors ?

2,213 views
Skip to first unread message

anind...@snips.ai

unread,
Apr 18, 2017, 6:05:14 AM4/18/17
to kaldi-help, Mael Primet
Dear all, 

Are there available recipes for training nnet3 models but without using i-vectors ? 

I might be wrong, but all nnet3 recipes I checked until now included i-vectors. So I started editing the scripts here https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/run_tdnn.sh to try to remove the i-vector-related code. But this has proved to be somewhat involved, as various scripts involving i-vectors are called from this top one, e.g. https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/run_tdnn.sh#L58  , https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/run_tdnn.sh#L97 , and https://github.com/kaldi-asr/kaldi/blob/master/egs/librispeech/s5/local/chain/run_tdnn.sh#L123

In this context, any feedback on existing nnet3 recipes without i-vectors, or advice regarding editing the existing scripts to remove the i-vector integration would be appreciated. 

Thanks.

ASR_OCEAN

unread,
Apr 18, 2017, 6:25:01 AM4/18/17
to kaldi-help, mael....@snips.ai
Hello,

You can check in the updated kaldi swbd recipe,

anind...@snips.ai

unread,
Apr 18, 2017, 6:37:57 AM4/18/17
to kaldi-help, mael....@snips.ai
Hello,

Thanks for the quick response.

Following your advice, I checked the most recent nnet3 and chain recipes: 

Recipe 1:
-> uses i-vector.

Recipe 2:
linked to
-> uses i-vector.

and both seem to use i-vectors.

Maybe I am missing something ? 

Thanks and regards.

Maël Primet

unread,
Apr 18, 2017, 12:41:15 PM4/18/17
to kaldi-help, mael....@snips.ai
It seems indeed not to be straightforward to update all the scripts to remove ivector computation, as there seems to be many dependencies between the scripts

Perhaps it would be better to keep the ivector computation commands in the scripts to minimize the number of changes, but change the nnet definition so it does not use the ivectors when training the neural network?

If I manage to do a recipe without ivectors I can send it if people are interested, but I'm wondering if there is a suggestion for the best way to do this?

Daniel Povey

unread,
Apr 18, 2017, 1:01:58 PM4/18/17
to kaldi-help, mael....@snips.ai
You can just remove any ivector-related things from the xconfig definition and remove any ivector-related options from the invocation of train.py and from the decoding script.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

anind...@snips.ai

unread,
Apr 18, 2017, 1:06:12 PM4/18/17
to kaldi-help, mael....@snips.ai, dpo...@gmail.com
Hello Dan,

Thanks for the advice, duly noted.

Regards.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Vijayaditya Peddinti

unread,
Apr 18, 2017, 2:19:31 PM4/18/17
to kaldi-help, mael....@snips.ai, Daniel Povey
Usually I find that adding mean normalization of the features helps a lot, when ivectors are removed.

--Vijay

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

Vassil Panayotov

unread,
Apr 19, 2017, 4:09:48 AM4/19/17
to kaldi...@googlegroups.com, mael....@snips.ai, Daniel Povey
Vijay, I remember you and Dan recommending the use of CMN when possible..
What fraction of the accuracy lost due to not using iVectors could be typically regained by using mean normalization(w/o ivectors, I gather they are complementary to some extent)? I understand that it probably depends on the context, such as domain, accent and so on, but perhaps you can recall some approximate numbers?

As far as I understand people are mostly concerned about computation cost of ivector extraction on resource-constrained devices. Are there any alternatives to CMN or ivectors with reduced dimensions, that are easier to compute than ivectors and are not much worse in terms of adaptation? Also could there be a faster way to compute ivectors than what's implemented in Kaldi(I'm not familiar with that part of the toolkit)? I just ran a quick search and the first result was a paper named "Efficient approximated i-vector extraction" by Aronowitz and Barkan, where they compare different approximate methods (http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202012/pdfs/0004789.pdf). Is Kaldi's implementation already optimized along these lines?

Vassil

anind...@snips.ai

unread,
Apr 19, 2017, 4:25:02 AM4/19/17
to kaldi-help, mael....@snips.ai, dpo...@gmail.com
Hello Vijay and Vassil,

Thanks for the advice, duly noted.

Quite relevant points, I would be curious to know the answers too.

Regards.

Daniel Povey

unread,
Apr 19, 2017, 1:18:52 PM4/19/17
to anind...@snips.ai, kaldi-help, Mael Primet
Regarding the speed of iVector extraction, we only use small dimensional iVectors (dim=100) and use various tricks; the speed of iVector extraction is not an issue we even think about as the nnet computation always dominates.

You'll have to wait for Vijay RE the WER-related things, I don't recall.

Vijayaditya Peddinti

unread,
Apr 19, 2017, 2:40:30 PM4/19/17
to kaldi-help, anind...@snips.ai, Mael Primet
The first row in the table corresponds to the CMN processed features. IIRC without CMN or iVectors the performance was abysmal.





@Dan

Could you please add the ASRU paper I attached to your site.

Peddinti, V., Chen, G., Manohar, V., Ko, T., Povey, D., & Khudanpur, S. (2015, December). JHU aspire system: Robust LVCSR with TDNNs, ivector adaptation and RNN-LMs. In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on (pp. 539-546). IEEE.
Chicago


--Vijay



To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
jhu-aspire-system.pdf

Vijayaditya Peddinti

unread,
Apr 19, 2017, 2:44:33 PM4/19/17
to kaldi-help, anind...@snips.ai, Mael Primet
Forgot to mention that in the ASpIRE setup there is one caveat.

"We noticed that the iVector adaptation was not sufficiently effective in adapting to test signals that had substantially different energy levels than the training data. For the results reported here, this issue was resolved by normalizing the test-signal energies to be the same as the average of the training data. We compare this approach with volume perturbation approach (2.3.1) in Section 4."

However I believe the comparison in the table above would hold in normal LVCSR tasks even without this global scaling, as ASpIRE data had a lot of energy variation which I have not seen in other Kaldi test sets till now.

--Vijay

anind...@snips.ai

unread,
Apr 20, 2017, 5:06:42 AM4/20/17
to kaldi-help, anind...@snips.ai, mael....@snips.ai
Hello Dan and Vijay,

Thanks for the detailed feedback, duly noted.

Regards.

- Anindya . 

Vassil Panayotov

unread,
Apr 20, 2017, 5:14:27 AM4/20/17
to kaldi...@googlegroups.com, anind...@snips.ai, Mael Primet
Thanks a lot Vijay, this is very helpful! So basically iVectors still give ~10% reduction in WER, even after CMN is applied.

Dan, I know that ivector computation is not an issue on the server, because as you say the nnet calculations are likely to dominate the run time. I haven't done any experiments in that direction, but I seem to remember that iVector extraction was still a slowish operation, when compared to, say, feature extraction. So I thought it may become an issue for someone who is interested in using "leaner" nnet models, on less powerful platforms. This was just a guess, however, and I could be wrong.

Vassil

Vassil Panayotov

unread,
Apr 20, 2017, 5:50:18 AM4/20/17
to kaldi...@googlegroups.com, anind...@snips.ai, Mael Primet
Actually, scratch that. I've found some old log files on my laptop, and iVector extraction seem to be only 1.5 times slower than feature extraction, so it's unlikely to be an issue after all(even though, this is with an i5 CPU, and I don't know how fast/slow it will be on ARM, for example).
Anyways, sorry for the noise :)

Vassil

Mike Kim

unread,
Aug 28, 2020, 4:33:08 AM8/28/20
to kaldi-help

Mike Kim

unread,
Aug 28, 2020, 4:36:31 AM8/28/20
to kaldi-help
Reply all
Reply to author
Forward
0 new messages