SRE10 FULL UBM Training on 32K utterance

Steven Du

unread,

Feb 16, 2016, 11:04:49 AM2/16/16

to kaldi-help

Hi,

In egs/SRE08:

The full UBM is first trained on 8k utterances , then trained on all male/female data.

but in SRE10 V1:

The full UBM is only trained once on 32k utterances.

So the questions are :

1) Why not train this full UBM again on all the data?

2) Is it a good idea to train full UBM with lots of data ?

(This question is first posted at https://github.com/kaldi-asr/kaldi/issues/477 )

David Snyder

unread,

Feb 17, 2016, 12:05:23 PM2/17/16

to kaldi-help

In egs/SRE08:

The full UBM is first trained on 8k utterances , then trained on all male/female data.

It's actually trained on only an 8k subset, take a look at the script again.

in SRE10 V1:

The full UBM is only trained once on 32k utterances.

The SRE08 example splits the evaluation into male and female portions, and also trains separate models for each portion. The SRE10 example uses one gender independent UBM and one gender independent i-vector extractor, since we're focusing on the gender independent results in that example (although we also report gender dependent results).

1) Why not train this full UBM again on all the data?

It probably wouldn't help to train it on more data, but less probably won't hurt. If you have all the data referenced in the example and want to train it on less, it might be better to use the --subsample-feats option to reduce the number of frames seen by the models.

2) Is it a good idea to train full UBM with lots of data ?

I doubt that more than 32k will help or hurt.

Steven Du

unread,

Feb 17, 2016, 9:52:56 PM2/17/16

to kaldi-help

Hi,David

Thanks.

Yes, you are right , the SRE08 is trained on 8k subset.

在 2016年2月18日星期四 UTC+8上午1:05:23，David Snyder写道：

Reply all

Reply to author

Forward