Whitening in speaker diarisation with ivectors

Rémi Francis

unread,

Jan 18, 2018, 1:16:55 PM1/18/18

to kaldi-help

Hi,

In the diarization work that has been recently merged, at stage 4 the plda is computed on "whitened" ivectors using a separate data set.

Do you have any resource that explain what this is about?

Best regards.

Matthew Maciejewski

unread,

Jan 18, 2018, 2:46:35 PM1/18/18

to kaldi-help

Hi,

This usage of whitening is fairly standard to speaker recognition and ivector diarization pipelines. It serves to "gaussianize" the ivectors, while also performing some kind of adaptation to the test set (the PLDA is trained on SRE ivectors while being whitened using a CallHome transform).

This is covered in Daniel Garcia-Romero's PhD thesis, "Robust Speaker Recognition Based on Latent Variable Models", in sections 5.3.1 and 5.3.2.

Thanks,

Matt

entn-at

unread,

Jan 18, 2018, 2:46:44 PM1/18/18

to kaldi-help

Whitening transform is sometimes also called "Radial Gaussianization". It is a common step in speaker verification in addition to length normalization, before training/scoring with a (Gaussian) PLDA model. See Garcia-Romero, Espy-Wilson (2011) Analysis of I-vector Length Normalization in Speaker Recognition Systems. Interspeech 2011.

Armando

unread,

Jan 19, 2018, 3:08:24 AM1/19/18

to kaldi-help

does anyknow know where I can find the PDF of this thesis (if it's available)? cannot find right now, just the astract

entn-at

unread,

Jan 19, 2018, 3:14:36 AM1/19/18

to kaldi-help

https://drum.lib.umd.edu/handle/1903/13092

Armando

unread,

Jan 19, 2018, 3:21:07 AM1/19/18

to kaldi-help

oops, didnt'see the pdf was available also
thanks

Rémi Francis

unread,

Jan 19, 2018, 8:57:16 AM1/19/18

to kaldi-help

So, how critical is it that the whitening is done with the in-domain data?

David Snyder

unread,

Jan 19, 2018, 10:52:02 AM1/19/18

to kaldi-help

So, how critical is it that the whitening is done with the in-domain data?

It's not critical, but how much it helps will depend on your data, and how great of a domain mismatch you have.

I tested this on the callhome_diarization recipe. The first row is with in-domain (IND) whitening (which is what the recipe currently does). The second row whitens with the out-of-domain (OOD) PLDA training data. The last row omits whitening altogether. So, if you don't have in-domain data to whiten to, it's probably best to just omit it.

DER(%)	Supervised Calibration	Oracle # Speakers
IND whiten	11.04	8.62
OOD whiten	14.61	11.56
No whiten	12.57	10.33

Rémi Francis

unread,

Jan 19, 2018, 1:08:50 PM1/19/18

to kaldi-help

Thanks. In that case you still do length normalization right?

Also, what are the differences in the ivectors from the `sid` and `diarization` dirs, compared to the ones used for neural nets?

David Snyder

unread,

Jan 19, 2018, 3:21:50 PM1/19/18

to kaldi-help

Thanks. In that case you still do length normalization right?

Yes, it's a good idea in either case.

Also, what are the differences in the ivectors from the `sid` and `diarization` dirs, compared to the ones used for neural nets?

I think the differences are mostly at the script-level, and are due to differences in how the features are prepared (e.g., ASR recipes splice together features, and apply PCA or LDA), and how i-vectors are extracted (e.g., online or offline). Also, the ASR recipes use a diagonal covariance UBM, whereas the speaker ID and diarization recipes use a full covariance UBM.

Rémi Francis

unread,

Jan 26, 2018, 12:28:41 PM1/26/18

to kaldi-help

Does it make sense to perform the whitening with the test data?

David Snyder

unread,

Jan 26, 2018, 2:10:52 PM1/26/18

to kaldi-help

Does it make sense to perform the whitening with the test data?

It depends.

If you assume that the data you need to diarize is given to you in one large pile, then there's no reason you can't use it for centering/whitening (it's completely unsupervised). If you need to perform diarization one file at a time using a fixed system, then it makes less sense to use your test data for diarization. All NIST SREs that I'm aware of use the latter assumption.

The callhome_diarization recipe takes a middle approach... We split the test dataset into two halves. Each half is treated as a heldout dataset for the other half. Things like centering and whitening are computed on one half and used on the other half.

Nicanor García

unread,

Feb 12, 2018, 2:34:14 PM2/12/18

to kaldi-help

What is the difference between ivectors-transform and transform-vec used in this recipe?

David Snyder

unread,

Feb 12, 2018, 2:39:45 PM2/12/18

to kaldi-help

They appear to be functionally equivalent (compare https://github.com/kaldi-asr/kaldi/blob/master/src/bin/transform-vec.cc with https://github.com/kaldi-asr/kaldi/blob/master/src/ivectorbin/ivector-transform.cc). The binary ivector-transform prints out a little more information, though.

Reply all

Reply to author

Forward