the score of gmm-ubm

32 views

Skip to first unread message

Kevin Paul

unread,

May 12, 2021, 10:40:31 PM5/12/21

to bob-devel

HI,

I am trying to use gmm-ubm to distinguish between natural and synthtic speeches.

I choose cqcc as input feature from https://github.com/stoneMo/ASVspoof/tree/main/CQCC. I trained ubm using a mixed set of natural and synthtic speeches:

Xf = np.load('Xf_ubm_cqcc_orig.npy',allow_pickle=True)

algoritmo_orig = bgmm.algorithm.GMM(number_of_gaussians=256, kmeans_training_iterations=25, gmm_training_iterations=25, training_threshold=0.0, variance_threshold=0.0005, update_weights=True, update_means=True, update_variances=True, relevance_factor=4, gmm_enroll_iterations=1, responsibility_threshold=0, INIT_SEED=5489)

algoritmo_orig.train_projector([np.vstack(Xf)],project_folder+"ubm_20_orig_Projector.hdf5")

Then I used natural speeches to enroll the gmm-ubm for natural speeches, as well as synthtic speeches for synthtic gmm-ubm:

algoritmo.load_projector(project_folder+"ubm_20_orig_Projector.hdf5")

Xf = np.load('Xf_gmm_ubm_nat_cqcc_orig.npy', allow_pickle=True)

modelo = algoritmo.enroll([np.vstack(Xf)])

archivo_modelo = project_folder + "ubm_orig_20_nat_modelo.hdf5"

algoritmo.write_model(modelo, archivo_modelo)

After that I tried to score test speeches, but for some speeches, I got a positive number, for others I got a negative number. I don't know the reason.

algoritmo_orig.load_projector('ubm_20_orig_Projector.hdf5')

modelo_orig_nat = algoritmo_orig.read_model('ubm_orig_20_nat_modelo.hdf5')

modelo_orig_synt = algoritmo_orig.read_model('ubm_orig_20_synt_modelo.hdf5')

ubm_orig_feature = algoritmo_orig.project(CQcc)

gmm_orig_nat_score[k] = algoritmo_orig.score(modelo_orig_nat, ubm_orig_feature)

gmm_orig_synt_score[k] = algoritmo_orig.score(modelo_orig_synt, ubm_orig_feature)

Do the bigger absolute values mean the bigger possibility?

Pavel Korshunov

unread,

May 19, 2021, 9:09:37 AM5/19/21

to bob-devel

Hi,

It is not really clear to me what you are trying to do here. A typical approach to using GMM for distinguishing real speech from fake is to train 2 GMMs: one for real speech and another for fake speech. I haven't seen people doing an enrollment in this situation. Two classes of real and synthetic are very different and it's probably not a good idea to adapt a UBM trained for real speech to the synthetic speech.

So, normally, what one does is to train two different GMMs independently of each other. Then, during test, you project the test feature on both GMMs and find the difference - this is your score. Basically, you find the difference between log likelihood of your feature for GMM trained on real speech and log likelihood of your feature for GMM trained on synthetic speech. If the difference is positive, it means the features is likelier to be real sample, if the sample is negative, it is likelier to be synthetic sample. But this is a crude estimation, it is better to compute the threshold on a development set to understand what value would provide the best separation between real and fake classes for the specific dataset you are training/evaluating on.

For how such two GMMs are trained, you can check this function https://gitlab.idiap.ch/bob/bob.pad.voice/-/blob/master/bob/pad/voice/algorithm/GMM.py#L146.

The sample is tested by computing log likelihoods of a feature for both GMMs like in this function: https://gitlab.idiap.ch/bob/bob.pad.voice/-/blob/master/bob/pad/voice/algorithm/GMM.py#L196

And here how you find a score (just a simple difference of the outputs from the project_feature() function): https://gitlab.idiap.ch/bob/bob.pad.voice/-/blob/master/bob/pad/voice/algorithm/GMM.py#L239

I hope my answer cleared some things up.

Best,

Pavel

Reply all

Reply to author

Forward

0 new messages