Hello,
I'm trying to implement a synthetic speech detection system and apply it to a subset of the ASVspoof 2015 dataset. However, I'm struggling to figure out all the steps from the
minimalistic documentation. I'm also new to this particular field. Let U, X_nat, X_synth and W be numpy arrays of MFCC coefficients respectively for a large number of speakers, training natural speakers, training synthetic speakers and test (nat. / synth.) speakers. I believe I need to use the methods train_projector() and enroll() to respectively train the UBM and the MAP adaptation of the natural and synthetic model from the training data (see code below). Is this correct? Then how do I apply the model to test set and compute the scores? How do I extract the GMM supervectors (i.e. stacks of MAP adapted mean vectors) from training and test data?
Thank you,
Lorenzo
from bob.bio.gmm import algorithm
# UBM training
ubm_model = algorithm.GMM(512)
ubm_model.train_projector(U,'ubm.hdf')
ubm_model.load_projector('ubm.hdf')
# MAP adaptation training data for natural speech
nat_model = ubm_model.enroll(X_nat)
# MAP adaptation training data for synthetic speech
synth_model = ubm_model.enroll(X_synth)
# How to apply the models to test data and compute the scores?
# How to extract the GMM supervectors from training and test data?