help! about I-vectors in kaldi

1,325 views
Skip to first unread message

汪子锐

unread,
Mar 22, 2016, 10:27:00 AM3/22/16
to kaldi...@googlegroups.com
Dear:
   In steps/online/nnet2.  train_ivec_extractor.sh is used  to train i-vectors extractor,I want to know when we train i-vectors extractor,if we train total variability T  and every sentemce is treated as different speaker ? just like the paper "front-factor analysis for speaker verfication"? or can you introduce a paper about your i-vectors?
 The second question is when we have T,we use extract i-vectors.sh.. It need utt2spk,I want to know a certain speaker,for example he has many sentence.Wherthere every sentence is extracted a i-vectors and average or all sentence of him is extracted a i-vector.

Thanks very much!

Daniel Povey

unread,
Mar 22, 2016, 2:24:10 PM3/22/16
to kaldi-help
Dear:
   In steps/online/nnet2.  train_ivec_extractor.sh is used  to train i-vectors extractor,I want to know when we train i-vectors extractor,if we train total variability T  and every sentemce is treated as different speaker ? just like the paper "front-factor analysis for speaker verfication"? or can you introduce a paper about your i-vectors?

Every  utterance is considered as a different speaker, it's just like normal iVectors.
 
 The second question is when we have T,we use extract i-vectors.sh.. It need utt2spk,I want to know a certain speaker,for example he has many sentence.Wherthere every sentence is extracted a i-vectors and average or all sentence of him is extracted a i-vector.

That script extracts them per utterance, and then also computes per-speaker ones using this command:

if [ $stage -le 2 ]; then
  # Be careful here: the speaker-level iVectors are now length-normalized,                                                                                                            
  # even if they are otherwise the same as the utterance-level ones.                                                                                                                  
  echo "$0: computing mean of iVectors for each speaker and length-normalizing"
  $cmd $dir/log/speaker_mean.log \
    ivector-normalize-length scp:$dir/ivector.scp  ark:- \| \
    ivector-mean ark:$data/spk2utt ark:- ark:- ark,t:$dir/num_utts.ark \| \
    ivector-normalize-length ark:- ark,scp:$dir/spk_ivector.ark,$dir/spk_ivector.scp || exit 1;
fi


Dan


Thanks very much!

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

汪子锐

unread,
Mar 22, 2016, 8:59:00 PM3/22/16
to kaldi...@googlegroups.com
Thank you dear Dan!
Just as you say "ivector-extract"  extract i-vectors from every sentence,so what's the meaning "--spk2utt"..It says  内嵌图片 1
It says "supply this option if you want ivectors to be output at the per-speaker level,estimated using stats accumulated from multiple utterances"
It is not the average for each speaker? 

Daniel Povey

unread,
Mar 22, 2016, 9:04:00 PM3/22/16
to kaldi-help
That would sum the stats and compute it from the summed stats... it gives a different answer, and I assume that it was not better than averaging the iVectors, or we'd be using that in the scripts.
Dan

ZiRui W

unread,
Mar 23, 2016, 12:22:16 AM3/23/16
to kaldi...@googlegroups.com
Thanks Dan.
Sorry to bother you again!
The last question:
 when we train i-vectors extracor ,if covariance matrix is also updated or just the same with UBM. I note he covariance matrix is first transformed into a full matrix from diag matrix,why?

Daniel Povey

unread,
Mar 23, 2016, 1:51:47 AM3/23/16
to kaldi-help

Thanks Dan.
Sorry to bother you again!
The last question:
 when we train i-vectors extracor ,if covariance matrix is also updated or just the same with UBM

It is updated but it's not 100% clear that this is helpful.  There is a flag in ivector-extractor-acc-stats (IIRC) which can turn this off.
 
I note he covariance matrix is first transformed into a full matrix from diag matrix,why?

To handle the general case when it is full.
Dan
Reply all
Reply to author
Forward
0 new messages