How to use Kaldi for speaker recognition

5,165 views
Skip to first unread message

maruf...@gmail.com

unread,
Nov 1, 2015, 1:52:19 AM11/1/15
to kaldi-help
Hi,

I am trying to use Kaldi for extracting ivectors from wav files for speaker recognition purpose. However,
as far as I have understood, the data preparation part for speech and speaker recognition need not 
be the same. Data preparation for speaker recognition should be easier than speech recognition.
I can't use this example for the unavailability of SRE2010 dataset. 
Is there any tutorial available to extract ivectors (using freely available wav files)?

Thanks in advance

wbgl...@gmail.com

unread,
Nov 1, 2015, 10:22:14 AM11/1/15
to kaldi-help, maruf...@gmail.com
maybe you can use the online ivector. the example is in http://kaldi-asr.org/doc/online_decoding.html.

在 2015年11月1日星期日 UTC+8下午2:52:19,maruf...@gmail.com写道:

David Snyder

unread,
Nov 1, 2015, 10:29:57 AM11/1/15
to kaldi-help, maruf...@gmail.com
Hi,

We also have the NIST SRE08 example. You could see if you access to that data. It's in egs/sre08/v1.

In either case, the SRE10 data is only used for the evaluation portion of the setup (e.g., the enrollment and test ivectors). If you already have data you want to use for enrollment and testing, and you have access to the training data (e.g., data to train the UBM and ivector extractor), you can run the entire example, and just replace the SRE10 data with your own.

Best,
David

David Snyder

unread,
Nov 1, 2015, 10:54:48 AM11/1/15
to kaldi-help, maruf...@gmail.com
To clarify: I would not recommend using the online ivector system for speaker recognition purposes. The online ivector systems have been optimized for ASR purposes, and I suspect will give subpar performance for speaker recognition, relative to the usual scripts. 

To reiterate my previous point, if you have the training data, you can always use the SRE10 setup as is. Just comment out the code which prepares the SRE10 data, and extracts ivectors for SRE10. Of course, you'll need your own enrollment and test data to replace it with. Also, you'll probably want to have some in-domain data (relative to your enroll + test data) to train the PLDA system. Just make sure the speakers are disjoint with the enroll and test data.

Quazi Marufur Rahman

unread,
Nov 1, 2015, 1:27:07 PM11/1/15
to David Snyder, kaldi-help
Hi,

So, my target is to execute run.sh using some wav files. ​Is it possible to create mfcc and execute voice-activity-detection script  without using 142 line of code from make_mfcc.sh and 72 line of code from compute_vad_detection.sh? There are lots of dependencies in those scripts :(. I tried to comment out some code to make it working but failed.​

If I have understood you reply properly there are the basic steps I need to follow from run.sh
  • make mfcc
  • compute voice activity detection
  • assuming that all data are from male users
  • execute diag_ubm and full_ubm to create the model
  • train ivector extractor
  • extract ivector
--
Quazi Marufur Rahman
Department of Computer Science and Engineering
University of Dhaka
Bangladesh

David Snyder

unread,
Nov 1, 2015, 4:47:15 PM11/1/15
to kaldi-help, david.ry...@gmail.com, maruf...@gmail.com
I'm not sure what you want to do. Lines 142 and 72 of compute_mfcc.sh and compute_vad_decisions.sh just print out a line saying that the data was successfully processed. Also, I'm not sure what dependencies you are referring to. If something is failing there, it's probably because you haven't installed Kaldi properly. 

Yes, the steps you listed seem correct. BTW, you don't need to separate the training data by gender; you can include male and female utterances together, and create gender independent models. 

Quazi Marufur Rahman

unread,
Nov 2, 2015, 4:23:03 AM11/2/15
to David Snyder, kaldi-help
Ok, lets start from the beginning.
I need to run compute-mfcc-feats for extracting mfcc from wav files and here is it's usage:

Usage:  compute-mfcc-feats [options...] <wav-rspecifier> <feats-wspecifier>

What is meant by wav-rspecifier and feats-wspecifier? This command is located around line#112
in make_mfcc.sh. Is it possible to run compute-mfcc-feats using wav files only?

David Snyder

unread,
Nov 2, 2015, 9:20:25 AM11/2/15
to kaldi-help, david.ry...@gmail.com, maruf...@gmail.com
The wav-rspecifier is a list of utterance ids followed by the path to the wav files (called wav.scp in the scripts). For example

utt1 path-to-wav1
utt2 path-to-wav2
.
.

The feats-wspecifier is the destination, i.e., this is where the MFCC features (aka "feats") get saved. When the script finishes, it will be of the form:

utt1 path-to-feats1
utt2 path-to-feats2
.
.

The "r" in rspecifier indicates that it is for reading and the "w" in wspecifier indicates that this is for writing. These conventions (and many others) are common to all Kaldi scripts. You will probably need to go through the Kaldi tutorial: http://kaldi-asr.org/doc/tutorial.html . It is written with ASR in mind, but most of this applies to other examples as well. File IO in Kaldi and many other things that you will need to know are described there.

David Snyder

unread,
Nov 2, 2015, 9:25:13 AM11/2/15
to kaldi-help, david.ry...@gmail.com, maruf...@gmail.com
You should probably go over the entire tutorial, but some of the most helpful bits might be the following:


netnet...@gmail.com

unread,
Nov 22, 2015, 2:15:22 AM11/22/15
to kaldi-help, david.ry...@gmail.com, maruf...@gmail.com
Hi,
I am trying to follow the example of sre08 and also run into the trouble of lack of data. 
Is there any possible ways to get all the data that are required? Paying the money may also be an optional choice.
Thanks!

在 2015年11月2日星期一 UTC+8下午10:25:13,David Snyder写道:

David Snyder

unread,
Nov 22, 2015, 9:10:36 AM11/22/15
to kaldi-help, david.ry...@gmail.com, maruf...@gmail.com, netnet...@gmail.com
If you don't mind paying, you can probably get everything you need from LDC.

You'll need the following SRE08 data for enrollment and test utterances:

Also, you'll need some data to train the UBM and i-vector extractor. The run.sh script uses several data sources, including past SREs. However, you could probably get away with (e.g., not suffer too much degradation) by just using Fisher:

If you want to include any of the other data, it probably has an LDC number in the example. 

alyam...@gmail.com

unread,
Mar 17, 2016, 1:43:16 PM3/17/16
to kaldi-help, david.ry...@gmail.com
When I click on links below I'm redirected to LCD site, and I have to pay 2000$ to get any dataset. Is there way to get it for free for research purposes?

воскресенье, 22 ноября 2015 г., 20:10:36 UTC+6 пользователь David Snyder написал:

David Snyder

unread,
Mar 17, 2016, 2:12:56 PM3/17/16
to kaldi-help, david.ry...@gmail.com, alyam...@gmail.com
Probably not. If you're affiliated with some institution that works on speech technology, you could ask them if they have a license for any LDC corpra. 

I'm not sure what, if any, free speaker ID resources exist. You'll have to look into it. 

Jan Trmal

unread,
Mar 17, 2016, 2:16:48 PM3/17/16
to kaldi-help, David Snyder, alyam...@gmail.com
LDC has a grant program that can be used to obtain the corpora. I have no experience with how difficult or time demanding it is.
y.

--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages