bob.spear - How to use spkverif_ivector.py so it can use for_models.lst and for_scores.lst

216 views
Skip to first unread message

Maria Garcia

unread,
Feb 12, 2015, 12:47:46 AM2/12/15
to bob-...@googlegroups.com
Hi and Good Morning.
I have a general spkverif_ivector.py question with bob spear.  I looked over the example with voxforge and didn't understand how it worked and was interested in setting up my own database.  I am interested in setting up my own enrollment and scoring.  In reading the documentation of setting up a database I believe I can create a database and make a 'for_models.lst' and put all my enrollment data there.  And then make a for_scores.lst file and put all my data I want 'tested' against the enrollment data there.  

However I do have some questions.
In the for_scores.lst file, what is the difference between the claimed_client_id and the client_id column?   In the examples I have seen, it seems like those two columns are always identical if I am looking at the data correctly.

How do I get spkverify_ivector.py to automatically use my for_models.lst and for_scores.lst files when it starts its run?

I saw the -d (database) parameter but when I looked at the config file for database in the example I could not figure out how to specify it to tell it to use my enrollment and score files.

How would I tell it to use a training file (world)?  What world file does it use if I don't specify one?  I see in the documentation it wants to use something in norm/train_world.lst .  I don't have a model at the moment.  When i looked at the voxforge example I didn't see it pulling in any of these files at all.  But I would like to use this approach  for_models.txt, for_scores.txt

If I use a for_znorm.lst file will it automatically perform z normalization?

Are there any examples?  I have been unable to find any.

Thank you in advance.

Maria

elie khoury

unread,
Feb 12, 2015, 1:31:32 AM2/12/15
to bob-...@googlegroups.com
Dear Maria, 
Having some deadlines here, we will reply to you in detail during the week-end.
Please remind us in case.
Best regards,
Elie


--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elie khoury

unread,
Feb 16, 2015, 2:33:03 AM2/16/15
to bob-...@googlegroups.com
Dear Maria,

On Feb 11, 2015, at 9:47 PM, Maria Garcia <mg0...@gmail.com> wrote:

Hi and Good Morning.
I have a general spkverif_ivector.py question with bob spear.  I looked over the example with voxforge and didn't understand how it worked and was interested in setting up my own database.  I am interested in setting up my own enrollment and scoring.  In reading the documentation of setting up a database I believe I can create a database and make a 'for_models.lst' and put all my enrollment data there.  And then make a for_scores.lst file and put all my data I want 'tested' against the enrollment data there.  

If you would like to set your own database, you could use the bob.db.verification.filelist interface (that is installed while installing bob.spear).

I suggest you look on timit example instead of voxforge:
- Here is its configuration file:
- For this, the lists for Background training set (norm), development set (DEV) and evaluation set (EVAL) are here:

- You could create your lists and configuration file in the same manner as for TIMIT.

However I do have some questions.
In the for_scores.lst file, what is the difference between the claimed_client_id and the client_id column?   In the examples I have seen, it seems like those two columns are always identical if I am looking at the data correctly.

- claimed_client_id refers to target speaker
- client_id refers to the true speaker of the test utterance
Indeed the two column you’re talking about are almost identical: Actually, they are duplicated because in some rate situations, one could enroll different models for the same client.
Alternatively, you could use “for_probes.lst” that contains only the list of test utterances. But this supposes that you want to compute a full matrix scores [all_clients X all_test_utterances].


How do I get spkverify_ivector.py to automatically use my for_models.lst and for_scores.lst files when it starts its run?

I saw the -d (database) parameter but when I looked at the config file for database in the example I could not figure out how to specify it to tell it to use my enrollment and score files.

Once you create the config file for your database, you just need to provide it as argument to your script.

How would I tell it to use a training file (world)?  What world file does it use if I don't specify one?  I see in the documentation it wants to use something in norm/train_world.lst .  I don't have a model at the moment.  When i looked at the voxforge example I didn't see it pulling in any of these files at all.  But I would like to use this approach  for_models.txt, for_scores.txt

Currently we do not provide any pre-trained model. You need to train your own model. You could use the voxforge example for that purpose.
Please check how to download the voxforge dataset here:

But notice that this model will be under-trained. In fact, It was given just as an example of use of the Bob.Spear tool with free data. You may need to use more data to train your model, and get descent results.
Check here if you want to dowload more similar data:

If I use a for_znorm.lst file will it automatically perform z normalization?

Yes, you may deactivate its use by adding the option "-z"

Are there any examples?  I have been unable to find any.

Yes, Voxforge should be a good example to start with. Then if you have Timit dataset, it should be a piece of cake to run any of the scripts.

Thank you in advance.

Maria


Please feel free to contact us for more details.
Sorry for the delay and best regards,
Elie

Maria Garcia

unread,
Feb 23, 2015, 7:10:08 AM2/23/15
to bob-...@googlegroups.com
Thank you.   I have a couple of questions now.
I studied your letter and looked at everything.  I don't know what a TIMIT is?  What does it mean?  Why is a number 2 part of the directory path?  I looked into the API and saw the 2 can be part of it but I don't know why it was chosen.  I see all the files that make up TIMIT but do not know where they are.  for example there is apparently a file called sx217.wav  but I cannot find it.   It is found in spear/protocols/timit/2/norm/train_world.lst   Do I have to make the file myself? or is that just an example?

Elie Khoury

unread,
Feb 23, 2015, 1:15:33 PM2/23/15
to bob-...@googlegroups.com
* TIMIT corpus:
 https://catalog.ldc.upenn.edu/LDC93S1

* In Spear, we created an interface for its protocol (using a part of the database designed by TIMIT2)

* My idea why I asked you to look on it is to have an idea on how to create your own database. You cannot run the speaker recognition 
experiments unless you acquire the corpus yourself.

Best
Elie 

--

Maria Garcia

unread,
Feb 24, 2015, 7:47:06 AM2/24/15
to bob-...@googlegroups.com
thank you for explaining. I am new.  I have started to give that a try.
Reply all
Reply to author
Forward
0 new messages