Use bob spear for speaker recognition in app

679 views
Skip to first unread message

Benoit Rouxel

unread,
Sep 19, 2016, 4:11:24 AM9/19/16
to bob-devel
Hello,

I'm working on a mobile app which allow the user to login in with his/her voice.
To create this authentication system, I want to use the speaker recognition of bob spear. But I need too functionality that I don't find yet in the documentation. 
Frist, I would like to evaluate new speech one by one because uses give me just one audio file for authentication. 
Second, if it is a new user, I would add he/she in the training model without recreate the whole model with all other files.

Is it possible to do that two points? Could you give me concrete document or way to do that?
Thank you 
Benoit

Amir Mohammadi

unread,
Sep 19, 2016, 4:26:05 AM9/19/16
to bob-devel
Hi Benoit,

It is best that you try this on a pc first (Linux) and please follow our installation instructions here: https://gitlab.idiap.ch/bob/bob/wikis/Installation

What you are trying to do is definitely possible but you need to dig into code and documentation.
Please look into: https://gitlab.idiap.ch/bob/bob.bio.base and https://pythonhosted.org/bob.bio.base/index.html

Try to read the documentations of bob.bio.base and bob.bio.spear and bob.bio.gmm **completely** before you start anything.

P.S. bob packages are mainly used for reproducible research and we have do not have documentation or scripts to use it easily in an app or a product.

Thank you,
Amir

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Benoit Rouxel

unread,
Sep 19, 2016, 5:28:54 AM9/19/16
to bob-devel
Thank you Amir for your answer!

In fact, I ever read the base and spear doc. I will read gmm documentation. For my app, I ever install and run the tutorial experiment with a sub part of voxforge database by using bob.db.voxforge-2.0.3 because I try to create an API that will call by my mobile app. 
I'm searching a documentation, if it exists, which can help me use this scientific framework to a "production application".

Manuel Günther

unread,
Sep 19, 2016, 8:58:23 PM9/19/16
to bob-devel
Indeed, the documentation of bob.bio.spear is a bit weak, but it mainly relies on bob.learn.em to run it's experiments. You might find more documentation about the algorithms that are applied there: http://pythonhosted.org/bob.learn.em/index.html
For the theoretical background (which you might be interested in, too), please read the papers that are linked on that web page.

I agree with Amir that Bob (and therewith also bob.bio.spear) is a toolbox for researchers, and it's main purpose is not on the production side.

Manuel

Benoit Rouxel

unread,
Sep 26, 2016, 5:05:23 AM9/26/16
to bob-devel
Hello, 
By adding scripts over bob and replacing probes list in dev by only one element, I think I use bob as a speaker recognition with low reliability (due to my low knowledge of this research domain). 
I have some questions about algo and databases:
1. I read in a previous topic that train list is used to train UBM and T matrix, DEV to evaluate parameters. 
    a. Can I use same data in the training list and in the for_model.lst in the DEV database? 
    b. In my case (just recognize speaker), I don't use EVAL database. Am I right?
2. In my database, the recording conditions of my files are really different (different noise, different microphone,...) 
    a. Which algorithm is the best to deal with this type of data? 

Benoit

Manuel Günther

unread,
Sep 26, 2016, 11:20:51 AM9/26/16
to bob-devel
To answer your questions:

1a) You can use the same data in training and development set. However, you have to realize that the verification rates that you obtain with this approach are over-optimistic. This means that your verification capabilities on unseen data will be lower than what you obtain on the development set.
1b) Correct. The evaluation set would be the "unseen data" mentioned above, where unbiased results can be obtained with.

2) I am not an expert on speaker recognition, so maybe someone else can reply here.

Finally, you can also use the ./bin/score.py to compute a score for one model and one probe. You might not need to create your own probes list. On the other hand, you can also check the implementation of ./bin/score.py (ionside bob.io.base in folder bob/bio/base/script/score.py) in order to write your own verification script using a single probe file.

Manuel

Benoit Rouxel

unread,
Sep 26, 2016, 12:15:10 PM9/26/16
to bob-devel
Thank you very much Manuel.
According to what you told me, it seems that I didn't understand the purpose of the three database. Because I want to recognize the speaker of an unseen speech, I have to use EVAL. Am I right?
I will ask my user to create 5 speeches. I will put 1 of them in the training list and 2 of them in the dev list model and the 2 last in the dev list probes and in the eval list model. Finally, when an unseen speech comes, I'll put it in the eval probes.
Is it a stupid idea? 

Benoit

Manuel Günther

unread,
Sep 26, 2016, 1:22:57 PM9/26/16
to bob-devel
Hmm... difficult to say. Usually, the evaluation set contains unseen identities. In your case, you should have all of them in the development set, as you are not adding identities.

As mentioned before, the Bob framework is not really designed to run in application mode, i.e., where you just classify unseen samples online. For such an application, our terminology does not apply, and neither do our scripts.
So, what you can do is:

1. Put some of your files into the training set. I am afraid that a single file per identity is too little -- I rather assume that our algorithms require several files per identity in the training set.

2. Put some files into the enrollment set for DEV (dev/for_models.lst). These files might be the same as in the training set, or even only part of it.

3. Put the remaining files into the probe set for DEV (dev/for_probes.lst). These files might be the same as in the training set, but not the same used for enrollment.

4. Using these files, run an experiment using only the DEV set. Store the generated models, and estimate a classification threshold.

Now, you have to write you own application, where you can handle new, unseen, online sample(s) of your user, by using the models and the threshold from 4.

Sebastien Marcel

unread,
Sep 27, 2016, 2:31:56 AM9/27/16
to bob-devel
If I am not mistaken Benoit doesn't want to use Spear to run experiments but he wants to be able to hook it easily to build a demo app.
In which case I am afraid it is best to make this demo using the components from Bob just looking at Spear on how is made the verification pipeline of these components.

Benoit Rouxel

unread,
Sep 27, 2016, 2:49:02 AM9/27/16
to bob-devel

Thank you again Manuel!
I will try what you told me. I will give some news after that!

Benoit Rouxel

unread,
Sep 27, 2016, 2:56:31 AM9/27/16
to bob-devel
Sebastien, I ever check (a part of) spear code. The problem is I'm really new in the speaker recognition domain. I ever made an app which works with low reliability. Manuel helps me to deal with databases.

Michal Bavlsik

unread,
Aug 17, 2017, 11:34:13 AM8/17/17
to bob-devel
Hello,

I'm student of cybernetics and artificial intelligence and I chose Deep learning in biometric security as my Bachelor thesis.

I was wondering the same way as Benoit if I would be able to make a full (simple) demo app with input and output pipeline with real time speaker verification.

Currently I am working with gpu-enabled tensorflow and Face-net deep learning face recognition library. I would love to implement speaker verification as well as fingerprint verification in the same demo app. The point is that bob is currently only available on Linux (I installed all the dependecies on mac) while I have gpu-enabled windows machine. I would like to use the .py scripts for enrollment and verification with custom DL algorithms in the same project. But if thats impossible I will split the work on 2 machines. However..

I would like to create a user profile database, train a neural net for feature extraction as well as for verification on large datasets, and use that net to verify short new real time inputs from microphone and try to find a match in the database of known users.

How is your progress looking Benoit?

Thank you,
Michal.

Manuel Günther

unread,
Aug 17, 2017, 12:16:49 PM8/17/17
to bob-devel
Dear Michael,

generating such an demo application would surely be possible, but I guess that we will not take the lead on such an application. We have provided command line scripts to perform each of the steps (preprocessing, feature extraction, enrollment, scoring) individually. For some reason, these scripts have no yet made it into the documentation. Please refer to "preprocess.py --help", "extract.py --help", "enroll.py --help" and "score.py --help", which are part of the bob.bio.base package (which is automatically installed together with bob.bio.spear). These files are also available by source (see: https://gitlab.idiap.ch/bob/bob.bio.base/tree/master/bob/bio/base/script), so having a look into these scripts might help you to build your demo application.

Bob currently does not support the Windows operating system, and I am not sure if it ever will. Bob is meant to be a research toolbox, and (in our area) little research is done on Windows machines. To avoid running your demo on two separate machines, we recommend to use a virtual Linux environment inside Windows, e.g., using VirtualBox. We have good experiences with that, but I do not know if it supports GPU access (which is not required by any function of bob.bio.spear, though).

Best regards
Manuel

Michal Bavlsik

unread,
Aug 18, 2017, 4:50:23 PM8/18/17
to bob-devel
Thank you for your answer Manuel.

I will definitelly dig deeper into the code and maybe try to train the neural nets remotely on gpu-enabled machine and modify the structure of the processing pipeline in bob accordingly. However if someone tried something similar before I would love to hear from him. 

Best regards,

Michal.
Reply all
Reply to author
Forward
0 new messages