Voice Print Merge method and Mean Count

jhun retumban

unread,

Jan 27, 2015, 9:52:49 AM1/27/15

to reco...@googlegroups.com

Hi Amaury,

I am currently working on a project in school about speaker recognition and I came across this library. I tested it and it was easy to use.

Now I want to dig more about this library. Studying the library, I came across this:

private void merge(double[] inner, double[] outer) {
        for (int i = 0; i < inner.length; i++) {
            inner[i] = (inner[i] * meanCount + outer[i]) / (meanCount + 1);
        }
    }

As far as I could understand, this method is used for merging new LPC features(outer) to an existing LPC features(inner) of a person registered in the UBM. Now I came up with these questions:

1. Does this merging algorithm you have written is based on a mathematical model? If so, what is it? (reason for asking - I badly need references in my paper :( ) If not, how did you come up with the code?

2. What does the meanCount do in the merge method?

3. Does regular updating of features of a specific person in the UBM would help in keeping track on his vocal state or would increase recognition accuracy?

I hope to learn more about this library. Your reply would be much appeciated :)

Jhun Retumban

Amaury Crickx

unread,

Feb 24, 2015, 6:25:42 AM2/24/15

to reco...@googlegroups.com

1. Does this merging algorithm you have written is based on a mathematical model? If so, what is it? (reason for asking - I badly need references in my paper :( ) If not, how did you come up with the code?

Looks like a silly way for calculating mean value where I remultiply the inner mean value by the total number of different values used to calculate that inner mean value, add the outer value and divide by meanCount + 1 for the new inner mean.

2. What does the meanCount do in the merge method?

It counts the number of values added together so that we can divide them to obtain mean value

3. Does regular updating of features of a specific person in the UBM would help in keeping track on his vocal state or would increase recognition accuracy?

UBM is a single voice print. Ideally, the UBM should not be updated all the time in production environment, but it's very practical for testing out things.

Once a recording is long enough for a given user, merging data for this user will not improve much.

jhun retumban

unread,

Feb 24, 2015, 10:40:25 PM2/24/15

to reco...@googlegroups.com

Hi

Thank you for your response. What if we create a target model for each speaker and update it progressively? Would it make a difference?

By the way. My aim is to develop a speaker recognition system that would enable to identify the speaker even there is a slight change of condition of his/her normal speaking voice. I thought that progressive modeling is the answer. Any comment from you would be much appreciated.

Amaury Crickx

unread,

Mar 9, 2015, 4:33:07 PM3/9/15

to reco...@googlegroups.com

I'm not sure what you mean with target model.

Merging multiple samples of the same person in order to cover a maximum of the voice characteristics is a good idea.

Once you have covered the different phonemes that could possibly show up in his speech. Further merging won't improve speaker recognition anymore.

Variability of voice is so large that I doubt any system could cope with huge variations (yelling vs soft tone)

A slight change should be ok for most system.

The difficulty lies in recording the user in similar conditions:

- distance to microphone (impacts signal to noise ratio)

- surrounding noises (outdoor noises are very hard: it's not just background that you could somehow discard, there might be someone else passing by and speaking at the same time)

The human brain is very good at filtering non-relevant information (we don't really notice the noise) but teaching that to a computer is a whole different story...

SUDHAKAR RAJU

unread,

Aug 3, 2016, 1:05:07 AM8/3/16

to Recognito

Hi,

Can you give me a sample code to test the speaker recognition. because we have tried a lot but couldn't get the proper result.