How to use kaldi online decoding methods for speaker recognition task

Soroush Gooran

unread,

Apr 23, 2021, 7:00:10 PM4/23/21

to kaldi-help

Hi everyone!

I'm going to create an online speaker recognition system. I use sre16/v2 example model (x-vector based model) for this task and I have Java Spring Boot in back-end.

1. How can I use the online_decoding methods suggested for the speech recognition task in my speaker recognition task? Is there any example code for that?

2. I want to use sre16/v2 model in my Java code, what is the appropriate method?

Regards,

Soroush

nshm...@gmail.com

unread,

Apr 24, 2021, 2:17:59 AM4/24/21

to kaldi-help

You can check how speaker identification implemented in Vosk library:

https://github.com/alphacep/vosk-api/blob/91a128b3edf7e84d55649d8fa9a60664b5386292/src/kaldi_recognizer.cc#L321

https://github.com/alphacep/vosk-api/blob/master/java/demo/src/main/java/org/vosk/demo/DecoderDemo.java

Soroush Gooran

unread,

Apr 24, 2021, 1:44:58 PM4/24/21

to kaldi-help

Thankful

The Vosk library is wonderful. I love it. With this introduction, my plan has now changed. Now I decide to use this library in my speaker recognition task. I ran the Vosk sample code for speaker recognition.

1- Important question: There was no place for speaker enrollment in the sample code. How can I register some speakers (in my case 10 people), and identify which speaker the incoming voice belonged to?

2- The sample code receives two models. One for the speaker and one for the speech. I just want to recognize the speaker and speech recognition is not my problem at the moment. Do I have to override the constructor method myself?

3. I'am using the Electron.js framework for the front-end. Can I use the Vosk node.js API on the client-side for online speaker recognition?

Thank you so much

Soroush

nshm...@gmail.com

unread,

Apr 24, 2021, 1:52:45 PM4/24/21

to kaldi-help

> 1- Important question: There was no place for speaker enrollment in the sample code. How can I register some speakers (in my case 10 people), and identify which speaker the incoming voice belonged to?

To enroll the speaker you run recognition and store xvector as a reference. You can store multiple xvectors for more reliable detection.

> 2- The sample code receives two models. One for the speaker and one for the speech. I just want to recognize the speaker and speech recognition is not my problem at the moment. Do I have to override the constructor method myself?

You can still have lightweight ASR model as VAD or you can replace the code with VAD like here:
https://github.com/igorsitdikov/lid_kaldi/blob/master/native/kaldi_recognizer.cc

> 3. I'am using the Electron.js framework for the front-end. Can I use the Vosk node.js API on the client-side for online speaker recognition?

For electron you need to rebuild ff-napi, see here:
https://github.com/node-ffi-napi/node-ffi-napi/issues/144

You can also use pure JS build:
https://github.com/ccoreilly/vosk-browser

Soroush Gooran

unread,

May 18, 2021, 4:33:39 PM5/18/21

to kaldi-help

Hi Nickolay

I have implemented a nodejs version of speaker recognition system using cosine distance function and now I need both VAD and PLDA methods in vosk. How can I add these features to my fork and how to rebuild this library for my nodejs implementation?

thanks again

nshm...@gmail.com

unread,

May 19, 2021, 6:26:41 AM5/19/21

to kaldi-help

Hi. Vosk doesn't have plda and VAD methods unfortunately but you can probably add them yourself. You can find an example in this project:

https://github.com/igorsitdikov/lid_kaldi/blob/00c06f9565e3fe69fb8f57a24664312ceb7697ae/native/kaldi_recognizer.cc#L66

Reply all

Reply to author

Forward