How to use kaldi online decoding methods for speaker recognition task

1,097 views
Skip to first unread message

Soroush Gooran

unread,
Apr 23, 2021, 7:00:10 PM4/23/21
to kaldi-help
Hi everyone! 
I'm going to create an online speaker recognition system. I use sre16/v2 example model (x-vector based model) for this task and I have Java Spring Boot in back-end.

1. How can I use the online_decoding methods suggested for the speech recognition task in my speaker recognition task? Is there any example code for that?

2. I want to use sre16/v2 model in my Java code, what is the appropriate method?

Regards,
Soroush

nshm...@gmail.com

unread,
Apr 24, 2021, 2:17:59 AM4/24/21
to kaldi-help

Soroush Gooran

unread,
Apr 24, 2021, 1:44:58 PM4/24/21
to kaldi-help
Thankful
The Vosk library is wonderful. I love it. With this introduction, my plan has now changed. Now I decide to use this library in my speaker recognition task. I ran the Vosk sample code for speaker recognition. 

1- Important question: There was no place for speaker enrollment in the sample code. How can I register some speakers (in my case 10 people), and identify which speaker the incoming voice belonged to?

2- The sample code receives two models. One for the speaker and one for the speech. I just want to recognize the speaker and speech recognition is not my problem at the moment. Do I have to override the constructor method myself?

3. I'am using the Electron.js framework for the front-end. Can I use the Vosk node.js API on the client-side for online speaker recognition?

Thank you so much
Soroush

nshm...@gmail.com

unread,
Apr 24, 2021, 1:52:45 PM4/24/21
to kaldi-help
> 1- Important question: There was no place for speaker enrollment in the sample code. How can I register some speakers (in my case 10 people), and identify which speaker the incoming voice belonged to?

To enroll the speaker you run recognition and store xvector as a reference. You can store multiple xvectors for more reliable detection.


> 2- The sample code receives two models. One for the speaker and one for the speech. I just want to recognize the speaker and speech recognition is not my problem at the moment. Do I have to override the constructor method myself?

You can still have lightweight ASR model as VAD or you can replace the code with VAD like here:
https://github.com/igorsitdikov/lid_kaldi/blob/master/native/kaldi_recognizer.cc

> 3. I'am using the Electron.js framework for the front-end. Can I use the Vosk node.js API on the client-side for online speaker recognition?

For electron you need to rebuild ff-napi, see here:
https://github.com/node-ffi-napi/node-ffi-napi/issues/144

You can also use pure JS build:
https://github.com/ccoreilly/vosk-browser

Soroush Gooran

unread,
May 18, 2021, 4:33:39 PM5/18/21
to kaldi-help
Hi Nickolay
I have implemented a nodejs version of speaker recognition system using cosine distance function and now I need both VAD and PLDA methods in vosk. How can I add these features to my fork and how to rebuild this library for my nodejs implementation?
thanks again

nshm...@gmail.com

unread,
May 19, 2021, 6:26:41 AM5/19/21
to kaldi-help

Hi. Vosk doesn't have plda and VAD methods unfortunately but you can probably add them yourself. You can find an example in this project:

https://github.com/igorsitdikov/lid_kaldi/blob/00c06f9565e3fe69fb8f57a24664312ceb7697ae/native/kaldi_recognizer.cc#L66

Reply all
Reply to author
Forward
0 new messages