Speaker Diarization using Kaldi for beginners

VITTHAL BHANDARI

unread,

Mar 10, 2021, 1:06:27 AM3/10/21

to kaldi-help

I am a CS undergrad and I am working on a project involving diarization at my university. I did all the literature survey for over 2-3 weeks, but when I got down to implement the basic diarization pipeline, I had trouble working with Kaldi.

GOAL/TASK --> I want to apply speech segmentation to the AMI corpus. With the segments obtained, I want to try some novel clustering algorithms and analyze their performance w.r.t the current state of the art. I have a few ideas and I want to implement them.
PROBLEMS FACED --> It is difficult to work with Kaldi owing to its scripted nature. As a final year undergrad, I have never worked with bash scripts for most of the part (except for some minor makefiles in course projects). So the wrapper works as an invisible wall, preventing me from beginning coding.
The tutorial by Yoav Ramon is not very hands-on and literally the only source for beginners.
Most authors have used either Kaldi or pyannote (which is also not very easy to use). Also, only the same, small set of authors have been publishing each year at ICASSP and Interspeech leading me to believe that not many researchers get into audio and speech processing and that this is a relatively a closed, gated community (Although most authors have also been very generous in clarifying any doubts I had reg their papers).
WHAT TO DO? --> Is there an expected time frame in which one can become comfortable with Kaldi? (say 4 weeks or 8 weeks..).
If not, would you rather suggest me to change my project & work on something else? I am afraid Kaldi is not suited for short-term projects / people who do not want to do prolonged research in Speech Processing.

Sorry for the long query, but this help group was the only way I could voice my concerns. I could not find any other popular blogs, communities on reddit or dedicated help book to obtain clarity.

Daniel Povey

unread,

Mar 10, 2021, 1:13:07 AM3/10/21

to kaldi-help

I think it will be hard unless you are willing to throw yourself into learning UNIX shell and bash.

Very hard to predict how long it would take without knowing you and your environment.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/198b1c85-7a05-402f-9575-e578c5a9290fn%40googlegroups.com.

Desh Raj

unread,

Mar 10, 2021, 7:45:56 AM3/10/21

to kaldi...@googlegroups.com

Hi Vitthal

Yes, Kaldi can be a little overwhelming and there is a learning curve involved. But since you already have a defined task at hand, you do not need to worry about understanding everything going on in Kaldi, which makes your task much easier. All you need to do is to go through the AMI diarization recipe (egs/ami/s5c), and understand what is happening in each of the "stages". Most of the Kaldi recipes are organized into such stages. For the diarization recipes, these stages roughly correspond to: Data preparation -> Feature extraction -> Speech activity detection -> x-vector extraction -> Clustering.

In your case, since you only wish to experiment with new clustering algorithms, you can pretty much run through the rest of the stages as is, and only focus on implementing the last stage. If you look at the run.sh script in the AMI recipe, you can just blindly run stages 1 to 6 (although I would recommend running them one at a time and then looking at the outputs produced). Your actual work would be to modify stage 7. Stages 8 onwards perform overlap detection which is not relevant to you, so you can remove those stages.

Stage 7 calls any one of diarize_ahc.sh / diarize_spectral.sh / diarize_vbx.sh, which are all in the local/ directory, depending on which clustering method you have specified. I would recommend make a copy of local/diarize_spectral.sh, and work with that copy. In the diarize_spectral.sh script, you would only need to modify the last stage, since the first 3 stages correspond to x-vector extraction and cosine scoring (you may decide to modify the scoring part and use a different scoring method). The last stage calls scluster.sh, which is basically just a wrapper around the actual clustering script (spec_clust.py) which splits up the whole data so the clustering can be done in parallel (in fact, you'll find that most of these internal scripts serve to parallelize your data, call the actual binary on each split, and combine the outputs).

Your main task would be to understand scluster.sh and spec_clust.py, and then modify these according to the algorithm you wish to implement. Since the actual clustering is implemented in Python, it should be quite easy to work with (it uses kaldi_io to load x-vectors into numpy matrices, so that you can use those for clustering).

I don't think there's just a small set of authors publishing on diarization. There have been amazing advances recently with lots of new paradigms coming up (such as end-to-end neural diarization). You can find more details in this review paper: https://arxiv.org/pdf/2101.09624.pdf. I hope the above explanation would be helpful to get going with your project.

Best,

Desh

VITTHAL BHANDARI

unread,

Mar 10, 2021, 8:00:46 AM3/10/21

to kaldi-help

Thank You Desh Raj

From your advice, it seems as if I can make this work. I will definitely try working on your advice.

Daniel's advice is also correct. I should be able to spend some more time learning the basics of script programming. Hopefully, I will be able to drive my project to completion!

The professor I am working under has advised me to convert the Kaldi pipeline into Python if that helps me in circumventing the difficulty associated with script programming. If anybody has any advice on how to do that, kindly let me know.