Multi-threading speaker diarization

François Hernandez

unread,

Jan 30, 2018, 10:51:17 AM1/30/18

to kaldi-help

Hi,

I'm currently working on adapting Matthew and David's diarization recipe to our data and production setup.

I have a rather simple setup working, starting from an 'unsegmented' recording (assuming only one big utterance) and going down the MFCC, VAD, ivector extraction and clustering pipe, to output a rttm in the end.
This is working rather well and is quite efficient, thanks a lot for your work!

In order to make it even more efficient, I tried to split it in n (e.g. with n=8) jobs, but it appears I lose a lot in diarization quality.
It seems the clustering step are handled completely apart. Would it be possible, in principle, to make them 'work' jointly in order to have a proper multi-thread solution?

Thanks in advance for any comment about this!

François

Daniel Povey

unread,

Jan 30, 2018, 2:34:39 PM1/30/18

to kaldi-help

Can you be precise about what part of the script you changed?

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/839ac12e-f151-4930-ae02-918edf4002a4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

François Hernandez

unread,

Jan 30, 2018, 2:42:29 PM1/30/18

to kaldi-help

Sure.

I segmented the audio using VAD, and applied a different speaker to each segment to allow the split in several jobs in extract_ivectors.sh called with --nj 8. Then called score_plda.sh and cluster.sh with --nj 8.
This doesn't feel totally right (as the differents jobs seem to be totally separated tasks), but I thought it would be a rather simple way to test a multi-threaded approach.

Le mardi 30 janvier 2018 20:34:39 UTC+1, Dan Povey a écrit :

Can you be precise about what part of the script you changed?

On Tue, Jan 30, 2018 at 10:51 AM, François Hernandez <francois.h...@gmail.com> wrote:

Hi,

I'm currently working on adapting Matthew and David's diarization recipe to our data and production setup.

I have a rather simple setup working, starting from an 'unsegmented' recording (assuming only one big utterance) and going down the MFCC, VAD, ivector extraction and clustering pipe, to output a rttm in the end.
This is working rather well and is quite efficient, thanks a lot for your work!

In order to make it even more efficient, I tried to split it in n (e.g. with n=8) jobs, but it appears I lose a lot in diarization quality.
It seems the clustering step are handled completely apart. Would it be possible, in principle, to make them 'work' jointly in order to have a proper multi-thread solution?

Thanks in advance for any comment about this!

François

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.

Matthew Maciejewski

unread,

Jan 30, 2018, 4:00:41 PM1/30/18

to kaldi-help

While I have not tested it myself, unless I'm missing something, I believe you can maintain identical performance while using a higher number of jobs on the ivector extraction code by adding the "--per-utt" option to utils/split_data.sh on line 95 of the diarization/extract_ivectors.sh script.

Splitting up a recording for parallelization for the PLDA scoring and clustering is not trivially possible. The PLDA scoring creates a single matrix with all possible pairwise scores between sliding-window ivectors in a session. Splitting up a recording for multiple jobs will mean you could not compute the PLDA scores between segments across the splits. Similarly, the problem with the clustering code is that it would not be able to merge clusters across splits.

If it is absolutely necessary to split up a single recording to run PLDA scoring and clustering in multiple jobs, I think the most feasible solution would be to split up the recording into multiple temporally-continuous sections and perform scoring and clustering on each section. And then after that, you would have to find a way to map the speaker labels within each section to each other to merge the diarization results per section. It might even be possible without too much work to extract speaker-level ivectors from the per-section diarization results, and then use the existing PLDA scoring and clustering code to then merge those results.

—Matt

François Hernandez

unread,

Jan 30, 2018, 4:04:05 PM1/30/18

to kaldi-help

That's what I thought, it's not that easy for the PLDA scoring and clustering part. Thanks for the ivector extraction trick, I'll implement it!

Thanks a lot for your detailed answer!

Daniel Povey

unread,

Jan 30, 2018, 5:03:31 PM1/30/18

to kaldi-help

I believe in addition to that you'd have to change the previous line to say:

sub_sdata=$sub_data/split${nj}utt

(it adds the 'utt' part to the per-utterance split subdirectories).

maybe you could make this the default, if there's no reason why not to.

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b9329a2d-b0b7-4c47-bb20-e94c51960d21%40googlegroups.com.

Sîng-hông Sih

unread,

Jan 31, 2018, 5:14:37 AM1/31/18

to kaldi-help

The recipe is in egs/callhome_diarization ?
from https://github.com/kaldi-asr/kaldi/pull/1894

Thanks

Dan Povey於 2018年1月31日星期三 UTC+8上午6時03分31秒寫道：

François Hernandez

unread,

Jan 31, 2018, 6:00:35 AM1/31/18

to kaldi-help

Yes.

Vassil Panayotov

unread,

Jan 31, 2018, 9:44:36 AM1/31/18

to kaldi...@googlegroups.com

Is there some sort of documentation or paper(s) that provide overview of the diarization pipeline?

Vassil

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/b9b820ba-e360-4beb-a674-206e8f250337%40googlegroups.com.

François Hernandez

unread,

Jan 31, 2018, 9:47:41 AM1/31/18

to kaldi-help

https://www.dropbox.com/s/bj5bc6brtzt52u4/slt_gks_dgr.pdf?dl=0

Vassil Panayotov

unread,

Jan 31, 2018, 9:49:48 AM1/31/18

to kaldi...@googlegroups.com

Thank you!

To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.

To post to this group, send email to kaldi...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/c5677ff0-fdd7-48ce-a5bb-aff422ad4846%40googlegroups.com.

Reply all

Reply to author

Forward