Speech Activate Detection (SAD) for Diarization

884 views
Skip to first unread message

Mesut Toruk

unread,
Feb 14, 2018, 6:36:17 AM2/14/18
to kaldi-help
Hello

Would DER decreases if we use SAD instead of VAD ?
Is there any script on kaldi about SAD?

Note: My false alarm high. (It is about %13-16)

Thanks
Mesut

David Snyder

unread,
Feb 14, 2018, 9:48:22 AM2/14/18
to kaldi-help
There are some SAD recipes here: https://github.com/kaldi-asr/kaldi/tree/master/egs/swbd/s5c/local/segmentation/tuning

However, it's not going to improve your results, but make them worse. The egs/callhome_diarization recipe currently uses the oracle segmentation, so any other segmentation is going to increase the error-rate. Of course, using the oracle segmentation isn't realistic, it provides a lower-bound on the error-rate. In the near future, we'll extend egs/callhome_diarization recipe to include segmentation from a SAD (rather than oracle).

Note: My false alarm high. (It is about %13-16)

We'd need much more information to help with this. For starters, can you copy and paste some relevant portions of the results log? Did you actually run the callhome_diarization using roughly the same datasets, or are you trying to adapt it to your own data?

David Snyder

unread,
Feb 14, 2018, 10:00:04 AM2/14/18
to kaldi-help
Or maybe you're asking about using a better SAD on the training data? It's could help, but usually the energy-based SAD is good enough for that purpose. What's important is to apply a high quality SAD to the speech that you want to diarize... But again, the recipe is already using the oracle segmentation there (unless you modified the recipe, in which case you'll need to explain what you did in more detail), so that's probably not why your error-rate is high.

Mesut Toruk

unread,
Feb 14, 2018, 10:51:34 AM2/14/18
to kaldi-help
I did not any changes on script but i use different datasets.

Train set: sre{04, 05, 06, 08(except summed)}
Test set: sre08/test/summed (in this set, all utterance include 2 speakers. During data preparation, i write to reco2num like this: utterance 2)

reco2num_spk:
gabzh 2
gabzo 2
gabzv 2
.
.

My error rates:
DER_threshold
MISSED SPEAKER TIME =     (  0.1 percent of scored speaker time)
FALARM SPEAKER TIME =    ( 16.7 percent of scored speaker time)
 SPEAKER ERROR TIME =     (  3.2 percent of scored speaker time)

DER_num_spk
MISSED SPEAKER TIME =    (  0.1 percent of scored speaker time)
FALARM SPEAKER TIME =    ( 16.7 percent of scored speaker time)
 SPEAKER ERROR TIME =    (  2.1 percent of scored speaker time)

Thanks
Mesut

14 Şubat 2018 Çarşamba 18:00:04 UTC+3 tarihinde David Snyder yazdı:

David Snyder

unread,
Feb 14, 2018, 11:57:59 AM2/14/18
to kaldi-help
Ok, it sounds like you might not be using any SAD currently. So, yes, you'll want to use something to get a segments file before performing diarization. The SAD scripts I pointed you to earlier could help you there. 

For development purposes, you might find it helpful to also use the oracle segmentation. You can get that information from the RTTM you used for scoring. That way, you can separate the sources of error (e.g., segmentation error from clustering errors).

Mesut Toruk

unread,
Feb 15, 2018, 12:17:38 AM2/15/18
to kaldi-help
Thanks for reply David

I already used "sid/compute_vad_decision.sh" and "diarization/vad_to_segments.sh" to get segments file. I have refenrence RTTM. I compared these for scoring. My best guesses are reference not good or vad isn't enough.(Because of this I asked about SAD).

Thanks again

14 Şubat 2018 Çarşamba 19:57:59 UTC+3 tarihinde David Snyder yazdı:

David Snyder

unread,
Feb 15, 2018, 12:37:57 AM2/15/18
to kaldi-help
Yes, you need a real SAD. The energy-based SAD is suitable for training the models, but it will probably not be enough for the segmentation that needs to be performed prior to diarization. You can take look at the SAD training scripts I pointed you to earlier.

For development purposes you might find it helpful to use the reference RTTM to compute the segments file. That way, you can see what your error-rate looks like when you don't need to worry about errors from the segmentation.

Mesut Toruk

unread,
Feb 15, 2018, 6:39:35 AM2/15/18
to kaldi-help
thanks again for your help, David
Mesut

15 Şubat 2018 Perşembe 08:37:57 UTC+3 tarihinde David Snyder yazdı:
Reply all
Reply to author
Forward
0 new messages