Running CALLHOME pretrained Diarization Xvector Model

530 views
Skip to first unread message

VITTHAL BHANDARI

unread,
Apr 10, 2021, 4:07:34 AM4/10/21
to kaldi-help
Dear All

I have the CALLHOME Diarization Xvector Model from here.
I also have the CALLHOME dataset along with the rttm file. 

How do I reproduce the baseline for CALLHOME diarization recipe using the pre-trained model?
There is the run.sh file from egs/callhome_diarization/v2/run.sh and also the final.raw file (along with others) in the pre-trained model directory.
Which scripts should I run?
Would I also have to create a few files myself before?
The readme does not specify anything.

Thanks 
Vitthal

Desh Raj

unread,
Apr 10, 2021, 5:08:38 PM4/10/21
to kaldi...@googlegroups.com
You should run the following stages from the run.sh script:

1. CALLHOME data preparation (only this command in stage 0; other datasets are for x-vector training which you don't need): `local/make_callhome.sh /export/corpora/NIST/LDC2001S97/ data/`
2. MFCC extraction (stage 1): only for callhome data.
3. x-vector extraction (stage 7): you would only need to extract them for callhome1 and callhome2 (you can comment out the part where it extracts x-vectors for PLDA training)
4. PLDA scoring (stage 9)
5. Clustering (stage 10) -> this will give you the DER using AHC diarization

If you also want to run VB resegmentation on top of the AHC, you can run stages 12 and 13 (but you would have to prepare training data for i-vector extractor training). The reason why the README does not contain explicit instructions is that the run.sh script contains details comments about what each stage does.

Desh

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/8a20be42-55ad-410a-a93f-2cf9d2597a5fn%40googlegroups.com.

Desh Raj

unread,
Apr 10, 2021, 5:10:06 PM4/10/21
to kaldi...@googlegroups.com
If you need an explicit tutorial, I think some people have found this article helpful: https://towardsdatascience.com/speaker-diarization-with-kaldi-e30301b05cc8

VITTHAL BHANDARI

unread,
Apr 11, 2021, 11:23:57 AM4/11/21
to kaldi-help
Hi Desh
Thank you for the detailed answer.
  1. In point 2, you mention that I should do MFCC extraction "only for the CALLHOME data". Does it mean I should exclude calllhome1 and callhome2 from line 58 and only run the command for callhome? 
  2. Do I need to run all the other commands in stage 1 (if not, could you specify which lines to exclude)?
  3. the nj parameter is set to 40. Should it be decreased if I am running the commands on a single laptop with Nvidia 1050 Ti?
DScreenshot from 2021-04-11 20-47-30.png

Desh Raj

unread,
Apr 11, 2021, 11:48:06 AM4/11/21
to kaldi...@googlegroups.com
1. No I meant both the CALLHOME 1 and 2 and the combined. I believe the thresholds are tuned on one of these subsets during diarization. You can just exclude the "train" dataset since you're not training any x-vector extractors.
2. You need to do the prepare_feats.sh for CH1 and 2. As mentioned in the comment, it applies CMN on the features before x-vector extraction. You don't need to do it for the SRE data, however. You also don't need to run the vad_to_segments.sh command since it is only required for PLDA training.
3. Yeah you would probably need to set it to be equal to the number of CPU cores you are using.

Desh

VITTHAL BHANDARI

unread,
Apr 13, 2021, 12:12:44 PM4/13/21
to kaldi-help
Hi Desh

Greeting for the day!

Thank you so much for the detailed steps. I was able to reproduce the baseline for CALLHOME dataset successfully.
The final output is: Using supervised calibration, DER: 9.29%
However, the expected DER is 8.39%. what could be the reason for the mismatch?
  1. I suppose a DER difference of more than 1% is not insignificant. Or is it acceptable?
  2. Could it be that it happened because of some error in following the procedure? (maybe the wav.scp or any other intermediate file was corrupted)
  3. I did not run VB resegmentation and I am aware that it would improve DER. Should it be implemented then, on top of AHC? 

Desh Raj

unread,
Apr 13, 2021, 12:40:28 PM4/13/21
to kaldi...@googlegroups.com, Zili Huang
Yeah, that's not insignificant. I'm not sure why the difference would occur. I haven't tried running the CH diarization recipe myself --- maybe Zili (cc'd) would know more. In any case, you should look at the DER breakdown (missed speech, false alarm, speaker confusion) to get a clearer picture.

Desh

Zili Huang

unread,
Apr 13, 2021, 1:02:32 PM4/13/21
to kaldi-help
Hi,

1. I think you need to check the results in detail because I didn't see the reason for this. Since the model is directly downloaded from Kaldi, I don't think there is too much randomness. I personally was getting even better results with this model.
2. It is definitely possible. I think you can simply count the number of utterance in output rttm file. (to make sure it is 500) And make sure that your data is complete and correct.
3. VB resegmentation will use output from AHC as initialization. Generally it will improve the results.

Best,
Zili

VITTHAL BHANDARI

unread,
Apr 13, 2021, 3:01:23 PM4/13/21
to kaldi-help
Hi

From what I could gather from DER_threshold.txt file, the Missed speech = 1% and speaker error = 8.3% (total DER = 9.3%).
Is the expected DER of 8.39% achieved with VB resegmentation or without resegmentation?

Zili Huang

unread,
Apr 14, 2021, 10:13:49 AM4/14/21
to kaldi-help
That is a little bit weird. I think the recipe is using the ground truth VAD information and when we compute DER, we are not scoring the overlapped region (using option -1). So I don't think there will be any missed speech. Do you think so, Desh?

VITTHAL BHANDARI

unread,
Apr 14, 2021, 10:45:40 AM4/14/21
to kaldi-help
Here are the contents of the DER_threshold.txt file, just to double-check if I'm inferring it right.

*** Performance analysis for Speaker Diarization for ALL ***

    EVAL TIME =  62119.44 secs
  EVAL SPEECH =  55679.74 secs ( 89.6 percent of evaluated time)
  SCORED TIME =  35732.42 secs ( 57.5 percent of evaluated time)
SCORED SPEECH =  33577.25 secs ( 94.0 percent of scored time)
   EVAL WORDS =      0        
 SCORED WORDS =      0         (100.0 percent of evaluated words)
---------------------------------------------
MISSED SPEECH =    348.03 secs (  1.0 percent of scored time)
FALARM SPEECH =      0.00 secs (  0.0 percent of scored time)
 MISSED WORDS =      0         (100.0 percent of scored words)
---------------------------------------------
SCORED SPEAKER TIME =  33577.25 secs (100.0 percent of scored speech)
MISSED SPEAKER TIME =    348.03 secs (  1.0 percent of scored speaker time)
FALARM SPEAKER TIME =      0.00 secs (  0.0 percent of scored speaker time)
 SPEAKER ERROR TIME =   2771.90 secs (  8.3 percent of scored speaker time)
SPEAKER ERROR WORDS =      0         (100.0 percent of scored speaker words)
---------------------------------------------
 OVERALL SPEAKER DIARIZATION ERROR = 9.29 percent of scored speaker time  `(ALL)
---------------------------------------------
 Speaker type confusion matrix -- speaker weighted
  REF\SYS (count)      unknown               MISS              
unknown                1158 /  90.3%        125 /   9.7%
  FALSE ALARM           174 /  13.6%
---------------------------------------------
 Speaker type confusion matrix -- time weighted
  REF\SYS (seconds)    unknown               MISS              
unknown            33229.22 /  99.0%     348.03 /   1.0%
  FALSE ALARM          0.00 /   0.0%
---------------------------------------------


Desh Raj

unread,
Apr 14, 2021, 10:56:08 AM4/14/21
to kaldi...@googlegroups.com
Yeah there shouldn't be any missed speech, for the reasons Zili mentioned. Check that you have not made any changes to the scoring stage (do a git diff).

Desh

Desh Raj

unread,
Apr 14, 2021, 11:00:06 AM4/14/21
to kaldi...@googlegroups.com, Zili Huang
@Zili Huang It seems in stage 1 we use compute_vad_decision.sh for  the CH1 and CH2 subsets. So we are not actually using oracle VAD? That should explain the 1% missed speech. 

Desh

Zili Huang

unread,
Apr 14, 2021, 11:10:02 AM4/14/21
to kaldi-help
No Desh, I think the VAD results are never used (I am quite sure about this). Remember we start from segments in this recipe and if we are using energy based VAD, the results would be much worse. I still think that the MISS error should not exist. Maybe you can check where do the missed speech come from. You can use command like md-eval.pl -f -1 -c 0.25 -r ref.txt -s pred.txt to get the DER per utterance and locate the problem. I suspect you are missing one utterance or something.  

VITTHAL BHANDARI

unread,
Apr 14, 2021, 2:18:19 PM4/14/21
to kaldi-help
I think Zili is right. I am missing a few utterances. Maybe that's why the DER is a bit bloated.

So basically I counted the number of unique values in column 2 of the final output rttm files in both- CH1 and CH2- for the same PLDA threshold.
Both returned 247. This means I am missing 6 (500-2*247) utterances. I will look back into the steps I followed to see why this happened.
Meanwhile, if you could point to a potential cause, it would help me find it out faster.

Vitthal

Zili Huang

unread,
Apr 14, 2021, 5:48:55 PM4/14/21
to kaldi-help
You may check your dataset data/callhome1 and data/callhome2 to make sure whether the utterances are missing from the first data preparation step.

VITTHAL BHANDARI

unread,
Apr 15, 2021, 2:59:04 PM4/15/21
to kaldi-help
*** ISSUE RESOLVED ***

As you mentioned, it was a problem with the utterances. After running the local/make_callhome.sh script first, 3 rows in data/callhome1/wav.scp and 3 rows in data/callhome2/wav.scp did NOT have the ending pipe/vertical bar ( | ) which I don't know the reason for. As a result, utils/fix_data_dir.sh filtered out those 6 utterances. Now the final output is: Using supervised calibration, DER: 8.57% which I suppose is much closer and statistically acceptable to the expected DER of 8.39% ( please tell me if I'm wrong here).
Thank you so much Desh and Zili for your detailed answers. It helped a lot.

Regards
Vitthal

Message has been deleted

Sangramsing kayte

unread,
May 23, 2021, 7:37:39 AM5/23/21
to kaldi-help
callhome_diarization_v2 I am facing a problem with how to add callhome data set in my computer location, 

like:- data_root=/export/corpora5/LDC.    i modify like this  data_root=/LDC   but this is not working 

it is possible for anyone to make a document where exactly changes the data path and how to execute it properly 

VITTHAL BHANDARI

unread,
May 23, 2021, 7:40:08 AM5/23/21
to kaldi...@googlegroups.com
If you have the data, simply put it into sub folders as shown in the path. It would be much easier than changing the path in multiple locations. 

Desh Raj

unread,
May 23, 2021, 8:39:19 AM5/23/21
to kaldi...@googlegroups.com
You should explain what you mean by "not working", i.e., what is the error message you get?

Desh

Sangramsing

unread,
May 24, 2021, 8:09:09 AM5/24/21
to kaldi...@googlegroups.com

(Speech) Sangram:v2 sing$ ./run.sh 

--2021-05-24 14:07:03--  http://www.openslr.org/resources/15/speaker_list.tgz

Resolving www.openslr.org... 46.101.158.64

Connecting to www.openslr.org|46.101.158.64|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 163742 (160K) [application/x-gzip]

Saving to: 'data/local/speaker_list.tgz.5'


speaker_list.tgz.5     100%[===========================>] 159.90K  --.-KB/s    in 0.08s   


2021-05-24 14:07:03 (2.06 MB/s) - 'data/local/speaker_list.tgz.5' saved [163742/163742]


x speaker_list

find: /LDC/LDC2006S44/: No such file or directory

Error getting list of sph files at local/make_sre.pl line 23.

(Speech) Sangram:v2 sing$ 



Desh Raj

unread,
May 24, 2021, 8:18:53 AM5/24/21
to kaldi...@googlegroups.com
It seems that the path you have specified as the corpus root does not exist. Make sure you provide the correct path to the downloaded LDC corpora.

Desh

Sangramsing

unread,
May 24, 2021, 8:27:06 AM5/24/21
to kaldi...@googlegroups.com

(Speech) Sangram:v2 sing$ ./run.sh 

Resolving www.openslr.org... 46.101.158.64

Connecting to www.openslr.org|46.101.158.64|:80... connected.

HTTP request sent, awaiting response... 200 OK

Length: 163742 (160K) [application/x-gzip]

Saving to: 'data/local/speaker_list.tgz.20'


speaker_list.tgz.20      100%[================================>] 159.90K   883KB/s    in 0.2s    


2021-05-24 14:25:15 (883 KB/s) - 'data/local/speaker_list.tgz.20' saved [163742/163742]


x speaker_list

utils/combine_data.sh data/sre data/sre2004 data/sre2005_train data/sre2005_test data/sre2006_train data/sre2006_test_1 data/sre2006_test_2 data/sre2008_train data/sre2008_test

utils/combine_data.sh [info]: not combining utt2uniq as it does not exist

utils/combine_data.sh [info]: not combining segments as it does not exist

utils/combine_data.sh: combined utt2spk

utils/combine_data.sh [info]: not combining utt2lang as it does not exist

utils/combine_data.sh [info]: not combining utt2dur as it does not exist

utils/combine_data.sh [info]: not combining utt2num_frames as it does not exist

utils/combine_data.sh [info]: not combining reco2dur as it does not exist

utils/combine_data.sh [info]: not combining feats.scp as it does not exist

utils/combine_data.sh [info]: not combining text as it does not exist

utils/combine_data.sh [info]: not combining cmvn.scp as it does not exist

utils/combine_data.sh [info]: not combining vad.scp as it does not exist

utils/combine_data.sh [info]: not combining reco2file_and_channel as it does not exist

utils/combine_data.sh: combined wav.scp

utils/combine_data.sh: combined spk2gender

fix_data_dir.sh: no utterances remained: not proceeding further.

(Speech) Sangram:v2 sing$ 

Sangramsing

unread,
May 24, 2021, 8:28:13 AM5/24/21
to kaldi...@googlegroups.com

Desh Raj

unread,
May 24, 2021, 9:22:32 AM5/24/21
to kaldi...@googlegroups.com
You need to specify the absolute path to the corpus. It might look something like "/Users/<name>/LDC..." (you can do pwd from your terminal to check).

Sangramsing

unread,
May 24, 2021, 9:32:59 AM5/24/21
to kaldi...@googlegroups.com
my  data set is here like /Users/sing/kaldi/egs/callhome_diarization/v2/export


Sangramsing

unread,
May 25, 2021, 6:19:47 AM5/25/21
to kaldi...@googlegroups.com

x speaker_list

find: /LDC/LDC2006S44/: No such file or directory

Error getting list of sph files at local/make_sre.pl line 23.

(Speech) Sangram:v2 sing$ 



LDC2006S44 is available on the Database

Desh Raj

unread,
May 25, 2021, 7:40:14 AM5/25/21
to kaldi...@googlegroups.com
As I said earlier, /LDC is not the right path to the corpus. You should open the corpus directory in a terminal and run pwd to get the correct path.

Sangramsing

unread,
May 25, 2021, 8:33:15 AM5/25/21
to kaldi...@googlegroups.com
Okay i changed like this /Users/sing/kaldi/egs/callhome_diarization/v2/LDC
but still error

Sangramsing kayte

unread,
Aug 30, 2021, 9:04:56 AM8/30/21
to kaldi-help
Screenshot 2021-08-30 at 2.50.25 PM.png

lei...@gmail.com

unread,
Aug 30, 2021, 9:15:01 AM8/30/21
to kaldi-help
Do you have the SRE datasets?

Paola

Sangramsing kayte

unread,
Aug 30, 2021, 9:17:07 AM8/30/21
to kaldi-help
Hi,
   Can you help me to run Callhome Diarization Xvector Medel

lei...@gmail.com

unread,
Aug 30, 2021, 9:21:18 AM8/30/21
to kaldi-help
To run the complete recipe you need the SRE datasets

data/sreXXXX. Do you have those datasets?

Paola

Sangramsing

unread,
Aug 30, 2021, 9:50:45 AM8/30/21
to kaldi...@googlegroups.com

Sangramsing

unread,
Aug 30, 2021, 9:51:26 AM8/30/21
to kaldi...@googlegroups.com
and this catalog LDC2001S97

Kashif Inam

unread,
Oct 12, 2021, 6:37:42 AM10/12/21
to kaldi-help
Hi, can we run callhome pretrained diarization model on gpu?

Desh Raj

unread,
Oct 12, 2021, 7:17:21 AM10/12/21
to kaldi...@googlegroups.com
You can perform x-vector extraction on GPU by passing `--use-gpu true` in the extract_xvectors.sh invocation. But the clustering step is CPU-only.

Desh

Reply all
Reply to author
Forward
0 new messages