Running the DIHARD baseline 3 recepe 2

`.

unread,

Jan 19, 2021, 3:31:24 AM1/19/21

to kaldi-help

Hi,

I am trying to train SD model using the DIhard3 baseline recipe 2. It involves training a NN based VAD. After training the nnet, there is a decoding step which should output a segments file, however I am getting an empty file. Below is the code snippet with the stdout

jupyter@diarization:~/dihard3_baseline/recipes/track2$ local/segmentation/detect_speech_activity.sh --nj 1 --stage 0 $DIHARD_DEV_DIR/ $DIHARD_DEV_DIR/exp/dihard3_sad_tdnn_stats $DIHARD_DEV_DIR/mfcc2 $DIHARD_DEV_DIR/exp/dihard3_sad_tdnn_stats_decode $DIHARD_DEV_DIR/data/dihard3_seg

--nj 1 --stage 0 /home/jupyter/small_test/dev/ /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats /home/jupyter/small_test/dev/mfcc2 /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode /home/jupyter/small_test/dev/data/dihard3_seg

fix_data_dir.sh: kept all 4159 utterances.

fix_data_dir.sh: old files are kept in /home/jupyter/small_test/dev//.backupsteps/make_mfcc.sh --mfcc-config conf/mfcc_sad.conf --nj 1 --cmd run.pl --write-utt2num-frames true /home/jupyter/small_test/dev/ exp/make_mfcc/dev_whole /home/jupyter/small_test/dev/mfcc2

steps/make_mfcc.sh: moving /home/jupyter/small_test/dev//feats.scp to /home/jupyter/small_test/dev//.backup

utils/validate_data_dir.sh: Successfully validated data-directory /home/jupyter/small_test/dev/

steps/make_mfcc.sh [info]: segments file exists: using that.

steps/make_mfcc.sh: Succeeded creating MFCC features for dev

steps/compute_cmvn_stats.sh /home/jupyter/small_test/dev/ exp/make_mfcc/dev_whole /home/jupyter/small_test/dev/mfcc2

Succeeded creating CMVN stats for dev

fix_data_dir.sh: kept all 4159 utterances.

fix_data_dir.sh: old files are kept in /home/jupyter/small_test/dev//.backup

local/segmentation/detect_speech_activity.sh: Computing non-speech/speech/garbage posteriors...

steps/nnet3/compute_output.sh --nj 1 --cmd run.pl --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 150 --apply-exp true --frame-subsampling-factor 3 /home/jupyter/small_test/dev/ /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode/posts

utils/data/get_utt2dur.sh: /home/jupyter/small_test/dev//utt2dur already exists with the expected length. We won't recompute it.

local/segmentation/detect_speech_activity.sh: Preparing SAD decoding graph...

local/segmentation/detect_speech_activity.sh: Running Viterbi decoding...

local/segmentation/detect_speech_activity.sh: Post-processing Viterbi segmentation...

utils/data/get_utt2dur.sh: /home/jupyter/small_test/dev//utt2dur already exists with the expected length. We won't recompute it.

local/segmentation/detect_speech_activity.sh: Generating new data directory from SAD...

utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]

utils/data/get_utt2num_frames.sh: /home/jupyter/small_test/dev//utt2num_frames already present!

utils/data/subsegment_data_dir.sh: subsegmented data from /home/jupyter/small_test/dev/ to /home/jupyter/small_test/dev/data/dihard3_seg

Empty list of recordings (bad file /home/jupyter/small_test/dev/data/dihard3_seg/segments)?

Daniel Povey

unread,

Jan 19, 2021, 3:33:41 AM1/19/21

to kaldi-help

You'll probably have to supply more details and find approximately where things went wrong, before people can help.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/48eb3a7e-4528-4289-b086-8b222292aa0cn%40googlegroups.com.

`.

unread,

Jan 19, 2021, 4:46:02 AM1/19/21

to kaldi-help

Thanks for the reply Dan, my guess is that the problem lies in the script: post_process_sad_to_segments and/or with the alignment file

The last line creates the segments file which turns out to be empty, I also checked out the smart_open in the common lib which opens the gzip ali file from the pipe.

I ran this command in the shell and basically it doesnot return anything. I suspect the alignment file is not correct, I have attached it here as well

ali (1).1.gz

Daniel Povey

unread,

Jan 19, 2021, 4:53:36 AM1/19/21

to kaldi-help

I don't have time to work out what's wrong with that, file, sorry..

To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/db93402d-7978-42bb-bdac-a7924187bba7n%40googlegroups.com.

Reply all

Reply to author

Forward