Running the DIHARD baseline 3 recepe 2

67 views
Skip to first unread message

`.

unread,
Jan 19, 2021, 3:31:24 AM1/19/21
to kaldi-help
Hi,

I am trying to train SD model using the DIhard3 baseline recipe 2. It involves training a NN based VAD. After training the nnet, there is a decoding step which should output a segments file, however I am getting an empty file. Below is the code snippet with the stdout

jupyter@diarization:~/dihard3_baseline/recipes/track2$ local/segmentation/detect_speech_activity.sh --nj 1 --stage 0 $DIHARD_DEV_DIR/ $DIHARD_DEV_DIR/exp/dihard3_sad_tdnn_stats $DIHARD_DEV_DIR/mfcc2 $DIHARD_DEV_DIR/exp/dihard3_sad_tdnn_stats_decode $DIHARD_DEV_DIR/data/dihard3_seg 
--nj 1 --stage 0 /home/jupyter/small_test/dev/ /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats /home/jupyter/small_test/dev/mfcc2 /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode /home/jupyter/small_test/dev/data/dihard3_seg
fix_data_dir.sh: kept all 4159 utterances.
fix_data_dir.sh: old files are kept in /home/jupyter/small_test/dev//.backupsteps/make_mfcc.sh --mfcc-config conf/mfcc_sad.conf --nj 1 --cmd run.pl --write-utt2num-frames true /home/jupyter/small_test/dev/ exp/make_mfcc/dev_whole /home/jupyter/small_test/dev/mfcc2
steps/make_mfcc.sh: moving /home/jupyter/small_test/dev//feats.scp to /home/jupyter/small_test/dev//.backup
utils/validate_data_dir.sh: Successfully validated data-directory /home/jupyter/small_test/dev/
steps/make_mfcc.sh [info]: segments file exists: using that.
steps/make_mfcc.sh: Succeeded creating MFCC features for dev
steps/compute_cmvn_stats.sh /home/jupyter/small_test/dev/ exp/make_mfcc/dev_whole /home/jupyter/small_test/dev/mfcc2
Succeeded creating CMVN stats for dev
fix_data_dir.sh: kept all 4159 utterances.
fix_data_dir.sh: old files are kept in /home/jupyter/small_test/dev//.backup
local/segmentation/detect_speech_activity.sh: Computing non-speech/speech/garbage posteriors...
steps/nnet3/compute_output.sh --nj 1 --cmd run.pl --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 150 --apply-exp true --frame-subsampling-factor 3 /home/jupyter/small_test/dev/ /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode /home/jupyter/small_test/dev/exp/dihard3_sad_tdnn_stats_decode/posts
utils/data/get_utt2dur.sh: /home/jupyter/small_test/dev//utt2dur already exists with the expected length.  We won't recompute it.
local/segmentation/detect_speech_activity.sh: Preparing SAD decoding graph...
local/segmentation/detect_speech_activity.sh: Running Viterbi decoding...
local/segmentation/detect_speech_activity.sh: Post-processing Viterbi segmentation...
utils/data/get_utt2dur.sh: /home/jupyter/small_test/dev//utt2dur already exists with the expected length.  We won't recompute it.
local/segmentation/detect_speech_activity.sh: Generating new data directory from SAD...
utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]
utils/data/get_utt2num_frames.sh: /home/jupyter/small_test/dev//utt2num_frames already present!
utils/data/subsegment_data_dir.sh: subsegmented data from /home/jupyter/small_test/dev/ to /home/jupyter/small_test/dev/data/dihard3_seg
Empty list of recordings (bad file /home/jupyter/small_test/dev/data/dihard3_seg/segments)?

Daniel Povey

unread,
Jan 19, 2021, 3:33:41 AM1/19/21
to kaldi-help
You'll probably have to supply more details and find approximately where things went wrong, before people can help.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/48eb3a7e-4528-4289-b086-8b222292aa0cn%40googlegroups.com.

`.

unread,
Jan 19, 2021, 4:46:02 AM1/19/21
to kaldi-help
Thanks for the reply Dan, my guess is that the problem lies in the script: post_process_sad_to_segments and/or with the alignment file

The last line creates the segments file which turns out to be empty, I also checked out the smart_open in the common lib which opens the gzip ali file from the pipe. 
I ran this command in the shell and basically it doesnot return anything. I suspect the alignment file is not correct, I have attached it here as well


Screenshot 2021-01-19 at 5.42.28 PM.png

ali (1).1.gz

Daniel Povey

unread,
Jan 19, 2021, 4:53:36 AM1/19/21
to kaldi-help
I don't have time to work out what's wrong with that, file, sorry..

Reply all
Reply to author
Forward
0 new messages