I've trained a speaker identification model using the recipe in Kaldi/sre16/v2. Now i want to use the model to identify the speaker in a new wav.file that's not in training data. This is the script that i am using:
#!/bin/bash
. ./path.sh || exit 1
. ./cmd.sh || exit 1
set -e
segmentsAudioDir=$1 #/home/cle-dl-1/Downloads/kaldi-master/Data_For_SE/Anonymous #$1 # audio segments path
identificationFolder=$2 #/home/cle-dl-1/New_Kaldi/SpeechandSpeakerRecognizer/SpeakerIdentification #$2 # Identification module path
outputFilePath=$3 #Anonymous_123 #$3 # output.txt file path
train_dir=train
#test_dir_name_path=$1
readarray -d / -t strarr <<< "$segmentsAudioDir"
test_dir=${strarr[-1]}
test_dir=${test_dir: 0:-1}
echo "-------------------------------------------------------------------", $test_dir
####################################
#test_dir=Anonymous_123
################################
ft=$identificationFolder/mfcc_$test_dir
mfccdir=$ft
vaddir=$ft
stage=1
nj=1
k=128
d=600
#Fix_data_directory
DIRECTORY=$identificationFolder/data/$test_dir
if [ -d "$DIRECTORY" ]; then
rm -r $DIRECTORY
fi
mkdir -p $identificationFolder/data/$test_dir
if [ $stage -eq 1 ]; then
wav=$identificationFolder/data/$test_dir/wav.scp
cp $segmentsAudioDir/wav.scp $wav
utt=$identificationFolder/data/$test_dir/utt2spk
cp $segmentsAudioDir/utt2spk $utt
t_ref=$identificationFolder/data/$test_dir/target_reference
cp $segmentsAudioDir/target_reference $t_ref
cp $identificationFolder/trial_tst.py $identificationFolder/data/$test_dir/trial_tst.py
python3 $identificationFolder/data/$test_dir/trial_tst.py $identificationFolder/data/$test_dir
cp $identificationFolder/target_unkn_generator.py $identificationFolder/data/$test_dir/target_unkn_generator.py
cp $identificationFolder/used_ids.txt $identificationFolder/data/$test_dir/used_ids.txt
python3 $identificationFolder/data/$test_dir/target_unkn_generator.py $identificationFolder/data/$test_dir/target_reference $identificationFolder/data/$test_dir/used_ids.txt $identificationFolder/data/$test_dir/target_reference_unkn
uttspk=$identificationFolder/data/$test_dir/spk2utt
chmod +x $identificationFolder/utils/
utt2spk_to_spk2utt.pl $identificationFolder/utils/
utt2spk_to_spk2utt.pl $utt >$uttspk
chmod +x $identificationFolder/utils/fix_data_dir.sh
$identificationFolder/utils/fix_data_dir.sh $identificationFolder/data/$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
if [ $stage -eq 2 ]; then
echo "Making MFCCs"
chmod +x $identificationFolder/steps/make_mfcc.sh
$identificationFolder/steps/make_mfcc.sh --mfcc-config $identificationFolder/conf/mfcc.conf --nj $nj --cmd
run.pl $identificationFolder/data/$test_dir $identificationFolder/exp/make_mfcc_$test_dir $mfccdir
echo "Computing VADs"
chmod +x $identificationFolder/sid/compute_vad_decision.sh
$identificationFolder/sid/compute_vad_decision.sh --vad-config $identificationFolder/conf/vad.conf --nj $nj --cmd
run.pl $identificationFolder/data/$test_dir $identificationFolder/exp/make_vad_$test_dir $vaddir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
if [ $stage -eq 3 ]; then
chmod +x $identificationFolder/sid/extract_xvectors.sh
$identificationFolder/sid/extract_xvectors.sh --cmd
run.pl --nj $nj $identificationFolder/exp/xvector_nnet_1a $identificationFolder/data/$test_dir $identificationFolder/exp/xvectors_$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
#This stage is used to assign the speakers to utterences having highest scores
if [ $stage -eq 4 ]; then
spk=$identificationFolder/exp/xvectors_train_combined/spk_xvector.scp
chmod +x $spk
xvec=$identificationFolder/exp/xvectors_$test_dir/xvector.scp
target=$identificationFolder/data/$test_dir/target
python3 $identificationFolder/local/target.py $spk $xvec $target
trials=$identificationFolder/data/$test_dir/target
cat $trials | awk '{print $1, $2}' | \
ivector-compute-dot-products - \
scp:$spk \
"ark:ivector-normalize-length scp:$identificationFolder/exp/xvectors_$test_dir/xvector.1.scp ark:- |" \
$identificationFolder/mfcc_cosine_$test_dir'_'1
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
if [ $stage -eq 5 ]; then
#python3 $identificationFolder/local_o/max_score_new.py $identificationFolder/mfcc_cosine_$test_dir'_'1 $identificationFolder/mfcc_cosine_$test_dir
python3 $identificationFolder/local_o/max_score.py $identificationFolder/mfcc_cosine_$test_dir'_'1 $identificationFolder/mfcc_cosine_$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
if [ $stage -eq 6 ]; then
trials=$identificationFolder/data/$test_dir/testPLDA.trials
trialsEER=$identificationFolder/data/$test_dir/test.trials
#python3 $identificationFolder/testing.py $identificationFolder/mfcc_cosine_$test_dir $identificationFolder/data/$test_dir/target_reference_unkn
#python3 $identificationFolder/threshold_tst.py $identificationFolder/mfcc_cosine_$test_dir'_'Predicted $identificationFolder/data/$test_dir/target_reference_unkn $identificationFolder/mfcc_cosine_$test_dir'_'Predicted_threshold_$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
if [ $stage -eq 7 ]; then
#finding accuracy
t_ref=$identificationFolder/data/$test_dir/target_reference
#python3 $identificationFolder/output_formater_new.py $identificationFolder/spkMapping.txt $identificationFolder/mfcc_cosine_$test_dir'_'Predicted_threshold_$test_dir $outputFilePath
python3 $identificationFolder/output_formater_orig.py $identificationFolder/spkMapping.txt $identificationFolder/mfcc_cosine_$test_dir $outputFilePath
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi
I was previously using this script for i-vector based model and have modified it for x-vector based model. Can anyone tell if this is okay because in the output file, it just gives me one speaker for all utterances. I then replaced my x-vector-nnet-1a with pre-existing one and then i started to get more speakers, although they were incorrect but still from that i deduced that my model wasn't trained properly. I am trying to figure out the error in my training script but is it possible that the script i am using(provided above) is faulty?
Thanks.