Using X-vector based Speaker Identification Model

22 views

Skip to first unread message

Ayesha Khalid

unread,

Mar 25, 2024, 2:32:59 AMMar 25

to kaldi-help

Hey Everyone,

I've trained a speaker identification model using the recipe in Kaldi/sre16/v2. Now i want to use the model to identify the speaker in a new wav.file that's not in training data. This is the script that i am using:

#!/bin/bash

. ./path.sh || exit 1
. ./cmd.sh || exit 1

set -e

segmentsAudioDir=$1 #/home/cle-dl-1/Downloads/kaldi-master/Data_For_SE/Anonymous #$1 # audio segments path
identificationFolder=$2 #/home/cle-dl-1/New_Kaldi/SpeechandSpeakerRecognizer/SpeakerIdentification #$2 # Identification module path
outputFilePath=$3 #Anonymous_123 #$3 # output.txt file path

train_dir=train
#test_dir_name_path=$1
readarray -d / -t strarr <<< "$segmentsAudioDir"
test_dir=${strarr[-1]}
test_dir=${test_dir: 0:-1}
echo "-------------------------------------------------------------------", $test_dir

####################################
#test_dir=Anonymous_123

################################

ft=$identificationFolder/mfcc_$test_dir
mfccdir=$ft
vaddir=$ft
stage=1
nj=1
k=128
d=600

#Fix_data_directory
DIRECTORY=$identificationFolder/data/$test_dir
if [ -d "$DIRECTORY" ]; then
rm -r $DIRECTORY
fi
mkdir -p $identificationFolder/data/$test_dir

if [ $stage -eq 1 ]; then

wav=$identificationFolder/data/$test_dir/wav.scp
cp $segmentsAudioDir/wav.scp $wav

utt=$identificationFolder/data/$test_dir/utt2spk
cp $segmentsAudioDir/utt2spk $utt

t_ref=$identificationFolder/data/$test_dir/target_reference
cp $segmentsAudioDir/target_reference $t_ref

cp $identificationFolder/trial_tst.py $identificationFolder/data/$test_dir/trial_tst.py
python3 $identificationFolder/data/$test_dir/trial_tst.py $identificationFolder/data/$test_dir

cp $identificationFolder/target_unkn_generator.py $identificationFolder/data/$test_dir/target_unkn_generator.py
cp $identificationFolder/used_ids.txt $identificationFolder/data/$test_dir/used_ids.txt

python3 $identificationFolder/data/$test_dir/target_unkn_generator.py $identificationFolder/data/$test_dir/target_reference $identificationFolder/data/$test_dir/used_ids.txt $identificationFolder/data/$test_dir/target_reference_unkn

uttspk=$identificationFolder/data/$test_dir/spk2utt
chmod +x $identificationFolder/utils/utt2spk_to_spk2utt.pl
$identificationFolder/utils/utt2spk_to_spk2utt.pl $utt >$uttspk
chmod +x $identificationFolder/utils/fix_data_dir.sh
$identificationFolder/utils/fix_data_dir.sh $identificationFolder/data/$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]

fi

if [ $stage -eq 2 ]; then
echo "Making MFCCs"
chmod +x $identificationFolder/steps/make_mfcc.sh
$identificationFolder/steps/make_mfcc.sh --mfcc-config $identificationFolder/conf/mfcc.conf --nj $nj --cmd run.pl $identificationFolder/data/$test_dir $identificationFolder/exp/make_mfcc_$test_dir $mfccdir

echo "Computing VADs"
chmod +x $identificationFolder/sid/compute_vad_decision.sh
$identificationFolder/sid/compute_vad_decision.sh --vad-config $identificationFolder/conf/vad.conf --nj $nj --cmd run.pl $identificationFolder/data/$test_dir $identificationFolder/exp/make_vad_$test_dir $vaddir

echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi

if [ $stage -eq 3 ]; then
chmod +x $identificationFolder/sid/extract_xvectors.sh
$identificationFolder/sid/extract_xvectors.sh --cmd run.pl --nj $nj $identificationFolder/exp/xvector_nnet_1a $identificationFolder/data/$test_dir $identificationFolder/exp/xvectors_$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi

#This stage is used to assign the speakers to utterences having highest scores
if [ $stage -eq 4 ]; then
spk=$identificationFolder/exp/xvectors_train_combined/spk_xvector.scp
chmod +x $spk

xvec=$identificationFolder/exp/xvectors_$test_dir/xvector.scp
target=$identificationFolder/data/$test_dir/target
python3 $identificationFolder/local/target.py $spk $xvec $target
trials=$identificationFolder/data/$test_dir/target

cat $trials | awk '{print $1, $2}' | \
ivector-compute-dot-products - \
scp:$spk \
"ark:ivector-normalize-length scp:$identificationFolder/exp/xvectors_$test_dir/xvector.1.scp ark:- |" \
$identificationFolder/mfcc_cosine_$test_dir'_'1

echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi

if [ $stage -eq 5 ]; then
#python3 $identificationFolder/local_o/max_score_new.py $identificationFolder/mfcc_cosine_$test_dir'_'1 $identificationFolder/mfcc_cosine_$test_dir
python3 $identificationFolder/local_o/max_score.py $identificationFolder/mfcc_cosine_$test_dir'_'1 $identificationFolder/mfcc_cosine_$test_dir
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]

fi

if [ $stage -eq 6 ]; then

trials=$identificationFolder/data/$test_dir/testPLDA.trials
trialsEER=$identificationFolder/data/$test_dir/test.trials

#python3 $identificationFolder/testing.py $identificationFolder/mfcc_cosine_$test_dir $identificationFolder/data/$test_dir/target_reference_unkn
#python3 $identificationFolder/threshold_tst.py $identificationFolder/mfcc_cosine_$test_dir'_'Predicted $identificationFolder/data/$test_dir/target_reference_unkn $identificationFolder/mfcc_cosine_$test_dir'_'Predicted_threshold_$test_dir

echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]
fi

if [ $stage -eq 7 ]; then
#finding accuracy

t_ref=$identificationFolder/data/$test_dir/target_reference
#python3 $identificationFolder/output_formater_new.py $identificationFolder/spkMapping.txt $identificationFolder/mfcc_cosine_$test_dir'_'Predicted_threshold_$test_dir $outputFilePath
python3 $identificationFolder/output_formater_orig.py $identificationFolder/spkMapping.txt $identificationFolder/mfcc_cosine_$test_dir $outputFilePath
echo "8888888888888888888888888888888 {Stage: $stage} 88888888888888888888888888888888888"
stage=$[ stage+1 ]

fi

I was previously using this script for i-vector based model and have modified it for x-vector based model. Can anyone tell if this is okay because in the output file, it just gives me one speaker for all utterances. I then replaced my x-vector-nnet-1a with pre-existing one and then i started to get more speakers, although they were incorrect but still from that i deduced that my model wasn't trained properly. I am trying to figure out the error in my training script but is it possible that the script i am using(provided above) is faulty?