To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/f6921593-9a54-4200-aad8-114168980701%40googlegroups.com.--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/NOrMx6hbBTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
. cmd_local.sh
. path.sh
set -e
tmp_dir_name=tmp_data
src=$1
dst=data/$tmp_dir_name
mfccdir=`pwd`/mfcc
vaddir=`pwd`/mfcc
nnet_dir=exp/xvector_nnet_1a
utt_id="temp_utterance"
utt2spk=$dst/utt2spk; [[ -f "$utt2spk" ]] && rm $utt2spk
utt2dur=$dst/utt2dur; [[ -f "$utt2dur" ]] && rm $utt2dur
mkdir -p $dst || exit 1;
# all utterances are FLAC compressed
if ! which flac >&/dev/null; then
echo "Please install 'flac' on ALL worker nodes!"
exit 1
fi
wav_scp=$dst/wav.scp; [[ -f "$wav_scp" ]] && rm $wav_scp
echo "$utt_id flac -c -d -s $src |" > $wav_scp || exit 1
echo "$utt_id S0" >> $utt2spk || exit 1
spk2utt=$dst/spk2utt
utils/utt2spk_to_spk2utt.pl <$utt2spk >$spk2utt || exit 1
echo "$0: successfully prepared data in $dst"
# make mfcc features
steps/make_mfcc.sh --mfcc-config conf/mfcc.conf --nj 1 --cmd "$train_cmd" --validate-data-dir false \
data/${tmp_dir_name} exp/make_mfcc $mfccdir
sid/compute_vad_decision.sh --nj 1 --cmd "$train_cmd" \
data/${tmp_dir_name} exp/make_vad $vaddir
# extract xvectors
sid/nnet3/xvector/extract_xvectors.sh --cmd "$train_cmd" --nj 1 \
$nnet_dir data/$tmp_dir_name \
exp/xvectors_$tmp_dir_name
# Get results using the adapted PLDA model.
$train_cmd exp/scores/log/${tmp_dir_name}.log \
ivector-plda-scoring --normalize-length=true \
--num-utts=ark:exp/xvectors_${tmp_dir_name}/num_utts.ark \
"ivector-copy-plda --smoothing=0.0 exp/xvectors_sre16_major/plda_adapt - |" \
"ark:ivector-mean ark:data/${tmp_dir_name}/spk2utt scp:exp/xvectors_${tmp_dir_name}/xvector.scp ark:- | ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec ark:- ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
"ark:ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec scp:exp/xvectors_${tmp_dir_name}/xvector.scp ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
"cat '/tmp/trials.txt' | cut -d\ --fields=1,2 |" exp/scores/${tmp_dir_name} || exit 1;
My questions is what is the trials file. the file, sre16_trials=data/sre16_eval_test/trials" which i replaced with /tmp/trials.txt. I assumed it was an output file. Turns out this is an input file.
Do you see anything wrong with my scripts?
Thanks,
Srikar
$train_cmd exp/scores/log/${tmp_dir_name}.log \
ivector-plda-scoring --normalize-length=true \
--num-utts=ark:exp/xvectors_${tmp_dir_name}/num_utts.ark \
"ivector-copy-plda --smoothing=0.0 exp/xvectors_sre16_major/plda_adapt - |" \
"ark:ivector-mean ark:data/${tmp_dir_name}/spk2utt scp:exp/xvectors_${tmp_dir_name}/xvector.scp ark:- | ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec ark:- ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" \
"ark:ivector-subtract-global-mean exp/xvectors_sre16_major/mean.vec scp:exp/xvectors_${tmp_dir_name}/xvector.scp ark:- | transform-vec exp/xvectors_sre_combined/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |"ivector-plda-scoring command.
Does my approach /understanding of the tools sound correct? flac -> mfcc -> ivector -> ivector-plda-scoring -> unnormalized score.
I now need to figure out how to normalize this score and apply a threshold. Any suggestions for this?
Thanks,
Srikar
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to a topic in the Google Groups "kaldi-help" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kaldi-help/NOrMx6hbBTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/391f28b7-41c1-4fe0-a0d9-585b9e809944%40googlegroups.com.
flac -> mfcc -> ivector -> ivector-plda-scoring -> unnormalized score.
I now need to figure out how to normalize this score and apply a threshold. Any suggestions for this?
To unsubscribe from this group and all its topics, send an email to kaldi-help+...@googlegroups.com.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/60f58290-d910-4444-b712-6acdfed2ed37%40googlegroups.com.
3. Identify which segments have voice in them (sid/compute_vad_decision.sh)
Thank you, David.
To unsubscribe from this group and all its topics, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/d32ef2e4-e358-46aa-99bc-5a966e5f25af%40googlegroups.com.
Also, what is the role of utterance_id , speaker_id in the wav.scp and other files. I intend to start with an audio file and expect to get a set of labelled segments indicating the speaker in each segment. If I need to process just one audio file, do I just create dummy speaker and utterance ids to get the scripts to work (and modify the splitting logic to not do any) ?
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/49b1e42b-5138-4736-9037-c558bbfb31ac%40googlegroups.com.
Also, what is the role of utterance_id , speaker_id in the wav.scp and other files. I intend to start with an audio file and expect to get a set of labelled segments indicating the speaker in each segment. If I need to process just one audio file, do I just create dummy speaker and utterance ids to get the scripts to work (and modify the splitting logic to not do any) ?Yes-- in general, prior to diarization, the speaker-id and utterance-id would be the same, IIRC, just reflecting the identity of the audio file you are splitting. But David or Vimal may correct me.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/49b1e42b-5138-4736-9037-c558bbfb31ac%40googlegroups.com.
--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To post to this group, send email to kaldi...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/CAEWAuyS%2BtaGo%2BH8pDkMZOOQjkiOjEdikAgmL2qEOa8D3L67Bag%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
1. You mentioned that training the plda model will help with my dataset in specific. Do you know how much data is needed for this?
2. Are there any parameters that I can tune to improve. e.g. target_energy/window/overlap/ mfcc extraction?
3. What is the difference between the plda_adapt in xvectors_sre16_major and plda model in xvectors_sre_combined (pretrained xvector models)
4. You mentioned that I could also get paid consultation/help. Do you know who can help me with this and how much it would cost?
Vimal
Thank you, David.
Your problem is more complicated than just speaker recognition. Since you have multiple speakers per recording, you'll (probably) want to first perform speaker diarization to split the recording into segments that belong to different speakers. There's a speaker diarization example in <a href="https://github.com/kaldi-asr/kaldi/tree/master/egs/callhome_diarization" rel="nofollow" target="_blank" onmousedown="this.href='https://www.google.com/url
1. You mentioned that training the plda model will help with my dataset in specific. Do you know how much data is needed for this?More is better. I'd aim for at least 1,000 speakers, with several recordings per speaker.I suggest obtaining VoxCeleb1 and VoxCeleb2 (http://www.robots.ox.ac.uk/~vgg/data/voxceleb/ and http://www.robots.ox.ac.uk/~vgg/data/voxceleb2/). In total, that will give you over 7,000 speakers with plenty of recordings per speaker. This data will be a better match for application (which is wideband mic, right?). Also, you'll have enough data to train a new x-vector DNN from scratch (see egs/voxceleb/v2 for some help with that), with wideband features.
2. Are there any parameters that I can tune to improve. e.g. target_energy/window/overlap/ mfcc extraction?The biggest impact will be from tuning the agglomerative clustering stopping threshold. You could also try increasing the target-energy option (try something like 0.95 and decrease from there). There's nothing you can do about MFCC extraction without retraining the x-vector DNN.
3. What is the difference between the plda_adapt in xvectors_sre16_major and plda model in xvectors_sre_combined (pretrained xvector models)This adaptation is specific to the SRE16 recipe. The SRE16 eval consists of Cantonese and Tagalog speech, but most of our training data is English, so the PLDA model in xvectors_sre_combined is trained mostly on English. We adapted it to a small pile of Cantonese and Tagalog data, to get the adapted PLDA model. The adapted model trained on Cantonese and Tagalog is probably not going to be helpful for you, unless they are your target language.
4. You mentioned that I could also get paid consultation/help. Do you know who can help me with this and how much it would cost?I'll let you know if I think of something. Someone might contact you if they're interested.