Are there any good egs to learn how to extract x-vectors?

339 views
Skip to first unread message

Algebrian

unread,
Jul 22, 2022, 5:01:35 AM7/22/22
to kaldi-help
Hi.

I am new to Kaldi (and am sorry about my poor English).
I want to extract x-vectors by using the pre-trained model.
Is there any good examples or pages to learn how do that?
(I need to extract x-vectors, not for sre or diarizations.)

Thanks.

Jan Yenda Trmal

unread,
Jul 22, 2022, 4:15:05 PM7/22/22
to kaldi-help
The trained model is pointing to the recipe that has been used to train and extract the xvectors
Review the recipe (run.sh as a starting point) or grep for the usage of extract_xvectors.sh
y.


--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/a98ea950-777c-44ac-9fcd-93165cedf05dn%40googlegroups.com.

Algebrian

unread,
Jul 26, 2022, 5:50:38 AM7/26/22
to kaldi-help
Thank you for the reply.

So I guess I should do is to :
  1.  copy the files in "00003_sre16_v2_1a" to kaldi/egs/sre16/v2 (but carefully, not to break the symbolic links)
  2.  do "bash extract_xvectors.sh $1 $2 $3"

Is this correct?
And if so, I have some questions about arguments:
  •  $1 is exp/xvector_nnet_1a?
  •  $2 is a directory that I have to make? If so, $2 directory could contain my own files (.wav format) or do I have to download NIST SRE16 datasets?
  •  $3 should be what? I guess "exp/xvectors_sre16_major" or "exp/xvectors_sre_combined".

Regards.
2022年7月23日土曜日 5:15:05 UTC+9 Yenda:

Jan Yenda Trmal

unread,
Jul 26, 2022, 10:48:53 AM7/26/22
to kaldi-help
I'm not familiar with the recipe. You shouldn't need the pretrained model to train the vectors (but you will need data).
I just thought you want a code reference. Sorry
y.

Desh Raj

unread,
Jul 26, 2022, 10:52:40 AM7/26/22
to kaldi...@googlegroups.com
Yes, look at the "usage" string in this file: https://github.com/kaldi-asr/kaldi/blob/master/egs/sre08/v1/sid/nnet3/xvector/extract_xvectors.sh.

Your main task would be to prepare your data directory in the Kaldi style (see this: https://kaldi-asr.org/doc/data_prep.html) and then extract features from it (check out feature extraction stage in any of the run.sh scripts). Once that is done, you can use the extract_xvectors.sh script.

Desh

Algebrian

unread,
Aug 2, 2022, 6:41:36 AM8/2/22
to kaldi-help
My thanks for both of you for replying.

I checked the page (https://kaldi-asr.org/doc/data_prep.html), and I prepared some files named "text", "utt2spk", "wav.scp".
And, I did $ utils/fix_data_dir.sh data/train then spk2utt was made automatically.
So, in the directory (sre16/v2/data/train/) is;
 text, utt2spk, spk2utt, wav.scp , data(directory, contains wav files)

Then I tried $ steps/make_mfcc.sh data/train

And the result is:
steps/make_mfcc.sh data/train
utils/validate_data_dir.sh: Error: in data/train, utterance lists extracted from utt2spk and wav.scp
utils/validate_data_dir.sh: differ, partial diff is:
--- /tmp/kaldi.Cuc2/utts    2022-08-02 18:11:37.127440531 +0900
+++ /tmp/kaldi.Cuc2/utts.wav    2022-08-02 18:11:37.135440438 +0900
@@ -1,400 +1,400 @@
-jvs001-001_jvs001-001
-jvs001-002_jvs001-002
-jvs001-003_jvs001-003
...
+jvs004-095
+jvs004-096
+jvs004-097
+jvs004-098
+jvs004-099
+jvs004-100
[Lengths are /tmp/kaldi.Cuc2/utts=400 versus /tmp/kaldi.Cuc2/utts.wav=400]


How can I solve this?

(The corpora I plan to use is JVS-corpus, particularly parallel100 set)
I prepared the IDs and filename as belows:
  • speaker-id = jvs$1 ($1 is a number from 001 to 004)
  • recording-id = jvs$1-$2 ($2 is a number from 001 to 100)
  • utterance-id = jvs$1-$2_jvs$1-$2 (I want to handle one file as one uttrance for now, so I set them like duplicate of recording-id)
  • extended-filename = path for the data files
So the files are like:
test; 
jvs001-001_jvs001-001 *Japanese text*
jvs001-002_jvs001-002 *Japanese text*
  ...
jvs004-100_jvs004-100 *Japanese text*

utt2spk;
jvs001-001_jvs001-001 jvs001
jvs001-002_jvs001-002 jvs001
...
jvs004-100_jvs004-100 jvs004

wav.scp;
jvs001-001 ./speakers/jvs001/VOICEACTRESS100_001.wav
jvs001-002 ./speakers/jvs001/VOICEACTRESS100_002.wav
...
jvs004-100 ./speakers/jvs004/VOICEACTRESS100_100.wav


Regards.
2022年7月26日火曜日 23:52:40 UTC+9 r.de...@gmail.com:

Algebrian

unread,
Aug 2, 2022, 6:53:11 AM8/2/22
to kaldi-help
To supplement the last message I sent:

spk2utt;
jvs001 jvs001-001_jvs001-001 jvs001-002_jvs001-002 ... jvs001-100_jvs001-100
...
jvs004 jvs004-001_jvs004-001 jvs004-002_jvs004-002 ... jvs004-100_jvs004-100
2022年8月2日火曜日 19:41:36 UTC+9 Algebrian:

Desh Raj

unread,
Aug 2, 2022, 10:40:41 AM8/2/22
to kaldi...@googlegroups.com
Since you don't have a segments file, the utterance ids in utt2spk should be same as the recording ids in wav.scp (i.e., each utterance is a recording).

Algebrian

unread,
Aug 3, 2022, 2:54:01 AM8/3/22
to kaldi-help
>"Since you don't have a segments file, the utterance ids in utt2spk should be same as the recording ids in wav.scp."
Ah I got it, thank you.
After fixing that (and some others), I did $ steps/make_mfcc.sh data/train and succeeded creationg MFCC features for train.

Then I tried $ diarization/nnet3/xvector/extract_xvectors.sh exp/xvector_nnet_1a data/train exp/xvectors_jvs_corpus
and the result is:
diarization/nnet3/xvector/extract_xvectors.sh exp/xvector_nnet_1a data/train xvector_nnet_1a
diarization/nnet3/xvector/extract_xvectors.sh: using exp/xvector_nnet_1a/extract.config to extract xvectors
usage: get_uniform_subsegments.py [-h]
                                  [--max-segment-duration MAX_SEGMENT_DURATION]
                                  [--overlap-duration OVERLAP_DURATION]
                                  [--max-remaining-duration MAX_REMAINING_DURATION]
                                  [--constant-duration CONSTANT_DURATION]
                                  segments_file
get_uniform_subsegments.py: error: argument segments_file: can't open 'data/train/segments': [Errno 2] No such file or directory: 'data/train/segments'
utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]
utils/data/get_utt2num_frames.sh: data/train/utt2num_frames already present!
utils/data/subsegment_data_dir.sh: subsegmented data from data/train to xvector_nnet_1a/subsegments_data
utils/split_scp.pl: Refusing to split data because number of speakers 0 is less than the number of output .scp files 30


The error is,   [Errno 2] No such file or directory: 'data/train/segments' .
How can I fix it?
2022年8月2日火曜日 23:40:41 UTC+9 r.de...@gmail.com:

Desh Raj

unread,
Aug 3, 2022, 9:57:22 AM8/3/22
to kaldi...@googlegroups.com
You should use the extract_xvectors.sh in sid, not in diarization. See the sre16 recipe.

Desh

Algebrian

unread,
Aug 4, 2022, 4:42:03 AM8/4/22
to kaldi-help
Thanks for the reply.
I should've checked run.sh.

I prepared vad.scp by $ steps/compute_vad_decision.sh data/train
and $ utils/fix_data_dir.sh .

Then I did $ sid/nnet3/xvector/extract_xvectors.sh --nj 4 exp/xvector_nnet_1a data/train exp/xvectors_jvs_corpus
and the result is:
sid/nnet3/xvector/extract_xvectors.sh --nj 4 exp/xvector_nnet_1a data/train exp/xvectors_jvs_corpus
sid/nnet3/xvector/extract_xvectors.sh: using exp/xvector_nnet_1a/extract.config to extract xvectors
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors for data/train
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors from nnet
sid/nnet3/xvector/extract_xvectors.sh: combining xvectors across jobs
sid/nnet3/xvector/extract_xvectors.sh: computing mean of xvectors for each speaker


Seems ok I guess.

Now, there are spk_xvector.arkxvector.*.ark (and some others) in exp/xvectors_jvs_corpus.
I think the spk_xvector.ark is the xvectors I wanted.
If so, I need to make the binary files into text files.

Any recommendation?
(I don't know how to invoke copy‐feats command. )
2022年8月3日水曜日 22:57:22 UTC+9 r.de...@gmail.com:

Algebrian

unread,
Aug 5, 2022, 3:47:24 AM8/5/22
to kaldi-help
I made the x-vector files into readable ones by using $ copy-vector command.

So finally, I got the things done.
Now I can handle them with python or something I like to use.
I really thank you guys.
I have been stucking while a month before I post the first question
since I wasn't familiar with acoustic field, shell script or kaldi script and I'm also bad at English.
Your replies got me motivated.

My thanks for both of you.
2022年8月4日木曜日 17:42:03 UTC+9 Algebrian:

Algebrian

unread,
Aug 5, 2022, 6:29:46 AM8/5/22
to kaldi-help
Oh no, I still have a problem...

I have extracted xvectors of 4 people, then I tried to do the same thing of 100 people.
And when I do $sid/nnet3/xvector/extract_xvectors.sh , core dump occurs.
Are there any good way to handle this?

Regards
2022年8月5日金曜日 16:47:24 UTC+9 Algebrian:
Reply all
Reply to author
Forward
0 new messages