Beginner's question about utt2spk to spk2utt

2021 MS

unread,

May 22, 2021, 11:27:52 AM5/22/21

to kaldi-help

Hello there!

I am an absolute beginner about speaker recognition. Recently I have got to known your open-sourced project Kaldi. Thank you very much for maintenance and help on this project. And I want to use this program with my own data set .(just a small speaker recognition program and I will use this program on my laptop) I have chosen aishell folder to put all my data in. Moreover, I have also looked over Kaldi for Dummies tutorial on Kaldi official website. But when I launch my own program, I encounter a problem about converting utt2spk file to spk2utt file.

As these picture show, I have recorded 32 data files from 4 different speakers. I followed a tutorial about how to use own data set to make full use of this project. I divide test data and training data into /data/local/test and /data/local/train as the example does. But as you may see, the original utt2spk will be changed into false utt2spk file, which will lead to false spk2utt file and cause further issue that only 1 speaker can be viewed as test data and recognized by the program. I want to know if this program only supports 1 speaker or there are some mistakes I have made. Please point out what I omit and any mistake I make. Thanks for taking you precious time to read this and help me! Look forward to your reply and wish you good life!

2021 MS

unread,

May 22, 2021, 11:33:13 AM5/22/21

to kaldi-help

This should be original utt2spk file, but I will be changed to false utt2spk that above picture indicates.

2021 MS

unread,

May 22, 2021, 11:43:30 AM5/22/21

to kaldi-help

These are my folders under /egs/aishell/v1. My own speaker data set locates at /v1/data/wav/test and /v1/data/wav/train. spk2tt utt2spk wav.scp are all in /v1/data/local/test and /v1/data/local/train. The false files locate at /v1/data/local/test( files in /v1/data/local/train are normal)

2021 MS

unread,

May 22, 2021, 11:49:36 AM5/22/21

to kaldi-help

PS: Since I don't use codes about speech recognition, I have deleted codes about that (including transcipts.txt text file and other potential files）

The screenshot below is currently output after I execute ./run.sh.

This is my first post here, sorry for any inconvenience I may cause.

Maple Ma

unread,

May 22, 2021, 1:58:14 PM5/22/21

to kaldi-help

Update:I have solved this issue! But I have another question to ask. As you can see in the past screen shot, it shows that [Info] no segments file exists: assuming wav.scp indexed by utterances. How can I cope with this problem or just ignore it?(for it doesn’t affect the execution of run.sh)Thanks in advance!

Karla Rehn

unread,

May 25, 2021, 10:31:04 AM5/25/21

to kaldi-help

Are your wav.scp indexed by utterance? That is, does each line in wav.scp point to a file containing only one utterance? (Judging from your file names, that is the case.)

In that case you can just ignore the [Info]. However, if your sound-files contain more than one utterance, you'll need the segment-file and can not ignore the [Info].

Maple Ma

unread,

May 26, 2021, 12:07:03 PM5/26/21

to kaldi-help

Thanks for your reply! But I want to know if I catch what you said correctly. What kind of sound-files can be viewed as containing only one utterance? Only have a speaker or multiple speakers but all their parts are divided(and if one speaker say multiple words in single wav file, do I need segent-file？）? Does the existence of segment-file apply to the second situation I mention above? (I have read segment-file format on Kaldi doc and have no idea whether I learn it correctly) Wait your reply, thanks in advance!

PS: My files are all named by their contents.eg: speaker1 said hello world, this file is named as speaker1_hw.wav.

Reply all

Reply to author

Forward