Hello I have some questions regarding "text" file format in data preparation step.
Consider the following line in the "text" file.
AlGore_2009-0001304-0002346 last year i showed these two slides so that demonstrated something
This means "AlGore_2009" speaking from "0001304" to "0002346" and the transcription is "last year..." right?
What if the audio file (either *.sph or *.wav) I have is not long speech but just only one sentence. In that case, what should I put in for the start and the end times?
Consider the following line in my "text" file.
sp01_train_sn0 the birch canoe slid on the smooth planks
Can this mean "sp01_train_sn0" speaking from the start to the end (of the audio) and the transcription is "the birch canoe..."?
I also noticed as I'm writing this question that I have two "underscore"s as a part of the speaker name. I think it would cause problems differentiating between the speaker and the utterance. Would it? If I did, would changing "sp01_train_sn0" to "sp01_trainsn0" solve the problem?
Regardless I messed up this "underscore" part or not, can specifying no time information in the "text" file be considered as taking from the start and the end time?
If it doesn't, what are your suggestions?
I would appreciate your help.