Hey out there, apologies for the beginner question, I promise I have done a lot of searching already but I can't seem to solve this one on my own.
I think the .sh files are reading my data differently than they should be. I am running all of this through a virtual box version of Ubuntu because I'm on a Windows, don't know if that has anything to do with it...
I'm getting the errors:
steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfccutils/validate_text.pl: The line for utterance 763_cora contains CR (0x0D) character and
utils/validate_text.pl: ERROR: text file 'data/train/text' contains disallowed UTF-8 whitespace character(s)
It's also telling me:
steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc utils/validate_data_dir.sh: WARNING: you have only one speaker. This is probably a bad idea.
But that's not true.
My text file looks like (though not actually bolded):
763_anton i am finally done with my english homework anton told his mom his mom gave him a look and asked oh really it took me an hour and a half now i better not wait to start my math anton quickly took out his work this should not take too long in fact it should only take about ten minutes
763_cora cora and her baby brother theo are playing in the snow although its chilly the bright sun feels like its already spring as cora pushes theos sled to the top of the hill she smiles he looks exactly like a bear in his furry snowsuit theo laughs and then surprise he rolls off the sled and starts sliding down the hill he quickly gains speed over a steep icy section racing after him cora sees someone appear at the bottom of the hill with care her mother scoops up her speeding son standing up she laughs no broken bones on this little bear now lets go inside and have lunch
773_anton i am finally done with my english homework anton told his mom his mom gave him a look and asked oh really it took me an hour and a half now i better not wait to start my math anton quickly took out his work this should not take too long in fact it should only take about ten minutes
773_cora cora and her baby brother theo are playing in the snow although its chilly the bright sun feels like its already spring as cora pushes theos sled to the top of the hill she smiles he looks exactly like a bear in his furry snowsuit theo laughs and then surprise he rolls off the sled and starts sliding down the hill he quickly gains speed over a steep icy section racing after him cora sees someone appear at the bottom of the hill with care her mother scoops up her speeding son standing up she laughs no broken bones on this little bear now lets go inside and have lunch
...it goes on.
I have two passages (anton and cora) that each student read once. so each speaker ID is attached to two utterances. There are no Carriage Return characters except to start a text ID
I've tried playing with the encoding but I don't know how to go about altering that nor how I would alter it to satisfy the program. I created these files all myself as instructed in the Kaldi for dummies tutorial, so I don't see how I could have encoded whitespace or CR characters into it that weren't meant to be there.
Any help greatly appreciate, Thank you!