Invalid id-list file line

56 views

Skip to first unread message

jiaru...@gmail.com

unread,

Jul 8, 2020, 1:08:30 AM7/8/20

to kaldi-help

Format of my files:

text: 001abrir abrir

utt2spk: 001abrir 001

wav.scp: 001abrir /my/path/to/001.wav

I believe that my path.sh file has configured correctly, and I'm not sure what problem this is, pls help thx

===== PREPARING ACOUSTIC DATA =====

===== FEATURES EXTRACTION =====

utils/validate_data_dir.sh: no such file data/train/feats.scp (if this is by design, specify --no-feats)

Invalid id-list file line

steps/make_mfcc.sh --nj 1 --cmd run.pl data/train exp/make_mfcc/train mfcc

steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.

steps/make_mfcc.sh: Succeeded creating MFCC features for train

steps/make_mfcc.sh --nj 1 --cmd run.pl data/test exp/make_mfcc/test mfcc

steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.

steps/make_mfcc.sh: Succeeded creating MFCC features for test

steps/compute_cmvn_stats.sh data/train exp/make_mfcc/train mfcc

Succeeded creating CMVN stats for train

steps/compute_cmvn_stats.sh data/test exp/make_mfcc/test mfcc

Succeeded creating CMVN stats for test

===== PREPARING LANGUAGE DATA =====

utils/prepare_lang.sh data/local/dict <UNK> data/local/lang data/lang

Checking data/local/dict/silence_phones.txt ...

--> reading data/local/dict/silence_phones.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/dict/silence_phones.txt is OK

Checking data/local/dict/optional_silence.txt ...

--> reading data/local/dict/optional_silence.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/dict/optional_silence.txt is OK

Checking data/local/dict/nonsilence_phones.txt ...

--> reading data/local/dict/nonsilence_phones.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/dict/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt

--> disjoint property is OK.

Checking data/local/dict/lexicon.txt

--> reading data/local/dict/lexicon.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/dict/lexicon.txt is OK

Checking data/local/dict/extra_questions.txt ...

--> data/local/dict/extra_questions.txt is empty (this is OK)

--> SUCCESS [validating dictionary directory data/local/dict]

**Creating data/local/dict/lexiconp.txt from data/local/dict/lexicon.txt

fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int

prepare_lang.sh: validating output directory

utils/validate_lang.pl data/lang

Checking existence of separator file

separator file data/lang/subword_separator.txt is empty or does not exist, deal in word case.

Checking data/lang/phones.txt ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/lang/phones.txt is OK

Checking words.txt: #0 ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/lang/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...

--> silence.txt and nonsilence.txt are disjoint

--> silence.txt and disambig.txt are disjoint

--> disambig.txt and nonsilence.txt are disjoint

--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...

--> found no unexplainable phones in phones.txt

Checking data/lang/phones/context_indep.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 10 entry/entries in data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 164 entry/entries in data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 10 entry/entries in data/lang/phones/silence.txt

--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt

--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt

--> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 1 entry/entries in data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 2 entry/entries in data/lang/phones/disambig.txt

--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt

--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt

--> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 43 entry/entries in data/lang/phones/roots.txt

--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt

--> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 43 entry/entries in data/lang/phones/sets.txt

--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt

--> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 9 entry/entries in data/lang/phones/extra_questions.txt

--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt

--> data/lang/phones/extra_questions.{txt, int} are OK

Checking data/lang/phones/word_boundary.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 174 entry/entries in data/lang/phones/word_boundary.txt

--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt

--> data/lang/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ...

--> reading data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1

--> data/lang/phones/disambig.txt has "#0" and "#1"

--> data/lang/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...

--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols

--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt

--> data/lang/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols...

--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)

Checking word_boundary.int and disambig.int

--> generating a 46 word/subword sequence

--> resulting phone sequence from L.fst corresponds to the word sequence

--> L.fst is OK

--> generating a 33 word/subword sequence

--> resulting phone sequence from L_disambig.fst corresponds to the word sequence

--> L_disambig.fst is OK

Checking data/lang/oov.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 1 entry/entries in data/lang/oov.txt

--> data/lang/oov.int corresponds to data/lang/oov.txt

--> data/lang/oov.{txt, int} are OK

--> data/lang/L.fst is olabel sorted

--> data/lang/L_disambig.fst is olabel sorted

--> SUCCESS [validating lang directory data/lang]

===== LANGUAGE MODEL CREATION =====

===== MAKING lm.arpa =====

===== MAKING G.fst =====

arpa2fst -

LOG (arpa2fst[5.5.733~1-84a6e]:Read():arpa-file-parser.cc:94) Reading \data\ section.

LOG (arpa2fst[5.5.733~1-84a6e]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.

===== MONO TRAINING =====

steps/train_mono.sh --nj 1 --cmd run.pl data/train data/lang exp/mono

Bad line

, could not get first field. at utils/filter_scps.pl line 123, <F> line 1.

Daniel Povey

unread,

Jul 8, 2020, 1:52:27 AM7/8/20

to kaldi-help

try utils/validate_data_dir.sh data/train

could be spk2utt

--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/865f7acd-c847-485a-80b4-1b7bfada6186n%40googlegroups.com.

jiaru...@gmail.com

unread,

Jul 8, 2020, 2:40:31 AM7/8/20

to kaldi-help

Thanks for the help!

I ran the validate_data_dir.sh, it returned utils/validate_data_dir.sh: Mal-formed spk2gender file

It turned out that I got an extra line at the beginning of the spk2gender file, and I can finish the script now.

Reply all

Reply to author

Forward

0 new messages