Bad Variable name & L_disambig.fst is not olabel sorted

176 views

Skip to first unread message

GMD Baloch

unread,

Apr 22, 2021, 1:11:11 AM4/22/21

to kaldi-help

Hi everyone,

This is my first experience with ubuntu and Kaldi. I have installed ubuntu on Windows 10 from the Microsoft store and I have to use Kaldi for a project of the speech recognition system.

I have followed this tutorial and have successfully created all the files listed till step 5.3.5

except for the two files named as segments in 5.2.2 and extra_questions.txt in 5.3.5, we were told by our teaching staff to follow this tutorial and they said that these two are optional so leave them.

So, when I run the following command I get an error saying L_disambig.fst is not olabel sorted and looking a bit back there is a bad variable name error which I think arises due to a space in the address C:\program files.

gmd144@DESKTOP-T8H4M35:~/kaldi/egs/mycorpus$ utils/prepare_lang.sh data/local/lang '<oov>' data/local/ data/lang

utils/prepare_lang.sh data/local/lang <oov> data/local/ data/lang

Checking data/local/lang/silence_phones.txt ...

--> reading data/local/lang/silence_phones.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/lang/silence_phones.txt is OK

Checking data/local/lang/optional_silence.txt ...

--> reading data/local/lang/optional_silence.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/lang/optional_silence.txt is OK

Checking data/local/lang/nonsilence_phones.txt ...

--> reading data/local/lang/nonsilence_phones.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/lang/nonsilence_phones.txt is OK

Checking disjoint: silence_phones.txt, nonsilence_phones.txt

--> disjoint property is OK.

Checking data/local/lang/lexicon.txt

--> reading data/local/lang/lexicon.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/lang/lexicon.txt is OK

Checking data/local/lang/lexiconp.txt

--> reading data/local/lang/lexiconp.txt

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/local/lang/lexiconp.txt is OK

Checking lexicon pair data/local/lang/lexicon.txt and data/local/lang/lexiconp.txt

--> lexicon pair data/local/lang/lexicon.txt and data/local/lang/lexiconp.txt match

Checking data/local/lang/extra_questions.txt ...

--> data/local/lang/extra_questions.txt is empty (this is OK)

--> SUCCESS [validating dictionary directory data/local/lang]

fstaddselfloops data/lang/phones/wdisambig_phones.int data/lang/phones/wdisambig_words.int

prepare_lang.sh: validating output directory

utils/validate_lang.pl data/lang

Checking existence of separator file

separator file data/lang/subword_separator.txt is empty or does not exist, deal in word case.

Checking data/lang/phones.txt ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/lang/phones.txt is OK

Checking words.txt: #0 ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> data/lang/words.txt is OK

Checking disjoint: silence.txt, nonsilence.txt, disambig.txt ...

--> silence.txt and nonsilence.txt are disjoint

--> silence.txt and disambig.txt are disjoint

--> disambig.txt and nonsilence.txt are disjoint

--> disjoint property is OK

Checking sumation: silence.txt, nonsilence.txt, disambig.txt ...

--> found no unexplainable phones in phones.txt

Checking data/lang/phones/context_indep.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 10 entry/entries in data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.int corresponds to data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.csl corresponds to data/lang/phones/context_indep.txt

--> data/lang/phones/context_indep.{txt, int, csl} are OK

Checking data/lang/phones/nonsilence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 240 entry/entries in data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.int corresponds to data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.csl corresponds to data/lang/phones/nonsilence.txt

--> data/lang/phones/nonsilence.{txt, int, csl} are OK

Checking data/lang/phones/silence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 10 entry/entries in data/lang/phones/silence.txt

--> data/lang/phones/silence.int corresponds to data/lang/phones/silence.txt

--> data/lang/phones/silence.csl corresponds to data/lang/phones/silence.txt

--> data/lang/phones/silence.{txt, int, csl} are OK

Checking data/lang/phones/optional_silence.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 1 entry/entries in data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.int corresponds to data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.csl corresponds to data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.{txt, int, csl} are OK

Checking data/lang/phones/disambig.{txt, int, csl} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 4 entry/entries in data/lang/phones/disambig.txt

--> data/lang/phones/disambig.int corresponds to data/lang/phones/disambig.txt

--> data/lang/phones/disambig.csl corresponds to data/lang/phones/disambig.txt

--> data/lang/phones/disambig.{txt, int, csl} are OK

Checking data/lang/phones/roots.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 62 entry/entries in data/lang/phones/roots.txt

--> data/lang/phones/roots.int corresponds to data/lang/phones/roots.txt

--> data/lang/phones/roots.{txt, int} are OK

Checking data/lang/phones/sets.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 62 entry/entries in data/lang/phones/sets.txt

--> data/lang/phones/sets.int corresponds to data/lang/phones/sets.txt

--> data/lang/phones/sets.{txt, int} are OK

Checking data/lang/phones/extra_questions.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 9 entry/entries in data/lang/phones/extra_questions.txt

--> data/lang/phones/extra_questions.int corresponds to data/lang/phones/extra_questions.txt

--> data/lang/phones/extra_questions.{txt, int} are OK

Checking data/lang/phones/word_boundary.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 250 entry/entries in data/lang/phones/word_boundary.txt

--> data/lang/phones/word_boundary.int corresponds to data/lang/phones/word_boundary.txt

--> data/lang/phones/word_boundary.{txt, int} are OK

Checking optional_silence.txt ...

--> reading data/lang/phones/optional_silence.txt

--> data/lang/phones/optional_silence.txt is OK

Checking disambiguation symbols: #0 and #1

--> data/lang/phones/disambig.txt has "#0" and "#1"

--> data/lang/phones/disambig.txt is OK

Checking topo ...

Checking word_boundary.txt: silence.txt, nonsilence.txt, disambig.txt ...

--> data/lang/phones/word_boundary.txt doesn't include disambiguation symbols

--> data/lang/phones/word_boundary.txt is the union of nonsilence.txt and silence.txt

--> data/lang/phones/word_boundary.txt is OK

Checking word-level disambiguation symbols...

--> data/lang/phones/wdisambig.txt exists (newer prepare_lang.sh)

Checking word_boundary.int and disambig.int

sh: 1: export: Files/WindowsApps/CanonicalGroupLimited.Ubuntu20.04onWindows_2004.2021.222.0_x64__79rhkp1fndgsc:/mnt/c/Windows/system32:/mnt/c/Windows:/mnt/c/Windows/System32/Wbem:/mnt/c/Windows/System32/WindowsPowerShell/v1.0/:/mnt/c/Windows/System32/OpenSSH/:/mnt/c/Program: bad variable name

--> generating a 31 word/subword sequence

--> ERROR: number of reconstructed words 0 does not match real number of words 31; indicates problem in L.fst or word_boundary.int. phoneseq = , wordseq = منگوا اپنا بیوہ فوراً روزگار لائحہ فیسیں تنظیم دھڑے دکھا انسان چوتھی عزت ڈھارس کمرے گو غیظ کرتی شاہین پیسنے کھیلنی موئسچرائزر نویں پاسداری دماغی نوجوانوں نیو جچے گمبھیر لہجہ پنجگانہ

--> generating a 28 word/subword sequence

Checking data/lang/oov.{txt, int} ...

--> text seems to be UTF-8 or ASCII, checking whitespaces

--> text contains only allowed whitespaces

--> 1 entry/entries in data/lang/oov.txt

--> data/lang/oov.int corresponds to data/lang/oov.txt

--> data/lang/oov.{txt, int} are OK

--> ERROR: data/lang/L.fst is not olabel sorted

--> ERROR: data/lang/L_disambig.fst is not olabel sorted

--> ERROR (see error messages above)

prepare_lang.sh: error validating output

Now, I don't know how to resolve this issue. Every little help is appreciated.

Regards,

GMD

GMD Baloch

unread,

Apr 22, 2021, 1:13:58 AM4/22/21

to kaldi-help

Here is the link of the tutorial I am following

https://www.eleanorchodroff.com/tutorial/kaldi/training-acoustic-models.html

Sorry for the inconvenience

Daniel Povey

unread,

Apr 22, 2021, 5:37:39 AM4/22/21

to kaldi-help

Kaldi scripts generally won't work if there are spaces in the directory where you are running it.

--
Go to http://kaldi-asr.org/forums.html to find out how to join the kaldi-help group
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/f02be9ae-2052-4596-8d63-b6767a728636n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages