ERROR (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:127) line 1 []: \data\ section missing or empty.

952 views
Skip to first unread message

Rohith Ramagani

unread,
Aug 27, 2019, 4:15:26 AM8/27/19
to kaldi-help
i have my own corpus.and i have manipulated entire data according to the kaldi requirement as mentioned in kaldi data preparation 
while dealing with data/lang directory preparation
i dont know where this data/local/lm/foo.kn.gz come from.when i want to run utils/format_lm.sh for converting'data/local/lm/foo.kn.gz' to FST 

this is what iam getting,can anyone tell what's going on.and what is this foo.kn.gz is???
 
Converting 'data/local/lm/foo.kn.gz' to FST
gzip: data/local/lm/foo.kn.gz: No such file or directory
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst 
ERROR (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:127) line 1 []: \data\ section missing or empty.

[ Stack-Trace: ]
/home/rohith/Desktop/ASR/kaldi/src/lib/libkaldi-base.so(kaldi::MessageLogger::LogMessage() const+0xb42) [0x7fd74ed436a2]
arpa2fst(kaldi::MessageLogger::LogAndThrow::operator=(kaldi::MessageLogger const&)+0x21) [0x5589464fb4df]
/home/rohith/Desktop/ASR/kaldi/src/lib/libkaldi-lm.so(kaldi::ArpaFileParser::Read(std::istream&)+0xa8f) [0x7fd74f19b11f]
arpa2fst(main+0xd24) [0x5589464fa15e]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7) [0x7fd74dc62b97]
arpa2fst(_start+0x2a) [0x5589464f935a]

Daniel Povey

unread,
Aug 27, 2019, 3:50:29 PM8/27/19
to kaldi-help
I think foo was just an example filename.  It needs an ARPA-format LM.  I suggest to read the first couple pages of "A bit of progress in language modeling" for an intro to language modeling, and maybe read some documentation or tutorials about srilm.
Many Kaldi scripts have examples with using various toolkits to create the ARPA-format LM

Dan


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kaldi-help/0b924d96-1585-4960-8bd8-f778c5ac54f3%40googlegroups.com.

Rohith Ramagani

unread,
Aug 28, 2019, 1:43:17 PM8/28/19
to kaldi-help
As you said i have created the ARPA format LM but after running the utils/format_lm.sh
i got some warnings as follows,tell me can i ignore them, if not what i have to do to sort them out
---------------------------------------------------------------------------------------------------------------------------------
utils/format_lm.sh data/lang data/local/lm/language.tgz data/local/dict/lexicon.txt data/lang_test
Converting 'data/local/lm/language.tgz' to FST
arpa2fst --disambig-symbol=#0 --read-symbol-table=data/lang_test/words.txt - data/lang_test/G.fst 
LOG (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:94) Reading \data\ section.
LOG (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:149) Reading \1-grams: section.
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 459 [-3.1841 CHECK_UP -0.2848] skipped: word 'CHECK_UP' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 497 [-3.1841 EYES. -0.2999] skipped: word 'EYES.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 533 [-2.8831 HOUSE. -0.2801] skipped: word 'HOUSE.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 548 [-3.1841 IT. -0.2848] skipped: word 'IT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 558 [-3.1841 LEFT. -0.2988] skipped: word 'LEFT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 588 [-3.1841 MUCH.CAN -0.2959] skipped: word 'MUCH.CAN' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 617 [-2.8831 PM -0.2842] skipped: word 'PM' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 638 [-2.8831 ROOM. -0.2845] skipped: word 'ROOM.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 660 [-3.1841 SPACE. -0.2944] skipped: word 'SPACE.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 661 [-3.1841 SPARE.THERE -0.2979] skipped: word 'SPARE.THERE' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 683 [-3.1841 THAT. -0.2988] skipped: word 'THAT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 695 [-3.1841 TINY. -0.2848] skipped: word 'TINY.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 699 [-3.1841 TOGETHER. -0.2965] skipped: word 'TOGETHER.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 707 [-3.1841 VALUABILE -0.3005] skipped: word 'VALUABILE' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 709 [-3.1841 VERY, -0.3007] skipped: word 'VERY,' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 713 [-3.1841 VISIT. -0.2999] skipped: word 'VISIT.' not in symbol table
LOG (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:149) Reading \2-grams: section.
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 740 [-0.3010 7 PM 0.0000] skipped: word 'PM' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 791 [-1.6435 A ROOM. -0.1761] skipped: word 'ROOM.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 853 [-0.6021 BY THAT. 0.0000] skipped: word 'THAT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 866 [-0.3010 CHECK_UP </s> -0.3010] skipped: word 'CHECK_UP' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 913 [-0.3010 EYES. THEY -0.2430] skipped: word 'EYES.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 942 [-0.3010 GINGERBREAD HOUSE. -0.1761] skipped: word 'HOUSE.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 965 [-0.3010 HEALTH CHECK_UP 0.0000] skipped: word 'CHECK_UP' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 984 [-0.6021 HOUSE. </s> -0.3010] skipped: word 'HOUSE.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 985 [-0.6021 HOUSE. IT -0.2109] skipped: word 'HOUSE.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 1028 [-0.3010 IT. </s> -0.3010] skipped: word 'IT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 1039 [-0.3010 LEFT. HE -0.2730] skipped: word 'LEFT.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 1058 [-0.3010 MEAL TOGETHER. 0.0000] skipped: word 'TOGETHER.' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 1072 [-0.3010 MUCH.CAN YOU -0.1919] skipped: word 'MUCH.CAN' not in symbol table
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:219) line 1128 [-0.6021 PM </s> -0.3010] skipped: word 'PM' not in symbol table
LOG (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:149) Reading \3-grams: section.
WARNING (arpa2fst[5.5.424~1-b5385]:Read():arpa-file-parser.cc:259) Of 102 parse warnings, 30 were reported. Run program with --max_warnings=-1 to see all warnings
LOG (arpa2fst[5.5.424~1-b5385]:RemoveRedundantStates():arpa-lm-compiler.cc:359) Reduced num-states from 844 to 821
fstisstochastic data/lang_test/G.fst 
0.803737 -0.405488
Succeeded in formatting LM: 'data/local/lm/language.tgz'

Daniel Povey

unread,
Aug 28, 2019, 1:44:22 PM8/28/19
to kaldi-help
It's because those words were not in your words.txt, but they mostly look like misspellings or instances where the punctuation was not stripped.
You should probably do some text normalization to remove punctuation and maybe split sentences apart.


--
Go to http://kaldi-asr.org/forums.html find out how to join
---
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages