$KALDI_ROOT/tools/srilm/bin/i686-m64/ngram-count -order 5 -text lang/dict/lexicon.txt -lm lm/amharic.train.lm.data.arpa -unk -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -kndiscount5 -gt1min 1 -gt2min 1 -gt3min 1 -gt4min 1 -gt5min 1
#convert to FST format for Kaldi
cat lm/amharic.train.lm.data.arpa | $KALDI_ROOT/egs/wsj/s5/utils/find_arpa_oovs.pl lang/words.txt > lang/oovs.txt
cat lm/amharic.train.lm.data.arpa | grep -v '<s> <s>' | grep -v '</s> <s>' | grep -v '</s> </s>' | arpa2fst - | fstprint | $KALDI_ROOT/egs/wsj/s5/utils/remove_oovs.pl lang/oovs.txt | $KALDI_ROOT/egs/wsj/s5/utils/eps2disambig.pl | $KALDI_ROOT/egs/wsj/s5/utils/s2eps.pl | fstcompile --isymbols=lang/words.txt --osymbols=lang/words.txt --keep_isymbols=false --keep_osymbols=false | fstrmepsilon > lm/G.fst
#add fst sort arc tools/openfst/bin/arcsort to solve the problem of "ERROR: data/lang/G.fst is not ilabel sorted"
fstarcsort --sort_type=ilabel lm/G.fst lm/newG.fst
mv lm/newG.fst lang/
mv lang/newG.fst lang/G.fst
#utils/validate_lang.pl lang
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
Here is the output of after running the commands you suggested( I added line numbers for clarification) :As you can see there is an space in the 9th number and '<s>' character in the 10th. Could it be because of those lines? Should I edit or remove them?
On Monday, November 21, 2016 at 10:14:24 PM UTC+3, Dan Povey wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.
one of modified KneserNey discounts is negative
error in discount estimator for order 1
/home/melese/toolkit/srilm/bin/i686-m64/ngram-count -order 5 -text lm/amharic.lm.data.segmented -lm lm/amharic.train.lm.data.arpa -unk -kndiscount1 -kndiscount2 -kndiscount3 -kndiscount4 -kndiscount5 -gt1min 1 -gt2min 1 -gt3min 1 -gt4min 1 -gt5min 1
|
However, the file in the -text argument amharic.lm.data.segmented is nonexistent. To make it work I replaced it with the lang/dict/lexicon.txt file. I have opened an issue in the git repository about the missing file. Could that be the problem? |
one of modified KneserNey discounts is negative
error in discount estimator for order 2
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
-wbdiscount -gt1min 1 -gt2min 1 -gt3min 1
- CPU: 1 core (Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz)
- RAM: 3.5GB
- HDD: 50GB
--
You received this message because you are subscribed to the Google Groups "kaldi-help" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+unsubscribe@googlegroups.com.
FstHeader::Read:Bad FST header: standard input
Error while loading shared libraries: libkaldi-fstext.so..
To unsubscribe from this group and stop receiving emails from it, send an email to kaldi-help+...@googlegroups.com.