Thanks for adding additional documentation to the git repo.
I tried the "Example for using your own language model with existing online-nnet2 models" section of the new documentation, which provides instructions to build a new language model with the same vocabulary. However, during the conversion from ARPA to WFST, I got the following error:
Could you give some hints on how to fix this error please? My train.txt file contains only sentences, one per line, without punctuation or any other special characters like <s> and </s>.
--> generating a 28 word sequence
--> resulting phone sequence from L.fst corresponds to the word sequence
--> L.fst is OK
--> generating a 19 word sequence
--> resulting phone sequence from L_disambig.fst corresponds to the word sequence
--> L_disambig.fst is OK
Checking data/lang_own/oov.{txt, int} ...
--> 1 entry/entries in data/lang_own/oov.txt
--> data/lang_own/
oov.int corresponds to data/lang_own/oov.txt
--> data/lang_own/oov.{txt, int} are OK
--> data/lang_own/L.fst is olabel sorted
--> data/lang_own/L_disambig.fst is olabel sorted
ERROR: FstHeader::Read: Bad FST header: data/lang_own/G.fst
--> ERROR: data/lang_own/G.fst is not ilabel sorted
awk: cmd. line:1: BEGIN{while((getline<disambig)>0) is_disambig[]=1; is_disambig[0] = 1; while((getline<words)>0){ if($1=="<s>"||$1=="</s>") is_forbidden[$2]=1;}} {if(NF<3 || is_disambig[$3]) print; else if(is_forbidden[$3] || is_forbidden[$4]) { print "Error: line " $0 " in G.fst contains forbidden symbol <s> or </s>" | "cat 1>&2"; exit(1); }}
awk: cmd. line:1: ^ syntax error
awk: cmd. line:1: error: invalid subscript expression
ERROR: FstHeader::Read: Bad FST header: data/lang_own/G.fst
--> ERROR: failure running command to check for disambig-sym loops [possibly G.fst contained the forbidden symbols <s> or </s>, or possibly some other error.. Output was:
fst type vector
arc type standard
input symbol table none
output symbol table none
# of states 0
# of arcs 0
initial state -1
# of final states 0
# of input/output epsilons 0
# of input epsilons 0
# of output epsilons 0
# of accessible states 0
# of coaccessible states 0
# of connected states 0
# of connected components 0
# of strongly conn components 0
input matcher y
output matcher y
input lookahead n
output lookahead n
expanded y
mutable y
error n
acceptor y
input deterministic y
output deterministic y
input/output epsilons n
input epsilons n
output epsilons n
input label sorted y
output label sorted y
weighted n
cyclic n
cyclic at initial state n
top sorted y
accessible y
coaccessible y
string y
--> G.fst did not contain cycles with only disambig symbols or epsilon on the input, and did not contain
the forbidden symbols <s> or </s> (if present in vocab) on the input or output.
--> ERROR (see error messages above)