Dear Oliver!
Thanks a lot for updating the script, now I'm trying it in Japanese
character, however I got the following error.
*************
./LatticeWordSegmentation -KnownN 3 -UnkN 1 -InputFilesList
train.scp -NumIter 10 -OutputDirectoryBasename results/
-ReferenceTranscription train.ref -EvalInterval 1 -InitLM
train.word.txt -CalculateWER -UseViterby 1 -BeamWidth 1000
-PdinConcent 0.1-NoThreads 4
-----------------------------------------------------------------------------------------------------------
LatticeWordSegmentation: build date: May 29 2017 - time: 18:54:19
-----------------------------------------------------------------------------------------------------------
WARNING: Reference transcription contains more characters than input
transcription!
<eps> 0
<phi> 1
<sigma> 2
・
・
Starting word segmentation!
Initializing empty language model with KnownN=3, UnkN=1!
Parsing initialization sentences and initializing language model!
Sentence: 1 of 1
Word length: 0, word length probability: 0.0000
Word length: 1, word length probability: 0.0000
Word length: 2, word length probability: 0.3854
Word length: 3, word length probability: 0.4773
Word length: 4, word length probability: 0.0907
Word length: 5, word length probability: 0.0290
Word length: 6, word length probability: 0.0088
Word length: 7, word length probability: 0.0050
Word length: 8, word length probability: 0.0025
Word length: 9, word length probability: 0.0000
Word length: 10, word length probability: 0.0013
Mean word length: 2.8312, number of word: 794
Perplexity: 4.58
CHPYLM statistics:
Contexts: 1
Tables: 540
Characters: 2248
Concentration: 0.14
Discount: 0.66
WHPYLM statistics:
Contexts: 1 490 1573
Tables: 794 1797 2404
Words: 1797 2404 2808
Concentration: 0.08 0.20 0.14
Discount: 0.66 0.77 0.74
Training language model!
<eps> 0
<phi> 1
<sigma> 2
・
・
Params.PDicConcent = 0.10
Error : word あっ not found in language model.
*****************
I think this error occurs due to handling 3bytes characters
(Japanese characters are 3bytes in utf-8).
How can I enable it to handle 3bytes characters ?
Thanks and kind regards,
Yuki.
---------------------------------------------------------
Tokyo Institute of technology
School of Engineering Department of information and Communications
Engineering master 1st
Yuki Ikeshita
mail :
ikeshi...@m.titech.ac.jp
---------------------------------------------------------