HUGE MEMORY usage of LatticeWordSegmentation

24 views

Skip to first unread message

Xiaobin

unread,

Dec 19, 2017, 5:09:27 AM12/19/17

to latticewordsegmentation

Hi:

I've found your excellent work LatticeWordSegmentation on git(https://github.com/fgnt/LatticeWordSegmentation), but when I tried to run it on sighan 2005 msr training data.(80,000+ sentennce) using script StarSim_text_fixed.bash, I got out-of-memory. (My machine has 32G memory.) How much memory the program needs totally? And is there any suggestion for reduce the usage of memory?

Thank you very much

best regards

Bin

Thomas Glarner

unread,

Dec 22, 2017, 10:36:27 AM12/22/17

to latticeword...@googlegroups.com

Hello Bin,

thank you for your kind critique.
Unfortunately, the program can indeed be very memory-hungry.
There is no fixed upper bound regarding the total possible memory usage.

The reasons are
1) The necessity to store multiple WFSTs for every utterance (this
scales with the number of utterances M times the number of links per
transducer L_m, where L_m depends on the length of the utterance and
the acoustic uncertainty, and is thus roughly O(M·L_{max})
2) the unboundedness of the (nested) HPY Language models. (The language
model upper bound on memory usage scales exponentially with the
number of found words and the n-gram order)
3) There is no hard-disk caching involved because the system would slow
down massively due to excessive file I/O.

Two ways to compensate are
1) Using lower n-gram orders through the flags
-KnownN, -UnkN and -AddCharN
2) Making use of the pruning option which reduces the number of links
in the WFST input lattices. The relevant flag is
-PruneFactor: Prune paths in the input that have a
PruneFactor times higher score
than the lowest scoring path
(-PruneFactor X (inf))

The best choice of parameters will depend on the combination of database
and acoustic decoder. Therefore, you might need to try out a variety
of combinations. Start with simple n-gram orders (especially with
respect to the word LM) and strong pruning through a low pruning
factor and gradually increase these parameters.

Best regards
Thomas

Reply all

Reply to author

Forward

0 new messages