What's the relation between the repo nhpylm and the repo LatticeWordSegmentation?

22 views
Skip to first unread message

Xiang Ji

unread,
Mar 22, 2018, 6:56:14 PM3/22/18
to LatticeWordSegmentation
Hi, I'm searching for implementations related to the Teh 2006 paper and Mochihashi 2009 paper, and found the repo nhpylm, which seems really promising. However I'm a kind of lost as to the relation between this repo and the repo at https://github.com/fgnt/LatticeWordSegmentation. I see that they both seem to have the same core C++ implementations of the nested hierarchical Pitman-Yor language model. So is it the case that nhpylm emphasizes Python binding while in that repo everything is done in C++? Are there any other differences regarding the intended use cases and functionalities? I want to edit and apply the model on word segmentation, as described in Mochihashi 2009. Thanks.

Oliver Walter

unread,
Mar 24, 2018, 4:21:10 AM3/24/18
to Xiang Ji, LatticeWordSegmentation

Hi Ji,

Both, the python version and the C++ use the same implementation of the NHPYLM.

The python version only provides bindings to train the language model on a segmented training corpus and apply the language to some test sentence (either characters or words or a mixture of both). It also contains some demo code.

The C++ code serves an extended purpose. It implements the unsupervised segmentation of (partially) unsegmented data according to Mochihashi and our extensions (see papers by Heymann and Walter).

You could use the Python bindings as well, but you would have to reimplement the iterations and parsing of results.


   Oliver

On Thu, Mar 22, 2018, 23:56 Xiang Ji <jimm...@gmail.com> wrote:
Hi, I'm searching for implementations related to the Teh 2006 paper and Mochihashi 2009 paper, and found the repo nhpylm, which seems really promising. However I'm a kind of lost as to the relation between this repo and the repo at https://github.com/fgnt/LatticeWordSegmentation. I see that they both seem to have the same core C++ implementations of the nested hierarchical Pitman-Yor language model. So is it the case that nhpylm emphasizes Python binding while in that repo everything is done in C++? Are there any other differences regarding the intended use cases and functionalities? I want to edit and apply the model on word segmentation, as described in Mochihashi 2009. Thanks.

--
You received this message because you are subscribed to the Google Groups "LatticeWordSegmentation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to latticewordsegmen...@googlegroups.com.
To post to this group, send email to latticeword...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/latticewordsegmentation/f249dd30-3e8e-40c2-9787-27f2c12169f9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages