class-based lm in cdec

23 views
Skip to first unread message

Junhui Li

unread,
Feb 14, 2014, 4:17:22 PM2/14/14
to cdec-...@googlegroups.com
Hi all,

I saw the paper (Translating into Morphologically Rich Languages with Synthetic Phrases http://aclweb.org/anthology//D/D13/D13-1174.pdf) used a class-based n-gram language model in cdec. Does this mean the latest cdec supports class-based lm? If so, could anyone kindly explain how to use it?

Thanks,

Junhui

Chris Dyer

unread,
Feb 14, 2014, 11:27:41 PM2/14/14
to <cdec-users@googlegroups.com>
You can enable the class-based LM by adding the following line to your
cdec.ini file:

feature_function=KLanguageModel -n ClassLM -m /path/to/classmap.gz
/path/to/classLM.klm

The class map has the following form:

11111111111000 front -0.972053782688111
11111111111000 terms -1.2822293653211
11111111111000 favor -2.59594683726262
11111111111000 favour -3.03623129459572
11111111111000 excess -3.16138051519471

The first column is the class id, the second column is the word, the
3rd column is the emission probability (you can set this to 0 if it's
a pain to compute, it typically doesn't help much).

The classLM.klm file is a corpus with all tokens replaced with their
class id's (or <unk> for OOVs).

Hope this helps,
Chris
> --
> You received this message because you are subscribed to the Google Groups
> "cdec users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdec-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
Reply all
Reply to author
Forward
0 new messages