Word-to-word alignment information in CDEC decoder

Sandipan Dandapat

unread,

Sep 5, 2014, 4:24:07 AM9/5/14

to cdec-...@googlegroups.com

Moses allows us to retrieve word-to-word alignment during decoding using one of the following option

-use-alignment-info -- to activate this feature (required for binarized ttables, see "Binary Phrase Table").
-print-alignment-info -- to output the word-to-word alignments into the verbose output.
-print-alignment-info-in-n-best -- to output the word-to-word alignments into the n-best lists.
-alignment-output-file outfilename -- to output word-to-word alignments into a separate file in a compact format (one line per sentence).

Is something similar available in CDEC to know the alignment?

Felix H

unread,

Sep 5, 2014, 4:40:17 AM9/5/14

to cdec-...@googlegroups.com

Hi Sandipan,

As far as I know there is nothing that is as convenient as in Moses, but assuming you are using SCFGs, the word alignment information is always given at the rules you extracted for your input, i.e.

[X] ||| src ||| target ||| feats ||| alignments

If you need the alignments for the full sentence you may run cdec with --extract_rules <fname> it dumps you the used rules for the Viterbi output.

In combination with --show_derivations <fname> you can probably recover the alignment for the full sentence.

Or you write something like decoder/viterbi.h::ViterbiPathTraversal to collect alignments from each rule at the viterbi edges along with the rule spans to create the final alignment.

Hope this helps,

Felix

Sandipan Dandapat

unread,

Sep 5, 2014, 6:16:09 AM9/5/14

to cdec-...@googlegroups.com

Thanks Felix,
I can understand now. However, when using realtime.py script, I am unable to figure out where to specify the same (excuse my python skill)?

Additional follow-up question,
If I want to discard the MIRA-based retraining what weight do I need to adjust in the weights.init file. In sum, I dont want to tune my system purposefully, then what weights should I define ( Is keeping them all zeros will be a problem)?
regards,
sandipan

Sandipan Dandapat

unread,

Sep 5, 2014, 8:41:30 AM9/5/14

to cdec-...@googlegroups.com

I specified --extract_rules <fname> and --show_derivations <fname> in the cdec.ini file but the same does not work. Muy cdec.ini looks like
formalism=scfg
cubepruning_pop_limit=200
feature_function=WordPenalty
feature_function=ArityPenalty
add_pass_through_rules=true
extract_rules=/home/sdandapat/file1
show_derivations=/home/sdandapat/file2

It creates a directory <file1> and there exist an empty 0.gz file.
Please let me know what I am doing wrong?
Thanks and regards,
sandipan

Reply all

Reply to author

Forward