Over the past couple of weeks, I've made a few small changes to cdec
that should make it easier to run MERT and/or PRO training in more
places and with better outcomes. First, there is finally some "getting
started" documentation available:
http://www.cdec-decoder.org/index.php?title=How_to_run_MERT_or_PRO
It does assume you know how to configure cdec with a translation and
language model (hopefully one day there will be documentation on that,
too). There is also better (and more correct) documentation in the
command line tools themselves.
Second, and this is of particular importance for existing users: you
MUST specify --qsub on the command line if you want dist-vest.pl or
dist-pro.pl to submit jobs using qsub. Otherwise, it will run them
locally. If cdec doesn't know how qsub is configured in your
environment, it will give you an error.
Third, if you've been using PRO, I've recently changed the semantics
of the regularization parameters from something like a "variance" to
something like a "penalty", i.e., the higher you make them, the more
you regularize. If you have tuned these values in the past, you will
need to retune them.
If you haven't updated recently, you are strongly encouraged to do so.
Cdec is faster than ever, uses less memory than ever, and has more
scoring functions (feature functions) than ever!
-Chris
I would like to try running MERT with cdec, but the above link seems
to be broken. Is the documentation still available somewhere else?
My thanks to Hieu for the recomiled LM to make the example work.
Further, I am wondering whether cdec provides wrapper scripts for
distributed decoding in an SGE?
Thanks,
Joern
Please see the instructions for running MERT here:
http://www.cdec-decoder.org/index.php?title=How_to_run_MERT_or_PRO
If you have qsub set up, or large, shared-memory machines, it should
be relatively easy to get your environment to run the decoder and
optimizer in parallel. Let me know if you have any questions.
Chris
<seg grammar="/path/to/grammar.sent0.gz" id="0"> sentence 0 input </seg>
This is a little different than how other decoders do it, and it means
you can't do online translation of arbitrary inputs, but it is much
faster and less memory intensive than other options.
-Chris