Hello cdec users!
This is just a quick announcement about a non-backward compatible
change that has been committed to cdec. Going forward, cdec parameter
optimization tools (MERT, PRO etc.) will no longer accept devsets
specified with parallel source/reference files. Instead, source
segments and their reference translations must be provided in a single
line format. This format has the source sentence and then any number
of references following it, separated with a triple pipe (|||). Such
files can be constructed from parallel files using the
corpus/
paste-files.pl command included in cdec.
I prefer to avoid non-backward compatible changes, but I'm trying to
simplify code and processing pipelines by getting rid of parallel
files, which will make things simpler to maintain going forward. My
apologies for any inconvenience.
Finally, cdec has a new website at the same location:
http://www.cdec-decoder.org -- check it out!
Best,
Chris