Hi Philipp and other WMT09 participants,
Great to see another shared task coming up.
However, I wonder if promoting the use of non-free corpora in the 2010
shared task is not counter-productive? (See WMT10 web page)
The 2nd edition French Gigaword corpus, for instance, costs $4000.00
to sites that are not LDC members in 2009.
When we initially discussed the idea of a shared task years ago, two
of the main ideas were:
(1) lower the barrier to participation in the shared task to sites
that might not have an LDC membership
(2) try to make the experiments repeatable later by anyone at any time
The latter is not true for the NIST MT evaluation, for instance, which
has the effect of reducing participation (since sites which do not
already have access to the data have to wait until the evaluation
license is available, not leaving much time to build a competitive
system), and making it difficult to repeat experiments later (since
the evaluation license is not available at times when there is no NIST
task).
Up until now the "free corpora only" policy for the WMT shared tasks
has been quite successful, and in my opinion it should be continued.
Cheers, Alex
On Fri, Dec 4, 2009 at 9:00 AM, Philipp Koehn <
pko...@inf.ed.ac.uk> wrote:
> ACL 2010 FIFTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION
> Shared Task: Machine Translation for European Languages
> July 15-16, in conjunction with ACL 2010 in Uppsala, Sweden
>
>
http://www.statmt.org/wmt10/translation-task.html
>
> As part of the ACL WMT 2010 workshop, as in previous years, we organize
> a shared task on machine translation between European language pairs.
>
> Translation quality will be evaluated on an unseen test set of news stories.
> We provide a parallel corpus as training data, a baseline system, and
> additional
> resources for download. Participants may augment the baseline system or use
> their own system.
>
> The goals of the shared translation task are:
> * To investigate the applicability of current MT techniques when
> translating
> into languages other than English
> * To examine special challenges in translating between European languages,
> including word order differences and morphology
> * To create publicly available corpora for machine translation and machine
> translation evaluation
> * To generate up-to-date performance numbers for European languages in
> order to provide a basis of comparison in future research
> * To offer newcomers a smooth start with hands-on experience in
> state-of-the-art
> statistical machine translation methods
>
> We hope that both beginners and established research groups will participate
> in this task.
>
> You may participate in any or all of the following language pairs:
> * French-English
> * Spanish-English
> * German-English
> * Czech-English
>
> For all language pairs we will test translation in both directions. To have a
> common framework that allows for comparable results, and also to lower
> the barrier to entry, we provide a common training set and baseline system.
>
> Dates
> * December 4: Training data released
> * February 15: Test data released (available on this web site)
> * February 19: Results submissions
> * March 26: Short paper submissions (4 pages)
>
> Organizers
> * Chris Callison-Burch (Johns Hopkins University)
> * Philipp Koehn (University of Edinburgh)
> * Christof Monz (University of Amsterdam)
> * Kay Peterson (NIST)
>
> --
>
> You received this message because you are subscribed to the Google Groups "Fourth Workshop on Statistical Machine Translation (WMT09)" group.
> To post to this group, send email to
WM...@googlegroups.com.
> To unsubscribe from this group, send email to
WMT09+un...@googlegroups.com.
> For more options, visit this group at
http://groups.google.com/group/WMT09?hl=en.
>
>
>