Dear participant,
we have something new planned for this year's text alignment task.
Instead of inviting implementations of new text alignment approaches,
we invite you to submit a text alignment corpus of your own design.
Since 2009, we have constructed and released a new text alignment
evaluation corpus almost every year. We have invented new ways to
generate text reuse and plagiarism, new ways to automatically
obfuscate it in various ways, and all of that at large scales.
Now, we believe, it is time to further diversify the corpus creation
efforts. Many of you have their own ideas of what instances of text
reuse and plagiarism an evaluation corpus should consist of. For
example, text reuse can be found in many different genres of writing;
and we still have explored only few languages, let alone
cross-language text reuse. This is the opportunity to execute on your
ideas and let the whole community around this task benefit from your
efforts.
Thanks to TIRA and our software submission initiative, we are now in a
position to create an exciting challenge around corpus construction:
every corpus that will be submitted this year will be fed into the
text alignment prototypes that have been submitted in previous years.
This way, we can, to some extent, assess the validity of a corpus as
well as its difficulty. To further assess corpus validity, each
submitted corpus will be made available to all other participants so
they can analyze the instances of text reuse and plagiarism in a
peer-review manner in order to answer the question: how realistic are
the problem instances?
To the best of our knowledge, this is the first time that corpus
creation has been done in this way in a shared task, and we hope you
will pick up the challenge and contribute in order to create a
community-driven text reuse and plagiarism corpus for PAN 2015 and
beyond.
Please take a quick look at the task web page to learn details:
http://www.uni-weimar.de/medien/webis/research/events/pan-15/pan15-web/plagiarism-detection.html
If you have any questions, please don't hesitate to ask.
Best,
Martin
--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de ---
www.netspeak.org