Congratulations to the winners!
To my surprise, our rather simple approach turned out to be
pretty good. Also, by evaluating the development corpus, I have discovered
several ways how to improve our existing system. So thanks also to the
competition organizers!
-Jan Kasprzak
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
>> If we wanted to trade simplicity and kewl design for usability I think <<
>> we all know the URL of the Apple Store. --jmorris42 @LWN <<
To my surprise, our rather simple approach turned out to be
pretty good. Also, by evaluating the development corpus, I have discovered
several ways how to improve our existing system. So thanks also to the
competition organizers!
Martin and other guys who organized this, thank you, that was very
interesting challenge!
Martin, do I understand it right, that we still can submit a paper,
even though we took only 6th place?
I would like to say my word of gratitude to the people who worked hard
for the Plagiarism
Detection Competition, and Martin in particular!
You have not only developed a profound model of Plagiarism detection
and
effectivenes estimation, but you've managed to overcome all the
difficulties that
arose on your way. The developed framework (both corpuses) and the
resulting
system have no maches (I've been working on Plagiarism detection quite
a long time,
but to the best of my knowlege this is the first time I've so
impressed by the results)
It's so good to see that scientific research can solve really
interesting and hot problems today!
Being one of the developers I feel an enourmous impulse to continue
polishing
the detection algorithms and make our results better during the next
competition.
I've seen a number of commersial solutions participating in the
competition -
and I think that it is great, when a really objective comparison is
done.
I hope this will become a good scientific annual occasion!
I would definitely like to have a corpus without the large number
of "accidental" similarities, not generated by the machine plagiarist.
Also, it could be interesting to unify the source and suspicious
documents into one base, with "find all the similarities amongst them"
as a competition task.