Detailed Document Comparison Test Corpus Size

12 views
Skip to first unread message

J.A. Palkovskii Plagiarism-Detector Project Leading Programmer

unread,
May 24, 2012, 5:05:12 AM5/24/12
to PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social Software Misuse.
Dear Martin,

Could you please comment on the Test Corpus size (Detailed Document
Comparison subtask) -
is it the same as the Training one or different? Difference in
approximate percentage will be really appreciated!

Thank you in advance!
--
Best Regards Yurii

tim gollub

unread,
May 24, 2012, 5:19:44 AM5/24/12
to pan-works...@googlegroups.com
Dear Yurii,
the size of the test corpus is in the order of 1000 document pairs,
similar to the size of each of the training datasets.
However the test corpus contains a mix of all kinds of obfuscation
methods as well as a small number of real plagiarism cases.
Hope this is helpful,
regards,
Tim.


On 05/24/2012 11:05 AM, J.A. Palkovskii Plagiarism-Detector Project
Reply all
Reply to author
Forward
0 new messages