Dear Yurii,
the size of the test corpus is in the order of 1000 document pairs,
similar to the size of each of the training datasets.
However the test corpus contains a mix of all kinds of obfuscation
methods as well as a small number of real plagiarism cases.
Hope this is helpful,
regards,
Tim.
On 05/24/2012 11:05 AM, J.A. Palkovskii Plagiarism-Detector Project