Evaluation Platform

8 views
Skip to first unread message

Gabriel Oberreuter

unread,
Jun 4, 2012, 1:31:21 PM6/4/12
to PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social Software Misuse.
Hi Tim, Martin,

You posted some information regarding the platform you will be running
the algorithms on, can you be more specific?
- What are the hardware characteristics of the machine?
- Do we have access to disk (for example to handle temporal files)?
- I understand that the corpus is generated based on the information
published in the Overview of the 1st International Competition on
Plagiarism Detection. Is there any update to the generation of
artificial plagiarism cases methods that should be made public?

thanks!

Martin Potthast

unread,
Jun 4, 2012, 4:23:09 PM6/4/12
to pan-works...@googlegroups.com
Hi Gabriel,

> You posted some information regarding the platform you will be running
> the algorithms on, can you be more specific?
> - What are the hardware characteristics of the machine?
> - Do we have access to disk (for example to handle temporal files)?

Tim will answer these two.

> - I understand that the corpus is generated based on the information
> published in the Overview of the 1st International Competition on
> Plagiarism Detection. Is there any update to the generation of
> artificial plagiarism cases methods that should be made public?

With regard to that, the artificial test cases will be based on the
latest version of the random plagiarist that was also used last year.

Best,
Martin

--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de  ---  www.netspeak.org

tim gollub

unread,
Jun 5, 2012, 3:21:54 AM6/5/12
to pan-works...@googlegroups.com
Hi Gabriel,
>> You posted some information regarding the platform you will be running
>> the algorithms on, can you be more specific?
>> - What are the hardware characteristics of the machine?
The programs are run on computers with two quad-core Intel(R) Xeon(R)
E5520 CPUs (@ 2.27GHz) and 70GiB RAM. We have six of these machines and
we use TIRA to distribute single program runs across these. Since the
virtual machines eat up some of the RAM, use not more than 60GiB RAM to
avoid swapping. If you want to know additional numbers, just ask.
>> - Do we have access to disk (for example to handle temporal files)?
Yes, your program has sandboxed access to the file system. For each
program run (= each pair of suspicious and source document), TIRA
creates a new run directory and starts the program from there. Your
program can then write to and read from this directory ("./") and
subdirectories ("./<subdir>/), not however access something else
("../"). Your program should write the detection result XML into an
accessible directory.

Best regards,
Tim.
Reply all
Reply to author
Forward
0 new messages