>> For author identification, each training dataset now contains a file contents.json in which the language of
>> each problem instance in the dataset is revealed. This file will also be present in the test dataset.
>> In your software, please use only this file to learn the language of a problem instance found in the dataset.
>> Do not rely on file names or folder names to do so.
Actually this is a problem for us, since our software relies that a sub-corpus consists only of problems coined from one language,
e.g. "DU" ---> "all problems are dutch". The new specification however can be interpretaed in that way that a sub-corpus can
contain problems from more than one language - this is bad for our software!
Is there any reason why the specification has changed? Why not keeping it in the same fashion like PAN13 & PAN14 ???
best regards,
Oren
________________________________________
Von: pan-works...@googlegroups.com [pan-works...@googlegroups.com]" im Auftrag von "Martin Potthast [martin....@uni-weimar.de]
Gesendet: Dienstag, 3. März 2015 17:16
An: pan-workshop-series
Betreff: [PAN'15] Training datasets updated for author identification and author profiling
Hi everyone,
Best,
Martin
--
--
You received this message because you are subscribed to the Google Group "PAN".
Visit this group at http://groups.google.com/group/pan-workshop-series
To unsubscribe send email to pan-workshop-se...@googlegroups.com.
---
You received this message because you are subscribed to the Google Groups "PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social Software Misuse." group.
To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-se...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
thanks alot for the quick response!
>> no worries: all the sub-corpora are still in the same language.
Great, than anything is fine ;-)
Just one organisational question: Are the VM's going to be deployd this week?
best regards,
Oren
________________________________________
Von: pan-works...@googlegroups.com [pan-works...@googlegroups.com]" im Auftrag von "Martin Potthast [martin....@uni-weimar.de]
Gesendet: Dienstag, 3. März 2015 22:20
An: pan-workshop-series
Betreff: Re: [PAN'15] Training datasets updated for author identification and author profiling
>> Many VMs have already been deployed, and I can deploy one for you right now:
That would be great!
>> Have you answered to the welcome mail asking what OS you prefer?
Well i can't remember that i've received any welcome-mail...However, i believe its
the same option as last year, right? If so, i'll chose Windows 7 ;-)
Best regards,
Oren
________________________________________
Von: pan-works...@googlegroups.com [pan-works...@googlegroups.com]" im Auftrag von "Martin Potthast [martin....@uni-weimar.de]
Gesendet: Dienstag, 3. März 2015 22:48
An: pan-workshop-series
Betreff: Re: [PAN'15] Training datasets updated for author identification and author profiling
Hi Oren,
Best,
Martin
--