I have to apologize for delaying the release of the annotated test
collections so long, but now they're finally available. Please visit
the task Web pages to find the respective downloads.
@Task 1:
We have simply recompiled the whole test collection, this time
including the annotations in the XML files. You will find a couple of
new attributes to the feature tags, detailing the variables you read
about in our COLING paper. Please note that this is NOT the final
release of the PAN-PC-10. We will compile this release some time after
the workshop.
Along the annotation we also release a new version of the perfmeasures
script. Please use this script instead of the old one. There was an
error in the computation of the micro-averaged performance values,
sometimes resulting in precision and recall values above 1.
Fortunately, the implementation of the macro-averaged performance
values was not affected by this error... but you'll imagine, this gave
me the creeps. ;-)
@Task 2:
The annotations for the test collection are those by which your
algorithms were judged. We release these annotations in the format of
the result submissions for your convenience. Please note that this is
NOT the final release of the PAN-WVC-10. Again, this corpus will be
provided later on.
Best regards,
Martin Potthast
--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de --- www.netspeak.cc
@Task 2:
The annotations for the test collection are those by which your
algorithms were judged. We release these annotations in the format of
the result submissions for your convenience. Please note that this is
NOT the final release of the PAN-WVC-10. Again, this corpus will be
provided later on.
Best regards,
Martin Potthast
--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de --- www.netspeak.cc
--
You received this message because you are subscribed to the Google Group "PAN".
Visit this group at http://groups.google.com/group/pan-workshop-series
To unsubscribe send email to pan-workshop-se...@googlegroups.com.
> Could you possibly release the ROC performance measures script that you have
> been using for the second task? Or was it Tom Fawcett's ROC_algs.tar.gz
> (http://home.comcast.net/~tom.fawcett/public_html/ROCCH/index.html)
> mentioned in the referenced paper?
No problem: it was the implementation used at KDDCUP 2004.
http://kodiak.cs.cornell.edu/kddcup/software.html
Best,
Martin
>Along the annotation we also release a new version of the perfmeasures
>script. Please use this script instead of the old one.
Is it the one currently in http://pan.webis.de/ ? or there's another
link for download?
Thanks,
Salha
> Is it the one currently in http://pan.webis.de/ ? or there's another
> link for download?
Look here at the bottom:
http://www.uni-weimar.de/medien/webis/research/workshopseries/pan-10/task1-plagiarism-detection.html#measures
Or use the direct link:
http://www.uni-weimar.de/medien/webis/research/corpora/pan-pc-09/perfmeasures.py
Martin
> I guess the download link consists of the the test corpus with updated
> annotations (xmls). I request if we can get only the annotations (only
> xmls). Because We are having very low bandwidth to download it for a
> week or so and it will take around 24 hrs, also there are only 2 days
> left for the submission so can you provide me the only xmls zipped? So
> that we can accommodate some error analysis in the report. If not on
> the web-page then if you can send me some external link to the zipped
> file, We would be highly obliged if we get it. I hope you can
> understand the concern.
Yes, good idea! In fact, I have changed the downloads so that only the
annotations can be download in addition to the test collection.
So, visit the Web pages and you'll find everything as requested (mind
your browser cache, though).
Martin
Martin, would it be possible to release the exact version
of evaluation script (perfmeasures.py) which have been used for
computing the official results? I am getting slightly different
output both with the current perfmeasures.py
(MD5 202d4a05c71d40dddca07db47082fe9b, downloaded ~12 hours ago),
the original one I have downloaded some time before the deadline
(MD5 822ad6c12c57f18c217f4e9d3202b4b7) and even my own implementation
in Perl.
If anyone is interested, my detections can be downloaded
from http://www.fi.muni.cz/~kas/tmp/plagiarism-kasprzak-2010-06-23-1418.zip
(MD5 77c3b8bb6654b2b7bda8427483884fa6; the file will expire in ~10 days).
Thanks,
-Jan Kasprzak
--
| Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
see my earlier answer to Parth's request.
Martin
> --
> You received this message because you are subscribed to the Google Group "PAN".
> Visit this group at http://groups.google.com/group/pan-workshop-series
> To unsubscribe send email to pan-workshop-se...@googlegroups.com.
>
--
> I faced an error with the rar file while extracting. Still it has
> extracted some files. So are there only 9188 suspicious files in /
> suspicious_document/ directory or more? in Source/ directory i have
> got 11148 docs so it must be fine. But please verify no. of suspicious
> docs.
I have just now inflated the archive which is on the Web pages, and it
worked fine. Did you verify the MD5 sum?
There are exactly as many XML files in this RAR as dummy XML files in
the test collection: 15925.
Best,
Martin
we got the same error. Some double-checking revealed that the
performance measure script is fine, but that some error must have
occurred while generating the fully annotated test collection. We are
investigating this issue and get back to you as soon as possible.
Martin
> --
> You received this message because you are subscribed to the Google Group "PAN".
> Visit this group at http://groups.google.com/group/pan-workshop-series
> To unsubscribe send email to pan-workshop-se...@googlegroups.com.
>
--