PAN Author Identification Task - Survey question...

33 views
Skip to first unread message

Oren Halvani

unread,
Mar 10, 2014, 8:17:32 PM3/10/14
to pan-works...@googlegroups.com
Hi folks,

since we missed the early bird submission i would like to know how good you perform
with your current approach on the PAN 2014 AI train set? Is it difficult for us or is it difficult
for the majority of the participants to achieve good results?

We tested four different (and really complicated) methods and could only reach max. ~60 % accuracy (for the binary Y/N cases). 
In comparison to PAN 2013 this is a very bad result...

Therefore, i really would like to know how good your approaches perform, before we give up...


Best regards,
Oren


Martin Potthast

unread,
Mar 11, 2014, 4:24:02 AM3/11/14
to pan-workshop-series
Hi Oren,

just a side not about difficulty, we made an effort do provide
challenging corpora for author identification this year. Just because
the performance values are numerically smaller than last year doesn't
tell a lot.

Martin
> --
> --
> You received this message because you are subscribed to the Google Group
> "PAN".
> Visit this group at http://groups.google.com/group/pan-workshop-series
> To unsubscribe send email to
> pan-workshop-se...@googlegroups.com.
> ---
> You received this message because you are subscribed to the Google Groups
> "PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social Software
> Misuse." group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pan-workshop-se...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Martin Potthast
Bauhaus-Universität Weimar
www.webis.de --- www.netspeak.org

Efstathios Stamatatos

unread,
Mar 11, 2014, 5:09:45 AM3/11/14
to pan-works...@googlegroups.com
Hi Oren,

Given the diversity of genres and properties of the corpus this year, it should be expected that some sub-corpora will be really challenging while some other sub-corpora will be easier. So, don't get disappointed by average comparisons with previous year results.

Best,
Stathis
> series+un...@googlegroups.com.
> ---
> You received this message because you are subscribed to the Google Groups
> "PAN Workshop Series. Uncovering Plagiarism, Authorship, and Social
> Software Misuse." group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pan-workshop-se...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com

Halvani, Oren

unread,
Mar 11, 2014, 7:49:22 AM3/11/14
to pan-works...@googlegroups.com
Hi Martin & Stathis,

Please do not misunderstand me ;-) I say that i wonder if we're the only group who face low results
or if there are other participants that have similar results for the current PAN 2014 AI corpus...

Its very good that you provide us realistic corpora. However, i'm just curious about the current baseline
(or average performance of the participants) for the given corpus...


Best regards,
Oren

________________________________________
Von: pan-works...@googlegroups.com [pan-works...@googlegroups.com]" im Auftrag von "Efstathios Stamatatos [stama...@aegean.gr]
Gesendet: Dienstag, 11. März 2014 10:09
An: pan-works...@googlegroups.com
Betreff: RE: [PAN'14] PAN Author Identification Task - Survey question...

Martin Potthast

unread,
Mar 11, 2014, 8:23:43 AM3/11/14
to pan-workshop-series
Hi Oren,

no worries, asking is perfectly alright.

But as Stathis and I were saying, a performance value of 0.6 is not low,
as you call it. If it happens to be the best performance, then it is high!
Comparing this year's performances to last year's is the just by their
numerical value does not work.

Martin

Robert Layton

unread,
Mar 12, 2014, 1:29:28 AM3/12/14
to pan-works...@googlegroups.com
As a benchmark, I ran the algorithm on similar (but not exact) code to my submission last year, and got around 60% on cross-validation for the author identification corpus.
It seems to be about the mark that needs to be beaten. I got third last year, but was a fair few percentage points off the winner, who did really well.
Also, as a caveat, I didn't get time to put a early bird in, so this score is purely off the training data.
Dr. Robert Layton
Research Fellow
w: Website   e: r.la...@icsl.com.au   t: @robertlayton
Internet Commerce Security Laboratory
University of Ballarat, Australia

We are looking for sponsors for a Malware Reverse Engineering workshop.
Last year's event attracted many analysts, providing a great opportunity to disseminate findings and technologies.
If you'd like to sponsor this year's event, please send me an email.

Some recent publications:
1) Authorship Attribution of IRC Messages Using Inverse Author Frequency  Link
2) Unsupervised authorship analysis of phishing webpages  PDF
3) Recentred local profiles for authorship attribution  Link

Oren Halvani

unread,
Mar 12, 2014, 7:51:52 AM3/12/14
to pan-works...@googlegroups.com
Hi Robert,

thanks for the info ! Well, looks that we are on the right track...
Lets see if we can improve our approaches in the next weeks...

Good luck buddy ;-)


Best regards,
Oren
 -----------------------------------------------------------------------------------------------------------
Reply all
Reply to author
Forward
0 new messages