PAN 17: Author Profiling Training Dataset Available

383 views
Skip to first unread message

Martin Potthast

unread,
Feb 22, 2017, 4:29:04 PM2/22/17
to pan-workshop-series
Hi everyone

The training dataset for author profiling has been uploaded: http://www.uni-weimar.de/medien/webis/corpora/corpus-pan-labs-09-today/pan-17/pan17-data/pan17-author-clustering-training-dataset-2017-02-15.zip

This year author profiling will address gender and language variety. The Twitter corpus has been annotated with authors' gender and their specific variation of their native language: English (Australia, Canada, Great Britain, Ireland, New Zealand, United States), Spanish (Argentina, Chile, Colombia, Mexico, Peru, Spain, Venezuela), Portuguese (Brazil, Portugal), and
Arabic (Egypt, Gulf, Levantine, Maghrebi).

Please download it and start training your system.

Best,
Martin

--
Dr. Martin Potthast
Bauhaus-Universität Weimar
Digital Bauhaus Lab
Bauhausstr. 9a
99423 Weimar
Germany

+49 3643 58 3567
+49 171 809 1945

www.potthast.net

Roddy Fuentes Alba

unread,
Feb 22, 2017, 4:58:23 PM2/22/17
to PAN Workshop Series on Digital Text Forensics
Hi Martin,

I would like to know for how long will the registration form will be opened.

Thanks!!!

smis...@illinois.edu

unread,
Feb 22, 2017, 4:58:24 PM2/22/17
to PAN Workshop Series on Digital Text Forensics

Hi Martin,

 

Thanks for sharing the details. However, the link you provided is for the author clustering task. The correct link on the task page is: https://www.uni-weimar.de/medien/webis/corpora/corpus-pan-labs-09-today/pan-17/pan17-data/pan17-author-profiling-training-dataset-2017-02-22-password-protected.zip

 

However, this zip file on this link requires a password to extract the files. Can you please provide the password to the participants.

 

Regards,

Shubhanshu

Martin Potthast

unread,
Feb 22, 2017, 6:23:29 PM2/22/17
to pan-workshop-series
The registration will be open basically forever. Once the deadline for this year has passed, you may already register for next year. :-)

Martin

--
--
You received this message because you are subscribed to the Google Group "PAN".
Visit this group at http://groups.google.com/group/pan-workshop-series
To unsubscribe send email to pan-workshop-series+unsub...@googlegroups.com.
---
You received this message because you are subscribed to the Google Groups "PAN Workshop Series on Digital Text Forensics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-series+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nils Schaetti

unread,
Mar 8, 2017, 5:13:10 PM3/8/17
to PAN Workshop Series on Digital Text Forensics
Hi Martin,

The link you provided point to the author clustering dataset. On the author profiling task page, the link point to a password protected zip file. Can you please provide the password to the participants?

Best regards,

Nils

Roddy Fuentes Alba

unread,
Mar 8, 2017, 8:06:31 PM3/8/17
to PAN Workshop Series on Digital Text Forensics
Hi Martin,

I noted that there aren't any documents annotated with the language variety "United States", in the training corpus. Nevertheless, the task description says that we might be presented with a document with the mentioned language variety in the test corpus.
Can you please let me know if this is some kind of mistake?

Best regards,
Roddy


El miércoles, 22 de febrero de 2017, 15:29:04 (UTC-6), Martin Potthast escribió:

Francisco Rangel

unread,
Mar 9, 2017, 8:28:16 AM3/9/17
to pan-workshop-series, p...@webis.de
Dear participants,

There has been an error packaging the dataset and we forgot to include the US variety.

We hope to fix the problem ASAP and upload a new version. We will let you know.

Thank you very much for letting us know, Roddy.

Best regards,

--
--
You received this message because you are subscribed to the Google Group "PAN".
Visit this group at http://groups.google.com/group/pan-workshop-series
To unsubscribe send email to pan-workshop-series+unsub...@googlegroups.com.
---
You received this message because you are subscribed to the Google Groups "PAN Workshop Series on Digital Text Forensics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-series+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Francisco M. Rangel Pardo
CTO Autoritas Consulting S.A.
Twitter: @kicorangel

abo majed

unread,
Feb 16, 2018, 2:41:27 AM2/16/18
to PAN Workshop Series on Digital Text Forensics

i have download it, but it is protected with a password, could you please give us the password  
Reply all
Reply to author
Forward
0 new messages