fabio celli <
fabio...@live.it> ha scritto:
> Hi Scott
>
> coming to your questions:
> · Can you say anything about from where the data came?
>
>
> we collected the data from Twitter by means of advertising campaign
>
>
>
>
> o In particular, I?m interested to know how the labels were
> obtained. Are the personality scores gold standard manually
> collected from questionnaire (self-assessed or judged), or ?silver
> standard,? derived from automatic labelling.
>
>
> these labels are gold standard self-assessed with the short big5
> test (BFI-10), normalized between -0.5 and +0.5.
>
>
>
>
> · The website mentions providing personality in terms of ?a)
> scores (between -0.5 and 0.5) and; b) binary classes (y/n that
> correspond to >0 and <=0).? However, there are no binary classes
> provide in any of the truth.txt files.
>
>
> yes, we wanted to release both scores and binary classes, but in the
> end we preferred to distribute only scores, since there is some
> inbalance in classes.
>
>
>
>
> o Related, binary distinction by 0 will result in unbalanced
> datasets (for example English Openness has only 4 users <=0). Is
> this the intention?
>
>
> well no, we suggest to use scores for training and testing rather
> than classes for this reason.
>
>
> · We have only explored the English dataset so far, but in
> truth.txt, we found user213 to be listed twice, with different
> characteristics.
> o user213:::M:::35-49:::0.1:::0.1:::0.1:::0.2:::0.2
> o user213:::M:::18-24:::0.1:::0.4:::0.0:::0.1:::0.2
>
>
> this is surely an error in the anonymization. We will fix this ASAP
>
>
>
>
> · There are no labels in the truth.txt files. Though most
> traits are self-explanatory, can we assume that the personality
> values are in the order as listed in the output example? Ie. E, N,
> A, C, O?
>
>
> yes, I confirm that labels are in the order ENACO. the polarity of
> trait N is: >0 = stable; < 0= neurotic
>
>
> o Though gender and age are in reverse order.
> · In the English data, there is one user (72) still labelled
> 50-64 instead of 50-XX
>
>
> We will fix this ASAP.
>
>
> thanks for the feedback!
>
>
>
>
> regards
>
>
> ============================
> Fabio Celli - post doc,
> University of Trento.
> Dept. of computer science (DISI): room 133
> Via Sommarive 5, Povo (TN)
>
http://clic.cimec.unitn.it/fabio/
> ============================
>
>
> From:
francisc...@autoritas.es
> Date: Tue, 17 Feb 2015 14:10:53 +0100
> Subject: Re: RE: PAN 2015: Author Profiling Task (Update)
> To:
martin....@uni-weimar.de
> CC:
p...@webis.de
>
> I'll try to fix all the issues with the data, but wrt. persanility
> questions maybe Fabio could explain them better.
>
>
> 2015-02-17 14:05 GMT+01:00 Martin Potthast <
martin....@uni-weimar.de>:
> This one is for Francisco.
> Martin
> ---------- Forwarded message ----------
> From: "NOWSON, Scott" <
scott....@xrce.xerox.com>
> Date: Feb 17, 2015 12:08 PM
> Subject: RE: PAN 2015: Author Profiling Task (Update)
> To: "Martin Potthast" <
martin....@uni-weimar.de>
> Cc: "PEREZ, Julien" <
julien...@xrce.xerox.com>
>
>
>
>
>
>
>
>
>
> Hi Martin,
>
>
>
> Thanks for sending out the details of the data. I have a number of
> questions, if I may. Some about the data generally, and some on
> what seem to be issues.
>
>
>
>
> ·
> Can you say anything about from where the data came?
>
>
> o
> In particular, I?m interested to know how the labels were obtained.
> Are the personality scores gold standard manually collected from
> questionnaire (self-assessed or judged), or ?silver standard,?
> derived from automatic labelling.
>
> ·
> The website mentions providing personality in terms of ?a) scores
> (between -0.5 and 0.5) and; b) binary classes (y/n that correspond
> to >0 and <=0).? However, there are no binary classes provide in any
> of the truth.txt files.
>
>
> o
> Related, binary distinction by 0 will result in unbalanced datasets
> (for example English Openness has only 4 users <=0). Is this the
> intention?
>
> ·
> We have only explored the English dataset so far, but in truth.txt,
> we found user213 to be listed twice, with different characteristics.
>
> o
> user213:::M:::35-49:::0.1:::0.1:::0.1:::0.2:::0.2
>
> o
> user213:::M:::18-24:::0.1:::0.4:::0.0:::0.1:::0.2
>
> ·
> There are no labels in the truth.txt files. Though most traits are
> self-explanatory, can we assume that the personality values are in
> the order as listed in the output example? Ie. E, N, A, C, O?
>
> o
> Though gender and age are in reverse order.
>
> ·
> In the English data, there is one user (72) still labelled 50-64
> instead of 50-XX
>
>
> I hope this is helpful feedback. I look forward to hearing from you.
>
>
>
> Cheers,
>
> Scott
>
>
>
> --
>
> Scott Nowson, Ph.D.
>
> Global Research Lead ? Customer Modelling
>
> Xerox Research Centre Europe
>
>
scott....@xrce.xerox.com
>
>
>
>
>
>
>
> -----Original Message-----
>
> From:
martin....@gmail.com [mailto:
martin....@gmail.com]
> On Behalf Of Martin Potthast
>
> Sent: 13 February 2015 11:39
>
> To: pan-workshop-series
>
> Subject: PAN 2015: Author Profiling Task (Update)
> --
>
> Francisco M. Rangel Pardo
> CTO Autoritas Consulting S.A.
http://www.autoritas.esTwitter: @kicorangel
> tlf.
+34 656 493 023
>