[Bots and Gender Profiling] Human-bias in the bot-documents ?

Oren Halvani

unread,

Feb 20, 2019, 11:40:25 AM2/20/19

to PAN Workshop Series on Digital Text Forensics

Dear Author Profiling organizers,

I would like to highlight an observation I've made regarding the provided training corpus.

There are several XML files, where the contained documents represent citations from famous humans.

Here are a number of such examples, taken from the XML file "2855276210aea6dad744fdcbca0e633e.xml"

According to the "train-truth.txt", this specific XML file represents an example for a bot:

2855276210aea6dad744fdcbca0e633e:::bot:::bot

However, the citations are from humans.

Therefore, my question: Isn't the given corpus somehow biased?

Best regards

Oren Halvani

Francisco Rangel

unread,

Feb 21, 2019, 9:01:12 AM2/21/19

to pan-workshop-series

Hi Oren, how are you?

You're right, this account is automatically tweeting famous humans citations. By definition, a software that automatically republishes contents is often considered a bot.

Best regards,

--
--
You received this message because you are subscribed to the Google Group "PAN".
Visit this group at http://groups.google.com/group/pan-workshop-series
To unsubscribe send email to pan-workshop-se...@googlegroups.com.
---
You received this message because you are subscribed to the Google Groups "PAN Workshop Series on Digital Text Forensics" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-se...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Francisco M. Rangel Pardo

CTO Autoritas Consulting S.A.

http://www.autoritas.net

http://www.kicorangel.com

Twitter: @kicorangel

tlf. +34 656 493 023

Oren

unread,

Feb 24, 2019, 4:26:09 PM2/24/19

to PAN Workshop Series on Digital Text Forensics

Hola Francisco,

>> Hi Oren, how are you?

Gracias, very fine ;-)
Hope you too...

>> By definition, a software that automatically republishes contents is often considered a bot.

OK, thank you very much for the clarification!
This simplifies the task :-)

Best regards
Oren
----------

To unsubscribe send email to pan-workshop-series+unsub...@googlegroups.com.

---
You received this message because you are subscribed to the Google Groups "PAN Workshop Series on Digital Text Forensics" group.

To unsubscribe from this group and stop receiving emails from it, send an email to pan-workshop-series+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward