Sentiment140 training data set only contains Neg & Pos but, no Neutral?

818 views
Skip to first unread message

Kevin Cocco

unread,
Apr 25, 2012, 9:04:25 PM4/25/12
to sentim...@googlegroups.com
Was looking to test modeling the "tweet corpus" data link on: http://help.sentiment140.com/for-students

I only see 800k records marked 0/neg and 800k marked 4/pos? No 2/neutral training?

Here is how the data is described:
Format Data file format has 6 fields:
0 - the polarity of the tweet (0 = negative, 2 = neutral, 4 = positive)

Anyone know where I could get the training dataset that includes neutral?
Thank you
-Kevin Cocco


Kevin Cocco

unread,
Apr 26, 2012, 8:45:52 AM4/26/12
to sentim...@googlegroups.com
I got an update: 

"We haven't posted a file with neutrals. I haven't really studied neutrals in that much detail. For the classifier on our site, we just used random tweets for our training data. I hope this helps."

Thanks for sharing data!
-Kevin

fj.mar...@gmail.com

unread,
May 21, 2012, 12:46:42 PM5/21/12
to sentim...@googlegroups.com
Hi, does that corpus includes data in spanish or onyl in english?

Thanks in advance,
Fernando

Alec

unread,
May 22, 2012, 11:21:21 PM5/22/12
to sentim...@googlegroups.com
Hi Fernando,

The corpus only includes English tweets for now.

Alec

diego...@gmail.com

unread,
Aug 24, 2012, 9:32:05 AM8/24/12
to sentim...@googlegroups.com
Hi Fernando,

I'm also wondering to make something work on Spanish language, but no idea where to start.


KR.

sumi

unread,
Oct 27, 2013, 4:56:41 PM10/27/13
to sentim...@googlegroups.com
Hi Kevin

I was just trying to use the sentiment140 for training and discovered the same thing: no neutrals. Because I do need them for the training I wondered, if you found something regarding this matter?

Thanks!
Greetings
Reply all
Reply to author
Forward
0 new messages