Dear all,
We have developed a simple algorithm for Twitter sentiment analysis. We have tested in SemEval 2017 general task for predicting the over all sentiment of a tweet. However, as we don't use word embeddings method many of the words in the test set of the task are not found in the train so the algorithm cannot perform well. We tried to use senti140 [a] (distant supervision, pos, neg) as a proxy where we create ourselves a neutral class but it's performing well this true because the labeling is different in the two data sets as in distant supervised senti140 "the event is starting at 9 :) " will be marked positive and in the SemEval as neutral.
So we would like to ask if you know any data set that containing a sufficient number of tweets labeled using pos, neut, neg in order to learn from it and then test on the SemEval task.
Many thanks for your consideration,
Damianos Melidis