Is there a way to improve the existing sentiment analysis?

418 views

Skip to first unread message

punforgettable

unread,

Apr 21, 2015, 9:53:55 AM4/21/15

to pattern-f...@googlegroups.com

For some reason textblob, which wraps pattern sentiment analysis, seems to return better sentiment scores than pattern itself, although the seem to have the same base xml file.

Obviously you guys cannot speak to the source of a different project, but is there a native way to improve sentiment scoring without spinning up a new classifier? I ask because I like the ability to ask for assessments and want to use that in conjunction with the improved scoring.

If spinning up a new classifier is necessary, what do you recommend?

Thank you very much!

Tom De Smedt

unread,

Apr 21, 2015, 8:20:44 PM4/21/15

to pattern-f...@googlegroups.com

The XML file is complemented with a Python algorithm that deals with negation ("not good"), intensity ("very good") and emoticons ("baaad >:-D"). This algorithm has changed a lot and might be entirely different from what is wrapped in TextBlob.

The only way to reliably measure the accuracy of a sentiment analysis system is to compare its output to (thousands of) human assessments. "Seems better" is always a problem, because the accuracy is statistical and the system might be wrong about specific cases, which the human eye is very good at spotting; and humans tend to disagree about any personal opinion 30% of the time.

Typical problems are domain adaptation (e.g., what works well on book reviews might not work very well on hotel reviews or political tweets) and sarcasm. The sentiment analysis in Pattern has been tested on book reviews and movie reviews. The accuracy has lowered with 1-2% with new updates to the algorithm – but this actually means that it has become stronger in other domains, in other words, in has better generalization.

Overall, classifiers will reach the same 70-80% accuracy than a lexicon + algorithm approach used by Pattern, unless you have a lot of training data and time to fine-tune the classifier. Classifiers offer a prediction, but they do not offer insight such as the assessments.

It is not difficult to extend Pattern's lexicon with your own scores:

from pattern.en import sentiment
sentiment.annotate("wicked party", polarity=0.7)
sentiment.annotate("nice job stupid", polarity=-0.9)
print sentiment("wicked party this weekend!")

Have a look at the Sentiment.annotate() method in pattern/text/__init__.py

If you do want to use classifiers, use SVM and focus on lots of high-quality training data instead of tweaking parameter values.

Tom

> --
>
> ---
> You received this message because you are subscribed to the Google Groups "Pattern" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to pattern-for-pyt...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages