Using NLTK's VADER for paragraphs (not single sentences)

982 views
Skip to first unread message

Nathan Sherburn

unread,
Jun 26, 2016, 3:24:15 AM6/26/16
to nltk-users
If I want the combined sentiment of a paragraph of text, is there any reason I can't just generate a score for the whole paragraph like this:

Code:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

paragraph = "It was one of the worst movies I've seen, despite good reviews. \
Unbelievably bad acting!! Poor direction. VERY poor production. \
The movie was bad. Very bad movie. VERY bad movie. VERY BAD movie. VERY BAD movie!"

sid = SentimentIntensityAnalyzer()
ss = sid.polarity_scores(paragraph)

print(ss)


Note: This code does produce a seemly valid output however, my question is, 'is there any reason this output is not valid'?

Output:
{
  'neg': 0.609,
  'neu': 0.391,
  'pos': 0.0,
  'compound': -0.9921
}

Dimitriadis, A. (Alexis)

unread,
Jun 26, 2016, 6:03:13 PM6/26/16
to nltk-...@googlegroups.com
Hi Nathan,

If the sentiment analyzer was trained on single sentences, it’s possible that it could be thrown off by being fed several sentences worth of words. But there’s only one way to tell: Are you getting satisfactory performance on your test corpus? 

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ewan Klein

unread,
Jun 27, 2016, 5:24:36 AM6/27/16
to nltk-...@googlegroups.com

> On 27 Jun 2016, at 00:03, Dimitriadis, A. (Alexis) <A.Dimi...@uu.nl> wrote:
>
> If the sentiment analyzer was trained on single sentences, it’s possible that it could be thrown off by being fed several sentences worth of words. But there’s only one way to tell: Are you getting satisfactory performance on your test corpus?

VADER's approach is not only intended for single sentences, but more importantly is tuned to the kind of vocabulary likely to occur in casual discourse such as Twitter and SMS messages.

I've tried it on multi-sentence texts (public comments on a planning proposal) and the results weren't great. As I recall, this was partly because of the particular vocabulary (e.g., "I object strongly..." comes out as slightly positive in VADER) and partly because the approach means that a number of neutral sentences will outweigh one very negative sentence in the same text.

-- Ewan
Reply all
Reply to author
Forward
0 new messages