WordNet synset frequencies

Benjamin Goldenberg

unread,

Jul 6, 2009, 7:40:32 PM7/6/09

to nltk-...@googlegroups.com

Hello all,

I'm working to classify utterances as having either a positive or
negative sentiment. One of the techniques I'm exploring is using the
SentiWordNet database to assign sentiment scores to individual words.
The database contains two scores, positive and negative, for each
synset in WordNet. But, I only have the raw text and POS of the
utterance and need to determine the appropriate synset(s). Is there a
way to get frequencies for each possible synset given a lemma word, so
I can weight the scores appropriately? If not, does anyone know of a
data source that provides these frequencies?

Thanks,
Ben

Keith Stevens

unread,

Jul 6, 2009, 9:04:27 PM7/6/09

to nltk-...@googlegroups.com

As far as i know, the synsets are ordered by popularity, so the first synset in a lemma is the most frequently used, and the next is the second and so on. So a really simply thing you could do is just take the first synset for a word and run with it.

Benjamin Goldenberg

unread,

Jul 7, 2009, 2:20:55 PM7/7/09

to nltk-...@googlegroups.com

> On Mon, Jul 6, 2009 at 4:40 PM, Benjamin Goldenberg <bgold...@gmail.com>
> wrote:
>>
>> Hello all,
>>
>> I'm working to classify utterances as having either a positive or
>> negative sentiment. One of the techniques I'm exploring is using the
>> SentiWordNet database to assign sentiment scores to individual words.
>> The database contains two scores, positive and negative, for each
>> synset in WordNet. But, I only have the raw text and POS of the
>> utterance and need to determine the appropriate synset(s). Is there a
>> way to get frequencies for each possible synset given a lemma word, so
>> I can weight the scores appropriately? If not, does anyone know of a
>> data source that provides these frequencies?
>

> On Mon, Jul 6, 2009 at 6:04 PM, Keith Stevens<fozzie...@gmail.com> wrote:
> As far as i know, the synsets are ordered by popularity, so the first synset
> in a lemma is the most frequently used, and the next is the second and so
> on. So a really simply thing you could do is just take the first synset for
> a word and run with it.

Thanks for the information. I'll try using just the first one, or
maybe a weighted average of the first three or something. I suppose
this popularity data must exist somewhere. I'll try to search for it.

Ben

Reply all

Reply to author

Forward