[nltk-users] Option mining using NLTK

Dilip kola

unread,

Mar 29, 2009, 12:57:09 PM3/29/09

to nltk-users

Hi all,
I am new to NLTK and I want to classify movie reviews as positive,
negative or neutral using NLP techniques and I thought that NLTK may
help me doing this. Can any one suggest how to these things using
NLTK?

Thanks In Advance :)

-Dilip Kumar Kola

Adam Oakman

unread,

Mar 29, 2009, 2:15:50 PM3/29/09

to nltk-...@googlegroups.com

NLTK has lots of possibility in this realm. Are you specifically wanting to
do it by meaning of word or sentiment? Take a look at Pang and Lee's work if
you haven't done so already. The NLTK movie review corpus included is a
derivative of their work based on sentiment analysis, though they don't
categorize neutral, only positive and negative. I just completed my MSc
dissertation on meaning analysis and abandoned the concept as neutral, since
there is really no hard definition of it; every review will wind up as
slightly positive or negative, so I developed a scale to measure this. If
you can be a little more specific on your algorithm ideas then we might be
able to provide some more detail.

Thanks

Adam

Dilip kola

unread,

Mar 29, 2009, 2:27:59 PM3/29/09

to nltk-users

Thanks Adam Oakman,

I will look into Pang and Lee's work, OK I realized neutral is quite
difficult to capture. Now I am fine with positive or negative. First I
want to know for what all machine learning algorithms NLTK has support
and can use svm for this purpose? Is SVM implemented in NLTK?

Adam Oakman

unread,

Mar 29, 2009, 2:30:58 PM3/29/09

to nltk-...@googlegroups.com

I don't believe there is a SVM implemented in NLTK, but there are definitely
python implementations of SVM available. I am playing with Pang and Lee's
work now and using http://www.cs.cornell.edu/~tomf/svmpython/

Adam

Dilip kola

unread,

Mar 29, 2009, 3:09:50 PM3/29/09

to nltk-users

Ok Thanks again Adam Oakman,

How are the results coming, what accuracy are you getting using SVM?
Can you send me sample codes how to use svmpython ?

Adam Oakman

unread,

Mar 29, 2009, 3:51:13 PM3/29/09

to nltk-...@googlegroups.com

Hi,
I can't supply source code at this point, though may be able to later
on. I need to talk to the sponsor about it. To date we are seeing 83%
accuracy, Pang and Lee from memory achieved 85% so not too far off though we
are yet to begin tweaking the SVM.

Thanks

A

Steven Bird

unread,

Mar 29, 2009, 5:01:32 PM3/29/09

to nltk-...@googlegroups.com

2009/3/30 Dilip kola <dilip...@gmail.com>:

>
> Hi all,
> I am new to NLTK and I want to classify movie reviews as positive,
> negative or neutral using NLP techniques and I thought that NLTK may
> help me doing this. Can any one suggest how to these things using
> NLTK?

In addition to the other excellent suggestions people have sent,
please see chapter 6 of the NLTK book on text classification.

-Steven Bird

Dilip kola

unread,

Mar 30, 2009, 12:53:35 AM3/30/09

to nltk-users

> In addition to the other excellent suggestions people have sent,
> please see chapter 6 of the NLTK book on text classification.
>
> -Steven Bird

Thanks Steven Bird,

I am reading chapter 6, it is very interesting :).

-Dilip Kola

Dilip kola

unread,

Mar 30, 2009, 12:57:44 AM3/30/09

to nltk-users

On Mar 30, 12:51 am, Adam Oakman <adam.oak...@gmail.com> wrote:
> Hi,
> I can't supply source code at this point, though may be able to later
> on. I need to talk to the sponsor about it. To date we are seeing 83%
> accuracy, Pang and Lee from memory achieved 85% so not too far off though we
> are yet to begin tweaking the SVM.
>
> Thanks
>
> A

HI Adam Oakman,

It is fine, if you are not able to send me those codes :). I am
reading chapter 06 from NLTK book , it is very interesting.

Thanks
-Dilip Kola.

Dilip kola

unread,

Mar 30, 2009, 3:14:30 AM3/30/09

to nltk-users

Hi,

The example given in NLTK ch06 about movie_reviews taking lot of time
and memory. Any help ?

Thanks,

Dilip Kola

Andrew Lee

unread,

Mar 30, 2009, 3:08:09 PM3/30/09

to nltk-users

On Mar 30, 3:14 am, Dilip kola <dilip.i...@gmail.com> wrote:
> Hi,
>
> The example given in NLTK ch06 about movie_reviews taking lot of time
> and memory. Any help ?
>

After reading your post this morning I started a python -m trace --
count code_document_classify_fd.py

I am still waiting for output.

Andrew Lee

unread,

Mar 30, 2009, 5:43:03 PM3/30/09

to nltk-users

I finally killed the process.

That was over 5 hours on a multi cpu machine -- I'd say there's a bug,
or at least that the example, as written, is useless.

Dilip kola

unread,

Mar 30, 2009, 6:35:30 PM3/30/09

to nltk-users

Same thing happened with me. I finally decided to use svmlight for my
task. Can any one suggest me how to combine nltk and svmlight for my
purpose. Still I don't have very clear idea on how to get training
vectors from reviews data. Can any one help in this regard?

-Dilip Kola

Javier Pueyo

unread,

Apr 15, 2009, 11:41:14 AM4/15/09

to nltk-...@googlegroups.com

Sorry for my late response. I just tried today the example and it
seems to me that there is a typo in the book listing (I hope it can be
fix before the book goes to print).

The function:

def document_features(document):
document_words = set(document)
features = {}
for word in all_words:
features['contains(%s)' % word] = (word in document_words)
return features

should read

def document_features(document):
document_words = set(document)
features = {}
for word in word_features:
features['contains(%s)' % word] = (word in document_words)
return features

So the problem seemed to be in the for loop: the "all_words" list
contains every single word in the corpus instead of the 2000 most
frequent words as it was intended in the example. The "word_features"
list is the one containing those 2000 most frequent words.

>>>len(all_words)
39768
>>> len( word_features)
2000

My understanding is that the purpose of the function is to check if
any of the 2000 most frequent words can found in a given document.

Now the example takes seconds instead of hours.

Hope it helps!

Javier

2009/3/30 Dilip kola <dilip...@gmail.com>:

Reply all

Reply to author

Forward