Sentence fragmenting

32 views
Skip to first unread message

Ambarish Jash

unread,
Dec 5, 2012, 12:25:36 PM12/5/12
to nltk-...@googlegroups.com
Hi,
I am trying to fragment sentences based on a conjunction join for sentiment analysis.
For eg 
Case 1
    "The coffee was great but not the juice"
Case 2
    "The coffee and the juice was great"

In case 1 there are 2 independent sentiments being put into one sentence and hence this sentence should be separated into 2 phrases. However in case 2 the sentiment is the same for both juice and coffee and hence the sentence need not be broken.

Is there a technique for doing this in nltk?
If not can someone suggest a technique / research paper which might help me

Thanks
Ambarish Jash

Nigel Legg

unread,
Dec 6, 2012, 2:47:46 AM12/6/12
to nltk-...@googlegroups.com
I wouldn't split the sentences.  I'd use a supervised classifier, and have two measures of sentiment one for coffee and one for juice.
So on "coffee sentiment", both sentences are positive, but for "juice sentiment" the first is negative and the second positive:
Coffee: +1, +1
Juice: -1, -1.



--
 
 



--
Regards,
Nigel Legg
07722 652866
http://twitter.com/nigellegg
http://uk.linkedin.com/in/nigellegg

Ambarish Jash

unread,
Dec 6, 2012, 8:44:29 AM12/6/12
to nltk-...@googlegroups.com
The scope of the adjective needs to figured out if I do not break up the sentences into phrases. 
Infact down the road a negation detector would have to figure out the scope of negation as well.

To prevent all this I thought of breaking the sentences into relevant phrases.


--
 
 



--
Ambarish Jash

Nigel Legg

unread,
Dec 6, 2012, 9:51:14 AM12/6/12
to nltk-...@googlegroups.com
Wouldn't that be covered by having a sufficiently solid and differentiated training sets to define he classifications?  Prodcing the training sets may take time, but there may be occasions where it will be imossible to disentangle the two threads eg "I only liked the coffee" implies the negative sentiment for the juice.

Ambarish Jash

unread,
Dec 6, 2012, 10:00:59 AM12/6/12
to nltk-...@googlegroups.com
I am assuming that if "juice" is not mentioned then it's not there. Also in reality the sentences could have any number of nouns (need not be only coffee and juice) so disentangling a thread might not be that simple.

I am working on reviews of internet products. So coffee and juice was just an example.


--
 
 



--
Ambarish Jash

Reply all
Reply to author
Forward
0 new messages