Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
Ğ Groups Home
Help me with best use of NLTK API for this research
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Edmon  
View profile  
 More options Mar 11 2012, 3:16 pm
From: Edmon <ebeg...@gmail.com>
Date: Sun, 11 Mar 2012 12:16:19 -0700 (PDT)
Local: Sun, Mar 11 2012 3:16 pm
Subject: Help me with best use of NLTK API for this research

I am new to NLTK but I am absolutely impressed and enamored by this
package.

I am using it for my research in computational linguistics and I am
learning as I go.
I have a particular statistic I would like to collect and I am hoping
someone on this list could help
me with the tips for implementation.

I would like to study patterns of the word groupings and
their frequencies around the particular families of words.

For example, I would like to look for all occurrences of the word "dislike"
with all of its variations (inflections,...)
and what are the most common words that precede and follow it at the
sentence level.  

I imagine I would take the raw text (already know how to do it), break it
down into sentences (know how to do this)
and then within sentence search for the occurrences of the word in all of
its forms and start building a set
of pre and post patterns (pre-bigram, pre-trigram, etc) and same for the
post.

For example, lets take this simple, made up text:

"Every Sunday they gather at the Mall. She dislikes the crowd, but she
likes the company of her friends.
Movie that they are going to see is a typical blockbuster. Her boyfriend
likes movies like that. She does not."

From this text I would collect:

(She, dislikes)
(dislikes, the)
(she, likes)
(likes, the)
(boyfriend, likes)
(likes, movies)

(She, dislikes)
(dislikes, the, crowd)
(but, she, likes)
(likes, the, company)
(Her, boyfriend, likes)
(likes, movies, like)
...

and then I would finally collect statistics on the frequencies of the
particular patterns. (know how to do it)

Would someone please suggest a tip or an approach on how to do the
neighborhood pattern collection part
on like words at the sentence level using NLTK API.

Thank you in advance,
Edmon


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Morten Minde Neergaard  
View profile  
 More options Mar 11 2012, 4:53 pm
From: Morten Minde Neergaard <x...@8d.no>
Date: Sun, 11 Mar 2012 21:53:40 +0100
Local: Sun, Mar 11 2012 4:53 pm
Subject: Re: [nltk-users] Help me with best use of NLTK API for this research
At 12:16, Sun 2012-03-11, Edmon wrote:
[…]
> From this text I would collect:

> (She, dislikes)
> (dislikes, the)
> (she, likes)
> (likes, the)
> (boyfriend, likes)
> (likes, movies)

[…]

This approach might give you what you want. Exchange the 2 for 3 to get
trigrams.

filter_words = ('like', 'likes', 'dislike', 'dislikes', 'enjoy', 'enjoys')
filter(lambda gram: gram[0] in filter_words or gram[-1] in filter_words,
       nltk.ngrams(nltk.tokenize.word_tokenize(text), 2))

You may want to use a tagger if you want to avoid false positives on e.g
ĞI am not like youğ while keeping e.g ĞI do not like youğ.

Cheers,
--
Morten Minde Neergaard


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Edmon  
View profile  
 More options Mar 11 2012, 6:09 pm
From: Edmon <ebeg...@gmail.com>
Date: Sun, 11 Mar 2012 15:09:49 -0700 (PDT)
Local: Sun, Mar 11 2012 6:09 pm
Subject: Re: [nltk-users] Help me with best use of NLTK API for this research

Thanks Morten.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »