Filtering bigrams by POS-tag

568 views

Skip to first unread message

Louis Corbel

unread,

Apr 26, 2016, 7:05:34 AM4/26/16

to nltk-users

Hi guys,

So I'm trying to filter output bigrams depending of the POS-tag of the words contained in them. To be clear I would like to get only bigrams composed of a noun 'NN' and an adjective 'JJ'.

Here is the sample of the code concerned by this :

finder = BigramCollocationFinder.from_words(tokens)
finder.apply_ngram_filter(lambda w1, w2: ???)

Here tokens is a list of words postagged and lemmatized. I don't know what to write in the '???' part to achieve what i described earlier.

If you need more details explanations or other part of my code to get a grasp of my problem don't hesitate to ask.

Thanks,
Louis

Alexis

unread,

May 1, 2016, 10:10:08 AM5/1/16

to nltk-...@googlegroups.com

If your "words" are actually tuples of (word, tag),as in the output of `pos_tag()`, your filter function can look like this:

def filterNNJJ(pair1, pair2):
(w1, t1), (w2, t2) = pair1, pair2
return not ((t1 == "NN") and (t2 == "JJ"))

Or less readably as a lambda function:

lambda w1, w2: not ((w1[1] == "NN") and (w2[1] == "JJ"))

According to the documentation, the filter must return True for the pairs that you want to remove, hence the test is negated. In any event, it should take as many arguments as your ngram size (2 by default), and it can unpack them as necessary. If this doesn't work right away, have your function print out "w1" and "w2", and take it from there.

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages