Named Entity Extraction subtrees error

Bio

unread,

Jan 25, 2016, 1:34:55 PM1/25/16

to nltk-users

Hi, I'm trying to update some of my nltk code from 2.7 to 3. and I am running into some problems extracting named entities. Specifically I have a text document that I am tokening and assigning pos tags to then I'm using nltk.chunk_ne_sents to get a chunk tree for each sentence. That part of the process is straightforward, here is the code I'm using to accomplish that:

sentences = nltk.sent_tokenize(corpus)

tokenized = [nltk.word_tokenize(sentence) for sentence in sentences]

pos_tags = [nltk.pos_tag(sentence) for sentence in tokenized]

trees = nltk.ne_chunk_sents(pos_tags, binary=True)

But now I want to extract all the named entities and place them in a list. As traversing trees is not my strong suit I pulled some code from Jacob Perkins NLTK 3 cookbook that looks to do what I want. Here is that code, straight from Jacob's book:

def sub_leaves(tree, label):

return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]

However when I run this def on my trees (I've tried it with a single sentence tree and with an entire text's worth of trees) and I get the error message:

File "<pyshell#41>", line 2, in sub_leaves

return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]

AttributeError: 'list' object has no attribute 'subtrees'

Any ideas on how to fix this would be greatly appreciated. And if Jacob happens to read this I just wanted to say how much I appreciate your cookbook code, it has been very helpful on several occasions. Thanks, George

Alexis

unread,

Jan 25, 2016, 7:21:06 PM1/25/16

to nltk-...@googlegroups.com

However when I run this def on my trees (I've tried it with a single sentence tree and with an entire text's worth of trees) and I get the error message:

File "<pyshell#41>", line 2, in sub_leaves
return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]
AttributeError: 'list' object has no attribute 'subtrees'

Looks like you're calling `sub_leaves()` with a list of trees, rather than a single tree as you should.

Also, are you sure you copied this function correctly? `leaves()` is a tree method.

Alexis

On 25 Jan 2016, at 20:34, Bio <con...@bioasys.net>

wrote:

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bio

unread,

Jan 26, 2016, 1:09:18 PM1/26/16

to nltk-users

Hi Alex, Thanks for taking a look at my problem code. For some reason I was unable to repeat the AttributeError problem I was having yesterday. My goal was to create of list of all the named entities from the trees the ne_chunk method created from my text. Fortunately after taking a different approach than the one I tried yesterday I was successful. Thanks again for taking a look at my code. Sincerely, George

Reply all

Reply to author

Forward