Hi, I'm trying to update some of my nltk code from 2.7 to 3. and I am running into some problems extracting named entities. Specifically I have a text document that I am tokening and assigning pos tags to then I'm using nltk.chunk_ne_sents to get a chunk tree for each sentence. That part of the process is straightforward, here is the code I'm using to accomplish that:
sentences = nltk.sent_tokenize(corpus)
tokenized = [nltk.word_tokenize(sentence) for sentence in sentences]
pos_tags = [nltk.pos_tag(sentence) for sentence in tokenized]
trees = nltk.ne_chunk_sents(pos_tags, binary=True)
But now I want to extract all the named entities and place them in a list. As traversing trees is not my strong suit I pulled some code from Jacob Perkins NLTK 3 cookbook that looks to do what I want. Here is that code, straight from Jacob's book:
def sub_leaves(tree, label):
return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]
However when I run this def on my trees (I've tried it with a single sentence tree and with an entire text's worth of trees) and I get the error message:
File "<pyshell#41>", line 2, in sub_leaves
return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]
AttributeError: 'list' object has no attribute 'subtrees'
Any ideas on how to fix this would be greatly appreciated. And if Jacob happens to read this I just wanted to say how much I appreciate your cookbook code, it has been very helpful on several occasions. Thanks, George