Named Entity Extraction subtrees error

233 views
Skip to first unread message

Bio

unread,
Jan 25, 2016, 1:34:55 PM1/25/16
to nltk-users
Hi, I'm trying to update some of my nltk code from 2.7 to 3. and I am running into some problems extracting named entities. Specifically I have a text document that I am tokening and assigning pos tags to then I'm using nltk.chunk_ne_sents to get a chunk tree for each sentence. That part of the process is straightforward, here is the code I'm using to accomplish that:
    
    sentences = nltk.sent_tokenize(corpus)
    tokenized = [nltk.word_tokenize(sentence) for sentence in sentences]
    pos_tags  = [nltk.pos_tag(sentence) for sentence in tokenized]
    trees = nltk.ne_chunk_sents(pos_tags, binary=True)

But now I want to extract all the named entities and place them in a list. As traversing trees is not my strong suit I pulled some code from Jacob Perkins NLTK 3 cookbook that looks to do what I want. Here is that code, straight from Jacob's book:

    def sub_leaves(tree, label):
        return [t.leaves() for t in tree.subtrees(lambda s: label() == label)] 

However when I run this def on my trees (I've tried it with a single sentence tree and with an entire text's worth of trees) and I get the error message:

 File "<pyshell#41>", line 2, in sub_leaves
    return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]
AttributeError: 'list' object has no attribute 'subtrees'


Any ideas on how to fix this would be greatly appreciated. And if Jacob happens to read this I just wanted to say how much I appreciate your cookbook code, it has been very helpful on several occasions. Thanks, George

Alexis

unread,
Jan 25, 2016, 7:21:06 PM1/25/16
to nltk-...@googlegroups.com
However when I run this def on my trees (I've tried it with a single sentence tree and with an entire text's worth of trees) and I get the error message:

 File "<pyshell#41>", line 2, in sub_leaves
    return [t.leaves() for t in tree.subtrees(lambda s: label() == label)]
AttributeError: 'list' object has no attribute 'subtrees'

Looks like you're calling `sub_leaves()` with a list of trees, rather than a single tree as you should.

Also, are you sure you copied this function correctly? `leaves()` is a tree method. 

Alexis

Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

On 25 Jan 2016, at 20:34, Bio <con...@bioasys.net>
 wrote:

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bio

unread,
Jan 26, 2016, 1:09:18 PM1/26/16
to nltk-users
Hi Alex, Thanks for taking a look at my problem code. For some reason I was unable to repeat the AttributeError problem I was having yesterday. My goal was to create of list of all the named entities from the trees the ne_chunk method created from my text. Fortunately after taking a different approach than the one I tried yesterday I was successful. Thanks again for taking a look at my code. Sincerely, George


Reply all
Reply to author
Forward
0 new messages