using nltk.findall

494 views
Skip to first unread message

Xavier Oremor

unread,
Oct 19, 2010, 5:48:59 PM10/19/10
to nltk-users
ok. i must be doing something completely wrong, but i can't figure out
how to properly use the nltk.findall function mentioned on p.105 of
the nl book.

from what I can see, it prints to stdout but it always assigns a value
of 'None' to the variable 'result'

any thoughts on how to access the value I'm looking for?

thx.

xavier

//** code snippet **//

text_list = ['Will',
'meet','at','the','ball','park','to','play','ball']
text = nltk.Text(text_list)

pattern = r"<.*> <ball>"
result = text.findall(pattern)

//** end code snippet **//

Tim McNamara

unread,
Oct 19, 2010, 6:38:01 PM10/19/10
to nltk-...@googlegroups.com
Hi Xavier,

It's not you, it's the code. That method only prints to the screen. I was talking a few months ago about moving nltk.Text to be more generic, rather than focusing on on general use case of getting things done in the interactive Python prompt.

def findall(self, regexp):
        if "_token_searcher" not in self.__dict__:
            self._token_searcher = TokenSearcher(self)

        hits = self._token_searcher.findall(regexp)
        hits = [' '.join(h) for h in hits]
        print tokenwrap(hits, "; ")

 

Richard Careaga

unread,
Oct 20, 2010, 5:19:48 PM10/20/10
to nltk-...@googlegroups.com
Tim, I encourage the change to Text to permit assignment. It would still be useful for interactive learning and would also provide a facility for simple applications. Although we are encouraged to use the underlying methods for non-interactive purposes, that ability comes fairly late along the learning curve.

Thanks,

Richard

Tim McNamara wrote:
--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To post to this group, send email to nltk-...@googlegroups.com.
To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.

AdamL

unread,
Oct 21, 2010, 2:03:04 AM10/21/10
to nltk-users
Interesting that you posted this Xavier. I wanted to use the findall
function for a few simple test. After some digging, and realizing that
it only prints output, you will see that it's a function of the class
TokenSeacher.

Text is only a wrapper that was intended for interactive exploration
but I agree with Richard that Text has a lot of really useful methods
that would be great to have without having to re-create with the
underlying classes. Text and TokenSearcher are both classes under the
module text: http://nltk.googlecode.com/svn/trunk/doc/api/nltk.text-module.html

You can simply create an object of TokenSearcher then use the function
findall (make sure your text is tokenized):

mash = nltk.text.TokenSearcher(mish)
pattern = mash.findall(<not><.*>?<good>)

Hope that helps,
Adam
> > ?? ? ? ?if "_token_searcher" not in self.__dict__:
> > ?? ? ? ? ? ?self._token_searcher = TokenSearcher(self)
>
> > ?? ? ? ?hits = self._token_searcher.findall(regexp)
> > ?? ? ? ?hits = [' '.join(h) for h in hits]
> > ?? ? ? ?print tokenwrap(hits, "; ")
>
> > ?

Xavier Oremor

unread,
Oct 21, 2010, 3:27:40 AM10/21/10
to nltk-...@googlegroups.com
thank you! 

Steven Bird

unread,
Oct 21, 2010, 4:23:25 PM10/21/10
to nltk-...@googlegroups.com
On 21 October 2010 17:03, AdamL <adam.p...@gmail.com> wrote:
> Text is only a wrapper that was intended for interactive exploration
> but I agree with Richard that Text has a lot of really useful methods
> that would be great to have without having to re-create with the
> underlying classes.

I would welcome suggestions for what functionality should be made more
accessible. Note that TokenSearcher can be imported from the nltk
namespace already (i.e. "from nltk import TokenSearcher").

You may also have suggestions for what should be included in a HOWTO
document about this functionality, and it could live with the other
HOWTOs here: http://nltk.googlecode.com/svn/trunk/doc/howto/index.html

Concrete suggestions, code, patches, documentation, etc, should be
posted to our issue tracker at
http://code.google.com/p/nltk/issues/list

Thanks,
-Steven Bird

AdamL

unread,
Oct 24, 2010, 5:21:36 PM10/24/10
to nltk-users
Thanks Steve. I will gather some thoughts and post some concrete
suggestions on the issue tracker.

My only thought would be to have the same functionality that's
available as nltk.Text but provide the ability to return rather than
print the output. The Test wrapper is so easy and convenient and it
wraps many useful methods that would be nice to have all in addition
to calling each method individually.

I think it might be useful to have a HOWTO on how to use the
individual methods bundled under the Text module.

Thanks for providing such a useful toolkit!

-Adam

On Oct 21, 1:23 pm, Steven Bird <stevenbi...@gmail.com> wrote:
Reply all
Reply to author
Forward
0 new messages