nltk clustering-working examples?tutorial?

2,457 views
Skip to first unread message

mystickahuna

unread,
Jul 13, 2010, 11:32:18 PM7/13/10
to nltk-users
Hi, Guys,
I am wondering if there is a tutorial that includes working
examples of using nltk.cluster package. I am currently working on
semantic clustering problem and realize this tool can be very helpful
to my analysis. However, after I search on the web and in this group
as well I didn't find any helpful resources that could help me
implement a working example... I also realize there are some issues
with different versions of nltk documents on the web that implement
different interfaces. E.g. nltk.cluster.kmeans.KMeansClusterer in nltk
2.0b8 was named as nltk.cluster.kmeans.KMeans in nltk 0.9.5 while the
example for nltk 2.0b8 is still based on the old implementation in
0.9.5.
Anyway, could somebody successfully run the example on the
following page using nltk 2.0b8?
http://nltk.googlecode.com/svn/trunk/doc/api/nltk.cluster-module.html
I know there must be work-arounds by classifying data separately using
some stats software but I still would like to try implementing it
using nltk library first so that I don't need to work on data in
different environments back and forth.
Thanks in advance
David

mystickahuna

unread,
Jul 15, 2010, 11:30:44 AM7/15/10
to nltk-users
Anybody could do me a favor here?

Alex Rudnick

unread,
Jul 16, 2010, 4:56:45 PM7/16/10
to nltk-...@googlegroups.com
On Thu, Jul 15, 2010 at 11:30 AM, mystickahuna <chen...@gmail.com> wrote:
> Anybody could do me a favor here?

Hey David,

I'll look into this this weekend or sometime soon. It might be that
there's no good up-to-date tutorial, and you (or somebody, maybe me)
should write one and put it on the new wiki!

Happy hacking,

--
-- alexr

mystickahuna

unread,
Jul 17, 2010, 3:27:33 PM7/17/10
to nltk-users
Thanks for spending time solving my problem. I will keep tracking this
thread. Hope NLTK can become more popular.

On Jul 16, 3:56 pm, Alex Rudnick <alex.rudn...@gmail.com> wrote:

Alex Rudnick

unread,
Jul 18, 2010, 11:37:07 PM7/18/10
to nltk-...@googlegroups.com
Hey again David,

You're right -- the example on that page doesn't work! I'll make sure to fix it.

There is a working example in the demo() function of kmeans.py, though.
http://code.google.com/p/nltk/source/browse/trunk/nltk/nltk/cluster/kmeans.py#167

Also, it works, for me, to type this into the Python repl:

from numpy import array
from nltk import cluster
from nltk.cluster import euclidean_distance
vectors = [array(f) for f in [[3, 3], [1, 2], [4, 2], [4, 0]]]
clusterer = cluster.KMeansClusterer(2, euclidean_distance, repeats=10)
print clusterer.cluster(vectors, True)

Hope this helps!

On Sat, Jul 17, 2010 at 3:27 PM, mystickahuna <chen...@gmail.com> wrote:
> Thanks for spending time solving my problem. I will keep tracking this
> thread. Hope NLTK can become more popular.

--
-- alexr

Alex Rudnick

unread,
Jul 19, 2010, 1:32:20 AM7/19/10
to nltk-...@googlegroups.com
OK, I updated the docstring. The new version of the documentation will
be up (I believe) when the docs for 2.09b get pushed.

http://code.google.com/p/nltk/issues/detail?id=578

Thanks for pointing out the problem!

--
-- alexr

Reply all
Reply to author
Forward
0 new messages