Kmeans clustering with Scipy and Gensim

2,121 views
Skip to first unread message

Vinay B,

unread,
Feb 6, 2013, 9:02:32 PM2/6/13
to gen...@googlegroups.com

Here's an example of how to cluster with scipy
http://glowingpython.blogspot.com/2012/04/k-means-clustering-with-scipy.html

If I wanted to integrate this kmeans , what would I need to pass in as "data" into kmeans. Would it be the corpus ?

Thank You

Radim Řehůřek

unread,
Feb 7, 2013, 3:29:31 PM2/7/13
to gensim
Hello Vinay,

as far as I recall, the kmeans in scipy accepts dense numpy arrays
(but check the scipy docs to be sure).

You can use `gensim.matutils` to convert between a memory-friendly
gensim corpus and a numpy array that you'd feed into scipy:
http://radimrehurek.com/gensim/matutils.html#gensim.matutils.corpus2dense

HTH,
Radim


On Feb 7, 3:02 am, "Vinay B," <vybe3...@gmail.com> wrote:
> Here's an example of how to cluster with scipyhttp://glowingpython.blogspot.com/2012/04/k-means-clustering-with-sci...

Vinay B

unread,
Feb 11, 2013, 1:51:36 PM2/11/13
to gen...@googlegroups.com
Hi,

Thanks, the matrix transofrmation helped (to an extent)

I tried a couple of things
1. Gensim + scipy integration
Teh clusters dont seem well formed at all and seems to have been completed too fast

2. Gensim + scikit-learn integration
Seems to converge too fast (withiin an iteration or so, ..this isn't normal)

Source at https://gist.github.com/balamuru/4756543 . Feel free to run it on any known test set you have.

I also have some other questions on the labels (commented out), but am not yet concerned with that

Thanks in advance
Vinay
Reply all
Reply to author
Forward
0 new messages