K-means cluster membership

45 views
Skip to first unread message

Vinodh Krishnaraju

unread,
Jul 19, 2014, 8:07:06 AM7/19/14
to rha...@googlegroups.com
Hi.

I have used the sample k-means code given at 

It ran correctly and returning me the cluster centers. But how do I determine the cluster association as to which cluster a point belongs to. 
I understand this has to be done somewhere in the reducer code. Can someone help me on this.

Antonio Piccolboni

unread,
Jul 19, 2014, 12:02:30 PM7/19/14
to RHadoop Google Group
I think you will need an additional job, as the reducer has to return the centers and the centers are then loaded into memory. The association with the points in the data set will take one row per point and won't fit main memory. I am guessing here, you know the data sizes at play here. So you will need to do a final pass, map only, where you calculate and return the associations. In the java API there is a multiple output feature, but we can't access it from R AFAIK so I don't think there's another way,


Antonio


--
post: rha...@googlegroups.com ||
unsubscribe: rhadoop+u...@googlegroups.com ||
web: https://groups.google.com/d/forum/rhadoop?hl=en-US
---
You received this message because you are subscribed to the Google Groups "RHadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rhadoop+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages