K-Means and MongoDB

239 views
Skip to first unread message

nimrod...@gmail.com

unread,
Oct 15, 2014, 10:32:18 AM10/15/14
to mongod...@googlegroups.com
Hi,
I am new to mongodb.
Lets say i have a collection of documents and i have a new document which is not in the collection , lets call it X.

1.I want to run the K-Means Algorithm on the collection.
2. I want to calculate to which cluster X is most close to(Lets say the metric is cosine similarity or other known metric).
3. I want to return all the documents which are in the specific cluster i found in 2.


Can i do that on mongoDB?

John De Goes

unread,
Oct 15, 2014, 12:50:10 PM10/15/14
to mongod...@googlegroups.com

It's well known that K-means can be done using ordinary SQL . Most of the "pain" of this approach actually involves the large number of intermediate tables, which will be less painful with MongoDB.

While MongoDB does not support SQL itself, you can execute SQL on MongoDB using an open source project called SlamEngine, which compiles ordinary SQL down to efficient MongoDB query plans. Note that you will probably have to modify the "classic" approach to doing this with SQL in order to achieve optimal performance.

Regards,

John

César Antonio Pérez Quintana

unread,
Oct 15, 2014, 6:17:18 PM10/15/14
to mongod...@googlegroups.com
I think the SQL approach is not as good as one with MapReduce... Give me a couple days to figure out how to do this.
Reply all
Reply to author
Forward
0 new messages