Hadoop Mongodb connector vs ETL tools (Talend & Pentahoo kettle)

211 views
Skip to first unread message

Peter Packer

unread,
Jul 14, 2014, 12:12:22 AM7/14/14
to mongod...@googlegroups.com
Dear all, 

I am planing to do extensive data mining in Mahout with data that are collected in mongodb. 

I am reading this http://docs.mongodb.org/ecosystem/use-cases/hadoop/. This doc highlights "moving data first to hdfs then compute". A quick search give me the following tools: 

 
In the meantime, I found out that I can use hadoop mongodb connector to run hadoop jobs seemlessly on top of mongodb without moving gigabytes of data to hdfs. 

I wonder if you may help pointing out what is the most popular way to do extensive hadoop mining with mongodb data? I really mean in production.
Should I definitively stick with the official hadoop mongo connector? 

Thanks very much, 
Peter

Will Berkeley

unread,
Jul 17, 2014, 11:51:13 AM7/17/14
to mongod...@googlegroups.com
Hi Peter. The mongo-hadoop connector does let you run your Hadoop jobs directly against MongoDB. I'd recommend starting this way and considering appropriate sync/dump options once there's a need for it.

-Will

TJ Tang

unread,
Jul 19, 2014, 8:08:39 PM7/19/14
to mongod...@googlegroups.com
If run Hadoop job directly against MongoDB, would there be a bottleneck on Mongo instances serving the data? Consider the scenario where I might have 20 Hadoop computing nodes and all accessing data from a 2  shards Mongo cluster, in real time. 

在 2014年7月17日星期四UTC+8下午11时51分13秒,Will Berkeley写道:

Will Berkeley

unread,
Jul 21, 2014, 12:44:42 PM7/21/14
to mongod...@googlegroups.com
That's correct that there might be a bottleneck for large enough Hadoop jobs, in which case you should consider dumping and syncing with HDFS instead of using the mongo-hadoop connector. The point is that it is much easier to use the mongo-hadoop connector than to do the syncing, so start off using the mongo-hadoop connector and, if you do encounter a bottleneck, consider syncing to HDFS to avoid the bottleneck.

-Will
Reply all
Reply to author
Forward
0 new messages