MongoDB to Hadoop using Sqoop connectivity

986 views
Skip to first unread message

Ashvanth R

unread,
Jul 27, 2016, 8:33:56 PM7/27/16
to mongodb-user

Hi Team,


I did not find much info as how to connect MongoDB to Hadoop using Sqoop connectivity. Does sqoop work with MongoDB Hadoop .

If sqoop works with mongoDB Hadoop connector how should the query look like to pull incremental data?

 

 

Thanks & Regards,

Ashvanth Rameshkumar.

 

Shiva Ramagopal

unread,
Jul 28, 2016, 2:23:16 AM7/28/16
to mongod...@googlegroups.com
Hi,

Sqoop is for connecting an RDBMS to Hadoop.

To connect MongoDB and Hadoop (typically Hive), use the mongo-hadoop connector.

[1] https://docs.mongodb.com/ecosystem/tools/hadoop/
[2] https://github.com/mongodb/mongo-hadoop

Also make sure that the various versions of Hadoop/Hive/Pig/Spark, MongoDB and other drivers are "correct".

From [2] above:

Version Compatibility

These are the minimum versions tested with the Hadoop connector. Earlier versions may work, but haven't been tested.

  • Hadoop: 2.4
  • Hive: 1.1
  • Pig: 0.11
  • Spark: 1.4
  • MongoDB: 2.2

Dependencies

You must have at least version 3.0.0 of the MongoDB Java Driver installed in order to use the Hadoop connector.

Have fun,

Shiv




--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/d80fbdf2-b421-4378-bd4e-8d122421761f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ashvanth R

unread,
Jul 28, 2016, 11:47:09 AM7/28/16
to mongodb-user
Hi,

If I am using MongoDB and Hadoop connector , what is will be format of data at the output.

Example : I am pulling Bson data from MongoDB to Hadoop and when the data is at destination what will be format/type of data ??

Wan Bachtiar

unread,
Jul 29, 2016, 1:04:42 AM7/29/16
to mongodb-user

I am pulling Bson data from MongoDB to Hadoop and when the data is at destination what will be format/type of data ??

Hi Ashvanth,

The MongoDB Connector for Hadoop makes it easy to use MongoDB databases, or MongoDB backup files in bson format, as the input source or output destination for Hadoop Map/Reduce jobs.

If you are referring to reading MongoDB back up files in bson format. i.e. output of mongodump , see Using BSON Files for usage and examples. If you are reading from MongoDB database as input/output source, see Configuration Reference for more information.

In regards to the input format, it depends on what you are trying to do. There are a number of input formats: com.mongodb.hadoop. Check out the MongoDB Hadoop Connector Job Examples for examples map/reduce jobs.

You may also find these resources useful:

Regards,

Wan.

Shiva Ramagopal

unread,
Jul 29, 2016, 1:44:56 AM7/29/16
to mongod...@googlegroups.com
Hi,

If the destination is Hive (which is typical), the output data format will be the storage format specified for the corresponding Hive table ('STORED AS <format name>' clause in the CREATE TABLE statement for the Hive table). This is automatically taken care by the Hive serdes. If you want to access the table as a file, you can find it in the HDFS directory /user/hive/warehouse/<table name>. However there's rarely a need to do this.

What's your use-case?

Cheers,
Shiv



Reply all
Reply to author
Forward
0 new messages