Loading PigData to MongoDB in Ubuntu.

TEJASHWINI s

unread,

Sep 21, 2016, 3:28:19 AM9/21/16

to mongodb-user

Hi ,

Could anyone plz tell me how to load Pig data to MongoDB . If I need to include the jars..Where to include those Jars.Please Help me.

Thanks & Regards,

Tejashwini

Wan Bachtiar

unread,

Sep 21, 2016, 8:07:10 AM9/21/16

to mongodb-user

Could anyone plz tell me how to load Pig data to MongoDB .

Hi Tejashwini,

Please see mongo-hadoop: Pig usage to get started with MongoDB and Apache Pig integration.
For example to insert directly into a MongoDB collection from Pig:

STORE data INTO 'mongodb://localhost:27017/db_name.coll_name' USING com.mongodb.hadoop.pig.MongoInsertStorage('', '');

If I need to include the jars..Where to include those Jars

You can use UDF statement REGISTER in Pig script to include the JARs (core, pig, and the Java driver). An example:

REGISTER /home/ubuntu/mongo-java-driver-3.3.0.jar;
REGISTER /home/ubuntu/mongo-hadoop-pig-2.0.1.jar;
REGISTER /home/ubuntu/mongo-hadoop-core-2.0.1.jar;

One way to get the jars is from the Maven central repository, i.e. maven: mongo-java-driver, maven: mongo-hadoop

You may also find MongoDB Pig presentation a useful reference.

Regards,

Wan

TEJASHWINI s

unread,

Sep 22, 2016, 1:01:52 AM9/22/16

to mongod...@googlegroups.com

Hi,

Thank you,but each time I run the pig script data is getting inserted twice in the mongodb. Then I tried

STORE data INTO 'mongodb://localhost:27017/db_name.coll_name' USING com.mongodb.hadoop.pig.MongoupdateStorage('', ''); It throws me as ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: Could you please help me.

Thanks & Regards,

Tejashwini

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/1e64f735-1596-4b71-b6ba-b0753ec97f57%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Wan Bachtiar

unread,

Sep 28, 2016, 3:53:33 AM9/28/16

to mongodb-user

each time I run the pig script data is getting inserted twice in the mongodb.

Hi Tejashwini,

I’ve tested loading data from Apache Pig to MongoDB using the version of jars below:

mongo-java-driver-3.3.0.jar
mongo-hadoop-pig-2.0.1.jar
mongo-hadoop-core-2.0.1.jar

I couldn’t replicate the double loading of data into MongoDB. If you are using previous versions of mongo-hadoop connector you may be encountering issue described in HADOOP-26.

You can also try to turn off the speculative execution by setting both mapred.map.speculative and mapred.reduce.speculative to false in grunt.

If you have more questions related to Hadoop/Pig speculative execution, you should post a question on StackOverflow to reach wider audience.

STORE data INTO ‘mongodb://localhost:27017/db_name.coll_name’ USING com.mongodb.hadoop.pig.MongoupdateStorage(‘’, ‘’); It throws me as ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:

You need to specify ‘query’ and ‘update’ in the form of valid JSON to use MongoUpdateStorage correctly. Please refer to mongo-connector: updating a MongoDB Collection to see example and usage guide for MongoUpdateStorage.

If you have further question relating to mongo-hadoop connector, could you provide the following:

All of the jar versions. i.e. mongo-java-driver, mongo-hadoop-pig, and mongo-hadoop-core.
MongoDB version.
Hadoop/Pig version.
A snippet of Pig script that loads the data.