Could anyone plz tell me how to load Pig data to MongoDB .
Hi Tejashwini,
Please see mongo-hadoop: Pig usage to get started with MongoDB and Apache Pig integration.
For example to insert directly into a MongoDB collection from Pig:
STORE data INTO 'mongodb://localhost:27017/db_name.coll_name' USING com.mongodb.hadoop.pig.MongoInsertStorage('', '');
If I need to include the jars..Where to include those Jars
You can use UDF statement REGISTER in Pig script to include the JARs (core, pig, and the Java driver). An example:
REGISTER /home/ubuntu/mongo-java-driver-3.3.0.jar;
REGISTER /home/ubuntu/mongo-hadoop-pig-2.0.1.jar;
REGISTER /home/ubuntu/mongo-hadoop-core-2.0.1.jar;
One way to get the jars is from the Maven central repository, i.e. maven: mongo-java-driver, maven: mongo-hadoop
You may also find MongoDB Pig presentation a useful reference.
Regards,
Wan
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/1e64f735-1596-4b71-b6ba-b0753ec97f57%40googlegroups.com.
each time I run the pig script data is getting inserted twice in the mongodb.
Hi Tejashwini,
I’ve tested loading data from Apache Pig to MongoDB using the version of jars below:
I couldn’t replicate the double loading of data into MongoDB. If you are using previous versions of mongo-hadoop
connector you may be encountering issue described in HADOOP-26.
You can also try to turn off the speculative execution by setting both mapred.map.speculative
and mapred.reduce.speculative
to false
in grunt
.
If you have more questions related to Hadoop/Pig speculative execution, you should post a question on StackOverflow to reach wider audience.
STORE data INTO ‘mongodb://localhost:27017/db_name.coll_name’ USING com.mongodb.hadoop.pig.MongoupdateStorage(‘’, ‘’); It throws me as ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse:
You need to specify ‘query’ and ‘update’ in the form of valid JSON to use MongoUpdateStorage
correctly. Please refer to mongo-connector: updating a MongoDB Collection to see example and usage guide for MongoUpdateStorage
.
If you have further question relating to mongo-hadoop
connector, could you provide the following:
Regards,
Wan.