mongoDB connector to Hadoop

61 views
Skip to first unread message

hadar bey

unread,
Sep 10, 2017, 4:40:06 PM9/10/17
to mongodb-user
hi
i am a beginner that try to connect mongo to hadoop
i try to work by this tutorial:
(i saw the same steps in other guides), But I got the following error:

super29@lab11-rd29-05:/usr/local/hadoop$ hadoop jar ImportWeblogsFromMongo.jar

Exception in thread "main" java.io.IOException: Error opening job jar: ImportWeblogsFromMongo.jar
at org.apache.hadoop.util.RunJar.run(RunJar.java:173)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:221)
at java.util.zip.ZipFile.<init>(ZipFile.java:151)
at java.util.jar.JarFile.<init>(JarFile.java:154)
at java.util.jar.JarFile.<init>(JarFile.java:91)
at org.apache.hadoop.util.RunJar.run(RunJar.java:171)
... 1 more


how to fix that?? 

Wan Bachtiar

unread,
Oct 9, 2017, 9:08:52 PM10/9/17
to mongodb-user

Hi Hadar,

It’s been a while since you posted this question, have you found an answer ?

Based on the stack trace error that you posted, it’s likely that your hadoop command failed to find or have access to ImportWeblogsFromMongo.jar.

Also worth noting that if you are following the tutorial link you posted, you’re using an older version of MongoDB Hadoop Connector. The current release version of mongo-hadoop is v2.0.2.

I would also review WIKI: MapReduce Usage from the mongo-hadoop project page for more information and guidance.

If you still have further question on this, please also include:

  • Hadoop version that you’re using
  • MongoDB Hadoop connector that you’re using
  • Snippet code example, along with pom.xml file content.
  • MongoDB example input documents.

Regards,
Wan.

Akshesh Doshi

unread,
May 28, 2018, 8:00:18 AM5/28/18
to mongodb-user
Hi Wan

Thank you for your help in the previous mail.


I am trying the MongoDB-Hadoop connector to store my data in HDFS and query it using MongoDB.

I followed the steps given in the MongoDB documentation & this getting started tutorial successfully. I also see the data in MongoDB when I run db.yield_historical.in.find() as shown in the article. But when I run hdfs dfs -ls / to check if the data has been stored in Hadoop HDFS I am NOT able to find any new data.

Is there anything that I am doing wrong here or anything that I've missed. I would really appreciate if you could help me here as I seem to have got stuck and there seems to be no step-wise guide on the web on how to use/configure the connector - I would really like to know if there's any article I can refer to (else I'll write one myself if I'm able to solve this riddle).

If it helps, I am using Hadoop 2.7.3.2.6.3.0-235 with MongoDB v3.6.5

Wan Bachtiar

unread,
May 31, 2018, 2:15:05 AM5/31/18
to mongodb-user

I also see the data in MongoDB when I run db.yield_historical.in.find() as shown in the article. But when I run hdfs dfs -ls / to check if the data has been stored in Hadoop HDFS I am NOT able to find any new data.

Hi Akshesh,

The example listed on Getting Started with Hadoop is showcasing mongo-hadoop: Treasury Yield Example. The first task of the example is importing the provided yield_historical_in.json into MongoDB. This is why you can see data when you run db.yield_historical.in.find().

I am trying the MongoDB-Hadoop connector to store my data in HDFS and query it using MongoDB.

The MongoDB Connector for Hadoop is a library which allows MongoDB to be used as an input source or output destination, for Hadoop MapReduce tasks; not to write data to HDFS. You can utilise module org.apache.hadoop.fs.FileSystem to create a stream and write to HDFS.

If you have further questions relating to MongoDB, please open a new discussion thread with the following information:

  • MongoDB Hadoop connector version that you’re using
  • Snippet code
  • Any error messages that you’re seeing

If you have further questions relating to Hadoop however, please post a question on StackOverflow: Hadoop to reach wider audience with Hadoop expertise.

Regards,
Wan.

Akshesh Doshi

unread,
Jun 3, 2018, 6:02:20 AM6/3/18
to mongodb-user
Thank you for your time and insights, Wan! I'll look further into this and see if this is the right solution to my problem.

Regards
Akshesh

Akshesh Doshi

unread,
Jun 7, 2018, 8:00:37 AM6/7/18
to mongodb-user
Reply all
Reply to author
Forward
0 new messages