Mongodb Hive Handler - Mongo URI with collection name and replication set not working

390 views
Skip to first unread message

Rohit Garg

unread,
Aug 24, 2015, 4:41:10 PM8/24/15
to mongodb-user

 I am using hive-mongo handler. https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage

Using a table like following to dump data from hive to mongo

 CREATE EXTERNAL TABLE IF NOT EXISTS t1 (
            col1 string,
            col2 string,

    ) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
    TBLPROPERTIES ( "mongo.uri" = "mongodb://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_HOST}/{MONGO_DBNAME}.{COL1}");

This works fine. But now, I have multiple servers and replica set. I am not able to create a mongo URI with replica set and table name both in it. Can someone please help :

Here is the new mongo URI I tried which doesn't work :

 TBLPROPERTIES ("mongo.uri" = "mongodb://{MONGO_USERNAME}:{MONGO_PASSWORD}@hostname1,hostname2,hostname3/{MONGO_DBNAME}.{COL1}?replicaSet=test_setName");

Can someone please tell me whats wrong with it or how to fix it?

Luke Lovett

unread,
Aug 25, 2015, 1:13:21 PM8/25/15
to mongodb-user
Hey RG,

Can you describe in more detail what's not working about your "CREATE TABLE" statement? Are there any Exceptions printed to the Hive logs when you issue the statement? What version of the Hadoop connector are you using?

Thanks,
Luke

RG

unread,
Aug 25, 2015, 3:23:17 PM8/25/15
to mongodb-user
Hi Luke,

Thanks for the reply.

Earlier I was using mongo-hadoop-hive-1.3.2.jar,mongo-hadoop-core-1.3.2.jar,mongo-java-driver-3.0.2.jar.

With them , the error was getting was 

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Unable to connect to MongoDB Output Collection.
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:473)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
	at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
	at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
	... 9 more

If I changed the mongo URI to just have one hostname , it works. But I want to connect to the whole replica set

So, something like this works : TBLPROPERTIES ("mongo.uri" = "mongodb://{MONGO_USERNAME}:{MONGO_PASSWORD}@hostname1/{MONGO_DBNAME}.{COL1}");

I just upgraded the mongo hadoop jars to mongo-hadoop-hive-1.4.0.jar,mongo-hadoop-core-1.4.0.jar

Now I am getting different error which I am investigating 

The error is :

Exception in thread "main" java.lang.NoClassDefFoundError: org/bson/BSONObject
	at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:132)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:340)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
	at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:281)
	at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:631)
	at org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:189)
	at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1018)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1328)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.getTable(BaseSemanticAnalyzer.java:1312)
	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeDropTable(DDLSemanticAnalyzer.java:804)
	at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:280)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:422)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:975)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1040)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:275)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:227)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:430)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:366)
	at org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:463)
	at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:479)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:697)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: org.bson.BSONObject
	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Message has been deleted

Luke Lovett

unread,
Aug 25, 2015, 4:05:58 PM8/25/15
to mongodb-user
I think you probably hit https://jira.mongodb.org/browse/HADOOP-179 when you were using 1.3.2 (bug that wouldn't let you use URIs that contained commas or spaces).

Now it looks like the Java driver isn't properly installed on Hive's CLASSPATH. Make sure that it's somewhere that Hive can find it (such as in hive/lib, or anywhere else configured on "hive.aux.jars.path"). Also make sure that these jars are on Hadoop's CLASSPATH (such as in $HADOOP_HOME/share/hadoop/common).

RG

unread,
Aug 25, 2015, 4:06:45 PM8/25/15
to mongodb-user
I am getting this error now :

Logging initialized using configuration in jar:file:/home/hadoop/.versions/hive-0.13.1-amzn-2/lib/hive-common-0.13.1-amzn-2.jar!/hive-log4j.properties
converting to local s3://path/mongo-hadoop-core-1.4.0.jar
Added /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-core-1.4.0.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-core-1.4.0.jar
converting to local s3://path/mongo-hadoop-hive-1.4.0.jar
Added /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-hive-1.4.0.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-hive-1.4.0.jar
converting to local s3://path/mongodb-driver-3.1.0-20150824.155105-65.jar
Added /mnt/var/lib/hive/downloaded_resources/mongodb-driver-3.1.0-20150824.155105-65.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongodb-driver-3.1.0-20150824.155105-65.jar
OK
Time taken: 1.573 seconds
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/bson/BSONObject
/mnt/var/lib/hadoop/steps/abc-1213/./hive-script:617: Error executing cmd: /usr/share/aws/emr/scripts/hive-script "--base-path" "s3n://us-east-1.elasticmapreduce/libs/hive/" "--hive-versions" "latest" "--run-hive-script" "--args" "-f" "s3://path/tmp/c57291b3-ff48-464f-a0a2-30137128c04a" "--hiveconf" "hive.exec.compress.output=false" "--hiveconf" "hive.exec.dynamic.partition=true" "--hiveconf" "mapred.job.name=tableName(jobflow_id=j-2qweDAMD, force=False, process_date=2015-07-16)" Command exiting with ret '1'

Not sure how to fix it.

Luke Lovett

unread,
Aug 25, 2015, 4:10:37 PM8/25/15
to mongodb-user
Ah, I think I see what the issue is. You need to use the "uber jar" for the Java driver. Check out the bottom section on this page: http://mongodb.github.io/mongo-java-driver/3.0/driver/getting-started/installation-guide/. Look for the jar called "mongo-java-driver.jar," not "mongodb-driver.jar".

RG

unread,
Aug 25, 2015, 4:38:11 PM8/25/15
to mongodb-user
Thanks Luke for the reply. I changed it to use mongo-java-driver.jar but its still the same error

Logging initialized using configuration in jar:file:/home/hadoop/.versions/hive-0.13.1-amzn-2/lib/hive-common-0.13.1-amzn-2.jar!/hive-log4j.properties
converting to local s3://path/mongo-hadoop-core-1.4.0.jar
Added /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-core-1.4.0.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-core-1.4.0.jar
converting to local s3://path/mongo-hadoop-hive-1.4.0.jar
Added /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-hive-1.4.0.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongo-hadoop-hive-1.4.0.jar
converting to local s3://path/mongo-java-driver-3.0.2.jar
Added /mnt/var/lib/hive/downloaded_resources/mongo-java-driver-3.0.2.jar to class path
Added resource: /mnt/var/lib/hive/downloaded_resources/mongo-java-driver-3.0.2.jar
OK
Time taken: 1.445 seconds
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/bson/BSONObject
/mnt/var/lib/hadoop/steps/s-7dasf3e/./hive-script:617: Error executing cmd: /usr/share/aws/emr/scripts/hive-script "--base-path" "s3n://us-east-1.elasticmapreduce/libs/hive/" "--hive-versions" "latest" "--run-hive-script" "--args" "-f" "s3://path/tmp/29f6c81a-5abc-4130-8d4a-4460f985a032" "--hiveconf" "hive.exec.compress.output=false" "--hiveconf" "hive.exec.dynamic.partition=true" "--hiveconf" "mapred.job.name=tableName(jobflow_id=j-adsafc, force=False, process_date=2015-07-16)" "--hiveconf" "mapred.reduce.taskCommand exiting with ret '1'
~                    

Not sure whats going on. The same code was working with 1.3.2 and one hostname 
Message has been deleted

Luke Lovett

unread,
Aug 26, 2015, 7:21:41 PM8/26/15
to mongodb-user
A ClassNotFoundException shouldn't have anything to do with version 1.3.2 vs version 1.4.0 of the connector, especially since the class not found is part of the Java driver. Perhaps this different behavior has to do with testing this out with different queries? Is this error happening with simple queries (no filter or computation) like SELECT * FROM table_name ? Or is this only happening when triggering MapReduce jobs? If the latter case, these jars are probably still missing from Hadoop's CLASSPATH. Add them to the HADOOP_CLASSPATH environment variable.

Are you launching Hive from a script of some kind? Are any of these setting Hive or Hadoop's classpath unintentionally?

RG

unread,
Aug 27, 2015, 6:02:00 PM8/27/15
to mongodb-user
Here is the code to how exactly copy to mongo task looks like


        add jar s3://path/mongo-hadoop-core-1.3.2.jar; 
        add jar s3://path/mongo-hadoop-hive-1.3.2.jar; 
        add jar s3://path/mongo-java-driver-3.0.2.jar; 
        DROP TABLE IF EXISTS MongTable;  
        CREATE EXTERNAL TABLE IF NOT EXISTS MongTable (
                date string,
                token string,
                uniques INT
                
        ) STORED BY "com.mongodb.hadoop.hive.MongoStorageHandler"
        TBLPROPERTIES ("mongo.uri" = "mongodb://{MONGO_USERNAME}:{MONGO_PASSWORD}@{MONGO_HOST}/{MONGO_DBNAME}.{MONGO_COLLECTION_NAME}?replicaSet={MONGO_REPLICA_SET}");

    
        CREATE EXTERNAL TABLE IF NOT EXISTS sourceTable (
                date string,
                token string,
                uniques INT
        ) PARTITIONED BY (day STRING)
        ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
        LOCATION 's3://path/{MONGO_COLLECTION_NAME}/';
        ALTER TABLE sourceTable ADD IF NOT EXISTS PARTITION (day='{DATE}') LOCATION 's3://path/{MONGO_COLLECTION_NAME}/day={DATE}';
        FROM sourceTable
        INSERT OVERWRITE TABLE MongTable  
        SELECT date,token,uniques
        where day='{DATE}'; 
        """.format(**self.params())

In the above code, i use 1.3.2.jar and it works perfect fine.

If the change the code slightly just to use new jars i.e. 1.4.0., it throws the class not found error I posted before. Caused by: java.lang.ClassNotFoundException: org.bson.BSONObject

HADOOP_CLASSPATH is set correctly and 1.3.2 jars works just fine with the same exact code. I have these hive jobs handled by luigi (https://github.com/spotify/luigi) and I have a task written in python which launches cluster and then these hive tasks are run on it.

Not sure whats going wrong. Its a simple jar upgrade.
Reply all
Reply to author
Forward
0 new messages