Check that JARs for hive datasets are on the classpath (and they are)

1,184 views
Skip to first unread message

Dominik Hübner

unread,
Jul 16, 2015, 2:29:50 AM7/16/15
to cdk...@cloudera.org
I am not able to load hive datasets and get the exception as shown below. I printed the classpath of the application running and the hive libraries are there.
The app was build with kite-data-hive, kite-data-mapreduce, kite-data-core and hadoop-client. I do not have issues working with hdfs datasets. 

I run the following:
HIVE_HOME=/usr/hdp/2.2.4.2-2/hive/ HCAT_HOME=/usr/hdp/2.2.4.2-2/hive-hcatalog/ YARN_USER_CLASSPATH=/usr/hdp/2.2.4.2-2/hive/lib/*:target/logging-1.1.0.jar yarn jar target/logging-1.1.0.jar org.kitesdk.examples.logging.CreateDataset
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.4.2-2/hive/lib/hive-jdbc-0.14.0.2.2.4.2-2-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/kite-examples/logging/target/logging-1.1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI pattern: dataset:hive:/tmp/data/default/events
Check that JARs for hive datasets are on the classpath
at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:128)
at org.kitesdk.data.Datasets.create(Datasets.java:228)
at org.kitesdk.data.Datasets.create(Datasets.java:307)
at org.kitesdk.data.Datasets.create(Datasets.java:335)
at org.kitesdk.examples.logging.CreateDataset.run(CreateDataset.java:36)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.kitesdk.examples.logging.CreateDataset.main(CreateDataset.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)


I am sure that there is something missing as the kite cli tool works perfect with both hive and hdfs.

Joey Echeverria

unread,
Jul 16, 2015, 11:37:20 AM7/16/15
to Dominik Hübner, cdk...@cloudera.org
If you add or edit a log4j.properties file in your classpath with the following:

log4j.logger.org.kitesdk.data.spi.Registration = DEBUG

You'll get a more detailed reason why the Hive URI wasn't registered.

-Joey
> --
> You received this message because you are subscribed to the Google Groups
> "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdk-dev+u...@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.



--
Joey Echeverria
Director of Engineer

Ryan Blue

unread,
Jul 16, 2015, 6:17:03 PM7/16/15
to Dominik Hübner, cdk...@cloudera.org
The CLI probably works because the startup script creates the classpath
for you. My guess is that this doesn't because that isn't happening and
you're missing the Hive dependencies, like hive-exec and hive-metastore.

You should be able to move on with this by updating the classpath using
the information you get from Joey's suggestion. I think we typically
recommend running this example from Maven, though. That way maven
populates the classpath for you. Another way us to use `mvn
dependency:copy-dependencies` and use the target/dependency/ directory
as your classpath.

You should alternatively be able to do the dataset setup with the CLI
instead of this program.

rb
> --
> You received this message because you are subscribed to the Google
> Groups "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cdk-dev+u...@cloudera.org
> <mailto:cdk-dev+u...@cloudera.org>.
Ryan Blue
Software Engineer
Cloudera, Inc.

Dominik Hübner

unread,
Jul 16, 2015, 7:31:30 PM7/16/15
to cdk...@cloudera.org, dominikhue...@gmail.com
Thanks a lot for all the help!
I finally got it running by checking how the classpaths were setup by the cli tool. 
Using those my little test application can access the hive datasets.

Damian Smith

unread,
Apr 14, 2016, 4:44:42 PM4/14/16
to CDK Development, dominikhue...@gmail.com
Hi Dominik - 
Could you post what the final classpath was?
We are evaluating this tool here at Cigna and are running into this same issue.
We have set HIVE_HOME and HCAT_HOME env vars - included all the hive jar's in our spark-submit command and still no joy.
I've downloaded the sources jar for the CLI tool and am looking through but can't see what hive jars we don't have - 

export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive

export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog



/opt/app/dev/spark-1.6.1/bin/spark-submit --verbose --jars $LIB/hive-jdbc-1.1.0-cdh5.5.2.jar,$LIB/hive-common-1.1.0-cdh5.5.2.jar,$LIB/hive-service-1.1.0-cdh5.5.2.jar,$LIB/hive-serde-1.1.0-cdh5.5.2.jar,$LIB/hive-metastore-1.1.0-cdh5.5.2.jar,$LIB/hive-shims-0.23-1.1.0-cdh5.5.2.jar,$LIB/hive-shims-1.1.0-cdh5.5.2.jar,$LIB/hive-exec-1.1.0-cdh5.5.2.jar,$LIB/hive-shims-common-1.1.0-cdh5.5.2.jar,$LIB/hive-shims-scheduler-1.1.0-cdh5.5.2.jar,$LIB/commons-logging-1.2.jar,$LIB/kite-data-core-1.0.0.jar,$LIB/kite-data-mapreduce-1.1.0.jar,$LIB/kite-hadoop-compatibility-1.0.0.jar,$LIB/metrics-core-2.2.0.jar,$LIB/kafka-clients-0.8.2.1.jar,$LIB/kafka_2.10-0.8.2.1.jar,$LIB/zkclient-0.3.jar,$LIB/datanucleus-core-3.2.2.jar,$LIB/datanucleus-api-jdo-3.2.1.jar,$LIB/datanucleus-rdbms-3.2.1.jar,$LIB/hbase-common-1.0.0-cdh5.5.2.jar,$LIB/hbase-client-1.0.0-cdh5.5.2.jar,$LIB/hbase-protocol-1.0.0-cdh5.5.2.jar,$LIB/htrace-core4-4.0.1-incubating.jar,$LIB/avro-tools-1.7.6-cdh5.5.2.jar,$LIB/hbase-annotations-1.0.0-cdh5.5.2.jar,$LIB/accumulo-core-1.6.0.jar,$LIB/hbase-server-1.0.0-cdh5.5.2.jar,$LIB/avro-mapred-1.7.6-cdh5.5.2-hadoop2.jar,$LIB/htrace-core-3.2.0-incubating.jar --queue g_hadoop_d_developers --master yarn-cluster --files $FILES --conf "spark.driver.extraJavaOptions=-XX:MaxPermSize=4G -XX:+UseConcMarkSweepGC -Dlog4j.configuration=log4j-elt.properties" --conf "spark.sql.tungsten.enabled=false" --conf "spark.eventLog.dir=hdfs://nameservice1/user/spark/applicationHistory" --conf "spark.eventLog.enabled=true" --conf "spark.sql.codegen=false" --conf "spark.sql.unsafe.enabled=false" --conf "spark.executor.extraJavaOptions=-XX:+UseConcMarkSweepGC -Dlog4j.configuration=log4j-elt.properties" --conf "spark.streaming.backpressure.enabled=true" --conf "spark.locality.wait=1s" --conf "spark.cores.max=12" --conf "spark.streaming.blockInterval=1500ms" --class com.cigna.damian.AvroRecordReceiver /home/damian/ELT2Hive-0.10.jar

Reply all
Reply to author
Forward
0 new messages