Hive to hive copy between different HDP versions and from HDFS to S3

104 views
Skip to first unread message

Manasee Kamble

unread,
Oct 20, 2016, 6:22:00 AM10/20/16
to reair
Use case :

Source cluster : HDP 2.1 (Hadoop 2.4 and hive 0.13) - High availability feature enabled (Identified by nameservice Id something like hdp21). Hive table created with location HDFS eg. hdfs://hdp21/user/hive/data_reair

Destination cluster : HDP 2.4 (Hadoop 2.7 and hive 1.2.1) - High availability feature enabled (Identified by nameservice Id eg. hdp24). 

1. So does this allow such a different hadoop versions copy? Any config that needs to be done in pom file? 
2. For distcp used in code, will it be able to copy it from HDFS 2.4 to S3 storage on HDP2.4 cluster.

My observation after few trails : Working scenario is ReAir utility works when copying Hive tables where data is copied from HDFS of 1 cluster to S3 of another cluster with same HDP2.4 versions. But as HDP 2.1 doesn't support S3A we add few jars to distcp commnad using --lib-jars. So how we can achieve this in code?

Your help we be highly appreciated. If any further details are required I will be happy to provide.

Thanks in advance,
Manasee

Manasee Kamble

unread,
Oct 21, 2016, 2:34:37 AM10/21/16
to reair
Its allowing to copy in batch mode from HDFS to S3. But for incremental changes when I add hive-hooks jar into Hadoop version 2.4 it throws error:
hive> show tables;
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/airbnb/reair/hive/hooks/CliAuditLogHook : Unsupported major.minor version 52.0

This problem I see with HDP 2.1 i.e hadoop 2.4 version. So do I need to build the hive hooks jar with lower hadoop version 2.4? Should I mention that in hive-hooks module pom file?

Thanks,
Manasee

Paul Yang

unread,
Oct 21, 2016, 2:23:16 PM10/21/16
to Manasee Kamble, reair
Yeah, it sounds like building the JAR with a different version of Hadoop may address the issue. Do you have the full stack trace of the exception?

--
You received this message because you are subscribed to the Google Groups "reair" group.
To unsubscribe from this group and stop receiving emails from it, send an email to airbnb-reair+unsubscribe@googlegroups.com.
To post to this group, send email to airbnb...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/airbnb-reair/e72fca8d-bd44-4d16-aade-829dec87fc1e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Manasee Kamble

unread,
Oct 22, 2016, 4:13:20 AM10/22/16
to reair, manaseeka...@gmail.com
Full Stack trace :

hive> show databases;
Exception in thread "main" java.lang.UnsupportedClassVersionError: com/airbnb/reair/hive/hooks/CliAuditLogHook : Unsupported major.minor version 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:59)
        at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1177)
        at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1161)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1379)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1093)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:916)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:906)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

And java version on that node is : [hive@ashnee2 ~]$ java -version
openjdk version "1.8.0_101"
OpenJDK Runtime Environment (build 1.8.0_101-b13)
OpenJDK 64-Bit Server VM (build 25.101-b13, mixed mode)

I was building the utils module 1st as it is required for hive-hooks.After adding the dependencies in pom, building the module is giving error :
[ERROR] Failed to execute goal on project airbnb-reair-utils: Could not resolve dependencies for project com.airbnb:airbnb-reair-utils:jar:1.0.0: Could not find artifact jdk.tools:jdk.tools:jar:1.6 at specified path /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el6_8.x86_64/jre/../lib/tools.jar -> [Help 1]

Where it is trying to find 1.6 tools.jar in 1.8 jre.

Paul Yang

unread,
Oct 25, 2016, 8:50:31 PM10/25/16
to Manasee Kamble, reair
Actually, that error seems to be related to a different version of Java that was used to compile the class vs. the one used to run it.

Manasee Kamble

unread,
Oct 27, 2016, 12:50:14 AM10/27/16
to reair, manaseeka...@gmail.com
yes, we solved tools.jar error by pointing to the one in our jdk 1.8(mentioned in pom) and able to build utils jar.

Now issue we are facing while building hive-hooks jar is : 
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.4:compile (default-compile) on project airbnb-reair-hive-hooks: Compilation failure: Compilation failure:
[ERROR] /home/hive/reair-master/hive-hooks/src/main/java/com/airbnb/reair/hive/hooks/SessionStateLite.java:[54,49] error: cannot find symbol
[ERROR] symbol:   method getMapRedStats()
[ERROR] location: variable sessionState of type SessionState
[ERROR] /home/hive/reair-master/hive-hooks/src/main/java/com/airbnb/reair/hive/hooks/AuditCoreLogModule.java:[182,13] error: an enum switch case label must be the unqualified name of an enumeration constant
[ERROR] /home/hive/reair-master/hive-hooks/src/main/java/com/airbnb/reair/hive/hooks/AuditCoreLogModule.java:[185,39] error: cannot find symbol
[ERROR] symbol:   method getUDF()
[ERROR] location: variable e of type Entity
[ERROR] /home/hive/reair-master/hive-hooks/src/main/java/com/airbnb/reair/hive/hooks/AuditLogHookUtils.java:[169,16] error: cannot find symbol
[ERROR] symbol:   method setMapRedStats()
[ERROR] location: variable sessionState of type SessionState

Reason : The ReAir code base uses cdh5.3.3 jars which are for hadoop 2.5 but as we have hadoop 2.4 we changed the dependency of hive-exec to 0.13.0.2.1.7.0-784. And we checked this jar does not contain the above methods get/set mapRedStats(). So we also tried with cdh jar where it is able to build the hive-hooks jar but on run time throws :
hive> show databases;
FAILED: Hive Internal Error: java.lang.SecurityException(Invalid signature file digest for Manifest main attributes)
java.lang.SecurityException: Invalid signature file digest for Manifest main attributes

These methods are crucial for saving map reduce states so how should we go further?

Thanks,
Manasee

Paul Yang

unread,
Oct 27, 2016, 1:15:34 AM10/27/16
to Manasee Kamble, reair
Unfortunately, it looks like you'll have to make some code changes to work with 1.7.0. My guess is that in the later Hive versions, they changed the definition of some enums and so one of the classes is not compatible without changing the case statement.

Reply all
Reply to author
Forward
0 new messages