I'm having an issue with guava version conflicts.
I'm using Hadoop 1 (Hortonworks HDP 3.2 distribution) which has the guava-11.0.2.jar in /usr/lib/hadoop/lib.
Lingual requires guava-14. I created a "shaded" jar (maven's term for an "uber" jar) that has all the dependencies needed by Lingual, including guava-14 Lingual itself in the single jar.
But when I run it on the Hadoop system I get the error:
14/02/20 09:15:37 ERROR jdbc.LingualConnection: read catalog from: hdfs://trvlapp0049.tsh.thomson.com:8020/user/diuser/lingual/employees/.lingual/catalog
Exception in thread "main" java.sql.SQLException: java.lang.NoSuchMethodError: com.google.common.collect.Lists.newCopyOnWriteArrayList(Ljava/lang/Iterable;)Ljava/util/concurrent/CopyOnWriteArrayList;
at cascading.lingual.platform.PlatformBroker.startConnection(PlatformBroker.java:180)
at cascading.lingual.platform.hadoop.HadoopPlatformBroker.startConnection(HadoopPlatformBroker.java:126)
at cascading.lingual.jdbc.LingualConnection.initialize(LingualConnection.java:128)
at cascading.lingual.jdbc.LingualConnection.<init>(LingualConnection.java:80)The
Lists#ewCopyOnWriteArrayList(Ljava/lang/Iterable;) method is in guava12 and above, so it looks like hadoop is putting the guava11 jar in the classpath before my uber jar.
I can fix this on my personal Hadoop VM by replacing the version of guava in /usr/lib/hadoop/lib to version 14, but this is less desirable on our production Hadoop clusters (which I don't have permissions to change).
I'd rather modify the Hadoop classpath to have guava14 appear first.
I'm running the code like so:
hadoop jar ling-shaded-1.0-SNAPSHOT.jar quux00.ling.App I tried using the
-libjars switch to hadoop jar, but that only works if your MR job uses the ToolRunner, which Cascading/Lingual do not AFAIK.
There are suggestions here:
http://stackoverflow.com/a/11698561/871012 on other ways to solve this, but some of those require a fair bit of work, so I'd like ask what is the standard Cascading/Lingual way to solve this? How can I adjust the classpath when running Cascading/Lingual jobs?
When I use the lingual shell on this same Hadoop cluster, it runs just fine, spawning MR jobs and completing successfully, so why does that work, but invocations of my code fail? Does the lingual shell use the distributed cache, for example?
-Michael