Guava version conflicts with Lingual and Hadoop 1

496 views
Skip to first unread message

Michael Peterson

unread,
Feb 20, 2014, 11:15:25 AM2/20/14
to lingua...@googlegroups.com
I'm having an issue with guava version conflicts.

I'm using Hadoop 1 (Hortonworks HDP 3.2 distribution) which has the guava-11.0.2.jar in /usr/lib/hadoop/lib.

Lingual requires guava-14.  I created a "shaded" jar (maven's term for an "uber" jar) that has all the dependencies needed by Lingual, including guava-14 Lingual itself in the single jar.
       
But when I run it on the Hadoop system I get the error:

        14/02/20 09:15:37 ERROR jdbc.LingualConnection:     read catalog from: hdfs://trvlapp0049.tsh.thomson.com:8020/user/diuser/lingual/employees/.lingual/catalog
Exception in thread "main" java.sql.SQLException: java.lang.NoSuchMethodError: com.google.common.collect.Lists.newCopyOnWriteArrayList(Ljava/lang/Iterable;)Ljava/util/concurrent/CopyOnWriteArrayList;
    at cascading.lingual.platform.PlatformBroker.startConnection(PlatformBroker.java:180)
    at cascading.lingual.platform.hadoop.HadoopPlatformBroker.startConnection(HadoopPlatformBroker.java:126)
    at cascading.lingual.jdbc.LingualConnection.initialize(LingualConnection.java:128)
    at cascading.lingual.jdbc.LingualConnection.<init>(LingualConnection.java:80)



The Lists#ewCopyOnWriteArrayList(Ljava/lang/Iterable;) method is in guava12 and above, so it looks like hadoop is putting the guava11 jar in the classpath before my uber jar.

I can fix this on my personal Hadoop VM by replacing the version of guava in /usr/lib/hadoop/lib to version 14, but this is less desirable on our production Hadoop clusters (which I don't have permissions to change).

I'd rather modify the Hadoop classpath to have guava14 appear first.

I'm running the code like so:

    hadoop jar ling-shaded-1.0-SNAPSHOT.jar quux00.ling.App               

I tried using the -libjars switch to hadoop jar, but that only works if your MR job uses the ToolRunner, which Cascading/Lingual do not AFAIK.

There are suggestions here: http://stackoverflow.com/a/11698561/871012 on other ways to solve this, but some of those require a fair bit of work, so I'd like ask what is the standard Cascading/Lingual way to solve this?  How can I adjust the classpath when running Cascading/Lingual jobs?

When I use the lingual shell on this same Hadoop cluster, it runs just fine, spawning MR jobs and completing successfully, so why does that work, but invocations of my code fail?  Does the lingual shell use the distributed cache, for example?

-Michael

Andre Kelpe

unread,
Feb 20, 2014, 11:25:08 AM2/20/14
to lingua...@googlegroups.com
Please try setting mapreduce.user.classpath.first=true in your JobConf. That should force hadoop to use your CLASSPATH before it's own.

-- André


--
You received this message because you are subscribed to the Google Groups "Lingual User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lingual-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Michael Peterson

unread,
Feb 20, 2014, 11:33:38 AM2/20/14
to lingua...@googlegroups.com
Oh, cool. I didn't know about that setting.

However, the example I'm trying uses the Lingual JDBC driver like so:

  public void lingualInstallExample() throws Exception {
    Class.forName( "cascading.lingual.jdbc.Driver" );
    Connection connection = DriverManager.getConnection(
        "jdbc:lingual:hadoop;schema=employees;resultPath=empout");
   
    Statement statement = connection.createStatement();

    ResultSet resultSet =
        statement.executeQuery(
            "select * from \"EMPLOYEES\".\"TITLES\" where title = 'Engineer'");

    resultSet.close();
    statement.close();
    connection.close();
  }

How would I set a JobConf in this context?  Can you provide a code example or link to an example?  Or would I have to use the more traditional FlowDef / FlowConnector model?

Andre Kelpe

unread,
Feb 20, 2014, 11:43:49 AM2/20/14
to lingua...@googlegroups.com
You can put it into /user/<yourname>/.lingual/config/default.properties in your lingual catalog on HDFS.

- André

Julian Hyde

unread,
Feb 20, 2014, 12:47:37 PM2/20/14
to lingua...@googlegroups.com
If jar shading & skullduggery doesn’t work, FYI, we just fixed this issue in Optiq master.


Julian

Joe Posner

unread,
Feb 20, 2014, 1:37:40 PM2/20/14
to lingua...@googlegroups.com
Jar shading does work to resolve this and that's how we get Guava 14
working for lingual.

For people writing who want to use lingual as a JDBC driver directly
like you do, we provide the lingual-hadoop-*-jdbc.jar since that
contains lingual and all its dependencies in shaded form. If you
change your dependency to that rather than the
lingual-hadoop-*-hadoop.jar you should be good.

Michael Peterson

unread,
Feb 20, 2014, 2:09:10 PM2/20/14
to lingua...@googlegroups.com
Thanks everyone for the input, but I still don't have it working.

So far specifying the properties in /user/<yourname>/.lingual/config/default.properties is not working for me.

If I replace guava-11 with guava-14 in /usr/lib/hadoop/lib my code example works, so I know the system is set up correctly except for this issue.  When I swap guava-11 back into /usr/lib/hadoop/lib and then change default.properties like so, it fails with the same guava error message:

$ hadoop fs -cat .lingual/config/default.properties
# place all default properties here, for example
# some.property=someValue
mapreduce.user.classpath.first=true


In a fit of desperation I also tried:

$ hcat /user/diuser/.lingual/config/default.properties
# place all default properties here, for example
# some.property=someValue
mapreduce.user.classpath.first=true
mapreduce.task.classpath.user.precedence=true
mapreduce.job.user.classpath.first=true

(based on: http://stackoverflow.com/questions/11685949/overriding-default-hadoop-jars-in-class-path)

But that also fails.  See anything I'm doing wrong?

I'm going to try Julian's suggestion next.

And question for Joe:
When I search conjars I don't see any jars called lingual-hadoop-*-jdbc.  I searched by "lingual" and by "jdbc".  I also don't know what the star in lingual-hadoop-*-hadoop.jar is referring to.  My pom has lingual-hadoop, lingual-core and lingual-platform.  Could you provide a link to the jdbc jars you are referring or the maven xml to use?

Thanks again,
Michael

Joe Posner

unread,
Feb 20, 2014, 2:14:13 PM2/20/14
to lingua...@googlegroups.com
The * was just my shorthand way of saying "whatever the latest version is"

At the moment, it's
http://conjars.org/repo/cascading/lingual-hadoop/1.0.3-wip-320/lingual-hadoop-1.0.3-wip-320-jdbc.jar

We keep all old releases on conjars so that link will always work. But
there will likely be a 1.0.3-wip-321 release, as well as 1.1 releases
so we encourage people to use the latest.

Michael Peterson

unread,
Feb 20, 2014, 2:33:53 PM2/20/14
to lingua...@googlegroups.com
Excellent - that's great news!

But when I pull down the transitive dependencies of the latest version in conjars (http://conjars.org/net.hydromatic/optiq-core/versions/0.4.12.3) it still depends on guava-14, even though that version was pushed this morning to conjars (by Joe Posner, it looks like).  Is that issue (#141) going to be in a later release or should it be in 0.4.12.3?  If a later release, I guess I'll just need to build it locally?

Thanks,
Michael

Joe Posner

unread,
Feb 20, 2014, 2:49:29 PM2/20/14
to lingua...@googlegroups.com
I'd suggest you start with trying out the lingual JDBC jar. It's got a
compatible mix of optiq, guava, included and shaded as appropriate.

Regarding your general question on optiq, Lingual doesn't use the
latest optiq version. The 0.4.12.3 release isn't the latest optiq
release, its just the latest maintenance release of the 0.4.12 series.
Julian was talking about a fix in the master release of optiq which is
on 0.4.18.

Andre Kelpe

unread,
Feb 20, 2014, 4:13:35 PM2/20/14
to lingua...@googlegroups.com
On Thu, Feb 20, 2014 at 8:09 PM, Michael Peterson <quu...@gmail.com> wrote:

And question for Joe:
When I search conjars I don't see any jars called lingual-hadoop-*-jdbc.  I searched by "lingual" and by "jdbc".  I also don't know what the star in lingual-hadoop-*-hadoop.jar is referring to.  My pom has lingual-hadoop, lingual-core and lingual-platform.  Could you provide a link to the jdbc jars you are referring or the maven xml to use?
 

The problem is, that clojars (the repo, that conjars.org is based upon) does not support searching for classifiers. "jdbc" is a classifier in this case. I opened an issue 6 month ago, but nobody had time to look at it yet: https://github.com/ato/clojars-web/issues/162
 
- André

Michael Peterson

unread,
Feb 20, 2014, 4:32:19 PM2/20/14
to lingua...@googlegroups.com
Hi Joe,

You are right - using this semi-shaded "jdbc" jar does work.  I call it "semi-shaded", because I see that jar lacks guava entirely, so you must be compiling in a older version of optiq that only needs guava-11?

In any case, thanks.  This will do the trick until Julian's latest changes get pulled into the Lingual transitive dependencies.

It would probably be good to put this information somewhere on the Lingual main documentation page for the time being.  I spent about 8 hours total yesterday and today getting this resolved.

-Michael

Julian Hyde

unread,
Feb 20, 2014, 4:40:04 PM2/20/14
to lingua...@googlegroups.com
On Feb 20, 2014, at 1:32 PM, Michael Peterson <quu...@gmail.com> wrote:

In any case, thanks.  This will do the trick until Julian's latest changes get pulled into the Lingual transitive dependencies.

Andre, Joe,

Can you quickly review my fix so you know it will work when you do integrate it. I use a range of versions:

    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <!-- Version 11.0.2 is possible (Hadoop uses it); 16.0 is preferred. -->
      <version>[11.0.2,]</version>
    </dependency>

so projects that integrate with Optiq can use guava-11.0.2 or guava-16 or anything in between. It looks bad in conjars: go to http://conjars.org/net.hydromatic/optiq-parent and click on the link "com.google.guava/guava [11.0.2,]". But I think that's just a cosmetic issue.

Julian

Michael Peterson

unread,
Feb 20, 2014, 5:07:10 PM2/20/14
to lingua...@googlegroups.com
I see. That explains why "jdbc" comes after the version number.  Thanks for the clarification.

-Michael

Joe Posner

unread,
Feb 20, 2014, 5:16:28 PM2/20/14
to lingua...@googlegroups.com
Glad that's working for you. I'll add clarification to the docs; I
feel your pain since I spent many hours setting up that shading. :-)

You're basically correct in calling it "semi-shaded." For performance
reasons we don't include jars that are runtime provided by all the
Hadoop distros on our compatibility list (
http://www.cascading.org/support/compatibility/ ). We do include and
shade jars where our code, or any libs we use, need a different
version than is always there.

So the goal behind the JDBC jar is that when we sign off on it we're
signing off on it managing internal transitive dependencies like that
for you.

Joe Posner

unread,
Feb 21, 2014, 6:05:33 PM2/21/14
to lingua...@googlegroups.com
FYI:
https://github.com/Cascading/lingual/issues/17 (assigned to me) is the
bug to clarify the documentation on the JDBC jar, shading, and what to
use when.

Joe Posner

unread,
Feb 24, 2014, 1:22:15 PM2/24/14
to lingua...@googlegroups.com
Julian,

A quick smoke test of that new build file works: if I change the
pom.xml for the 0.4.12.4-SNAPSHOT to match that it gets a dependency
list showing optiq pointing at the 14.0.1 version of Guava that
Lingual includes and passing unit tests.
Reply all
Reply to author
Forward
0 new messages