Cascading + libjars = ClassNotFoundException. Sometimes

187 views
Skip to first unread message

Sasha Ovsankin

unread,
Jul 25, 2013, 12:32:02 PM7/25/13
to cascadi...@googlegroups.com

I am running Cascading (actually Scalding) hadoop job that puts dependency jars into DistributedCache.

Fist time it works fine (meaning that the jars are there on HDFS and the classpath is set up correctly) but next run it starts failing with ClassNotFoundException:

java.io.IOException: Split class cascading.tap.hadoop.io.MultiInputSplit not found
    at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:387)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:412)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.ClassNotFoundException: cascading.tap.hadoop.io.MultiInputSplit
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
    at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:385)
    ...

Did anybody else have success with Cascading and jars in the DistributedCache?

This message seems to imply that Cascading has some internal handling of the distributed cache jars. Any light you can shed on this?

Thanks,

-- Sasha

cross-posted to http://stackoverflow.com/questions/17861614/cascading-libjars-classnotfoundexception-sometimes


Chris K Wensel

unread,
Jul 25, 2013, 12:58:20 PM7/25/13
to cascadi...@googlegroups.com
Current Cascading releases perform zero classloading, and make no concessions for how jars are loaded in the the child jvm. Your errors are Hadoop related.

Cascading 2.2 WIP adds a new API allowing for dynamic classloading of additional jars in the child jvm (not the client jvm) via the distributed cache. Even in this case, Cascading will not perform any classloading (in Hadoop mode), it just notifies the platform/hadoop of the additional dependencies.

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


Sasha Ovsankin

unread,
Jul 26, 2013, 9:46:00 AM7/26/13
to cascadi...@googlegroups.com
Thanks Chris for clarification. This was problem in my code.
Reply all
Reply to author
Forward
0 new messages