Cascading and CDH4 (Hadoop 0.23)

406 views
Skip to first unread message

Jeremy Bennett

unread,
Jun 8, 2012, 12:23:52 AM6/8/12
to cascading-user
CDH4 was released earlier this week! We setup Yarn (M/R v2),
recompiled our Cascading 1.2.4 M/R jobs with CDH4 Hadoop 0.23 and
tried to execute. However, we get the following exception that we did
not get under CDH3.

java.io.IOException: Split class cascading.tap.hadoop.MultiInputSplit
not found
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:350)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:375)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:334)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:152)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:
1232)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:147)
Caused by: java.lang.ClassNotFoundException: Class
cascading.tap.hadoop.MultiInputSplit not found
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:
1350)
at org.apache.hadoop.mapred.MapTask.getSplitDetails(MapTask.java:348)

It appears that it is not able to locate the cascading jars bundled in
our M/R job jars.

A tvf shows that the jars are bundled in the job jar, and they
actually run fine on CDH3 M/R.

0 Wed Jun 06 17:08:00 PDT 2012 META-INF/
192 Wed Jun 06 17:07:58 PDT 2012 META-INF/MANIFEST.MF
0 Wed Jun 06 17:08:00 PDT 2012 lib/
535037 Mon Sep 05 15:22:40 PDT 2011 lib/cascading-core-1.2.4.jar
234759 Thu May 05 17:47:02 PDT 2011 lib/jgrapht-jdk1.6-0.8.1.jar
11351 Thu May 05 17:47:00 PDT 2011 lib/riffle-0.1-dev.jar
616904 Thu May 05 17:47:02 PDT 2011 lib/janino-2.5.16.jar
101784 Mon Sep 05 15:22:40 PDT 2011 lib/cascading-xml-1.2.4.jar
90023 Mon Sep 05 15:22:40 PDT 2011 lib/tagsoup-1.2.jar

Has anyone else experience this?

Thanks,

Jeremy

Chris K Wensel

unread,
Jun 8, 2012, 11:07:44 AM6/8/12
to cascadi...@googlegroups.com
Cascading only supports stable Apache Hadoop releases (1.0.x) on the stable APIs

Horton has made some comments on this that are relevant.

http://hortonworks.com/blog/balancing-community-innovation-and-enterprise-stability/

ckw
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
ch...@concurrentinc.com
http://concurrentinc.com

Andrew Purtell

unread,
Jun 8, 2012, 12:54:03 PM6/8/12
to cascadi...@googlegroups.com
CDH4 includes a "mr1" package that is a port of the MR from Hadoop 1.x
onto Hadoop 2.x core and HDFS. Try using that instead of YARN.

- Andy
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet
Hein (via Tom White)

Jeremy Bennett

unread,
Jun 8, 2012, 1:24:27 PM6/8/12
to cascadi...@googlegroups.com
Thanks Andy.  That is exactly what I was researching.  This may allow us use CDH4 and not need to rewrite our Cascading jobs.  I assume we will still need to recompile our jobs with CDH4 to get them to work with M/R v1 in CDH4.
>> To post to this group, send email to cascading-user@googlegroups.com.
>> To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>>
>
> --
> Chris K Wensel
> ch...@concurrentinc.com
> http://concurrentinc.com
>
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascading-user@googlegroups.com.
> To unsubscribe from this group, send email to cascading-user+unsubscribe@googlegroups.com.

Andrew Purtell

unread,
Jun 8, 2012, 2:26:46 PM6/8/12
to cascadi...@googlegroups.com
On Fri, Jun 8, 2012 at 10:24 AM, Jeremy Bennett
<jeremy.w...@gmail.com> wrote:
> I assume we will still
> need to recompile our jobs with CDH4 to get them to work with M/R v1 in
> CDH4.

I can't say for certain but I would certainly do that anyway.

David McNeil

unread,
Sep 27, 2012, 3:31:08 PM9/27/12
to cascadi...@googlegroups.com
We also are investigating running a Cascading application on Cloudera CDH4 using MRv1. Has anyone been successful doing this? If so do you have any tips or links to a procedure to follow?

Thanks.
-David

Ken Krugler

unread,
Sep 27, 2012, 3:46:08 PM9/27/12
to cascadi...@googlegroups.com
On Sep 27, 2012, at 12:31pm, David McNeil wrote:

We also are investigating running a Cascading application on Cloudera CDH4 using MRv1. Has anyone been successful doing this? If so do you have any tips or links to a procedure to follow?

I know Philippe Laflamme was running into an issue getting all of the Cascading unit tests to pass. He had a post back in June that said:

There seems to be an issue with serialization when using Cascading 2.0 on CDH4:


I opened an issue at Cloudera, you can contribute what you know or vote for it here:


I'm not sure what's happening, but it seems like the Cascading serializers aren't registered correctly when the job moves around the cluster.

The last comment by (I think) somebody from Cloudera suggests disabling AvroReflectSerialization:


-- Ken

Philippe Laflamme

unread,
Sep 27, 2012, 4:17:29 PM9/27/12
to cascadi...@googlegroups.com
Here's a link to the latest CDH4  + Cascading 2.0 situation we have:


The flows we're running are fairly complex but they don't mix Dfs and Lfs.

Hope that helps,
Philippe

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To post to this group, send email to cascadi...@googlegroups.com.
To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages