hadoop2 support

114 views
Skip to first unread message

Alex Cozzi

unread,
Oct 18, 2013, 5:35:51 PM10/18/13
to scoob...@googlegroups.com
We got a new cluster running HortonWorks' hadoop 2 version and I needed to do some simple changes to get scoobi to run correctly on it. Not sure whether you are looking forward to support another branch, but the differences are quite small (see the attached patch)

When I tried to run jobs using the cdh4 branch I got the rather strange behavior of having the job run without any error, creating the output directories but not creating any data in it. Rather puzzling. 
cdh3 version fails throwing an exception about a class being replaced by an interface.

Alex


hadoop2.patch

Eric Torreborre

unread,
Oct 20, 2013, 11:24:57 PM10/20/13
to scoob...@googlegroups.com
Hi Alex,

I've taken your patch and incorporated it to our master branch by augmenting our "Compatibility" object.

This object tries to make the API uniform across cdh3, cdh4 and Hadoop V2.

When I tried to run jobs using the cdh4 branch I got the rather strange behavior of having the job run without any error, creating the output directories but not creating any data in it. Rather puzzling. 

It is possible that your test project didn't work with Hadoop2 because some constants have changed (for example mapred.cache.files), but it might be for an entirely different reason.


cdh3 version fails throwing an exception about a class being replaced by an interface.

This means that you must still have the cdh4 jars on your classpath.

I unfortunately don't have much time to devote to testing Hadoop V2 but I'd be happy if you could pursue in that direction, so I incorporated your changes into master and pushed master so that:

 - setting the version to 0.8.0-cdh3 should have the cdh3 dependencies and use the cdh3 classes for JobContext, etc... (not interfaces)
 - setting the version to 0.8.0-cdh4 should have the cdh4 dependencies and use the cdh4 interfaces (for JobContext etc...)
 - setting the version to 0.8.0-hadoop2 should have the hadoop 2.1 dependencies (from HortonWorks) and use the cdh4 interfaces 

This is all a bit messy but if we can keep just not too big Compatibility class making Scoobi work across CDH3, CDH4 and Horton 2.1 that will be great.

E.

Alex Cozzi

unread,
Oct 22, 2013, 4:01:24 PM10/22/13
to scoob...@googlegroups.com
awesome!

Eric Torreborre

unread,
Oct 22, 2013, 8:00:22 PM10/22/13
to scoob...@googlegroups.com
It is published now. Thanks for trying it out and sending me back the logs of what doesn't work, I'd be surprised if everything worked out of the box :-).

Alex Cozzi

unread,
Oct 23, 2013, 2:44:45 PM10/23/13
to scoob...@googlegroups.com
Sorry, I see only cdh3 and cdh4 versions there:

0.8.0-SNAPSHOT/Wed Oct 23 06:09:40 CDT 2013  
0.8.0-cdh3-SNAPSHOT/Wed Oct 23 06:11:04 CDT 2013  
0.8.0-cdh4-SNAPSHOT/Sat Oct 19 06:08:30 CDT 2013

Eric Torreborre

unread,
Oct 23, 2013, 7:12:27 PM10/23/13
to scoob...@googlegroups.com

Alex Cozzi

unread,
Oct 30, 2013, 1:22:37 AM10/30/13
to scoob...@googlegroups.com

I tested it on our cluster and it works! Thanks.










Eric Torreborre

unread,
Oct 30, 2013, 2:28:09 AM10/30/13
to scoob...@googlegroups.com
That's good news. I have also now enabled our Jenkins server to publish `hadoop2` versions for 0.8.0-SNAPSHOT automatically.

E.

Alex Cozzi

unread,
Nov 13, 2013, 6:38:12 PM11/13/13
to scoob...@googlegroups.com

Now that hadoop2 went final I found another problem: 


Exception in thread "main" java.lang.NoSuchMethodException: org.apache.hadoop.mapreduce.Job.getJobClient()

        at java.lang.Class.getDeclaredMethod(Class.java:1937)

        at com.nicta.scoobi.impl.reflect.Classes$class.invokeProtected(Classes.scala:135)

        at com.nicta.scoobi.impl.reflect.Classes$.invokeProtected(Classes.scala:165)

        at com.nicta.scoobi.impl.exec.TaskDetailsLogger.getJobClient$lzycompute(MapReduceJob.scala:314)

        at com.nicta.scoobi.impl.exec.TaskDetailsLogger.getJobClient(MapReduceJob.scala:314)

        at com.nicta.scoobi.impl.exec.TaskDetailsLogger.com$nicta$scoobi$impl$exec$TaskDetailsLogger$$getTaskCompletionEvents(MapReduceJob.scala:308)

        at com.nicta.scoobi.impl.exec.TaskDetailsLogger$$anonfun$logTaskCompletionDetails$1.apply(MapReduceJob.scala:285)

        at com.nicta.scoobi.impl.exec.TaskDetailsLogger$$anonfun$logTaskCompletionDetails$1.apply(MapReduceJob.scala:285)



Essentially what happened is that they got rid of getJobClient :-(
I found this issue:
I am looking into a workaround and will keep you posted, but I am open to suggestions. 
I also have a patch to bring the scoobi build to the latests version of hadoop2, but it will not work without fixing the getJobClient problem.

--- a/project/dependencies.scala
+++ b/project/dependencies.scala
@@ -38,13 +38,13 @@ object dependencies {
     "org.apache.commons"                %  "commons-compress"          % "1.0"              % "test")
 
   def hadoop(version: String) =
-    if (version.contains("hadoop2")) Seq("org.apache.hadoop" % "hadoop-common"                     % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-hdfs"                       % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-mapreduce-client-app"       % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-mapreduce-client-core"      % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-mapreduce-client-core"      % "2.1.0.2.0.5.0-67",
-                                         "org.apache.hadoop" % "hadoop-annotations"                % "2.1.0.2.0.5.0-67",
+    if (version.contains("hadoop2")) Seq("org.apache.hadoop" % "hadoop-common"                     % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-hdfs"                       % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-mapreduce-client-app"       % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-mapreduce-client-core"      % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-mapreduce-client-core"      % "2.2.0.2.0.6.0-76",
+                                         "org.apache.hadoop" % "hadoop-annotations"                % "2.2.0.2.0.6.0-76",
                                          "org.apache.avro"   % "avro-mapred"                       % "1.7.4")
     else if (version.contains("cdh3")) Seq("org.apache.hadoop" % "hadoop-core"   % "0.20.2-cdh3u1",
                                            "org.apache.avro"   % "avro-mapred"   % "1.7.4")











Alex Cozzi

unread,
Nov 14, 2013, 1:51:41 PM11/14/13
to scoob...@googlegroups.com

I found a workaround by changing line 307 in MapReduceJob.scala:

private def getTaskCompletionEvents(index: Int) =  job.getTaskCompletionEvents(index)
 

I am actually wondering why this straightforward implementation is not used in hadoop 1 as well?
Alex










Eric Torreborre

unread,
Nov 19, 2013, 5:13:18 PM11/19/13
to scoob...@googlegroups.com
Hi Alex, 

Thanks for working on this. We are very busy at the moment on non-Scoobi stuff. 

I hope to get some time to work on this next week.

E.
Reply all
Reply to author
Forward
0 new messages