hadoop2 support

Alex Cozzi

unread,

Oct 18, 2013, 5:35:51 PM10/18/13

to scoob...@googlegroups.com

We got a new cluster running HortonWorks' hadoop 2 version and I needed to do some simple changes to get scoobi to run correctly on it. Not sure whether you are looking forward to support another branch, but the differences are quite small (see the attached patch)

When I tried to run jobs using the cdh4 branch I got the rather strange behavior of having the job run without any error, creating the output directories but not creating any data in it. Rather puzzling.

cdh3 version fails throwing an exception about a class being replaced by an interface.

Alex

hadoop2.patch

Eric Torreborre

unread,

Oct 20, 2013, 11:24:57 PM10/20/13

to scoob...@googlegroups.com

Hi Alex,

I've taken your patch and incorporated it to our master branch by augmenting our "Compatibility" object.

This object tries to make the API uniform across cdh3, cdh4 and Hadoop V2.

> When I tried to run jobs using the cdh4 branch I got the rather strange behavior of having the job run without any error, creating the output directories but not creating any data in it. Rather puzzling.

It is possible that your test project didn't work with Hadoop2 because some constants have changed (for example mapred.cache.files), but it might be for an entirely different reason.

> cdh3 version fails throwing an exception about a class being replaced by an interface.

This means that you must still have the cdh4 jars on your classpath.

I unfortunately don't have much time to devote to testing Hadoop V2 but I'd be happy if you could pursue in that direction, so I incorporated your changes into master and pushed master so that:

- setting the version to 0.8.0-cdh3 should have the cdh3 dependencies and use the cdh3 classes for JobContext, etc... (not interfaces)

- setting the version to 0.8.0-cdh4 should have the cdh4 dependencies and use the cdh4 interfaces (for JobContext etc...)

- setting the version to 0.8.0-hadoop2 should have the hadoop 2.1 dependencies (from HortonWorks) and use the cdh4 interfaces

This is all a bit messy but if we can keep just not too big Compatibility class making Scoobi work across CDH3, CDH4 and Horton 2.1 that will be great.

E.

Alex Cozzi

unread,

Oct 22, 2013, 4:01:24 PM10/22/13

to scoob...@googlegroups.com

awesome!

I will try it out as soon as it get published at https://oss.sonatype.org/content/repositories/snapshots/com/nicta/scoobi_2.10/

Eric Torreborre

unread,

Oct 22, 2013, 8:00:22 PM10/22/13

to scoob...@googlegroups.com

It is published now. Thanks for trying it out and sending me back the logs of what doesn't work, I'd be surprised if everything worked out of the box :-).

Alex Cozzi

unread,

Oct 23, 2013, 2:44:45 PM10/23/13

to scoob...@googlegroups.com

Sorry, I see only cdh3 and cdh4 versions there:

0.8.0-SNAPSHOT/	Wed Oct 23 06:09:40 CDT 2013
0.8.0-cdh3-SNAPSHOT/	Wed Oct 23 06:11:04 CDT 2013
0.8.0-cdh4-SNAPSHOT/	Sat Oct 19 06:08:30 CDT 2013

Eric Torreborre

unread,

Oct 23, 2013, 7:12:27 PM10/23/13

to scoob...@googlegroups.com

Sorry my bad, it's up now: https://oss.sonatype.org/service/local/staging/deploy/maven2/com/nicta/scoobi_2.10/0.8.0-hadoop2/scoobi_2.10-0.8.0-hadoop2.jar.

E.

Alex Cozzi

unread,

Oct 30, 2013, 1:22:37 AM10/30/13

to scoob...@googlegroups.com

I tested it on our cluster and it works! Thanks.

Eric Torreborre

unread,

Oct 30, 2013, 2:28:09 AM10/30/13

to scoob...@googlegroups.com

That's good news. I have also now enabled our Jenkins server to publish `hadoop2` versions for 0.8.0-SNAPSHOT automatically.

E.

Alex Cozzi

unread,

Nov 13, 2013, 6:38:12 PM11/13/13

to scoob...@googlegroups.com

Now that hadoop2 went final I found another problem:

Exception in thread "main" java.lang.NoSuchMethodException: org.apache.hadoop.mapreduce.Job.getJobClient()

at java.lang.Class.getDeclaredMethod(Class.java:1937)

at com.nicta.scoobi.impl.reflect.Classes$class.invokeProtected(Classes.scala:135)

at com.nicta.scoobi.impl.reflect.Classes$.invokeProtected(Classes.scala:165)

at com.nicta.scoobi.impl.exec.TaskDetailsLogger.getJobClient$lzycompute(MapReduceJob.scala:314)

at com.nicta.scoobi.impl.exec.TaskDetailsLogger.getJobClient(MapReduceJob.scala:314)

at com.nicta.scoobi.impl.exec.TaskDetailsLogger.com$nicta$scoobi$impl$exec$TaskDetailsLogger$$getTaskCompletionEvents(MapReduceJob.scala:308)

at com.nicta.scoobi.impl.exec.TaskDetailsLogger$$anonfun$logTaskCompletionDetails$1.apply(MapReduceJob.scala:285)

Essentially what happened is that they got rid of getJobClient :-(

I found this issue:

https://issues.apache.org/jira/browse/MAPREDUCE-5215

I am looking into a workaround and will keep you posted, but I am open to suggestions.

I also have a patch to bring the scoobi build to the latests version of hadoop2, but it will not work without fixing the getJobClient problem.

--- a/project/dependencies.scala

+++ b/project/dependencies.scala

@@ -38,13 +38,13 @@ object dependencies {

"org.apache.commons" % "commons-compress" % "1.0" % "test")

def hadoop(version: String) =

- if (version.contains("hadoop2")) Seq("org.apache.hadoop" % "hadoop-common" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-hdfs" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-mapreduce-client-app" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.1.0.2.0.5.0-67",

- "org.apache.hadoop" % "hadoop-annotations" % "2.1.0.2.0.5.0-67",

+ if (version.contains("hadoop2")) Seq("org.apache.hadoop" % "hadoop-common" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-mapreduce-client-app" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-mapreduce-client-core" % "2.2.0.2.0.6.0-76",

+ "org.apache.hadoop" % "hadoop-annotations" % "2.2.0.2.0.6.0-76",

"org.apache.avro" % "avro-mapred" % "1.7.4")

else if (version.contains("cdh3")) Seq("org.apache.hadoop" % "hadoop-core" % "0.20.2-cdh3u1",

"org.apache.avro" % "avro-mapred" % "1.7.4")

Alex Cozzi

unread,

Nov 14, 2013, 1:51:41 PM11/14/13

to scoob...@googlegroups.com

I found a workaround by changing line 307 in MapReduceJob.scala:

private def getTaskCompletionEvents(index: Int) = job.getTaskCompletionEvents(index)

I am actually wondering why this straightforward implementation is not used in hadoop 1 as well?

Alex

Eric Torreborre

unread,

Nov 19, 2013, 5:13:18 PM11/19/13

to scoob...@googlegroups.com

Hi Alex,

Thanks for working on this. We are very busy at the moment on non-Scoobi stuff.

I hope to get some time to work on this next week.

E.

Reply all

Reply to author

Forward