CDH4.2 Support

133 views
Skip to first unread message

Gary Malouf

unread,
May 6, 2013, 6:19:32 PM5/6/13
to scoobi...@googlegroups.com
I keep getting the following when trying to load data using the latest Scoobi:

Exception in thread "main" java.io.IOException: Input path hdfs://localhost:8020/valid/path/to/file does not exist.
    at com.nicta.scoobi.io.sequence.SequenceInput$SeqSource$$anonfun$inputCheck$1.apply(SequenceInput.scala:142)
    at com.nicta.scoobi.io.sequence.SequenceInput$SeqSource$$anonfun$inputCheck$1.apply(SequenceInput.scala:138)

This is running from the name node hadoop CLI.  Is this a compatibility issue between the CDH4.0.x that I compile against vs the 4.2 runtime?


Eric Torreborre

unread,
May 6, 2013, 6:48:00 PM5/6/13
to scoobi...@googlegroups.com
Hi Gary,

I don't think that this is a compatibility issue. This check is just doing:

getFileStatus(path, pathFilter).size > 0

where pathFilter is:

  new PathFilter {
    def accept(p: Path): Boolean = !p.getName.startsWith("_") && !p.getName.startsWith(".")
  }

So the question is really what Hadoop's FileStatus is for your input file on this node. Is the path really correct?

E.

Gary Malouf

unread,
May 6, 2013, 6:58:58 PM5/6/13
to scoobi...@googlegroups.com
Hi Eric,

On the path portion, the 'hadoop -ls' function shows the valid path.  My only thought is that somehow the hdfs://localhost:8020 portion is wrong.  Guess I will have to keep trying different combos.

-Gary


--
You received this message because you are subscribed to a topic in the Google Groups "scoobi-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scoobi-users/ampuS3cibOk/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to scoobi-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Gary Malouf

unread,
May 6, 2013, 7:21:00 PM5/6/13
to scoobi...@googlegroups.com
Hi Eric,

So even with the fully qualified name the file exists using the hadoop -l command.  Would scoobi be picky about the file being owned by the flume user rather than the hadoop one?

-Gary

Gary Malouf

unread,
May 6, 2013, 8:26:22 PM5/6/13
to scoobi...@googlegroups.com
Correction: for 'hadoop fs -ls' the path exists but scoobi can not seem to find it.  We see nothing in the hadoop/hdfs logs as the job gets cut off before it starts. 

Eric Torreborre

unread,
May 7, 2013, 4:54:51 AM5/7/13
to scoobi...@googlegroups.com
Can you try to write some Java code to narrow the issue down?

The "checking" code that is executed is:

org.apache.hadoop.fs.FileSystem.get(path.toUri, configuration).globStatus(new Path(path, "*"), pathFilter)

Can you please check if you get anything back when you do that?

Also the problem might be in your configuration object. What is the property for fs.defaultFS? Should it be something else?

E.

Gary Malouf

unread,
May 7, 2013, 9:57:00 AM5/7/13
to scoobi...@googlegroups.com
Hi Eric:

When I run that global status command, I get:

Global status: ()


The setting for defaultFS seems to point to our name node properly:

hdfs://nn-01:8020


Pretty mysterious, as the file clearly exists.

Gary Malouf

unread,
May 7, 2013, 11:14:35 AM5/7/13
to scoobi...@googlegroups.com
Update:

Digging through the job tracker logs, we noticed the following:

2013-05-07 14:08:53,105 WARN org.apache.hadoop.ipc.Server: Incorrect header or version mismatch from {Our IP}:36893 got version 6 expected version 7

This hints at a compatbility issue between CDH4.0.1 and CDH4.2

Eric Torreborre

unread,
May 13, 2013, 2:02:16 AM5/13/13
to scoobi...@googlegroups.com
Hi Gary,

Do you mean that Scoobi should be recompiled against CDH4.2 or that you might have your client library on 4.0.1 while the cluster is on 4.2?

I'm going to discuss with Ben to see what it would mean for us to officially upgrade to CDH4.2.

E.

Gary Malouf

unread,
May 13, 2013, 7:56:25 AM5/13/13
to scoobi...@googlegroups.com
I had to drop Scoobi and use the straight java api to get my app working in the immediate term.  At this point, I was providing the CDH4.2 jars (and excluding the 4.0.1) ones at runtime.  The next thing I meant to try was to compile against 4.2 but that is currently on hold.

-Gary

Lionel Herbet

unread,
Jun 4, 2013, 8:11:06 AM6/4/13
to scoobi...@googlegroups.com
It won't really help you but for information i ran on a CDH4.3 cluster, a job based on scoobi-7.0-RC1 and compiled against CDH4.2.3 jars and it works so far.

Lionel

Lionel Herbet

unread,
Jun 4, 2013, 8:18:00 AM6/4/13
to scoobi...@googlegroups.com
Sorry it was CHD4.2.1 jars and i did not recompile scoobi against CDH4.2.1 jars just my job.

Ross MacLeod

unread,
Jun 4, 2013, 9:30:38 AM6/4/13
to scoobi...@googlegroups.com
I have a branch of scoobi 0.6.2 that I used successfully against CDH4.2.1. I've since downgraded to CDH4.1.1 for unrelated reasons, but so far as I know the branch works:


The top commit was all I needed to do to make CDH4.2.1 work.

-Ross

You received this message because you are subscribed to the Google Groups "scoobi-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scoobi-users...@googlegroups.com.

Gary Malouf

unread,
Jun 6, 2013, 9:23:50 AM6/6/13
to scoobi...@googlegroups.com
Thanks Ross - hopefully some of this gets merged into main scoobi project soon as well - I will give it a shot.

Gary Malouf

unread,
Jun 6, 2013, 11:06:55 AM6/6/13
to scoobi...@googlegroups.com
Question - we are still using map reduce 1 on CDH 4.2.1.  It appears the jars you are including are for working with Yarn - which changes need to be made to use MR-1?

Ross MacLeod

unread,
Jun 6, 2013, 1:43:55 PM6/6/13
to scoobi...@googlegroups.com
I'm not totally sure, but I'd guess it's something along the lines of replace these:

"org.apache.hadoop" % "hadoop-common" % "2.0.0-cdh4.2.0" exclude("commons-daemon", "commons-daemon"),
"org.apache.hadoop" % "hadoop-hdfs" % "2.0.0-cdh4.2.0" exclude("commons-daemon", "commons-daemon"),
"org.apache.hadoop" % "hadoop-mapreduce-client-jobclient" % "2.0.0-cdh4.2.0",

with:

"org.apache.hadoop" % "hadoop-common" % "2.0.0-cdh4.2.0" exclude("commons-daemon", "commons-daemon"),
"org.apache.hadoop" % "hadoop-core" % "2.0.0-mr1-cdh4.2.0" exclude("commons-daemon", "commons-daemon"),

I think hadoop-core is the MRv1 Mapreduce stuff, and hadoop-mapreduce-client-jobclient is the MRv2 stuff.

-Ross


Reply all
Reply to author
Forward
0 new messages