Cascading 2.6.3 on cloudera hadoop 2.5.0-cdh5.3.0

330 views
Skip to first unread message

dubo...@avast.com

unread,
Mar 23, 2015, 8:56:36 AM3/23/15
to cascadi...@googlegroups.com
Hi cascading users,

  I try to run part1 of Cascading for Impatient on our cluster. We have cloudera 2.5.0-cdh5.3.0 installed there. I have changed cascading and hadoop versions in build.gradle to

ext.cascadingVersion = '2.6.3'
ext.hadoopVersion = '2.5.0-cdh5.3.0'

  and then compiled, uploaded and run on cluster with

hadoop jar impatient.jar /input/path /output/path

  which gave me an error of:

15/03/23 13:32:35 ERROR flow.FlowStep: [] unable to load platform specific class, please verify Hadoop cluster version: 'Hadoop:2.5.0-mr1-cdh5.3.0:Apache', matches the Hadoop platform build dependency and associated FlowConnector, cascading-hadoop or cascading-hadoop2-mr1
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/TaskCompletionEvent
...

  This TaskCompletionEvent is in hadoop-mapreduce-client-core artifact. That is marked as provided in gradle.build. But it is not on classpath of hadoop executable just on classpath of distributed job run afterwards. So I tried to change it from provided to compile. That solved NoClassDefFoundError but created VerifyError:

Exception in thread "main" java.lang.VerifyError: Inconsistent stackmap frames at branch target 34
Exception Details:
  Location:
    cascading/stats/hadoop/HadoopStepStats.addAttemptsToTaskStats(Ljava/util/Map;Z)V @34: aload
  Reason:
    Type '[Lorg/apache/hadoop/mapred/TaskCompletionEvent;' (current frame, locals[5]) is not assignable to '[Lorg/apache/hadoop/mapreduce/TaskCompletionEvent;' (stack map, locals[5])

  ...

  Could you please point to right direction? Is cdh5.3.0 supported by cascading?

  Thank you in advance

  Jakub Dubovsky

Andre Kelpe

unread,
Mar 23, 2015, 12:04:10 PM3/23/15
to cascadi...@googlegroups.com
Hi,

CDH 5 has been certified with Cascading:
http://www.cascading.org/support/compatibility/

Do you get the same result when running the application with "yarn jar
...." instead of hadoop jar?

- Andre
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/a9964151-a769-40cf-a921-9b3260a8eac4%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Dubovský, Jakub

unread,
Mar 23, 2015, 1:04:27 PM3/23/15
to cascadi...@googlegroups.com
I see the same reaction when I run it with yarn instead of hadoop.

I added the code for extracting URLs from classloader to see what jars ClassLoader is looking into. In both hadoop and yarn case I see this list:

file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/
file:/home/dubovsky/impatient.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/classes
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/guava-14.0.1.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/cascading-core-2.6.3.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/janino-2.7.5.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/cascading-hadoop2-mr1-2.6.3.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/cascading-local-2.6.3.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/commons-compiler-2.7.5.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/jgrapht-jdk1.6-0.8.1.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/riffle-0.1-dev.jar
file:/tmp/hadoop-dubovsky/hadoop-unjar7546198077987338391/lib/slf4j-api-1.7.2.jar

  Those are just jars from gradle build of sample code. That means there is no other jars provided by hadoop runtime during this preparation phase. But the code in sample project are compiled agains hadoop-mapreduce-client-core dependency which is marked as providedCompile. So I wonder why all cascading examples says we should use "hadoop jar code.jar ..." to run the code...

  if there is any info I can provide or test to run I am happy to do it. Just ask...
  Maybe I will also ask our dev-ops as well.

  thanks for response

  J.


You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/ejFycv95YTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Andre Kelpe

unread,
Mar 23, 2015, 1:09:02 PM3/23/15
to cascadi...@googlegroups.com
That is exactly what it should be. Your hadoop runtime should provide
the rest of the classpath.

Could you put the output of hadoop classpath into a gist and share it
with us? All jars in there should be on the classpath on the client
side when the app launches.

- Andre
> https://groups.google.com/d/msgid/cascading-user/CANzMGC3vyp953u%3DwhN4mtqoajd8r9HUidteeR-w3Cr%3DMfVstow%40mail.gmail.com.

Dubovský, Jakub

unread,
Mar 23, 2015, 2:22:32 PM3/23/15
to cascadi...@googlegroups.com
ok so let's start from begining. My steps:

1) git clone https://github.com/Cascading/Impatient.git

2) change of versions in build.gradle to


ext.cascadingVersion = '2.6.3'
ext.hadoopVersion = '2.5.0-cdh5.3.0'

3) gradle clean jar

4) upload build/libs/impatient.jar  to cluster

5) hadoop jar impatient.jar /input/path /output/path

here is output of the command

6) hadoop classpath | tr : '\n'

here is output of classpath command

7) find `hadoop classpath | tr : ' '` -name '*.jar'

here is output of find command

There is hadoop-mapreduce-client-core-2.5.0-cdh5.3.0.jar among them which is the one the Impatient jar was compiled against. But I guess this will be on classpath of mappers and reducers but not on classpath of the code which submits the job...

  any thoughts?

  J.

Andre Kelpe

unread,
Mar 23, 2015, 4:46:30 PM3/23/15
to cascadi...@googlegroups.com
Hi,

I just downloaded the binary distro from cloudera and gave it a try:

$ git diff build.gradle
diff --git a/part1/build.gradle b/part1/build.gradle
index 28c0a09..139b71e 100644
--- a/part1/build.gradle
+++ b/part1/build.gradle
@@ -30,11 +30,12 @@ repositories {
mavenLocal()
mavenCentral()
maven{ url 'http://conjars.org/repo/' }
+ maven{ url 'https://repository.cloudera.com/artifactory/cloudera-repos/' }
}

def fluidVersion = '1.0.0'
-def cascadingVersion = '2.6.1'
-def hadoopVersion = '2.4.1'
+def cascadingVersion = '2.6.3'
+def hadoopVersion = '2.5.0-cdh5.3.0'

dependencies {
compile( group: 'cascading', name: 'fluid-api', version: fluidVersion )


$ $HOME/tools/gradle-1.11/bin/gradle clean jar
Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true
:part1:clean
:part1:compileJava
:part1:processResources UP-TO-DATE
:part1:classes
:part1:jar

BUILD SUCCESSFUL

Total time: 7.205 secs


$ hadoop jar build/libs/impatient.jar data/rain.txt out
Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true
15/03/23 13:43:08 INFO util.HadoopUtil: resolving application jar from
found main method on: impatient.Main
15/03/23 13:43:08 INFO planner.HadoopPlanner: using application jar:
/Users/akelpe/code/Impatient/part1/build/libs/impatient.jar
15/03/23 13:43:08 INFO property.AppProps: using app.id:
C61B73E2C4F14F29BC0F8E7459187E98
15/03/23 13:43:09 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/03/23 13:43:09 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/23 13:43:09 INFO Configuration.deprecation:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
15/03/23 13:43:09 INFO Configuration.deprecation: mapred.job.tracker
is deprecated. Instead, use mapreduce.jobtracker.address
15/03/23 13:43:09 INFO Configuration.deprecation:
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress
15/03/23 13:43:09 INFO util.Version: Concurrent, Inc - Cascading 2.6.3
15/03/23 13:43:09 INFO flow.Flow: [] starting
15/03/23 13:43:09 INFO flow.Flow: [] source:
Hfs["TextDelimited[['doc_id', 'text']]"]["data/rain.txt"]
15/03/23 13:43:09 INFO flow.Flow: [] sink:
Hfs["TextDelimited[['doc_id', 'text']]"]["out"]
15/03/23 13:43:09 INFO flow.Flow: [] parallel execution is enabled: false
15/03/23 13:43:09 INFO flow.Flow: [] starting jobs: 1
15/03/23 13:43:09 INFO flow.Flow: [] allocating threads: 1
15/03/23 13:43:09 INFO flow.FlowStep: [] starting step: (1/1) out
15/03/23 13:43:09 INFO Configuration.deprecation: session.id is
deprecated. Instead, use dfs.metrics.session-id
15/03/23 13:43:09 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
15/03/23 13:43:09 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
15/03/23 13:43:09 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/23 13:43:09 INFO mapreduce.JobSubmitter: number of splits:1
15/03/23 13:43:09 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_local1927426705_0001
15/03/23 13:43:09 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/staging/akelpe1927426705/.staging/job_local1927426705_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
15/03/23 13:43:09 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/staging/akelpe1927426705/.staging/job_local1927426705_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
15/03/23 13:43:10 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/local/localRunner/akelpe/job_local1927426705_0001/job_local1927426705_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
15/03/23 13:43:10 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/local/localRunner/akelpe/job_local1927426705_0001/job_local1927426705_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
15/03/23 13:43:10 INFO mapreduce.Job: The url to track the job:
http://localhost:8080/
15/03/23 13:43:10 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/03/23 13:43:10 INFO flow.FlowStep: [] submitted hadoop job:
job_local1927426705_0001
15/03/23 13:43:10 INFO flow.FlowStep: [] tracking url: http://localhost:8080/
15/03/23 13:43:10 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Starting task:
attempt_local1927426705_0001_m_000000_0
15/03/23 13:43:10 INFO util.ProcfsBasedProcessTree:
ProcfsBasedProcessTree currently is supported only on Linux.
15/03/23 13:43:10 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/03/23 13:43:10 INFO io.MultiInputSplit: current split input path:
file:/Users/akelpe/code/Impatient/part1/data/rain.txt
15/03/23 13:43:10 INFO mapred.MapTask: Processing split:
cascading.tap.hadoop.io.MultiInputSplit@14fdaa89
15/03/23 13:43:10 INFO mapred.MapTask: numReduceTasks: 0
15/03/23 13:43:10 INFO hadoop.FlowMapper: cascading version: 2.6.3
15/03/23 13:43:10 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
15/03/23 13:43:10 INFO Configuration.deprecation:
mapred.task.partition is deprecated. Instead, use
mapreduce.task.partition
15/03/23 13:43:10 INFO hadoop.FlowMapper: sourcing from:
Hfs["TextDelimited[['doc_id', 'text']]"]["data/rain.txt"]
15/03/23 13:43:10 INFO hadoop.FlowMapper: sinking to:
Hfs["TextDelimited[['doc_id', 'text']]"]["out"]
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
15/03/23 13:43:10 INFO mapred.Task:
Task:attempt_local1927426705_0001_m_000000_0 is done. And is in the
process of committing
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
15/03/23 13:43:10 INFO mapred.Task: Task
attempt_local1927426705_0001_m_000000_0 is allowed to commit now
15/03/23 13:43:10 INFO output.FileOutputCommitter: Saved output of
task 'attempt_local1927426705_0001_m_000000_0' to
file:/Users/akelpe/code/Impatient/part1/out/_temporary/0/task_local1927426705_0001_m_000000
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
file:/Users/akelpe/code/Impatient/part1/data/rain.txt:0+510
15/03/23 13:43:10 INFO mapred.Task: Task
'attempt_local1927426705_0001_m_000000_0' done.
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Finishing task:
attempt_local1927426705_0001_m_000000_0
15/03/23 13:43:10 INFO mapred.LocalJobRunner: map task executor complete.
15/03/23 13:43:15 INFO util.Hadoop18TapUtil: deleting temp path out/_temporary


$ cat out/*
doc_id text
doc01 A rain shadow is a dry area on the lee back side of a mountainous area.
doc02 This sinking, dry air produces a rain shadow, or area in the
lee of a mountain with less rain and cloudcover.
doc03 A rain shadow is an area of dry land that lies on the leeward
(or downwind) side of a mountain.
doc04 This is known as the rain shadow effect and is the primary
cause of leeward deserts of mountain ranges, such as California's
Death Valley.
doc05 Two Women. Secrets. A Broken Land. [DVD Australia]

No problem at all. Something must be wrong with your cluster install.
I am no expert on cloudera, but maybe some components have not been
installed. Note that we use the mapred compatibility in hadoop for
Cascading.

- Andre
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/CANzMGC2-LHLph_ocx1P%2BiMgia3ZA5NGz3A%2B4jSygn5671oO3gQ%40mail.gmail.com.

Dubovský, Jakub

unread,
Mar 25, 2015, 4:17:58 AM3/25/15
to cascadi...@googlegroups.com
Hi,

  thank you for your time! I asked our dev-ops guys. I'll come back with info how it went...

  J.


You received this message because you are subscribed to a topic in the Google Groups "cascading-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/cascading-user/ejFycv95YTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to cascading-use...@googlegroups.com.

To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

Nguyen Huu Nam Duong

unread,
Jun 18, 2015, 7:31:19 PM6/18/15
to cascadi...@googlegroups.com, dubo...@avast.com
I have the same problem. @Jakub: Could you update how you solve the problem?

dubo...@avast.com

unread,
Jun 19, 2015, 6:08:13 AM6/19/15
to cascadi...@googlegroups.com
Hey, it is nice to hear about somebody with the same problem. Unfortunately I do not yet have the problem solved. It is because we have some other priorities to work on. But I have this issue still in list. I will post update info when I know something new. But that might be weeks...

  Please share some news on your side as well!

  J.
Reply all
Reply to author
Forward
0 new messages