Hi,
I just downloaded the binary distro from cloudera and gave it a try:
$ git diff build.gradle
diff --git a/part1/build.gradle b/part1/build.gradle
index 28c0a09..139b71e 100644
--- a/part1/build.gradle
+++ b/part1/build.gradle
@@ -30,11 +30,12 @@ repositories {
mavenLocal()
mavenCentral()
maven{ url '
http://conjars.org/repo/' }
+ maven{ url '
https://repository.cloudera.com/artifactory/cloudera-repos/' }
}
def fluidVersion = '1.0.0'
-def cascadingVersion = '2.6.1'
-def hadoopVersion = '2.4.1'
+def cascadingVersion = '2.6.3'
+def hadoopVersion = '2.5.0-cdh5.3.0'
dependencies {
compile( group: 'cascading', name: 'fluid-api', version: fluidVersion )
$ $HOME/tools/gradle-1.11/bin/gradle clean jar
Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true
:part1:clean
:part1:compileJava
:part1:processResources UP-TO-DATE
:part1:classes
:part1:jar
BUILD SUCCESSFUL
Total time: 7.205 secs
$ hadoop jar build/libs/impatient.jar data/rain.txt out
Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true
15/03/23 13:43:08 INFO util.HadoopUtil: resolving application jar from
found main method on: impatient.Main
15/03/23 13:43:08 INFO planner.HadoopPlanner: using application jar:
/Users/akelpe/code/Impatient/part1/build/libs/impatient.jar
15/03/23 13:43:08 INFO property.AppProps: using
app.id:
C61B73E2C4F14F29BC0F8E7459187E98
15/03/23 13:43:09 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
15/03/23 13:43:09 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/23 13:43:09 INFO Configuration.deprecation:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
15/03/23 13:43:09 INFO Configuration.deprecation: mapred.job.tracker
is deprecated. Instead, use mapreduce.jobtracker.address
15/03/23 13:43:09 INFO Configuration.deprecation:
mapred.output.compress is deprecated. Instead, use
mapreduce.output.fileoutputformat.compress
15/03/23 13:43:09 INFO util.Version: Concurrent, Inc - Cascading 2.6.3
15/03/23 13:43:09 INFO flow.Flow: [] starting
15/03/23 13:43:09 INFO flow.Flow: [] source:
Hfs["TextDelimited[['doc_id', 'text']]"]["data/rain.txt"]
15/03/23 13:43:09 INFO flow.Flow: [] sink:
Hfs["TextDelimited[['doc_id', 'text']]"]["out"]
15/03/23 13:43:09 INFO flow.Flow: [] parallel execution is enabled: false
15/03/23 13:43:09 INFO flow.Flow: [] starting jobs: 1
15/03/23 13:43:09 INFO flow.Flow: [] allocating threads: 1
15/03/23 13:43:09 INFO flow.FlowStep: [] starting step: (1/1) out
15/03/23 13:43:09 INFO Configuration.deprecation:
session.id is
deprecated. Instead, use dfs.metrics.session-id
15/03/23 13:43:09 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
15/03/23 13:43:09 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics
with processName=JobTracker, sessionId= - already initialized
15/03/23 13:43:09 INFO mapred.FileInputFormat: Total input paths to process : 1
15/03/23 13:43:09 INFO mapreduce.JobSubmitter: number of splits:1
15/03/23 13:43:09 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_local1927426705_0001
15/03/23 13:43:09 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/staging/akelpe1927426705/.staging/job_local1927426705_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
15/03/23 13:43:09 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/staging/akelpe1927426705/.staging/job_local1927426705_0001/job.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
15/03/23 13:43:10 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/local/localRunner/akelpe/job_local1927426705_0001/job_local1927426705_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.retry.interval; Ignoring.
15/03/23 13:43:10 WARN conf.Configuration:
file:/tmp/hadoop-akelpe/mapred/local/localRunner/akelpe/job_local1927426705_0001/job_local1927426705_0001.xml:an
attempt to override final parameter:
mapreduce.job.end-notification.max.attempts; Ignoring.
15/03/23 13:43:10 INFO mapreduce.Job: The url to track the job:
http://localhost:8080/
15/03/23 13:43:10 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/03/23 13:43:10 INFO flow.FlowStep: [] submitted hadoop job:
job_local1927426705_0001
15/03/23 13:43:10 INFO flow.FlowStep: [] tracking url:
http://localhost:8080/
15/03/23 13:43:10 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Waiting for map tasks
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Starting task:
attempt_local1927426705_0001_m_000000_0
15/03/23 13:43:10 INFO util.ProcfsBasedProcessTree:
ProcfsBasedProcessTree currently is supported only on Linux.
15/03/23 13:43:10 INFO mapred.Task: Using ResourceCalculatorProcessTree : null
15/03/23 13:43:10 INFO io.MultiInputSplit: current split input path:
file:/Users/akelpe/code/Impatient/part1/data/rain.txt
15/03/23 13:43:10 INFO mapred.MapTask: Processing split:
cascading.tap.hadoop.io.MultiInputSplit@14fdaa89
15/03/23 13:43:10 INFO mapred.MapTask: numReduceTasks: 0
15/03/23 13:43:10 INFO hadoop.FlowMapper: cascading version: 2.6.3
15/03/23 13:43:10 INFO hadoop.FlowMapper: child jvm opts: -Xmx200m
15/03/23 13:43:10 INFO Configuration.deprecation:
mapred.task.partition is deprecated. Instead, use
mapreduce.task.partition
15/03/23 13:43:10 INFO hadoop.FlowMapper: sourcing from:
Hfs["TextDelimited[['doc_id', 'text']]"]["data/rain.txt"]
15/03/23 13:43:10 INFO hadoop.FlowMapper: sinking to:
Hfs["TextDelimited[['doc_id', 'text']]"]["out"]
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
15/03/23 13:43:10 INFO mapred.Task:
Task:attempt_local1927426705_0001_m_000000_0 is done. And is in the
process of committing
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
15/03/23 13:43:10 INFO mapred.Task: Task
attempt_local1927426705_0001_m_000000_0 is allowed to commit now
15/03/23 13:43:10 INFO output.FileOutputCommitter: Saved output of
task 'attempt_local1927426705_0001_m_000000_0' to
file:/Users/akelpe/code/Impatient/part1/out/_temporary/0/task_local1927426705_0001_m_000000
15/03/23 13:43:10 INFO mapred.LocalJobRunner:
file:/Users/akelpe/code/Impatient/part1/data/rain.txt:0+510
15/03/23 13:43:10 INFO mapred.Task: Task
'attempt_local1927426705_0001_m_000000_0' done.
15/03/23 13:43:10 INFO mapred.LocalJobRunner: Finishing task:
attempt_local1927426705_0001_m_000000_0
15/03/23 13:43:10 INFO mapred.LocalJobRunner: map task executor complete.
15/03/23 13:43:15 INFO util.Hadoop18TapUtil: deleting temp path out/_temporary
$ cat out/*
doc_id text
doc01 A rain shadow is a dry area on the lee back side of a mountainous area.
doc02 This sinking, dry air produces a rain shadow, or area in the
lee of a mountain with less rain and cloudcover.
doc03 A rain shadow is an area of dry land that lies on the leeward
(or downwind) side of a mountain.
doc04 This is known as the rain shadow effect and is the primary
cause of leeward deserts of mountain ranges, such as California's
Death Valley.
doc05 Two Women. Secrets. A Broken Land. [DVD Australia]
No problem at all. Something must be wrong with your cluster install.
I am no expert on cloudera, but maybe some components have not been
installed. Note that we use the mapred compatibility in hadoop for
Cascading.
- Andre
>
https://groups.google.com/d/msgid/cascading-user/CANzMGC2-LHLph_ocx1P%2BiMgia3ZA5NGz3A%2B4jSygn5671oO3gQ%40mail.gmail.com.