Hello,
I'm creating an application using Spring Boot which will spawn cascading jobs from within sub-threads. As a first step, I wanted to get Spring Boot and Cascading working together. I downloaded Cascading for the Impatient and built the first jar, part1. I was able to successfully run this jar on our cluster. I then took the code from within the main method and threw it into a hello world spring boot application. When I run this application, I get the following errors:
WARN 22:05:08.893 [main] c.f.h.Hadoop2MR1Planner - running Hadoop 1.x based flows on YARN may cause problems, please use the 'cascading-hadoop' dependencies
WARN 22:05:10.127 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker 16/01/06 22:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
WARN 22:05:10.685 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker WARN 22:05:10.688 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker WARN 22:05:10.782 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker WARN 22:05:10.870 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker WARN 22:05:10.941 [flow] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.flow.FlowException: unhandled exception
at cascading.flow.BaseFlow.complete(BaseFlow.java:954)
at com.mst.cascadingtest.CascadingTest.main(CascadingTest.java:107)
... 6 more
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses. at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:105)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:265)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:184)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:146)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more
My pom.xml file has the following cascading and hadoop dependencies, hadoop version 2.6.0 and cascading version 3.0.2:
<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<!-- Cascading -->
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-hadoop2-mr1</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-core</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-local</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
This is the spring boot plugin I use to generate the jar:
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>1.2.7.RELEASE</version>
<executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
<configuration>
<excludeArtifactIds>slf4j-log4j12</excludeArtifactIds>
<excludeGroupIds>org.apache.hadoop</excludeGroupIds>
</configuration>
</execution>
</executions>
</plugin>
I've also examined my jar file (generated using the spring boot maven plugin), I don't see any of the hadoop jars in the lib folder and I see the 3 cascading jars: cascading-local-3.0.2.jar, cascading-core-3.0.2.jar, and cascading-hadoop2-mr1-3.0.2.jar.
Also, if I switch to using a maven shaded jar (aka uber jar), it works fine and successfully executes the cascading flow on the cluster.
From what I can tell, it seems that something with the spring boot maven plugin is giving the program fits. Has anyone done spring boot and cascading together before or have any insight into where I should look next?