Simple Cascading application with Spring Boot

237 views
Skip to first unread message

Sean Gottschalk

unread,
Jan 6, 2016, 5:16:13 PM1/6/16
to cascading-user
Hello,

I'm creating an application using Spring Boot which will spawn cascading jobs from within sub-threads. As a first step, I wanted to get Spring Boot and Cascading working together. I downloaded Cascading for the Impatient and built the first jar, part1. I was able to successfully run this jar on our cluster. I then took the code from within the main method and threw it into a hello world spring boot application. When I run this application, I get the following errors:

WARN  22:05:08.893 [main] c.f.h.Hadoop2MR1Planner - running Hadoop 1.x based flows on YARN may cause problems, please use the 'cascading-hadoop' dependencies
WARN  22:05:10.127 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
16/01/06 22:05:10 INFO mapred.FileInputFormat: Total input paths to process : 1
WARN  22:05:10.685 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
WARN  22:05:10.688 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
WARN  22:05:10.782 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
WARN  22:05:10.870 [main] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
WARN  22:05:10.941 [flow] c.f.h.util.HadoopUtil - could not successfully test if Hadoop based platform is in standalone/local mode, no valid properties set, returning false - tests for: mapreduce.framework.name, tez.local.mode, and mapred.job.tracker
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:53)
at java.lang.Thread.run(Thread.java:744)
Caused by: cascading.flow.FlowException: unhandled exception
at cascading.flow.BaseFlow.complete(BaseFlow.java:954)
at com.mst.cascadingtest.CascadingTest.main(CascadingTest.java:107)
... 6 more
Caused by: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:120)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:82)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:75)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:470)
at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:449)
at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:105)
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:265)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:184)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:146)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:48)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
... 1 more

My pom.xml file has the following cascading and hadoop dependencies, hadoop version 2.6.0 and cascading version 3.0.2:

<!-- Hadoop -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>

<!-- Cascading -->
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-hadoop2-mr1</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-core</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-local</artifactId>
<version>${cascading.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>

This is the spring boot plugin I use to generate the jar:

<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>1.2.7.RELEASE</version>
    <executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
<configuration>
<excludeArtifactIds>slf4j-log4j12</excludeArtifactIds>
<excludeGroupIds>org.apache.hadoop</excludeGroupIds>
</configuration>
</execution>
</executions>
</plugin>

I've also examined my jar file (generated using the spring boot maven plugin), I don't see any of the hadoop jars in the lib folder and I see the 3 cascading jars: cascading-local-3.0.2.jar, cascading-core-3.0.2.jar, and cascading-hadoop2-mr1-3.0.2.jar.

Also, if I switch to using a maven shaded jar (aka uber jar), it works fine and successfully executes the cascading flow on the cluster.

From what I can tell, it seems that something with the spring boot maven plugin is giving the program fits. Has anyone done spring boot and cascading together before or have any insight into where I should look next?

Andre Kelpe

unread,
Jan 7, 2016, 5:18:50 AM1/7/16
to cascading-user
This looks like a classpath issue. First of all, you have to package
your application as one jar, that contains it all. That can either be
a fat-jar or a jar with an embedded lib folder for all your
dependencies. Next to that, you have to make sure that all the
configuration settings and libraries of hadoop are on the runtime
classpath. This is normally being taking care of by the hadoop or yarn
script. An application is normally ran like so:

hadoop jar </path/to/my.jar> or yarn jar </path/to/my.jar>

Now, if for whatever reason, you cannot use those scripts, you can ask
the yarn or hadoop script for the classpath and pass that to your jvm
invocation. Something like

MY_CLASSPATH=$(yarn classpath):/path/to/my.jar
java -cp $MY_CLASSPATH main.class.Name <args>

should do the trick, if you use a fatjar.

You might be tempted to put all the hadoop libs into the application
jar as well, but that is not a good idea, since it can cause all sorts
of strange errors later on.

HTH

- André
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/e25db5dc-4501-4377-9e75-8b38605e5391%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Sean Gottschalk

unread,
Jan 7, 2016, 1:00:57 PM1/7/16
to cascading-user
Yeah it does seem like a classpath issue. To address your other notes, it is being packaged as a runnable uber jar using the maven spring boot plugin. When I examined the jar, it contains a lib sub-folder with the cascading jars and not the hadoop jars. I'm invoking it using "hadoop jar <jarname>". I'm kind of at a loss for how to further debug this issue, do you have any suggestions?

Andre Kelpe

unread,
Jan 8, 2016, 6:06:43 AM1/8/16
to cascading-user
Which cascading dependency do you use? If this is hadoop 2.x, you
should use cascading-hadoop2-mr1.

- André

On Thu, Jan 7, 2016 at 7:00 PM, Sean Gottschalk
> https://groups.google.com/d/msgid/cascading-user/d05a8456-f72f-4a70-99a5-5464daacf394%40googlegroups.com.

Sean Gottschalk

unread,
Jan 8, 2016, 1:45:25 PM1/8/16
to cascading-user
Yeah the hadoop version is 2.4.1, I double (triple, quadruple, etc) checked and my jar has cascading-hadoop2-mr1-3.0.2.jar. I have a suspicion that spring boot is causing the problems, since I can create the jar using the shade plugin and it works.

eeps...@marketshare.com

unread,
Jan 8, 2016, 5:18:54 PM1/8/16
to cascading-user
I do several things.  In the part of the pom I include:

<excludeGroupIds>org.apache.hadoop</excludeGroupIds>

In the <execution/> <configuration/> of the spring-boot-maven-plugin.

Second, I have a property that indicates whether I'm in local mode and, if so, set a few properties when configuring Cascading / Hadoop.

And, finally, I'm using the cascading-hadoop artifact even though it produces a warning.

HTH.

eeps...@marketshare.com

unread,
Jan 8, 2016, 5:34:40 PM1/8/16
to cascading-user
Are you getting an exception when running on the cluster or when running locally?

I found for a local (dev) installation of Hadoop I could edit the configuration XML files to sort out some issues.

Sean Gottschalk

unread,
Jan 14, 2016, 3:07:44 PM1/14/16
to cascading-user
Just giving an update, I ended up switching from the Maven Spring Boot Plugin to the Maven Shade Plugin and it worked correctly. I still don't know the root cause for the Spring Boot plugin not working, but at least switching to Shade fixed it.

Sean Gottschalk

unread,
Jan 19, 2016, 3:56:42 PM1/19/16
to cascading-user
This is the magic plugin snippet which did the trick:

<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>2.4.3</version>
                <dependencies>
                    <dependency>
                        <groupId>org.springframework.boot</groupId>
                        <artifactId>spring-boot-maven-plugin</artifactId>
                        <version>${spring-boot.version}</version>
                    </dependency>
                </dependencies>
                <configuration>
                    <keepDependenciesWithProvidedScope>false</keepDependenciesWithProvidedScope>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <transformers>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.handlers</resource>
                                </transformer>
                                <transformer implementation="org.springframework.boot.maven.PropertiesMergingResourceTransformer">
                                    <resource>META-INF/spring.factories</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/spring.schemas</resource>
                                </transformer>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                                <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <mainClass>com.mst.filemover.FilemoverApplication</mainClass>
                                </transformer>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

Andre Kelpe

unread,
Jan 19, 2016, 4:10:29 PM1/19/16
to cascading-user
Thanks for sharing!

- Andre
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cascading-use...@googlegroups.com.
> To post to this group, send email to cascadi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/cascading-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/cascading-user/0d20c722-24fe-4d5c-867c-55f882a6ce2b%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages