oozie cascading action - number of mapper/reducer tasks

143 views
Skip to first unread message

Dilip K

unread,
Aug 24, 2015, 11:16:24 AM8/24/15
to cascading-user
I am running Cascading workflow using Oozie Java action. Every time the running with single mapper/reducer when launched from Oozie java action, However the same running with intended number of mappers/reducers when running from CLI as hadoop jar.

Flow connector being used is Hadoop2MR1FlowConnector.

Please suggest if anyone ran into the same issue or any other way to run Cascading workflow using Ooize.

Thanks
Dilip

Andre Kelpe

unread,
Aug 24, 2015, 11:26:12 AM8/24/15
to cascading-user
The problem is oozie, since it does everything it can to make the classpath management more complicated, that it needs to be. When you launch a cascading app via hadoop jar <your>.jar or yarn jar <your>.jar, the wrapper scripts will set up the classpath correctly. That classpath will include the conf directory of hadoop and then everything will be just fine.

You can see the classpath like so:

yarn classpath | tr ":"  "\n" | sort -u


Oozie however, does not do any of that and you end up having the default settings, which is one mapper and one reducer in a local JVM.

You have to make sure that the conf directory is on the classpath, when your app starts and everything should work as expected.

- André



--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/0a66d65c-5f86-454a-92ab-5d035f304cbb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Dilip K

unread,
Aug 25, 2015, 11:43:39 AM8/25/15
to cascading-user
Thanks Andre. 

Libraries can be added to the classpath using libpath or sharedlibs in oozie. Wondering how to add the conf to the classpath of ooize java action.

Thanks
Dilip 

Andre Kelpe

unread,
Aug 25, 2015, 12:17:30 PM8/25/15
to cascading-user
TBH I don't know. Can't you use the shell action and simply run yarn jar <yourjar>?

- André


For more options, visit https://groups.google.com/d/optout.

Dilip K

unread,
Aug 25, 2015, 12:53:42 PM8/25/15
to cascading-user
That's a great idea, I will try run it as Shell action and hopefully it runs with configured map and reduce tasks.

Andre Kelpe

unread,
Aug 25, 2015, 3:02:25 PM8/25/15
to cascading-user
Please let us know, if it worked out. It is always good when we can share solutions with other oozie users.

- André


For more options, visit https://groups.google.com/d/optout.

Oleksii Iepishkin

unread,
Aug 26, 2015, 10:26:22 AM8/26/15
to cascading-user
We at Tapad run scalding jobs on cdh 5.4.2 clusters using java action from oozie 4.1.0-cdh5.4.2. No problems there.

To configure default number of reducers per scalding job we use this tag in a java action
<arg>-Dmapreduce.job.reduces=${numReduceTasks}</arg>

It is still possible in a scalding job to override that by calling .reducers(X)

-Oleksii

Oleksii Iepishkin

unread,
Aug 26, 2015, 10:28:19 AM8/26/15
to cascading-user
BTW we use twitter Tool class to submit scalding jobs. So in a java action it looks like
<main-class>com.twitter.scalding.Tool</main-class>

Java action is just a launcher.

Dilip K

unread,
Aug 26, 2015, 3:57:54 PM8/26/15
to cascading-user
Thanks for the inputs Oleksii.

I tried as you suggested, but somehow the java action not running with mentioned number of reducers. 

>>> Invoking Main class now >>>

Fetching child yarn jobs
tag id : oozie-e4ed2fac48f99d4e94ce9d4150888d28
Child yarn jobs are found - 
Main class        : com.app.network.workflow.extractfeatures.ExtractFeatures
Arguments         :
                    cluster
                    hdfs://HDFSHA/user/dkari/BigDataEurekaExtract(17June2015).csv
                    hdfs://HDFSHA/tmp/hadoop-dkari/nd_output/feature-extraction-output.csv
                    -Dmapreduce.job.reduces=15

Heart beat
Heart beat
Heart beat
Heart beat

<<< Invocation of Main class completed <<<


Thanks
Dilip

Ryan Desmond

unread,
Aug 26, 2015, 4:03:48 PM8/26/15
to cascadi...@googlegroups.com
You may find this thread useful, "oozie cascading action - number of mapper/reducer tasks".


--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.

For more options, visit https://groups.google.com/d/optout.



--
Ryan Desmond
Sr. Solutions Architect
Concurrent Inc.

Dilip K

unread,
Aug 26, 2015, 4:38:53 PM8/26/15
to cascading-user
Thanks Ryan.

Finally able to run the cascading oozie job with intended number of reducers. But the problem is that I can't customize any hadoop properties from Oozie workflow.

In non restricted Hadoop environment you will not have any issues while running cascading job from shell. 

What if it is a secured/restrictive environment?
User running the job (lets say user1), who will be given write access to only his/her home dir(/user/user1) and temp(/tmp/hadoop-user1) directory. But without setting the hadoop.tmp.dir from your ooize config property it will always try to use /tmp/hadoop-yarn folder and fail with access permissions, as the actual user runs the job is yarn.

From the link that Ryan shared in previous thread, oozie sharedlib is the solution. In which you will have all hadoop config available to oozie, but global to all oozie workflows. Still you will have the issue if you want to customize any config.

Thanks
Dilip

Oleksii Iepishkin

unread,
Aug 27, 2015, 10:57:47 AM8/27/15
to cascading-user
Dilip,

1. As I mentioned above the main class must be com.twitter.scalding.Tool
If your ExtractFeatures extends it then your are probably fine. Be we don't do that.

com.twitter.scalding.Tool takes a scalding job class name as an argument.

2. Java action m/r job always has only map phase with only 1 map task. It works as a driver that submits map reduce jobs to the cluster.

Let me know if this doesn't work in your case.
Reply all
Reply to author
Forward
0 new messages