Scalding jobs are running local vs against the hadoop cluster on EMR

71 views

Skip to first unread message

Helena

unread,

Oct 11, 2013, 6:25:53 PM10/11/13

to cascadi...@googlegroups.com

Hi,

I'm trying to find out why my scalding jobs are running local vs against the hadoop cluster on EMR.

We have python pig jobs running against the cluster which use the same hadoop configuration/version on EMR as the scalding jobs.

Perhaps there is some default setting I need to override in my base class that extends Scalding Job or the hadoop Configuration passed to the job runner?

When run on EMR I see:

ERROR - 13/10/11 01:20:12 INFO s3native.NativeS3FileSystem: Opening 's3n://../global/pail/master/pail.meta' for reading

INFO - 10/11 01:20:24 INFO [pool-1-thread-1] c.f.FlowStep - [..] submitted hadoop job: job_local_0001

ERROR - 13/10/11 01:20:24 INFO mapred.MapTask: Host name: {the remote ip}

and eventually:

INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting

ERROR - 13/10/11 20:57:15 INFO mapred.LocalJobRunner:

Scalding 0.8.11

Scala 2.10.2

Cascading 2.1.6

JDK 1.7

Hadoop (AWS EMR) 1.0.3

Thanks,

Helena

Oscar Boykin

unread,

Oct 11, 2013, 7:07:28 PM10/11/13

to cascadi...@googlegroups.com

You can look at Tool:

https://github.com/twitter/scalding/blob/develop/scalding-core/src/main/scala/com/twitter/scalding/Tool.scala#L70

to see how jobs are launched.

It gets the default configuration set up for hadoop. With the hadoop jar command you can pass a path to the config you want to use. I don't know how to do that with EMR, sorry.

Hope this helps.

PS: scalding.Tool is just there to make things easier. If you need, it should be easy enough to make an EMRTool or something that would set any special options. If that were general enough, we could merge it.

CONFIDENTIALITY NOTICE: The information contained in this message may be privileged and/or confidential. It is the property of CrowdStrike.  If you are not the intended recipient, or responsible for delivering this message to the intended recipient, any review, forwarding, dissemination, distribution or copying of this communication or any attachment(s) is strictly prohibited. If you have received this message in error, please notify the sender immediately, and delete it and all attachments from your computer and network.
--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
For more options, visit https://groups.google.com/groups/opt_out.