Hi,
I'm trying to find out why my scalding jobs are running local vs against the hadoop cluster on EMR.
We have python pig jobs running against the cluster which use the same hadoop configuration/version on EMR as the scalding jobs.
Perhaps there is some default setting I need to override in my base class that extends Scalding Job or the hadoop Configuration passed to the job runner?
When run on EMR I see:
ERROR - 13/10/11 01:20:12 INFO s3native.NativeS3FileSystem: Opening 's3n://../global/pail/master/pail.meta' for reading
INFO - 10/11 01:20:24 INFO [pool-1-thread-1] c.f.FlowStep - [..] submitted hadoop job: job_local_0001
ERROR - 13/10/11 01:20:24 INFO mapred.MapTask: Host name: {the remote ip}
and eventually:
INFO mapred.Task: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
ERROR - 13/10/11 20:57:15 INFO mapred.LocalJobRunner:
Scalding 0.8.11
Scala 2.10.2
Cascading 2.1.6
JDK 1.7
Hadoop (AWS EMR) 1.0.3
Thanks,
Helena