Task works locally, but fails on EMR

137 views

Skip to first unread message

Noah

unread,

Jan 11, 2016, 8:47:44 PM1/11/16

to mrjob

Hi,

Still very new to mrjob

I'm experimenting with the example code from the Common Crawl group. To start, I just want to run the example code: tag_counter.py.

It works great locally

It give weird errors and fails to finish when running on EMR.

Can anybody point me in the right direction to solve this?

Thank You.

--------------------------------------------------------------------------

Digging thorough the different log files, I believe these are the relevant portions.

2016-01-12 00:40:38,254 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): Submitting tokens for job: job_1452559128536_0001
2016-01-12 00:40:38,771 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (main): Submitted application application_1452559128536_0001 to ResourceManager at /172.31.12.166:9022
2016-01-12 00:40:38,841 INFO org.apache.hadoop.mapreduce.Job (main): The url to track the job: http://172.31.12.166:9046/proxy/application_1452559128536_0001/
2016-01-12 00:40:38,843 INFO org.apache.hadoop.mapreduce.Job (main): Running job: job_1452559128536_0001
2016-01-12 00:40:49,253 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1452559128536_0001 running in uber mode : false
2016-01-12 00:40:49,255 INFO org.apache.hadoop.mapreduce.Job (main):  map 0% reduce 0%
2016-01-12 00:41:29,665 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000004_0, Status : FAILED
2016-01-12 00:41:29,692 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000001_0, Status : FAILED
2016-01-12 00:41:30,738 INFO org.apache.hadoop.mapreduce.Job (main):  map 6% reduce 0%
2016-01-12 00:41:30,744 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000002_0, Status : FAILED

Then, at the bottom of that log file:

2016-01-12 00:43:10,762 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1452559128536_0001 failed with state FAILED due to: Task failed task_1452559128536_0001_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0 2016-01-12 00:43:10,886 INFO org.apache.hadoop.mapreduce.Job (main): Counters: 10
Job Counters
Failed map tasks=40
Killed map tasks=12
Launched map tasks=52
Other local map tasks=39
Rack-local map tasks=13
Total time spent by all maps in occupied slots (ms)=4924353
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0

2016-01-12 00:43:10,886 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not Successful!

In the node specific stderr log, It just gives a long stream of java errors that I can't decipher:

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Streaming Command Failed!

Alexandra Faynburd

unread,

May 11, 2020, 11:11:58 AM5/11/20

to mrjob

Hi,

Did you find the solution?

I think I have the same problem with google dataproc

Thanks!

Reply all

Reply to author

Forward

0 new messages