Task works locally, but fails on EMR

137 views
Skip to first unread message

Noah

unread,
Jan 11, 2016, 8:47:44 PM1/11/16
to mrjob
Hi,

Still very new to mrjob


I'm experimenting with the example code from the Common Crawl group.  To start, I just want to run the example code: tag_counter.py.  

It works great locally
It give weird errors and fails to finish when running on EMR.

Can anybody point me in the right direction to solve this?

Thank You.

--------------------------------------------------------------------------

Digging thorough the different log files, I believe these are the relevant portions.

2016-01-12 00:40:38,254 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): Submitting tokens for job: job_1452559128536_0001
2016-01-12 00:40:38,771 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (main): Submitted application application_1452559128536_0001 to ResourceManager at /172.31.12.166:9022
2016-01-12 00:40:38,841 INFO org.apache.hadoop.mapreduce.Job (main): The url to track the job: http://172.31.12.166:9046/proxy/application_1452559128536_0001/
2016-01-12 00:40:38,843 INFO org.apache.hadoop.mapreduce.Job (main): Running job: job_1452559128536_0001
2016-01-12 00:40:49,253 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1452559128536_0001 running in uber mode : false
2016-01-12 00:40:49,255 INFO org.apache.hadoop.mapreduce.Job (main):  map 0% reduce 0%
2016-01-12 00:41:29,665 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000004_0, Status : FAILED
2016-01-12 00:41:29,692 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000001_0, Status : FAILED
2016-01-12 00:41:30,738 INFO org.apache.hadoop.mapreduce.Job (main):  map 6% reduce 0%
2016-01-12 00:41:30,744 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1452559128536_0001_m_000002_0, Status : FAILED


Then, at the bottom of that log file:

2016-01-12 00:43:10,762 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1452559128536_0001 failed with state FAILED due to: Task failed task_1452559128536_0001_m_000004
Job failed as tasks failed. failedMaps:1 failedReduces:0 2016-01-12 00:43:10,886 INFO org.apache.hadoop.mapreduce.Job (main): Counters: 10
Job Counters
Failed map tasks=40
Killed map tasks=12
Launched map tasks=52
Other local map tasks=39
Rack-local map tasks=13
Total time spent by all maps in occupied slots (ms)=4924353
Total time spent by all reduces in occupied slots (ms)=0
Map-Reduce Framework
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0 
2016-01-12 00:43:10,886 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not Successful!


In the node specific stderr log, It just gives a long stream of java errors that I can't decipher:

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
Streaming Command Failed! 

Alexandra Faynburd

unread,
May 11, 2020, 11:11:58 AM5/11/20
to mrjob
Hi,

Did you find the solution?
I think I have the same problem with google dataproc

Thanks!
Reply all
Reply to author
Forward
0 new messages