Getting OutOfMemory error during map phase. Tried with different memory settings as per the suggestions in hadoop.settings dbut still no luck. Any help is greatly appreciated.
Config details:
CDH 5.4.0
Hadoop 2.6.0
RMR2 3.3.1
Streaming Jar (hadoop-streaming-2.6.0-cdh5.4.0.jar)
The following are the Memory settings for the job:
mapreduce.map.memory.mb=8192m
mapreduce.map.java.opts=-Xmx4096m
mapreduce.reduce.memory.mb=8192m
mapreduce.reduce.java.opts=-Xmx4096m
The stack trace:
15/06/25 18:29:44 INFO mapreduce.Job: Running job: job_1435165845246_0880
15/06/25 18:29:50 INFO mapreduce.Job: Job job_1435165845246_0880 running in uber mode : false
15/06/25 18:29:50 INFO mapreduce.Job: map 0% reduce 0%
15/06/25 18:30:01 INFO mapreduce.Job: map 33% reduce 0%
15/06/25 18:30:02 INFO mapreduce.Job: map 67% reduce 0%
15/06/25 18:30:39 INFO mapreduce.Job: Task Id : attempt_1435165845246_0880_m_000001_0, Status : FAILED
Error: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:336)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:535)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.typedbytes.TypedBytesInput.readRawBytes(TypedBytesInput.java:212)
at org.apache.hadoop.typedbytes.TypedBytesInput.readRaw(TypedBytesInput.java:152)
at org.apache.hadoop.streaming.io.TypedBytesOutputReader.readKeyValue(TypedBytesOutputReader.java:51)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:378)
Garbage collection details:
Garbage collection 141 = 124+10+7 (level 2) ...
23.4 Mbytes of cons cells used (59%)
4.2 Mbytes of vectors used (47%)
Dotted pair list of 1
$ : language rmr.str(gc(verbose = TRUE, reset = FALSE))
gc(verbose = TRUE, reset = FALSE)
num [1:2, 1:6] 437558 543467 23.4 4.2 741108 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:2] "Ncells" "Vcells"
..$ : chr [1:6] "used" "(Mb)" "gc trigger" "(Mb)" ...
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 437558 23.4 741108 39.6 531268 28.4
Vcells 543467 4.2 1162592 8.9 1162592 8.9