Hi!
I'm running a scalding job that basically aggregates a monoid in few different passes. The monoid is a class that has a few fields in it, but some of them contain Maps. Everything seems fine, but I seem to be hitting a java.lang.OutOfMemoryError: GC overhead limit exceeded all the time. The pipeline has quite a few steps, so I'm still trying to figure out where exactly does it fail, but I think there should not be that many keys that have huge number of events. Actually in one situation I was quite certain that it was basically joining at max 2 events per key. Here this could have been more, but all the data was already aggregated on more accurate level than in this phase. If I'm not totally mistaken. Only think I can guess is that one or more of those maps becomes somehow extremely huge. But I don't know.
So, the question is, what is the best way to start to debug where I'm running out of memory? Or what is causing the GC overhead limit exceeded, when basically this is all I have in one of the the failed logs (tail):
2013-10-15 18:22:13,753 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 70
2013-10-15 18:22:13,761 INFO org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2013-10-15 18:22:13,765 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 177
2013-10-15 18:22:13,829 INFO org.apache.hadoop.mapred.MapTask: data buffer = 141054448/176318064
2013-10-15 18:22:13,829 INFO org.apache.hadoop.mapred.MapTask: record buffer = 463994/579993
2013-10-15 18:22:13,831 INFO cascading.tuple.hadoop.TupleSerialization: using default comparator: com.twitter.scalding.IntegralComparator
2013-10-15 18:22:13,831 INFO cascading.tuple.hadoop.TupleSerialization: using hadoop serializations from the job conf: cascading.tuple.hadoop.TupleSerialization,org.apache.hadoop.io.seriali
zer.WritableSerialization,com.twitter.scalding.serialization.KryoHadoop
2013-10-15 18:22:13,831 INFO cascading.tuple.hadoop.TupleSerialization: adding serialization token: 127, for classname: org.apache.hadoop.io.BytesWritable
2013-10-15 18:22:13,832 INFO cascading.tuple.hadoop.TupleSerialization: using default comparator: com.twitter.scalding.IntegralComparator
2013-10-15 18:22:13,836 INFO cascading.tuple.hadoop.TupleSerialization: using hadoop serializations from the job conf: cascading.tuple.hadoop.TupleSerialization,org.apache.hadoop.io.seriali
zer.WritableSerialization,com.twitter.scalding.serialization.KryoHadoop
2013-10-15 18:22:13,836 INFO cascading.tuple.hadoop.TupleSerialization: adding serialization token: 127, for classname: org.apache.hadoop.io.BytesWritable
2013-10-15 18:22:13,848 INFO cascading.flow.hadoop.FlowMapper: cascading version: Concurrent, Inc - Cascading 2.1.6
2013-10-15 18:22:13,848 INFO cascading.flow.hadoop.FlowMapper: child jvm opts: -Xmx1073741824
2013-10-15 18:22:16,934 INFO cascading.flow.hadoop.FlowMapper: sourcing from: TempHfs["SequenceFile[['cookie', 'session', 'profile', 'shogunList&apos
;, 'shogun']]"][283867434/_pipe_0__pipe_1/]
2013-10-15 18:22:16,934 INFO cascading.flow.hadoop.FlowMapper: sinking to: GroupBy(_pipe_0*_pipe_1)[by:[{1}:'cookie']]
2013-10-15 18:22:16,937 INFO cascading.pipe.assembly.AggregateBy: using threshold value: 100000
2013-10-15 18:22:16,942 INFO cascading.tuple.hadoop.TupleSerialization: using default comparator: com.twitter.scalding.IntegralComparator
2013-10-15 18:24:28,387 ERROR cascading.flow.stream.TrapHandler: caught Throwable, no trap available, rethrowing
2013-10-15 18:24:50,597 ERROR cascading.flow.stream.SourceStage: caught throwable
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:68)
at java.lang.StackTraceElement.toString(StackTraceElement.java:157)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at java.lang.Throwable.printStackTrace(Throwable.java:512)
at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)
at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.hadoop.mapred.TaskLogAppender.append(TaskLogAppender.java:83)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
at cascading.flow.stream.TrapHandler.handleReThrowableException(TrapHandler.java:103)
at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:121)
at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:115)
at cascading.flow.stream.ElementStage.handleException(ElementStage.java:134)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:107)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
2013-10-15 18:24:50,598 INFO org.apache.hadoop.mapred.Task: Communication exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
at sun.nio.cs.StreamDecoder.<init>(StreamDecoder.java:231)
at sun.nio.cs.StreamDecoder.<init>(StreamDecoder.java:211)
at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:50)
at java.io.InputStreamReader.<init>(InputStreamReader.java:57)
at java.io.FileReader.<init>(FileReader.java:55)
at org.apache.hadoop.util.ProcfsBasedProcessTree.constructProcessInfo(ProcfsBasedProcessTree.java:514)
at org.apache.hadoop.util.ProcfsBasedProcessTree.getProcessTree(ProcfsBasedProcessTree.java:189)
at org.apache.hadoop.util.LinuxResourceCalculatorPlugin.getProcResourceValues(LinuxResourceCalculatorPlugin.java:401)
at org.apache.hadoop.mapred.Task.updateResourceCounters(Task.java:821)
at org.apache.hadoop.mapred.Task.updateCounters(Task.java:843)
at org.apache.hadoop.mapred.Task.access$500(Task.java:72)
at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:670)
at java.lang.Thread.run(Thread.java:662)
2013-10-15 18:25:02,303 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-10-15 18:25:02,305 WARN org.apache.hadoop.mapred.Child: Error running child
cascading.flow.FlowException: internal error during mapper execution
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:138)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:45)
at java.lang.StringBuilder.<init>(StringBuilder.java:68)
at java.lang.StackTraceElement.toString(StackTraceElement.java:157)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at java.lang.Throwable.printStackTrace(Throwable.java:512)
at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)
at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
at org.apache.hadoop.mapred.TaskLogAppender.append(TaskLogAppender.java:83)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
at cascading.flow.stream.TrapHandler.handleReThrowableException(TrapHandler.java:103)
at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:121)
at cascading.flow.stream.TrapHandler.handleException(TrapHandler.java:115)
at cascading.flow.stream.ElementStage.handleException(ElementStage.java:134)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:107)
at cascading.flow.stream.FunctionEachStage.receive(FunctionEachStage.java:39)
at cascading.flow.stream.SourceStage.map(SourceStage.java:102)
at cascading.flow.stream.SourceStage.run(SourceStage.java:58)
at cascading.flow.hadoop.FlowMapper.run(FlowMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
Any suggestions on debugging?
--
HG