java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens

unread,

Sep 26, 2010, 3:55:15 AM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

Greetings,

I'm running into a brain-numbing problem on Elastic MapReduce. I'm
running a decent-size task (22,000 mappers, a ton of GZipped input
blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

I get failures randomly --- sometimes at the end of my 6-step process,
sometimes at the first reducer phase, sometimes in the mapper. It
seems to fail in multiple areas. Mostly in the reducers. Any ideas?

Here's the settings I've changed:
-Xmx400m
6 max mappers
1 max reducer
1GB swap partition
mapred.job.reuse.jvm.num.tasks=50
mapred.reduce.parallel.copies=3

java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
at java.lang.StringCoding.decode(StringCoding.java:173)
at java.lang.String.(String.java:443)
at java.lang.String.(String.java:515)
at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Bradford Stephens

unread,

Sep 26, 2010, 4:00:29 AM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
see if that helps.

Bradford Stephens

unread,

Sep 26, 2010, 6:30:10 AM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

Nope, that didn't seem to help.

On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens

Aceeca

unread,

Sep 26, 2010, 10:15:03 AM9/26/10

to cascading-user

Try to reduce the number of mappers and add GC parameters.

On Sep 26, 6:30 am, Bradford Stephens <bradfordsteph...@gmail.com>
wrote:

> >>http://www.drawntoscalehq.com-- The intuitive, cloud-scale data

> >> solution. Process, store, query, search, and serve all your data.
>

> >>http://www.roadtofailure.com-- The Fringes of Scalability, Social

> >> Media, and Computer Science
>
> > --
> > Bradford Stephens,
> > Founder, Drawn to Scale
> > drawntoscalehq.com
> > 727.697.7528
>

> >http://www.drawntoscalehq.com-- The intuitive, cloud-scale data

> > solution. Process, store, query, search, and serve all your data.
>

> >http://www.roadtofailure.com-- The Fringes of Scalability, Social

> > Media, and Computer Science
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>

> http://www.drawntoscalehq.com-- The intuitive, cloud-scale data

> solution. Process, store, query, search, and serve all your data.
>

> http://www.roadtofailure.com-- The Fringes of Scalability, Social
> Media, and Computer Science

Chris K Wensel

unread,

Sep 26, 2010, 11:10:45 AM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

fwiw

I run m2.xlarge slaves, using the default mappers/reducers (4/2 i think).

with swap
--bootstrap-action s3://elasticmapreduce/bootstrap-actions/create-swap-file.rb --args "-E,/mnt/swap,1000"

historically i'v run this property with no issues, but should probably re-research the gc setting (comments please)
"mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC"

i haven't co-installed ganglia to look at utilization lately, but any more mappers than 4 or more than 2 reducers have always given me headaches.

ckw

> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to cascadi...@googlegroups.com.
> To unsubscribe from this group, send email to cascading-use...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support, and licensing for Cascading

Ted Dunning

unread,

Sep 26, 2010, 12:35:44 PM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

The old GC routinely gives me lower performance than modern GC. The default is now quite good for batch programs.

Ted Dunning

unread,

Sep 26, 2010, 12:37:01 PM9/26/10

to cascadi...@googlegroups.com, core...@hadoop.apache.org

My feeling is that you have some kind of leak going on in your mappers or reducers and that reducing the number of times the jvm is re-used would improve matters.

GC overhead limit indicates that your (tiny) heap is full and collection is not reducing that.

On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <bradford...@gmail.com> wrote:

mapred.job.reuse.jvm.num.tasks=50

Bradford Stephens

unread,

Sep 26, 2010, 7:46:20 PM9/26/10

to commo...@hadoop.apache.org, cascadi...@googlegroups.com, core...@hadoop.apache.org

Sadly, making Chris's changes didn't help.

Here's the Cascading code, it's pretty simple but uses the new
"combiner"-like functionality:

http://pastebin.com/ccvDmLSX

--

Chris K Wensel

unread,

Sep 26, 2010, 8:09:31 PM9/26/10

to cascadi...@googlegroups.com, commo...@hadoop.apache.org, core...@hadoop.apache.org

Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.

ckw

Bradford Stephens

unread,

Sep 26, 2010, 8:37:22 PM9/26/10

to commo...@hadoop.apache.org, cascadi...@googlegroups.com, core...@hadoop.apache.org

Yup, I've turned it down to 1,000. Let's see if that helps!

Bradford Stephens

unread,

Sep 26, 2010, 9:01:46 PM9/26/10

to commo...@hadoop.apache.org, cascadi...@googlegroups.com, core...@hadoop.apache.org

One of the problems with this data set is that I'm grouping by a
category that has only, say, 20 different values. Then I'm doing a
unique count of Facebook user IDs per group. I imagine that's not
pleasant for the reducers.

On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <ale...@cloudera.com> wrote:
> Hi Bradford,
>
> Sometimes the reducers do not handle merging large chunks of data too well:
> How many reducers do you have? Try to increase the # of reducers (you can
> always merge the data later if you are worried about too many output files).
>
> --
> Alex Kozlov
> Solutions Architect
> Cloudera, Inc
> twitter: alexvk2009
>
> Hadoop World 2010, October 12, New York City - Register now:
> http://www.cloudera.com/company/press-center/hadoop-world-nyc/

>
>
> On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <ch...@wensel.net> wrote:
>

>> cascading-use...@googlegroups.com<cascading-user%2Bunsu...@googlegroups.com>

>> .
>> > For more options, visit this group at
>> http://groups.google.com/group/cascading-user?hl=en.
>> >
>>
>> --
>> Chris K Wensel
>> ch...@concurrentinc.com
>> http://www.concurrentinc.com
>>
>> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>>
>>
>

--

Ted Dunning

unread,

Sep 26, 2010, 10:00:46 PM9/26/10

to cascadi...@googlegroups.com, commo...@hadoop.apache.org, core...@hadoop.apache.org

If there are combiners, the reducers shouldn't get any lists longer than a small multiple of the number of maps.

To unsubscribe from this group, send email to cascading-use...@googlegroups.com.

Bradford Stephens

unread,

Sep 27, 2010, 5:46:03 AM9/27/10

to commo...@hadoop.apache.org, cascadi...@googlegroups.com, core...@hadoop.apache.org

It turned out to be a deployment issue of an old version. Ted and
Chris's suggestions were spot-on.

I can't believe how BRILLIANT these combiners from Cascading are. It's
cut my processing time down from 20 hours to 50 minutes. AND I cut out
about 80% of my hand-crafted code.

Bravo. I look smart now. (Almost).

-B

>> <cascading-user%2Bunsu...@googlegroups.com<cascading-user%252Buns...@googlegroups.com>

--

Reply all

Reply to author

Forward