Faunus job memory requirement / recommendation

Yuriy

unread,

Jul 1, 2014, 2:28:52 PM7/1/14

to aureliu...@googlegroups.com

Could you recommend the amount of memory for a Faunus (4.4.1 on HBase) job?

I tried setting 4 GB

(via

mapreduce.map.memory.mb=5120

mapreduce.map.java.opts=-Xmx4096m

mapreduce.reduce.memory.mb=5120

mapreduce.reduce.java.opts=-Xmx4096m

in properties file),

but the job still throws out of memory exception (like the one below and others):

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

at java.util.Arrays.copyOf(Arrays.java:2219)

at java.util.ArrayList.grow(ArrayList.java:242)

at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:216)

at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:208)

at java.util.ArrayList.add(ArrayList.java:440)

at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:203)

at com.google.common.collect.AbstractListMultimap.put(AbstractListMultimap.java:95)

at com.google.common.collect.ArrayListMultimap.put(ArrayListMultimap.java:62)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$2.call(StandardTitanGraph.java:282)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$2.call(StandardTitanGraph.java:241)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:62)

... 6 more

Thanks,

Yuriy

Daniel Kuppitz

unread,

Jul 1, 2014, 2:31:50 PM7/1/14

to aureliu...@googlegroups.com

The required memory depends on your graph, especially on the largest vertex (the one with the most incident edges).

Cheers,

Daniel

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Bob Briody

unread,

Jul 1, 2014, 2:42:27 PM7/1/14

to aureliu...@googlegroups.com

Hi Yuriy,

I have had success w/ the following configuration on an m3.2xlarge machine:

mapred.task.timeout=5400000 mapred.max.split.size=5242880 mapred.map.child.java.opts=-Xmx4G mapred.reduce.child.java.opts=-Xmx4G mapred.map.tasks=4 mapred.reduce.tasks=2 mapred.job.reuse.jvm.num.tasks=-1

I did experiment w/ some more aggressive settings that failed, but I would not say that the process was particularly rigorous.

As Daniel mentioned, YMMV depending on the nature of your graph. You can also try reducing the number of map and reduce tasks until things work, and then experiment with increasing them.

Thanks,

Bob

Tim Ludwinski

unread,

Jul 1, 2014, 4:15:32 PM7/1/14

to aureliu...@googlegroups.com

The required memory depends on your graph, especially on the largest vertex (the one with the most incident edges).

This isn't good. Basically, you need to run Faunus to determine the largest vertex so you can run Faunus to run your query. But you can't use Faunus to determine the largest vertex because you don't know how much memory you need.

I hope this is fixed somehow in newer versions. Maybe being able to save the intermediate results when something fails so you could restart it with more memory would be enough.

Stephen Mallette

unread,

Jul 1, 2014, 4:24:46 PM7/1/14

to aureliu...@googlegroups.com

> This isn't good

i chuckled at a bit at that. maybe it isn't "good" - i've gotten so used to just trying, failing, bumping the Xmx and repeat repeat repeat. Interesting what one learns to live with. Never even thought to question it. ;)

--

Yuriy

unread,

Jul 2, 2014, 3:56:04 PM7/2/14

to aureliu...@googlegroups.com

Thanks everyone.

Yeah, it wouldn't hurt if Faunus be more reliable, i.e. be taking longer instead of failing.

But it is Apache 2 OSS so we shouldn't complain, but should rather fix it ourselves :)

Reply all

Reply to author

Forward