Titan hadoop cassandra GC overhead limit exceeded

308 views
Skip to first unread message

Apoorva Gaurav

unread,
May 2, 2015, 10:55:50 AM5/2/15
to aureliu...@googlegroups.com
Hello All,

I'm using titan 0.5.4 with cassandra 2.0.14 and trying to run map reduce job on it using cloudera CDH5. Its a user:user and user:interest graph where close to 16M users are there and close to 1000 interests, its possible that some interest can have as many as 2M users, in future these numbers will increase. The total graph isn't huge, with Replication factor of 3 the 4 node cassandra cluster has combined disk size of close to 28GB.

I've three yarn node managers on which I've given following limits 
yarn-site.xml
         <property>
                <name>yarn.nodemanager.resource.memory-mb</name>
                <value>16000</value>
        </property>
        <property>
                <name>yarn.scheduler.minimum-allocation-mb</name>
                <value>1000</value>
        </property>

mapred-site.xml
        <property>
                <name>mapreduce.map.memory.mb</name>
                <value>6000</value>
        </property>
        <property>
                <name>mapreduce.reduce.memory.mb</name>
                <value>6000</value>
        </property>
        <property>
                <name>mapreduce.map.java.opts</name>
                <value>-Xmx4096m</value>
        </property>
        <property>
                <name>mapreduce.reduce.java.opts</name>
                <value>-Xmx4096m</value>
        </property>

and on the client I've given following limits
mapreduce.map.memory.mb=6000
mapreduce.reduce.memory.mb=6000
mapred.map.child.java.opts=-Xmx4096m
mapred.reduce.child.java.opts=-Xmx4096m
mapred.max.split.size=5242880
mapred.job.reuse.jvm.num.tasks=-1

titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat
titan.hadoop.input.conf.storage.backend=cassandrathrift
titan.hadoop.input.conf.storage.hostname=lp1,lp3
titan.hadoop.input.conf.storage.port=9160
titan.hadoop.input.conf.storage.cassandra.keyspace=lgpgelsgraph
cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner
cassandra.input.split.size=16384
cassandra.thrift.framed.size_mb=499
cassandra.thrift.message.max_size_mb=500

titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat
titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.noop.NoOpOutputFormat


On running simple gremlin queries I quite often get errors like this. 
10:11:26 INFO  org.apache.hadoop.mapreduce.Job  - Task Id : attempt_1430217400643_0041_m_000179_2, Status : FAILED
Error: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.hadoop.mapreduce.lib.chain.Chain.joinAllThreads(Chain.java:526)
        at org.apache.hadoop.mapreduce.lib.chain.ChainMapper.run(ChainMapper.java:169)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
        at com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:194)
        at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:114)
        at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:49)
        at com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:156)
        at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:214)
        at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:201)
        at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)
        at com.google.common.collect.HashMultimap.put(HashMultimap.java:49)
        at com.thinkaurelius.titan.hadoop.FaunusSerializer.readEdges(FaunusSerializer.java:252)
        at com.thinkaurelius.titan.hadoop.FaunusSerializer.readElement(FaunusSerializer.java:143)
        at com.thinkaurelius.titan.hadoop.FaunusSerializer.readPathElement(FaunusSerializer.java:119)
        at com.thinkaurelius.titan.hadoop.FaunusSerializer.readEdges(FaunusSerializer.java:218)
        at com.thinkaurelius.titan.hadoop.FaunusSerializer.readVertex(FaunusSerializer.java:76)
        at com.thinkaurelius.titan.hadoop.FaunusVertex.readFields(FaunusVertex.java:336)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
        at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
        at org.apache.hadoop.util.ReflectionUtils.copy(ReflectionUtils.java:296)
        at org.apache.hadoop.mapreduce.lib.chain.Chain$ChainRecordWriter.writeToQueue(Chain.java:264)
        at org.apache.hadoop.mapreduce.lib.chain.Chain$ChainRecordWriter.write(Chain.java:252)
        at org.apache.hadoop.mapreduce.lib.chain.ChainMapContextImpl.write(ChainMapContextImpl.java:110)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
        at com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Map.map(VerticesMap.java:59)
        at com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Map.map(VerticesMap.java:36)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapreduce.lib.chain.Chain$MapRunner.run(Chain.java:321)

What can be wrong? I can see that similar issue is discussed at https://groups.google.com/d/topic/aureliusgraphs/CWxF60DvoA0/discussion but no concrete solution, is the issue has been addressed in titan 0.5 (apparently not), how to approach this to reach a workable state.

Also I think that once running an identity mapper g._ to move data to HDFS and running subsequent jobs directly from HDFS is a good idea but so far identity mapper has never run successfully, alway dying due to GC overhead exceptions.

Also sometimes I get errors like 
10:21:29 INFO  org.apache.hadoop.mapreduce.Job  - Task Id : attempt_1430217400643_0041_m_000554_0, Status : FAILED
Error: java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.hadoop.formats.util.input.current.TitanHadoopSetupImpl
        at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:55)
        at com.thinkaurelius.titan.hadoop.formats.util.TitanInputFormat.getGraphSetup(TitanInputFormat.java:49)
        at com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraRecordReader.initialize(TitanCassandraRecordReader.java:44)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:44)
        ... 10 more
Caused by: com.thinkaurelius.titan.core.TitanException: A Titan graph with the same instance id [ac14151c32106-nmc-lp31] is already open. Might required forced shutdown.
        at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:133)
        at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)
        at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:83)
        at com.thinkaurelius.titan.hadoop.formats.util.input.current.TitanHadoopSetupImpl.<init>(TitanHadoopSetupImpl.java:39)
        ... 15 more
I've exited gremlin shell without doing g.shutdown() during some of the older runs, can that be the issue? can I find out all running graphs and shut them down?

Thanks & Regards,
Apoorva

CasperCLD

unread,
May 4, 2015, 7:40:30 AM5/4/15
to aureliu...@googlegroups.com
I cannot help you with this issue. However, I would suggest to use Spark over Hadoop since it's much faster and can be ran standalone (no need for Zookeeper). A spark connector is in development now and will be included in version 1.0.

Apoorva Gaurav

unread,
May 4, 2015, 8:41:21 AM5/4/15
to aureliu...@googlegroups.com

Thanks Casper,
Any pointers on using spark with titan

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/50c5bd2d-72ea-4b98-b6ec-b5c6a9ed3e37%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Apoorva Gaurav

unread,
May 5, 2015, 9:47:33 AM5/5/15
to aureliu...@googlegroups.com
Any suggestions?

Stephen Mallette

unread,
May 5, 2015, 10:01:04 AM5/5/15
to aureliu...@googlegroups.com
The usual fix is "more memory".  As long as there are complaints about OutOfMemoryError, I would throw more at it - 4096m must be inadequate for your data for whatever reason.  You could also try to reduce the split size to make the jobs smaller.  Keep in mind that i have some reservations about this advice given your use of cloudera - not sure if here are other issues there as i've not used that distribution.

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

CasperCLD

unread,
May 6, 2015, 3:54:38 AM5/6/15
to aureliu...@googlegroups.com
Reply all
Reply to author
Forward
0 new messages