Titan hadoop cassandra GC overhead limit exceeded

Apoorva Gaurav

unread,

May 2, 2015, 10:55:50 AM5/2/15

to aureliu...@googlegroups.com

Hello All,

I'm using titan 0.5.4 with cassandra 2.0.14 and trying to run map reduce job on it using cloudera CDH5. Its a user:user and user:interest graph where close to 16M users are there and close to 1000 interests, its possible that some interest can have as many as 2M users, in future these numbers will increase. The total graph isn't huge, with Replication factor of 3 the 4 node cassandra cluster has combined disk size of close to 28GB.

I've three yarn node managers on which I've given following limits

yarn-site.xml

<name>yarn.nodemanager.resource.memory-mb</name>

</property>

<name>yarn.scheduler.minimum-allocation-mb</name>

</property>

mapred-site.xml

<name>mapreduce.map.memory.mb</name>

</property>

<name>mapreduce.reduce.memory.mb</name>

</property>

<name>mapreduce.map.java.opts</name>

</property>

<name>mapreduce.reduce.java.opts</name>

</property>

and on the client I've given following limits

mapreduce.map.memory.mb=6000

mapreduce.reduce.memory.mb=6000

mapred.map.child.java.opts=-Xmx4096m

mapred.reduce.child.java.opts=-Xmx4096m

mapred.max.split.size=5242880

mapred.job.reuse.jvm.num.tasks=-1

titan.hadoop.input.format=com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraInputFormat

titan.hadoop.input.conf.storage.backend=cassandrathrift

titan.hadoop.input.conf.storage.hostname=lp1,lp3

titan.hadoop.input.conf.storage.port=9160

titan.hadoop.input.conf.storage.cassandra.keyspace=lgpgelsgraph

cassandra.input.partitioner.class=org.apache.cassandra.dht.Murmur3Partitioner

cassandra.input.split.size=16384

cassandra.thrift.framed.size_mb=499

cassandra.thrift.message.max_size_mb=500

titan.hadoop.sideeffect.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat

titan.hadoop.output.format=com.thinkaurelius.titan.hadoop.formats.noop.NoOpOutputFormat

On running simple gremlin queries I quite often get errors like this.

10:11:26 INFO org.apache.hadoop.mapreduce.Job - Task Id : attempt_1430217400643_0041_m_000179_2, Status : FAILED

Error: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded

at org.apache.hadoop.mapreduce.lib.chain.Chain.joinAllThreads(Chain.java:526)

at org.apache.hadoop.mapreduce.lib.chain.ChainMapper.run(ChainMapper.java:169)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

at com.google.common.collect.Sets.newHashSetWithExpectedSize(Sets.java:194)

at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:114)

at com.google.common.collect.HashMultimap.createCollection(HashMultimap.java:49)

at com.google.common.collect.AbstractMultimap.createCollection(AbstractMultimap.java:156)

at com.google.common.collect.AbstractMultimap.getOrCreateCollection(AbstractMultimap.java:214)

at com.google.common.collect.AbstractMultimap.put(AbstractMultimap.java:201)

at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:117)

at com.google.common.collect.HashMultimap.put(HashMultimap.java:49)

at com.thinkaurelius.titan.hadoop.FaunusSerializer.readEdges(FaunusSerializer.java:252)

at com.thinkaurelius.titan.hadoop.FaunusSerializer.readElement(FaunusSerializer.java:143)

at com.thinkaurelius.titan.hadoop.FaunusSerializer.readPathElement(FaunusSerializer.java:119)

at com.thinkaurelius.titan.hadoop.FaunusSerializer.readEdges(FaunusSerializer.java:218)

at com.thinkaurelius.titan.hadoop.FaunusSerializer.readVertex(FaunusSerializer.java:76)

at com.thinkaurelius.titan.hadoop.FaunusVertex.readFields(FaunusVertex.java:336)

at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)

at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)

at org.apache.hadoop.util.ReflectionUtils.copy(ReflectionUtils.java:296)

at org.apache.hadoop.mapreduce.lib.chain.Chain$ChainRecordWriter.writeToQueue(Chain.java:264)

at org.apache.hadoop.mapreduce.lib.chain.Chain$ChainRecordWriter.write(Chain.java:252)

at org.apache.hadoop.mapreduce.lib.chain.ChainMapContextImpl.write(ChainMapContextImpl.java:110)

at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)

at com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Map.map(VerticesMap.java:59)

at com.thinkaurelius.titan.hadoop.mapreduce.transform.VerticesMap$Map.map(VerticesMap.java:36)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)

at org.apache.hadoop.mapreduce.lib.chain.Chain$MapRunner.run(Chain.java:321)

What can be wrong? I can see that similar issue is discussed at https://groups.google.com/d/topic/aureliusgraphs/CWxF60DvoA0/discussion but no concrete solution, is the issue has been addressed in titan 0.5 (apparently not), how to approach this to reach a workable state.

Also I think that once running an identity mapper g._ to move data to HDFS and running subsequent jobs directly from HDFS is a good idea but so far identity mapper has never run successfully, alway dying due to GC overhead exceptions.

Also sometimes I get errors like

10:21:29 INFO org.apache.hadoop.mapreduce.Job - Task Id : attempt_1430217400643_0041_m_000554_0, Status : FAILED

Error: java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.hadoop.formats.util.input.current.TitanHadoopSetupImpl

at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:55)

at com.thinkaurelius.titan.hadoop.formats.util.TitanInputFormat.getGraphSetup(TitanInputFormat.java:49)

at com.thinkaurelius.titan.hadoop.formats.cassandra.TitanCassandraRecordReader.initialize(TitanCassandraRecordReader.java:44)

at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.lang.reflect.InvocationTargetException

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)

at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:44)

... 10 more

Caused by: com.thinkaurelius.titan.core.TitanException: A Titan graph with the same instance id [ac14151c32106-nmc-lp31] is already open. Might required forced shutdown.

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.<init>(StandardTitanGraph.java:133)

at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)

at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:83)

at com.thinkaurelius.titan.hadoop.formats.util.input.current.TitanHadoopSetupImpl.<init>(TitanHadoopSetupImpl.java:39)

... 15 more

I've exited gremlin shell without doing g.shutdown() during some of the older runs, can that be the issue? can I find out all running graphs and shut them down?

Thanks & Regards,

Apoorva

CasperCLD

unread,

May 4, 2015, 7:40:30 AM5/4/15

to aureliu...@googlegroups.com

I cannot help you with this issue. However, I would suggest to use Spark over Hadoop since it's much faster and can be ran standalone (no need for Zookeeper). A spark connector is in development now and will be included in version 1.0.

Apoorva Gaurav

unread,

May 4, 2015, 8:41:21 AM5/4/15

to aureliu...@googlegroups.com

Thanks Casper,
Any pointers on using spark with titan

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/50c5bd2d-72ea-4b98-b6ec-b5c6a9ed3e37%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Apoorva Gaurav

unread,

May 5, 2015, 9:47:33 AM5/5/15

to aureliu...@googlegroups.com

Any suggestions?

Stephen Mallette

unread,

May 5, 2015, 10:01:04 AM5/5/15

to aureliu...@googlegroups.com

The usual fix is "more memory". As long as there are complaints about OutOfMemoryError, I would throw more at it - 4096m must be inadequate for your data for whatever reason. You could also try to reduce the split size to make the jobs smaller. Keep in mind that i have some reservations about this advice given your use of cloudera - not sure if here are other issues there as i've not used that distribution.

--

You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/44475eb7-c28c-4641-b12c-09a09908e544%40googlegroups.com.

CasperCLD

unread,

May 6, 2015, 3:54:38 AM5/6/15

to aureliu...@googlegroups.com

Reply all

Reply to author

Forward