Running out of Heap Space with BatchGraph and Titan+HBase

423 views
Skip to first unread message

Raj Bandyopadhyay

unread,
May 7, 2013, 1:08:54 PM5/7/13
to aureliu...@googlegroups.com
Hi there,

I'm trying to load a graph with about 10M vertices into Titan+HBase using BatchGraph. There's only one process/thread that is loading the graph (it's using Map/Reduce, but with only one reduce process to load the graph).

After loading about 7.5M vertices, the process starts giving errors and runs out of Java Heap space. I'm puzzled as to what could be causing this. My Java heap space is set to 1.6G and I can't really go beyond this right now. However, I assume that BatchGraph or Titan shouldn't be using all that memory.


I have pasted the whole long error message below in the hopes that someone will be able to help.

My Titan configuration is as follows:

        int batchSize = 100000;
        BaseConfiguration titanConf = new BaseConfiguration();
        titanConf.setProperty("storage.backend","hbase");
        titanConf.setProperty("storage.hostname","res01");
        titanConf.setProperty("storage.write-attempts",100);
        titanConf.setProperty("storage.tablename", dbName);
        titanConf.setProperty("storage.batch-loading",true);
        titanConf.setProperty("ids.block-size", 4000000);
        titanConf.setProperty("persist-wait-time",300000);
        titanConf.setProperty("persist-attempts",100);
        titanConf.setProperty("storage.idauthority-block-size", 4000000);
        titanConf.setProperty("storage.buffer-size", batchSize);
        titanConf.setProperty("storage.lock-wait-time", 1000);
        titanConf.setProperty("storage.idauthority-wait-time", 1000);


My BatchGraph code starts up like this

        graph = TitanFactory.open(titanConf);
        graph.createKeyIndex("name",Vertex.class);
       
        bGraph = new BatchGraph<TitanGraph>(graph,VertexIDType.STRING,batchSize);
        bGraph.setVertexIdKey("name");
        bGraph.setLoadingFromScratch(true);


 
And here's the error I get (it takes about 30 minutes to get to this point)
     


2013-05-07 16:48:22,176 INFO main.TitanRRSetLoadReducer1: Added 7600000 vertices so far
2013-05-07 16:50:01,455 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=titan_arr_test_9,,1367943278891.c3a9d1db91b3ffadf5905759bcd2fa4e., hostname=res16.research.damballa, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call to res16.research.damballa/172.24.20.116:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1536)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1388)
	at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
	at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.batch(HTablePool.java:357)
	at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:149)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.flushInternal(BufferTransaction.java:93)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.mutate(BufferTransaction.java:79)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.mutate(BufferedKeyColumnValueStore.java:53)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.CachedKeyColumnValueStore.mutate(CachedKeyColumnValueStore.java:107)
	at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.mutate(ConsistentKeyLockStore.java:103)
	at com.thinkaurelius.titan.diskstorage.BackendTransaction.mutateVertexIndex(BackendTransaction.java:107)
	at com.thinkaurelius.titan.graphdb.database.IndexSerializer.addProperty(IndexSerializer.java:77)
	at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.persist(StandardTitanGraph.java:313)
	at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.save(StandardTitanGraph.java:264)
	at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:796)
	at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:38)
	at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.stopTransaction(TitanBlueprintsGraph.java:63)
	at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.nextElement(BatchGraph.java:213)
	at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.addVertex(BatchGraph.java:338)
	at main.TitanGraphUtils.addVertex(TitanGraphUtils.java:24)
	at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:86)
	at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:28)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.net.SocketTimeoutException: Call to res16.research.damballa/172.24.20.116:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
	at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1052)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
	at $Proxy13.multi(Unknown Source)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1365)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1363)
	at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1372)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
	at java.io.FilterInputStream.read(FilterInputStream.java:116)
	at java.io.FilterInputStream.read(FilterInputStream.java:116)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:399)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
	at java.io.DataInputStream.readInt(DataInputStream.java:370)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:672)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)
2013-05-07 16:50:36,475 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 33961ms for sessionid 0x13e6f51aa770765, closing socket connection and attempting reconnect
2013-05-07 16:50:48,487 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:50:48,488 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:50:48,515 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x13e6f51aa770765 has expired, closing socket connection
2013-05-07 16:50:48,523 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-05-07 16:50:48,523 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2013-05-07 16:50:48,523 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-05-07 16:51:21,156 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=res01.research.damballa:2181 sessionTimeout=180000 watcher=hconnection
2013-05-07 16:51:21,157 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 15...@res06.research.damballa
2013-05-07 16:51:21,158 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:51:21,158 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:51:21,163 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server res01.research.damballa/172.24.20.101:2181, sessionid = 0x13e6f51aa770776, negotiated timeout = 40000
2013-05-07 16:51:55,832 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 34667ms for sessionid 0x13e6f51aa770776, closing socket connection and attempting reconnect
2013-05-07 16:52:18,171 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:52:18,172 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:52:18,173 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x13e6f51aa770776 has expired, closing socket connection
2013-05-07 16:52:18,173 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-05-07 16:52:18,173 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2013-05-07 16:52:18,173 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-05-07 16:52:48,317 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=res01.research.damballa:2181 sessionTimeout=180000 watcher=hconnection
2013-05-07 16:52:48,317 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 15...@res06.research.damballa
2013-05-07 16:52:48,318 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:52:48,319 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:52:48,347 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server res01.research.damballa/172.24.20.101:2181, sessionid = 0x13e6f51aa770777, negotiated timeout = 40000
2013-05-07 16:53:44,683 INFO main.TitanRRSetLoadReducer1: Added 7700000 vertices so far
2013-05-07 16:56:07,573 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=titan_arr_test_9,,1367943278891.c3a9d1db91b3ffadf5905759bcd2fa4e., hostname=res16.research.damballa, port=60020
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
	at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
	at java.util.concurrent.FutureTask.get(FutureTask.java:83)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1536)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1388)
	at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
	at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.batch(HTablePool.java:357)
	at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:149)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.flushInternal(BufferTransaction.java:93)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.mutate(BufferTransaction.java:79)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.mutate(BufferedKeyColumnValueStore.java:53)
	at com.thinkaurelius.titan.diskstorage.keycolumnvalue.CachedKeyColumnValueStore.mutate(CachedKeyColumnValueStore.java:107)
	at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.mutate(ConsistentKeyLockStore.java:103)
	at com.thinkaurelius.titan.diskstorage.BackendTransaction.mutateVertexIndex(BackendTransaction.java:107)
	at com.thinkaurelius.titan.graphdb.database.IndexSerializer.addProperty(IndexSerializer.java:77)
	at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.persist(StandardTitanGraph.java:313)
	at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.save(StandardTitanGraph.java:264)
	at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:796)
	at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:38)
	at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.stopTransaction(TitanBlueprintsGraph.java:63)
	at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.nextElement(BatchGraph.java:213)
	at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.addVertex(BatchGraph.java:338)
	at main.TitanGraphUtils.addVertex(TitanGraphUtils.java:24)
	at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:86)
	at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:28)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
	at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1372)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: Java heap space
	at java.util.Arrays.copyOf(Arrays.java:2786)
	at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
	at java.io.DataOutputStream.write(DataOutputStream.java:90)
	at org.apache.hadoop.hbase.util.Bytes.writeByteArray(Bytes.java:196)
	at org.apache.hadoop.hbase.util.Bytes.writeByteArray(Bytes.java:180)
	at org.apache.hadoop.hbase.client.Put.write(Put.java:400)
	at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
	at org.apache.hadoop.hbase.client.Action.write(Action.java:94)
	at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
	at org.apache.hadoop.hbase.client.MultiAction.write(MultiAction.java:102)
	at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
	at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
	at $Proxy13.multi(Unknown Source)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1365)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1363)
	at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
	... 7 more

Matthias Broecheler

unread,
May 7, 2013, 1:31:20 PM5/7/13
to aureliu...@googlegroups.com
Hey Raj,

BatchGraph takes quite a bit of memory because it keeps a mapping of (external vertex id)->(Titan internal vertex id) where the external vertex id is the one you provided. This is useful if you later add edges with the external vertex id because then Titan does not have to look them up.
However, if I remember your use case, you are not actually adding any edges. Hence, you might be better off (from a memory perspective) to not use BatchGraph and just insert edges in batches of a 1000 or so.

Best,
Matthias



--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Matthias Broecheler
http://www.matthiasb.com

Raj Bandyopadhyay

unread,
May 7, 2013, 2:58:07 PM5/7/13
to aureliu...@googlegroups.com
Hmm, interesting. I do intend to add several million edges later. I'm just trying to do the vertex-only addition first :)

Do you recommend using BatchGraph for the edge addition step?

Thanks
Raj

Matthias Broecheler

unread,
May 7, 2013, 3:35:07 PM5/7/13
to aureliu...@googlegroups.com
Yes, once you get to that phase, it will be very fast if the ids have been cached. But that does require sufficient RAM for caching. Depending on what your external ids look like you might want to compression (if they are long strings for instance).


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages