Hi there,
I'm trying to load a graph with about 10M vertices into Titan+HBase using BatchGraph. There's only one process/thread that is loading the graph (it's using Map/Reduce, but with only one reduce process to load the graph).
After loading about 7.5M vertices, the process starts giving errors and runs out of Java Heap space. I'm puzzled as to what could be causing this. My Java heap space is set to 1.6G and I can't really go beyond this right now. However, I assume that BatchGraph or Titan shouldn't be using all that memory.
I have pasted the whole long error message below in the hopes that someone will be able to help.
My Titan configuration is as follows:
int batchSize = 100000;
BaseConfiguration titanConf = new BaseConfiguration();
titanConf.setProperty("storage.backend","hbase");
titanConf.setProperty("storage.hostname","res01");
titanConf.setProperty("storage.write-attempts",100);
titanConf.setProperty("storage.tablename", dbName);
titanConf.setProperty("storage.batch-loading",true);
titanConf.setProperty("ids.block-size", 4000000);
titanConf.setProperty("persist-wait-time",300000);
titanConf.setProperty("persist-attempts",100);
titanConf.setProperty("storage.idauthority-block-size", 4000000);
titanConf.setProperty("storage.buffer-size", batchSize);
titanConf.setProperty("storage.lock-wait-time", 1000);
titanConf.setProperty("storage.idauthority-wait-time", 1000);My BatchGraph code starts up like this
graph = TitanFactory.open(titanConf);
graph.createKeyIndex("name",Vertex.class);
bGraph = new BatchGraph<TitanGraph>(graph,VertexIDType.STRING,batchSize);
bGraph.setVertexIdKey("name");
bGraph.setLoadingFromScratch(true); And here's the error I get (it takes about 30 minutes to get to this point)
2013-05-07 16:48:22,176 INFO main.TitanRRSetLoadReducer1: Added 7600000 vertices so far
2013-05-07 16:50:01,455 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=titan_arr_test_9,,1367943278891.c3a9d1db91b3ffadf5905759bcd2fa4e., hostname=res16.research.damballa, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Call to res16.research.damballa/172.24.20.116:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1536)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1388)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.batch(HTablePool.java:357)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:149)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.flushInternal(BufferTransaction.java:93)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.mutate(BufferTransaction.java:79)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.mutate(BufferedKeyColumnValueStore.java:53)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.CachedKeyColumnValueStore.mutate(CachedKeyColumnValueStore.java:107)
at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.mutate(ConsistentKeyLockStore.java:103)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.mutateVertexIndex(BackendTransaction.java:107)
at com.thinkaurelius.titan.graphdb.database.IndexSerializer.addProperty(IndexSerializer.java:77)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.persist(StandardTitanGraph.java:313)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.save(StandardTitanGraph.java:264)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:796)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:38)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.stopTransaction(TitanBlueprintsGraph.java:63)
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.nextElement(BatchGraph.java:213)
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.addVertex(BatchGraph.java:338)
at main.TitanGraphUtils.addVertex(TitanGraphUtils.java:24)
at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:86)
at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:28)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.net.SocketTimeoutException: Call to res16.research.damballa/172.24.20.116:60020 failed on socket timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1052)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1025)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy13.multi(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1365)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1363)
at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1372)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.24.20.106:35256 remote=res16.research.damballa/172.24.20.116:60020]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:165)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:154)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:127)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:116)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:399)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
at java.io.DataInputStream.readInt(DataInputStream.java:370)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:672)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:606)
2013-05-07 16:50:36,475 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 33961ms for sessionid 0x13e6f51aa770765, closing socket connection and attempting reconnect
2013-05-07 16:50:48,487 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:50:48,488 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:50:48,515 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x13e6f51aa770765 has expired, closing socket connection
2013-05-07 16:50:48,523 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-05-07 16:50:48,523 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2013-05-07 16:50:48,523 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-05-07 16:51:21,156 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=res01.research.damballa:2181 sessionTimeout=180000 watcher=hconnection
2013-05-07 16:51:21,157 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 15...@res06.research.damballa
2013-05-07 16:51:21,158 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:51:21,158 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:51:21,163 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server res01.research.damballa/172.24.20.101:2181, sessionid = 0x13e6f51aa770776, negotiated timeout = 40000
2013-05-07 16:51:55,832 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 34667ms for sessionid 0x13e6f51aa770776, closing socket connection and attempting reconnect
2013-05-07 16:52:18,171 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:52:18,172 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:52:18,173 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 0x13e6f51aa770776 has expired, closing socket connection
2013-05-07 16:52:18,173 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: This client just lost it's session with ZooKeeper, will automatically reconnect when needed.
2013-05-07 16:52:18,173 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: ZK session expired. This disconnect could have been caused by a network partition or a long-running GC pause, either way it's recommended that you verify your environment.
2013-05-07 16:52:18,173 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2013-05-07 16:52:48,317 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=res01.research.damballa:2181 sessionTimeout=180000 watcher=hconnection
2013-05-07 16:52:48,317 INFO org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of this process is 15...@res06.research.damballa
2013-05-07 16:52:48,318 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server res01.research.damballa/172.24.20.101:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
2013-05-07 16:52:48,319 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to res01.research.damballa/172.24.20.101:2181, initiating session
2013-05-07 16:52:48,347 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete on server res01.research.damballa/172.24.20.101:2181, sessionid = 0x13e6f51aa770777, negotiated timeout = 40000
2013-05-07 16:53:44,683 INFO main.TitanRRSetLoadReducer1: Added 7700000 vertices so far
2013-05-07 16:56:07,573 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Failed all from region=titan_arr_test_9,,1367943278891.c3a9d1db91b3ffadf5905759bcd2fa4e., hostname=res16.research.damballa, port=60020
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1536)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1388)
at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.batch(HTablePool.java:357)
at com.thinkaurelius.titan.diskstorage.hbase.HBaseStoreManager.mutateMany(HBaseStoreManager.java:149)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.flushInternal(BufferTransaction.java:93)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.mutate(BufferTransaction.java:79)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.mutate(BufferedKeyColumnValueStore.java:53)
at com.thinkaurelius.titan.diskstorage.keycolumnvalue.CachedKeyColumnValueStore.mutate(CachedKeyColumnValueStore.java:107)
at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.mutate(ConsistentKeyLockStore.java:103)
at com.thinkaurelius.titan.diskstorage.BackendTransaction.mutateVertexIndex(BackendTransaction.java:107)
at com.thinkaurelius.titan.graphdb.database.IndexSerializer.addProperty(IndexSerializer.java:77)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.persist(StandardTitanGraph.java:313)
at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.save(StandardTitanGraph.java:264)
at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:796)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:38)
at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.stopTransaction(TitanBlueprintsGraph.java:63)
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.nextElement(BatchGraph.java:213)
at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.addVertex(BatchGraph.java:338)
at main.TitanGraphUtils.addVertex(TitanGraphUtils.java:24)
at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:86)
at main.TitanRRSetLoadReducer1.reduce(TitanRRSetLoadReducer1.java:28)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:164)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:595)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:433)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1332)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:216)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1372)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1360)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.hbase.util.Bytes.writeByteArray(Bytes.java:196)
at org.apache.hadoop.hbase.util.Bytes.writeByteArray(Bytes.java:180)
at org.apache.hadoop.hbase.client.Put.write(Put.java:400)
at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
at org.apache.hadoop.hbase.client.Action.write(Action.java:94)
at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
at org.apache.hadoop.hbase.client.MultiAction.write(MultiAction.java:102)
at org.apache.hadoop.hbase.io.HbaseObjectWritable.writeObject(HbaseObjectWritable.java:522)
at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:638)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1001)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
at $Proxy13.multi(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1365)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1363)
at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:210)
... 7 more