Titan batch insert problems

505 views
Skip to first unread message

Dustin Spicuzza

unread,
Aug 22, 2013, 2:50:51 PM8/22/13
to aureliu...@googlegroups.com


Hey,


We're evaluating various graph databases, and I'm having (what seems like) performance problems with Titan. At the moment, I've only been evaluating the batch ingest rate on a dataset of 2M nodes, but I've only been able to insert part of my data set before a SocketTimeoutException occurs. It has been inserting at maybe 1000 vertices/second, but various sources I've found online indicate that I should expect the performance to be much higher (such as http://architects.dzone.com/articles/educating-planet-and-graph, which indicated 1.2M edges/second). The CPU/memory usage is not particularly high on the titan nodes (maybe 50-80% CPU according to top, and 30% memory). 


Additionally, I keep getting exceptions during the insert process indicating socket timeouts with cassandra (similar to https://github.com/thinkaurelius/titan/issues/250). I'm using BatchGraph with various transaction sizes, but they all seem to have the same exceptions.  I've adjusted storage.buffer-size to 131072 which helped, but it still eventually dies. storage.batch-loading doesn't seem to make any difference in performance speed. 


I'm currently running the cluster on 4 rackspace nodes with 8GB RAM and 4 cores. I'm using Titan 0.3.2 on Java 6, with embedded cassandra configuration for titan. I've noticed the timeout exceptions happen less when I run the insertion program from one of the nodes instead of an external machine, but they still happen. 


I've found advice at various places, and gotten slightly better performance, but not by much:

https://groups.google.com/forum/#!topic/aureliusgraphs/FOBy4VBQP44

https://groups.google.com/forum/#!topic/aureliusgraphs/n2M2SS-X_2M


My import code looks roughly like this:


Configuration conf = new BaseConfiguration();

conf.setProperty("storage.backend", "cassandrathrift");

conf.setProperty("storage.hostname", "xx.xx.xx.xx");

conf.setProperty("storage.connection-timeout", "10000");

TitanGraph g = TitanFactory.open(conf);

// good enough for now

TitanType t = g.getType(V_INDEX);

if (t == null)

{

g.makeType().name(V_INDEX).indexed(Vertex.class).unique(Direction.OUT).dataType(String.class).makePropertyKey();

g.makeType().name("connects").makeEdgeLabel();

g.commit();

}


// have tried various transaction sizes here

tg = new BatchGraph<TransactionalGraph>(g, VertexIDType.STRING, 25000);


line = reader.readLine();


String a, b, c;

String[] split;

Vertex aV, bV;

Edge edge;

String edgeID;

HashSet<String> edgeIDs = new HashSet<String>();


while (line != null) 

{

split = line.split("\t");


a = split[0];

b = split[1];

c = split[10];


aV = tg.getVertex(a);

if (aV == null)

{

// sometimes the exception happens here

aV = tg.addVertex(a);

aV.setProperty(V_INDEX, a);

}


bV = tg.getVertex(b);

if (bV == null)

{

// sometimes the exception happens here

bV = tg.addVertex(b);

bV.setProperty(V_INDEX, b);

}


edgeID = a + "|" + b;


if (!edgeIDs.contains(edgeID)) {


// sometimes the exception happens here

edge = tg.addEdge(edgeID, aV, bV, "connects");

edge.setProperty(E_A_INDEX, a);

edge.setProperty(E_B_INDEX, b);

edge.setProperty(E_C_INDEX, c);

edgeIDs.add(edgeID);

}


line = reader.readLine();

}


tg.shutdown();



java.lang.reflect.InvocationTargetException

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)

at java.lang.Thread.run(Thread.java:662)

Caused by: com.thinkaurelius.titan.core.TitanException: Could not commit transaction due to exception during persistence

at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:848)

at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:64)

at com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.stopTransaction(TitanBlueprintsGraph.java:91)

at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.nextElement(BatchGraph.java:213)

at com.tinkerpop.blueprints.util.wrappers.batch.BatchGraph.addVertex(BatchGraph.java:338)

at main.Importer.main(Importer.java:448)

... 6 more

Caused by: com.thinkaurelius.titan.core.TitanException: Unexpected exception during backend operation

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:66)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.save(StandardTitanGraph.java:277)

at com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:839)

... 12 more

Caused by: com.thinkaurelius.titan.core.TitanException: Permanent exception during backend operation

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:64)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.flushInternal(BufferTransaction.java:96)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction.mutate(BufferTransaction.java:84)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferedKeyColumnValueStore.mutate(BufferedKeyColumnValueStore.java:47)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.CachedKeyColumnValueStore.mutate(CachedKeyColumnValueStore.java:97)

at com.thinkaurelius.titan.diskstorage.locking.consistentkey.ConsistentKeyLockStore.mutate(ConsistentKeyLockStore.java:121)

at com.thinkaurelius.titan.diskstorage.BackendTransaction.mutateEdges(BackendTransaction.java:99)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.persist(StandardTitanGraph.java:315)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.access$000(StandardTitanGraph.java:45)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$2.call(StandardTitanGraph.java:270)

at com.thinkaurelius.titan.graphdb.database.StandardTitanGraph$2.call(StandardTitanGraph.java:203)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:61)

... 14 more

Caused by: com.thinkaurelius.titan.diskstorage.PermanentStorageException: Permanent failure in storage backend

at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftKeyColumnValueStore.convertException(CassandraThriftKeyColumnValueStore.java:270)

at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.mutateMany(CassandraThriftStoreManager.java:162)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction$1.call(BufferTransaction.java:99)

at com.thinkaurelius.titan.diskstorage.keycolumnvalue.BufferTransaction$1.call(BufferTransaction.java:96)

at com.thinkaurelius.titan.diskstorage.util.BackendOperation.execute(BackendOperation.java:61)

... 25 more

Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out

at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)

at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)

at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)

at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)

at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)

at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:964)

at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:950)

at com.thinkaurelius.titan.diskstorage.cassandra.thrift.CassandraThriftStoreManager.mutateMany(CassandraThriftStoreManager.java:160)

... 28 more

Caused by: java.net.SocketTimeoutException: Read timed out

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:129)

at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

... 39 more


Any thoughts/comments you have would be appreciated. Thanks!


Dustin


Matthias Broecheler

unread,
Sep 4, 2013, 8:15:16 PM9/4/13
to aureliu...@googlegroups.com
Hey Dustin,

can you try running Titan and Cassandra in separate jvms (i.e. not embedded)? In other words, setup your Cassandra cluster and run BatchGraph on separate instance connecting to the cluster?
I am suspecting that the timeouts might be the result of GC hickups. Just a hunch.

HTH,
Matthias


--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraph...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Matthias Broecheler
http://www.matthiasb.com

Annu Sharma

unread,
Sep 11, 2017, 10:43:20 PM9/11/17
to Aurelius
Hi,

I have been trying to resolve the same issue. As suggested by Matthias, I do have the, running in two JVMs - however, that does not solve the issue. The error happens intermittently for me though, during graph server initialization. 
Reply all
Reply to author
Forward
0 new messages