TitanGraph with Apache Cassandra Emebedded - Performance

104 views
Skip to first unread message

Alex Punnen

unread,
Jul 15, 2014, 3:55:03 PM7/15/14
to aureliu...@googlegroups.com
Hi,
  I was checking the performance of Apache Cassandra in embedded mode; To add around 18 million vertices it is taking about 1 hour in my laptop ( disk 100Mb/sec). I had a feeling that with ApacheCaasndra I would have much faster write speed; 

I had tried in standalone mode too running in local host with the same performance. Is it that there is some Cassandra related setting I am missing; or am I doing many things wrong or is it expected ( with JgraphT I am able to hold these many in around 2 gB ram; so I am thinking if writes are too costly to design to hold in memory and parallelize across nodes ( with a common db is the easiest)

                long startime = System.currentTimeMillis();
System.out.println("Going to add 300,000  Edges! ");
cellgraph.addVertex(sourceCell);
for(int i =0; i <300000; i++){
CellLite_itf sourceCellTemp= new CellLite(10,155,27414529+i);
//cellgraph.addVertex(sourceCellTemp);
for(int k =0; k <60; k++){
CellLite_itf targetCellTemp= new CellLite(10,155,27412484+k);
//cellgraph.addVertex(targetCellTemp);
cellgraph.addEdgeExpiriment(sourceCellTemp, targetCellTemp);

}
}
long timetaken = (System.currentTimeMillis() -startime)/1000;


public void addEdgeExpiriment(CellLite_itf sourceCell, CellLite_itf adjcell) {
//TransactionalGraph tx = graph.newTransaction(); //Only needed for multithreaded application, was not showing much performance
Kpi_itf  kpitemp = new Kpi_Edge(1.0, null, 0, 0);
Vertex sourceT= graph.addVertex(null);
sourceT.setProperty("EutranCell", new Integer(sourceCell.hashCode()).toString());
sourceT.setProperty("EutranCellObj", sourceCell);
Vertex targetT =graph.addVertex(null);
targetT.setProperty("EutranCell", new Integer(adjcell.hashCode()).toString());
targetT.setProperty("EutranCellObj", adjcell);
Edge edge= graph.addEdge(null,sourceT,targetT,"LNREL");
edge.setProperty("Kpi_startdate", kpitemp.getStartDate());
edge.setProperty("Kpi_enddate", kpitemp.getEndDate());
edge.setProperty("Kpi_value", kpitemp.getValue());
count++;
if(count % 10000 ==0){ --> I did this to batch writes to diskthough I know there is a conf.setProperty("tx-cache-size",60000)

System.out.println("100,00 Calls to AddEdgeCompleted Invocation=" +count + " TimeTaken=-"  + (System.currentTimeMillis()-starttime));
starttime=System.currentTimeMillis();
//tx.commit();
graph.commit();
};


Here are the settings

                Configuration conf = new BaseConfiguration();
conf.setProperty("storage.backend","embeddedcassandra");
conf.setProperty("storage.conf-file","file:///d:/Program_Files/apache-cassandra-2.0.8/conf/cassandra.yaml");

conf.setProperty("ids.block-size",1000000);
conf.setProperty("storage.buffer-size",2024);
conf.setProperty("tx-cache-size",60000);
conf.setProperty("autotype","blueprints");
conf.setProperty("attributes.allow-all", true);
//conf.setProperty("storage.batch-loading",true);

TitanManagement mgmt = graph.getManagementSystem();
try{
mgmt.getEdgeLabel("LNREL");
}catch(IllegalArgumentException e)
{
System.out.println("Adding Edge and Poprety keys");
graph.makeEdgeLabel("LNREL").make();
graph.makePropertyKey("Kpi_startdate").dataType(Long.class).make();
graph.makePropertyKey("Kpi_enddate").dataType(Long.class).make();
graph.makePropertyKey("Kpi_value").dataType(Double.class).make();
graph.makePropertyKey("EutranCellObj").dataType(CellLite.class).make();
graph.makePropertyKey("EutranCell").dataType(String.class).make();
}

Cassandra yaml setting is at its default;

Maybe some other setting is needed or misconfigured that leads to poor write speeds;(also GC is kicking in heavily  . Here is a snapshot 




Alex Punnen

unread,
Jul 15, 2014, 4:47:05 PM7/15/14
to aureliu...@googlegroups.com

Here are some snap shots for a run I started anew with conf.setProperty("storage.batch-loading",true);
Pretty slow -10,000 Edges in 50 seconds; (10k source and 10 k targer vertices with 2 properties each and an edge between them); Or I am using Titan Graph wrongly;

Some info from Cassandra logs

1273666 [OptionalTasks:1] INFO  org.apache.cassandra.db.MeteredFlusher  - flushng high-traffic column family CFS(Keyspace='titan', ColumnFamily='edgestore') (estimated 187163160 bytes)
1273667 [OptionalTasks:1] INFO  org.apache.cassandra.db.ColumnFamilyStore  - Enueuing flush of Memtable-edgestore@36206096(18716316/187163160 serialized/liveytes, 558696 ops)
1273690 [FlushWriter:9] INFO  org.apache.cassandra.db.Memtable  - Writing Memtale-edgestore@36206096(18716316/187163160 serialized/live bytes, 558696 ops)
1279831 [FlushWriter:9] INFO  org.apache.cassandra.db.Memtable  - Completed fluhing d:\var\lib\cassandra\data\titan\edgestore\titan-edgestore-jb-10-Data.db (1
033876 bytes) for commitlog position ReplayPosition(segmentId=1405455771097, poition=27293243)

I will update the thread
15-07-2014 23-36-13.jpg
15-07-2014 23-34-46.jpg
15-07-2014 22-52-40.jpg

Gehrig Kunz

unread,
Jul 16, 2014, 1:50:21 PM7/16/14
to aureliu...@googlegroups.com
Hey Alex,

I'm part of the PlanetCassandra.org community team, supporting Apache Cassandra. We know a ton of users in the community using Cassandra and Titan. I'd be happy to intro you for some help, if you'd like. Feel free to send over an email gk...@datastax.com, or to tweet me @gehrigds. 

Daniel Kuppitz

unread,
Jul 19, 2014, 8:30:49 PM7/19/14
to aureliu...@googlegroups.com
Hi Alex,

I modified your code slightly (basically replaced the type CellLite with HashMaps, enabled batch-loading and used BatchGraph) and it finished successfully after ~80 minutes. That's almost the same time that you've mentioned in your initial post (since my disk write speed is @ around 80MB/sec and yours @ around 100MB/sec, I expected to see a longer execution time). However, this time looks reasonable to me on a single machine and only a single thread. I'm surprised that you've had almost the same time with batch-loading disabled.

Regarding your 2nd post: I'm not quite sure why you see such a dramatic performance drop after enabling batch-loading. Find my Gist with a batch-loading-enabled script here: https://gist.github.com/dkuppitz/2519d61b25d0710131ab

Cheers,
Daniel
Reply all
Reply to author
Forward
0 new messages