dear folks,
We are now using titan (1.0.0) as the graph data store of our analytic product. One issue we are facing now is that we need to transform customer's relational data into graph data and store them in titan and the process is super slow. So I'm posting this message to see if we could find some suggestion or guidance here.
We developed our customized ETL code with multi-threading for both reading and writing. The ETL code queries data from relational DB, map them into graph vertices and edges according to customer configuration and and then store them into titan.
On our 3-node cassandra cluster setup, what we can get it around 1k/sec inserting rate for vertices and around 600/sec for edges (we verify whether vertices exists before adding an edge). It's not so exciting and it took 8 hours to load a data set with 2.5M vertices and 12M edges.
I'm wondering if there is a better solution to perform such task or at least any tips for where we could start tuning the process. Will titan-hadoop help on this?
P.S.
the batch loading config is already turned on.
Thanks