How to ELT data from Relational DB and Bulk load into Titan efficiently

56 views
Skip to first unread message

史晔翎

unread,
Sep 12, 2016, 2:19:25 AM9/12/16
to Aurelius
dear folks,
We are now using titan (1.0.0) as the graph data store of our analytic product. One issue we are facing now is that we need to transform customer's relational data into graph data and store them in titan and the process is super slow. So I'm posting this message to see if we could find some suggestion or guidance here. 

We developed our customized ETL code with multi-threading for both reading and writing.  The ETL code queries data from relational DB,  map them into graph vertices and edges according to customer configuration and and then store them into titan.

On our 3-node cassandra cluster setup, what we can get it around 1k/sec inserting rate for vertices and around 600/sec for edges (we verify whether vertices exists before adding an edge). It's not so exciting and it took 8 hours to load a data set with 2.5M vertices and 12M edges.

I'm wondering if there is a better solution to perform such task or at least any tips for where we could start tuning the process. Will titan-hadoop help on this? 

P.S.
 the batch loading config is already turned on. 

Thanks

Stephen Mallette

unread,
Sep 15, 2016, 7:39:35 AM9/15/16
to Aurelius
That's really slow, but i'm not sure what could be wrong. I wouldn't use titan-hadoop for just 12M edges - you just need to figure out what might be amiss with your current loading strategy. I would normally recommend a Vertex cache of some sort for this amount of data, but your speed is slow enough that I'd wonder if something else wasn't wrong. What is your commit size? Do you have good indices setup for vertex lookups?

--
You received this message because you are subscribed to the Google Groups "Aurelius" group.
To unsubscribe from this group and stop receiving emails from it, send an email to aureliusgraphs+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/aureliusgraphs/721a6d8d-be08-40d1-aed8-85873125d886%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages