Strategy for loading about 40 GB data to a 3 node OrientDB cluster

42 views
Skip to first unread message

praveen....@tigeranalytics.com

unread,
Aug 26, 2016, 11:11:03 AM8/26/16
to OrientDB
I'm in the process of working on a POC with OrientDB. I've set it up across 3 servers. I read the OrientDB documentation and wanted to
know the best possible method to load the data which is in the form of CSV files. The schema having 3 class vertices and 3 class edges which should be
interconnected among one another.

Below are some of the questions i have :

1) Does it make sense in terms of ETL performance, if i create 3 clusters for each of the classes and assign each cluster to one of the servers ? ( based on this link : http://orientdb.com/docs/2.2.x/Distributed-Sharding.html  I'm not worried about fault tolerance at this stage )

2) Regarding the ETL storage process, i'm considering 3 options :
For the 2nd and 3rd method, I'm required to provide Record Ids manually, My doubt is how do i make sure Duplicate vertices are not created. Will Indexing help avoid this ?
How does the above 3 methods compare in terms of performance ?

3) Is it possible to store in one server of the OrientDB cluster within that machine using the "plocal" option in the ETL tool ?

4) Is it possible to use plocal option for ETL , even when the OrientDB runs on distributed mode ?


Luca Garulli

unread,
Aug 26, 2016, 2:42:35 PM8/26/16
to OrientDB

Best Regards,

Luca Garulli
Founder & CEO

Want to share your opinion about OrientDB?
Rate & review us at Gartner's Software Review


--

---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages