Hi, All,
Thanks for all the helps!
Now I consider Ziolek's opinion: use the latest version of Neo4j and use import tool.
According to the import tool examples, I've also update my scenario, as described below.
Actually, I want to build social network circles.
I've six months' data in the oracle database, stored as six tables named TH_07, TH_08, TH_09, TH_10, TH_11, TH_12 respectly.
Every table has the same description:
id1 varchar, id2 varchar, relationship_property int, primary key is {id1, id2}
P.S. There may be exactly the same {id1, id2} pair between different tables, but with different relationship_property, e.g. there is one and only one record {ABC, XYZ, 10} in TH_07, and one and only record {ABC, XYZ, int} in other tables like TH_09.
Each tables has about 80~90 million rows!
By the way, I set up the Neo4j database and oracle on exactly the same machine with 256GB RAM and 64-core CPU.
I want to build a graph database which each id1 and id2 represent a node, and if there is a record (id1, id2, relation_property) in oracle, create a relationship between id1 and id2 with relation_property.
The First Question:
I need UNIQUE node with id, so I have three choices:
1. use DISTINCT in oracle, but I have six tables so it's very hard;
2. use MERGE in Cypher, but it's too slow! I cannot stand the low effiency;
3. use Python to connect to oracle, and preprocess the data in Python (since I've 256GB RAM it's possible to process such big data)
Is it possible to import 7.5 billion nodes once from csv file?
The Second Question:
How can I update the relationship_property? For example, I've {ABC, XYZ, 10} in table TH_07, and {ABC, XYZ, 20} in table TH_08, so I hope update relationship between {ABC} and {XYZ} is 10+20 = 30 for simplicity.
1. Process it also in Python?
2. Can I do this in Cypher?
The Third Question:
I've tried LOAD CSV in Neo4j -2.1.6-community version.
The Cypher language is exactly shown below:
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM 'FILEPATH' AS ROW
CREATE (n:User {id: row.id1})
CREATE (m:User {id:row.id2})
However, during the processing, I've encountered error such as "Kenel error, please restart or recover" something like that (Sorry I did not record the error)
The Last Question:
How can I set the Neo4j Server Configuration? As you know, I've 7.5 billion nodes and about 100 billion relationships. After importing the data, I should do such computation, such as Degree Centrality, Betweenness Centrality, Closeness Centrality and something like this.
How can I use my computer efficiently?
Thank you!
Yours, Jesse
在 2015年2月3日星期二 UTC+8下午4:18:39,Jesse Liu写道: