Hi! Has anyone here has worked with the Paradise Papers CSV dataset? (
https://offshoreleaks.icij.org/pages/database) The icij have used neo4j for their graph db, and from that link, offer the CSV files of the data. I was able to create the nodes for the graph, but I'm having a tough time creating the relationships from the edges CSV - it is currently importing now (~4 hours), but I'm hoping there is a better way out there than how I did it!
The difficulty for me, apart from being new to neo4j, is that the edges CSV contains all the relationships (5 different types) with the node_id for the source and target id specified. The node_id is unique to a node that is one of 5 types of nodes. So I figured that I could write a statement (ignoring properties) that would read the CSV as 'line' and then:
MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
CREATE (n1)-[:line.`rel_type`]->(n2);
The problem with this is that you can't programmatically specify the relationship type.. I don't think. So I came up with the following:
MATCH (n1 {node_id: line.`node_1`}), (n2 {node_id: line.`node_2`})
FOREACH(ignoreMe IN CASE WHEN line.`rel_type`='registered_address' THEN [1] ELSE [] END |
MERGE (n1)-[:REGISTERED_ADDRESS]->(n2)
)
<Other FOREACH statements, one for each type of relationship> ...
Now that last idea works, but really slowly, even with indexes on node_id for each node type. It was creating about 25 relationships every 10 seconds which wasn't going to work for ~ 400,000 relationships.
What I ended up doing was dumping the CSVs into a MySQL db and through a multi join query, 'selected' the individual CREATE statements for every relationship, saved this to a file, installed APOC, granted permissions and then ran the file using runFile. It is faster now (probably going to take 4-5 hours) but seems overly complicated. I'm hoping someone has a better way of doing it!
Ideas? :)