Memory allocation when using the Neo4j Shell to import from CSV

50 views
Skip to first unread message

Arielle Bonnici

unread,
Jan 27, 2016, 11:13:25 AM1/27/16
to Neo4j
I'm currently running a test with Neo4j CE 2.3.1 on a Windows 7 machine with 4GB memory and trying to understand how to manage memory allocation when importing from CSV using the Neo4jShell.

I am running these two commands, the first one to create the nodes and the second one to create edges (one edge for each node).

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line
CREATE (:EVENT { eventID: line.eventID, name: line.name, referrer: line.referrer, sessionID: toInt(line.sessionID), timestamp: toInt(line.timestamp), pID: toInt(line.pID)});

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line 
MATCH (f:Feature)
MATCH (e:EVENT) 
WHERE e.eventID = line.eventID
MERGE (e)-[:FOR]->(f);

I have the following related indexes and constraints:

Indexes                                                          
  ON :EVENT(eventID) ONLINE (for uniqueness constraint) 
  ON :Feature(name)  ONLINE (for uniqueness constraint) 

Constraints
  ON (feature:Feature) ASSERT feature.name IS UNIQUE
  ON (event:EVENT) ASSERT event.eventID IS UNIQUE

When I have 5 million nodes in the db and try to load a CSV that has another 5 million nodes, it takes about 15 minutes to complete and gets to ~1.5GB memory usage. If I immediately run the second command to create the edges, the memory starts going up again and sometimes it will stall at some point. In order to make sure the second command works I have to restart Neo4j. 

I'm trying to understand if I can improve this by optimizing the commands somehow, or if specifying memory settings in the properties file might help...in which case how best to go about that?

Michael Hunger

unread,
Jan 27, 2016, 11:29:52 AM1/27/16
to ne...@googlegroups.com
Can you try it on 2.3.2  too?
In general your code looks ok. Can you share your query plan?
Prefix your query with EXPLAIN and remove the USING PERIODIC COMMIT to see the plan.

How big is your neo4j store on disk?

Michael


--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Arielle Bonnici

unread,
Jan 28, 2016, 6:13:30 AM1/28/16
to Neo4j
Hi Michael,

I tried with 2.3.2, started with a fresh db that had 10 nodes in it. I then ran the first command to import 5 million nodes from CSV. This took 12 minutes and when it finished it was using 1.6GB memory. Size on disk was 2.5GB.

I ran the second command and it created the 5 million edges in 8 minutes, after which it was using 1.8GB memory and size on disk was 3.32GB. A few minutes later memory usage went down to 1.3GB.

Next I ran the first command again on another CSV file which contained 5 million events too. It took 15 minutes to create the nodes, was using 2.2GB memory and size on disk was 5.9GB.

When I ran the second command on this file it completed in 8 minutes and was still using 2.2GB memory. Size on disk was at 6.8GB.

After that I ran another command similar to the second one, which creates another edge for each node and it completed in 8 minutes and memory was at 2.3GB.

So up to now it does seem to be a bit better in that it doesn't stall.

When I prefix the second command with EXPLAIN this is what I'm getting:

Compiler CYPHER 2.3

Planner RULE

Runtime INTERPRETED

+--------------+-----------------------+-------------------------------+
| Operator     | Identifiers           | Other                         |
+--------------+-----------------------+-------------------------------+
| +EmptyResult |                       |                               |
| |            +-----------------------+-------------------------------+
| +Merge(Into) | anon[167], e, f, line | (e)-[:FOR]->(f)               |
| |            +-----------------------+-------------------------------+
| +SchemaIndex | e, f, line  | line.eventID; :EVENT(eventID) |
| |            +-----------------------+-------------------------------+
| +SchemaIndex | f, line               | line.name; :Feature(name)     |
| |            +-----------------------+-------------------------------+
| +LoadCSV     | line                  |                               |
+--------------+-----------------------+-------------------------------+

Total database accesses: ?


Regards,

Arielle
Reply all
Reply to author
Forward
0 new messages