Slow shell performance when piping commands

27 views
Skip to first unread message

Andy Law

unread,
Jul 5, 2017, 2:59:51 AM7/5/17
to Neo4j
Trying to load data representing 2 node types and one set of relationships between the two.

Loading from CSV files with headers

Working in the browser interface, loading the two node data sets (32 and 17717 nodes respectively) and the relationships between them takes less than 6 seconds

Working in the cypher-shell pasting individual commands in one after the other (same as using the browser interface) the data load takes about the same length of time.

Putting the commands into a text script file and piping it into cypher-shell takes 9 minutes and 55 seconds (10 minutes).

What is going on here? Anyone else seen this?

For the record, I'm on a Late 2012 27-inch iMac with 24GB RAM. Running OSX 10.11.6 and with Java 1.8.0_131-b11. Neo4J is 3.2.1 Community Edition.

Michael Hunger

unread,
Jul 5, 2017, 4:29:11 AM7/5/17
to ne...@googlegroups.com
Can you share your commands and their count? And the size of the csv.

Also the output of :schema

Von meinem iPhone gesendet
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

LAW Andy

unread,
Jul 5, 2017, 6:43:28 AM7/5/17
to ne...@googlegroups.com
Michael,

As discussed on the Slack channel, the problem does seem to go away with a brand new clean instance of neo4j.

Commands are as follows…

load csv with headers from "file:///chromosomes.txt" AS line
create (:Chromosome { name: line.`Chromosome/scaffold name`});

create index on :Chromosome(name);

using periodic commit 1000
LOAD CSV with headers FROM "file:///genes.txt" AS line
create ( :Gene { id: line.`Gene stable ID`, name: line.`Gene name`, description: line.`Gene description`, type: line.`Gene type`});

create index on :Gene(id);

using periodic commit 1000
load csv with headers from "file:///genes-to-chromosomes.txt" as line
match (chromosome:Chromosome {name: line.`Chromosome/scaffold name`})
match (gene:Gene {id: line.`Gene stable ID`})
merge (gene)-[:MAPS_TO {start: toInteger(line.`Gene start (bp)`), end: toInteger(line.`Gene end (bp)`), strand: toInteger(line.Strand)}]->(chromosome);

It’s the last one that takes the time,

File parameters are:

chromosomes.txt - 33 lines, 112bytes
genes.txt - 17718 lines, 1.4Mbytes
genes-to-chromosomes.txt - 17718 lines, 720Kbytes

So not a significant data set.


Output of :schema is…

Indexes
   ON :Chromosome(name) ONLINE 
   ON :Gene(id) ONLINE 
   ON :Transcript(id) ONLINE 

No constraints


Later,

Andy

LAW Andy

unread,
Jul 5, 2017, 2:08:59 PM7/5/17
to ne...@googlegroups.com
OK. So this turns out to be a timing issue on the availability of the indexes being created.

Inserting calls to db.awaitIndex() immediately after index creation fixes the problem.

Later,

Andy

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Reply all
Reply to author
Forward
0 new messages