Extremely slow query - Import relationships from CSV

41 views
Skip to first unread message

Eugene C.

unread,
Mar 20, 2015, 4:12:11 AM3/20/15
to ne...@googlegroups.com
Hey everyone, 

I have some issues importing a set of relationships from a CSV file. 

I'm running Neo4j 2.1.7 under Mac OSX (10.9.5), 16GB RAM.

The file has the following schema: 

user_id, shop_id
1,230
1,458
1,783
2,942
2,123
etc.

It contains about 2M records (relationships). 

Here is the query I'm running using the browser UI (I was also trying to do the same with a REST call):
USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file://path/to/my/file.csv" AS relation
MATCH (user:User {id: relation.user_id})
MATCH (shop:Shop {id: relation.shop_id})
MERGE (user)-[:LIKES]->(shop)

This query takes ages to run, about 800 seconds. I do have indexes on :User(id) and :Shop(id). Created them with: 
CREATE INDEX ON :User(id) 
CREATE INDEX ON :Shop(id)

 Any ideas on how to increase the performance? 

Thanks

Michael Hunger

unread,
Mar 20, 2015, 6:43:04 AM3/20/15
to ne...@googlegroups.com
Remove the space before shop_id

try to run:

LOAD CSV WITH HEADERS FROM "file:test.csv" AS r return r.user_id, r.shop_id limit 10;

to see if it is loaded correctly

Also make sure that you didn't store the id's as numeric values in the first place, then you have to use toInt(r.shop_id)

M
> --
> You received this message because you are subscribed to the Google Groups "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Eugene C.

unread,
Mar 24, 2015, 10:58:25 AM3/24/15
to ne...@googlegroups.com
Yes, it's loaded correctly. Also the values are strings, already checked that. 
Here is a one more workaround: 

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file://path/to/file.csv" AS relation
MATCH (user:User {id: relation.user_id})-[likes:LIKES]->(shop:Shop {id: relation.shop_id})
return count(likes)

This query takes about 800-900 seconds as well. I believe the previous query is slow of the same reason - slow matching of the existing relationship. Is that possible to improve the performance here, or it's just the way Neo4j works? 

Michael Hunger

unread,
Mar 26, 2015, 5:14:14 PM3/26/15
to ne...@googlegroups.com
if you have an index on :User(id) and/or :Shop(id) it should be fast.

How long does this take?

LOAD CSV WITH HEADERS FROM "file://path/to/file.csv" AS relation
RETURN count(*);

and this:

LOAD CSV WITH HEADERS FROM "file://path/to/file.csv" AS relation
MATCH (user:User {id: relation.user_id})
RETURN count(*);

and this

LOAD CSV WITH HEADERS FROM "file://path/to/file.csv" AS relation
MATCH (shop:Shop {id: relation.shop_id})
return count(*)

and

LOAD CSV WITH HEADERS FROM "file://path/to/file.csv" AS relation
MATCH(shop:Shop {id: relation.shop_id})<-[likes:LIKES]- (user:User {id: relation.user_id})
return count(*)

Michael

Reply all
Reply to author
Forward
0 new messages