Neo4j is pretty slow when trying to create relationships between existing nodes with a large amount of data

Praveen T

unread,

Jun 15, 2016, 8:41:49 PM6/15/16

to Neo4j

Hi,

I am working on the latest version of Neo4j. Been trying to model a graph by parsing 13 million lines of syslog information present in network switches. Some of the nodes in the graph have already been created and i am trying to insert some more information into the graph. I am using the official python driver to insert data into the server which is being hosted in a server with pretty good configs.Been running for atleast 1.5 days but just done with 4 lakh lines. Been using the transaction framework for working on the graph Can someone help me with why the insertion is taking a lot of time. Can someone give me tips on how to make it faster?

Thanks

Praveen

Michael Hunger

unread,

Jun 15, 2016, 8:54:45 PM6/15/16

to ne...@googlegroups.com

You have to share more detail of what you're doing.

Make sure to have an index / constraint on the label + property you are matching / merging.

it also makes sense if you are import larger amounts to batch updates, into eg. 1k to 10k blocks.

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Praveen T

unread,

Jun 20, 2016, 5:53:08 PM6/20/16

to Neo4j

Hi,

I have a number of syslogs of the format <date> <pc> <Failed Password> <username> <ip address>.I am trying to insert them into a Neo4j 3.0.2 community edition graph. PC and IP may be repeated so i first create a node with unique id for the particular syslog then match and try to get the other nodes if they exist or else i create nodes and assign relationships between the nodes based on the information in the log i parse. I am using the python based transactions to insert stuff into the graph. Can you point me to the doc where batch insertion APIs are explained. Thought transactions would do that with regards to the python driver.

Michael Hunger

unread,

Jun 20, 2016, 6:26:44 PM6/20/16

to ne...@googlegroups.com

Can you share your queries and the output of ":schema" ?

You'd do he tx batching in python.

You should also use params

You could also do it like this (send in a list of 10k dicts as parameter rows)

MERGE (log:Syslog {id:{log_id})

UNWIND {rows} as row

CREATE (log)-[:CONTAINS]->(l:Line {logline:row.line_no, time: row.timestamp}

MERGE (u:User {id:row.user})

CREATE (l)-[:USER]->(u)

etc.

CREATE

Reply all

Reply to author

Forward