Creating huge graphs from xml files

100 views
Skip to first unread message

Pablo Ramirez

unread,
Oct 3, 2012, 6:25:48 PM10/3/12
to ne...@googlegroups.com
Hi, I'm working in a project in which we have to create graphs from reading an xml file, the computer is a Core 2 Duo and 6Gb RAM, running on Windows takes 30 minutes and 16 minutes on Fedora,
but in other computer Core i5 and Core i7 takes 3 minutes aprox., What can I do to accelerate the process? Why is taking to much time?

I tried modifing the configuration file neo4j.properties but it has no effect in the creating process. Any idea about that?

Thanks.

Michael Hunger

unread,
Oct 3, 2012, 6:48:21 PM10/3/12
to ne...@googlegroups.com
Can you share more details!

versions used,
code used,
configuration of both machines (hardware, OS, and neo4j config)

probably slow disk (non ssd?)
how much memory has the other machine?

did you try to run a profiler (like visualvm or yourkit)?

etc.

Michael
> --
>
>

Pablo Ramirez

unread,
Oct 4, 2012, 12:07:13 AM10/4/12
to ne...@googlegroups.com
Neo version: 1.9SNAPSHOT
1st computer:
- Core 2 Duo 2.33 GHz
- 6GB RAM DDR2 666MHz
- HD 500 GB SATAII
- Windows 7 and Fedora Installed in the same computer.
- Windows 7 30 minutes, Fedora 16 minutes

2nd computer:
- Core i5 (4 cores)
- 4GB DDR3 1333MHz
- HD 500 GB SATAII
- Windows7, takes 3 minutes aprox

I'm programming with eclipse IDE
I attached de neo4j configuration file and de code of my application.
What is visualvm and yourkit?

--



GraphDbInstance.java
GraphNodes.java
neo4j.properties

Michael Hunger

unread,
Oct 4, 2012, 2:02:53 AM10/4/12
to ne...@googlegroups.com
Pablo,

several things.

#0 your properties file contains a lot of unsuitable values for your system size if you add the memory sizes you give for memory-mapped-io it is several times the amount of ram you have, I think for your import you can just leave off the properties file
#1 you should only have one gdb instance for your import otherwise you loose your caches in between.
#2 your tx size is too small (just 1 node or rel) you should only commit (tx.success(); tx.finish(); every 10k nodes/rels and create a new tx)
#3 instead of values and elements list, why not use a map with key-value ?
#4 don't create the rels with cypher if you have an embedded db instance at hand use gdb.getNodeById() if you had stored the actual node-id (which is internal) to your id-mapping somewhere
#5 I doubt that your id lookup works, what you want is either an auto-index on "id" or a manual index (gdb.index().forNodes("node"); index.add(node,"id",id); and the use index.get("id",id).getSingle() for lookup
#6 then use node.createRelationship(node2,type) and rel.setProperty()
#7 same problem with tx size in your case it is one tx per relationship
#8 if you want to use cypher you MUST use parameters for your passed in values (i.e. node-properties or id's to lookup) otherwise cypher has to parse the query for each execution

HTH

Michael

--
 
 
<GraphDbInstance.java><GraphNodes.java><neo4j.properties>

Pablo Ramirez

unread,
Oct 4, 2012, 2:29:21 AM10/4/12
to ne...@googlegroups.com
Thank you very mucha Michael, I will to try that.

--
 
 

Pablo Ramirez

unread,
Oct 5, 2012, 8:44:36 AM10/5/12
to ne...@googlegroups.com
Hi,
I've put the creation of nodes and relationships inside one transaction tx and it taked like 2 minutes.
How can I use the map with key-value when I'm reading from a xml file, and therefore I don't know how much properties de node has?

The number of properties from a node can be dinamic, the first node could have 10 properties, the next 15 properties and so on.

Michael Hunger

unread,
Oct 5, 2012, 8:52:50 AM10/5/12
to ne...@googlegroups.com
what for?
cypher ?
just fill a hashmap with the attribute properties key-value pairs and pass it as named parameter

in core-api you just call setProperty for each attribute

Michael

--
 
 

Pablo Ramirez

unread,
Oct 9, 2012, 2:01:52 AM10/9/12
to ne...@googlegroups.com
Hi Michael,
What do you want to mean with only one gdb instance?
How can I use a map with key-value?

Thanks Michael.

Michael Hunger

unread,
Oct 9, 2012, 2:15:35 AM10/9/12
to ne...@googlegroups.com
Am 09.10.2012 um 08:01 schrieb Pablo Ramirez <ramirezs...@gmail.com>:

Hi Michael,
What do you want to mean with only one gdb instance?
you create two graph-database instances, one for nodes and one for relationships and shutdown in between. Use one for both and shutdown after you're finished.

How can I use a map with key-value?
In cypher you pass it as a parameter, in core-api it is an individual setProperty, so no map.

--
 
 

Reply all
Reply to author
Forward
0 new messages