your approach sounds sensible. Really good choices and good understanding on the possibilities.
Of course using a java (batch) importer would be faster (probably by one to two orders of magnitude).
What you can do with the REST-API is employ multi-threading (to a sensible limit) to insert your data in parallel.
During insert you might tweak the cache_type setting in neo4j.properties to weak.
-Xmx4G should be ok, it might be interesting to try a bigger heap size but increase the YoungGeneration size to say 2-3G to not run into long gc-pauses.
After the import you should have a look at your data/graph.db directory and adjust the memory-mapping settings in neo4j.properties to match the file-sizes on disk so that neo4j is able to memory-map all the files completely.
In the end most of the time will be spent in JSON String parsing and formatting and building up the results of the rest-batch-importer in memory.
Otherwise your data seems like a really good fit.
Will the server run on the mac in the end?
What language / library are you using for accessing the REST-API ?
HTH
Michael
Hi Michael,Thanks for answering!If a Java based importer will give me a 10x bump, I'm definitely going to try that. As I understood, you just create a embedded DB from Java and later on copy that into the server installation, right? One thing. Can you create full text automatic indexes through the Java API (instead of exact ones)?
The import script is python. The graph is built in Hadoop. My hacked-together python code takes the output files from Hadoop and talks to the REST api directly (no library). I get two files out of the Hadoop job, one with a list of nodes and one with a list of edges. Nodes are identified by a domain specific ID. I create the nodes first and then keep an in memory map of domain specific ID -> node ID, such that I don't have to lookup the node IDs through an index or anything again when creating the edges. It would be involved to turn this into something multi-threaded. Rewriting in Java is a lot less work...
For reading / querying, there is a simple HTML / JavaScript based UI on top of everything that talks directly to the REST API. I can enter a Cypher query and see the results, do some highlighting of paths and look into node properties. It's very basic. I am working on ad hoc / prototype stuff, so this will not ever become a production setup (famous last words), which is why I keep it on my mac. As long as everything fits in RAM, it should be fine (IO is terrible on the Mac, especially with full disk encryption, which I have).
Am 22.02.2012 um 17:48 schrieb Friso van Vollenhoven:Yes on both accounts.Hi Michael,Thanks for answering!If a Java based importer will give me a 10x bump, I'm definitely going to try that. As I understood, you just create a embedded DB from Java and later on copy that into the server installation, right? One thing. Can you create full text automatic indexes through the Java API (instead of exact ones)?