Performance while migrating data to Neo4j server

164 views
Skip to first unread message

Samwillie

unread,
May 3, 2012, 8:33:16 AM5/3/12
to Neo4j
Hi,

I am in the process of migrating data from tables in mySql into nodes
in Neo4j. There are approximately 20-30,000 table values that I need
to convert to nodes. Thanks to Peter and Michael, I have configured a
remote Neo4j Server and can access it with my Java web application
running on my local machine, which does the migration and creates
nodes and indexes them on the remote server also setting properties.

I notice a huge delay in doing this, (on the web-admin I see that for
the last half an hour only around 3000 nodes have been created &
indexed). The data retrieval from MySQL is fast- During the creation
of node, I also index these nodes. Is it because of indexing that it
takes such a long time? Or is there anyother method to make this whole
process faster?

Many thanks for your reply,

Greets

Peter Neubauer

unread,
May 3, 2012, 8:34:40 AM5/3/12
to ne...@googlegroups.com
Wow,
that is too slow. How are you inserting these nodes?

Cheers,

/peter neubauer

G:  neubauer.peter
S:  peter.neubauer
P:  +46 704 106975
L:   http://www.linkedin.com/in/neubauer
T:   @peterneubauer

If you can write, you can code - @coderdojomalmo
If you can sketch, you can use a graph database - @neo4j

Michael Hunger

unread,
May 3, 2012, 9:17:51 AM5/3/12
to ne...@googlegroups.com
Can you try to use the batch rest api and/or mutating cypher?

Sent from mobile device

Samwillie

unread,
May 3, 2012, 10:41:14 AM5/3/12
to Neo4j
Hi Peter,

This is what I am trying to do:

GraphDatabaseService graphDbService = new
RestGraphDatabase(SERVER_PATH_URL);
index = graphDbService.index();
Iterator<DummyObject> iterator =
myDao.getAllDummyObjects().iterator();
while (iterator.hasNext()) {
Node dummyNode = graphDbService.createNode();
//get all dummy objects from relational database
DummyObject relationalDBDummyObject = iterator.next();
Long dummyId = relationalDBDummyObject.getId();
.............
// migrate all properties from relationalDBDummyObject to dummyNode
here...
.............
Here is what I do, get all dummyObjects from the mySQL table, iterate
through them and for each object that is returned, create a node and
index one or two of its properties. I tried without indexing and even
this takes a long time.

@Michael, I am not aware of the techniques you mentioned - and how a
batch rest API works. Let me check this in parallel...

Thanks,

On May 3, 2:34 pm, Peter Neubauer <peter.neuba...@neotechnology.com>
wrote:

Michael Hunger

unread,
May 3, 2012, 10:44:29 AM5/3/12
to ne...@googlegroups.com
Sam,

I'm currently working on upgrading the java-rest-bindings to a better integration of the batch API.
I hope to be done by tonight. Would love if you could check it out then.

Mutating Cypher is in 1.8 see the blog post (blog.neo4j.org).

right now it is restApi.executeBatch(new BatchCallback() ....)

but that will most probably change to what Daniel Cox suggested.

Cheers

Michael

Samwillie

unread,
May 4, 2012, 6:28:50 AM5/4/12
to Neo4j
Hi Michael, Peter

While I wait for your information on updates to the java-rest-bindings
- I tried the following

Use the locally hosted neo4j server using
RestGraphDatabase(LOCALHOST_GRAPHDB_PATH_URL); It takes an hour for
approx. 20000 nodes, I create nodes and index them. It is much more
faster with an EmbeddedGraphDatabase. The process of how I create
nodes is described above in this thread.

Is this the normal behavior?

Greets

On May 3, 4:44 pm, Michael Hunger <michael.hun...@neotechnology.com>
wrote:

Michael Hunger

unread,
May 4, 2012, 7:16:28 AM5/4/12
to ne...@googlegroups.com
No I actually meant using an embedded graph database to import your data and then copy the database directory over to the server.

I also answered the springsource forum post with:

http://forum.springsource.org/showthread.php?126059-Neo4j-Insert-performance

Spring Data Neo4j is not designed for massive data inserts, there are some approaches to use Neo4j BatchInserter on a local database to insert the data. This one is able to insert one million nodes per second.

I would like you to test to use a local embedded database and measure the difference.

But using the Batch-Inserter (which is non-transactionally though) would be the fastest way:
(see https://groups.google.com/d/topic/ne...2YA/discussion for some code but read the whole thread).

I think it makes sense to offer this functionality in SDN itself, I created an issue to track it:https://jira.springsource.org/browse/DATAGRAPH-231

Cheers

Michael
Reply all
Reply to author
Forward
0 new messages