Is there a good way to batch insert data into standalone neo4j server with JAVA

706 views
Skip to first unread message

阿风

unread,
Mar 1, 2012, 11:35:10 AM3/1/12
to Neo4j
Is there a good way to batch insert data into standalone neo4j server
with JAVA ?
I want to import data to the standalone neo4j server use hadoop
mapReduce, and how about develope plugin with RMI ? or any other good
idea?

Jim Webber

unread,
Mar 1, 2012, 11:40:10 AM3/1/12
to ne...@googlegroups.com
Hi,

Yes you can insert into a Neo4j server database using the batch inserter with Java, if:

1. It's the first import (batch inserts aren't safe after that).
2. The server is shutdown
3. You have access to the filesystem where the Neo4j database is stored.

Then you can just use the batch inserter: http://docs.neo4j.org/chunked/stable/indexing-batchinsert.html

Jim

Shouer.Shen

unread,
Mar 1, 2012, 12:04:36 PM3/1/12
to Neo4j
Hi jim

I had read that page you show for me! but I think there are any other
problems:

1. The server must be shutdown when insert data using batch inserter?
2. I want to merger the data when there are some properties already
exist in database during inserting data , how to do these ?

Jim Webber

unread,
Mar 1, 2012, 12:17:34 PM3/1/12
to ne...@googlegroups.com
Hello,

> 1. The server must be shutdown when insert data using batch inserter?

Yes.

> 2. I want to merger the data when there are some properties already
> exist in database during inserting data , how to do these ?

The batch inserter is *not* safe to use on a populated database.

You're going to have to do something transactional - the output of your Map/Reduce work should bind to the Java or REST API and fill the database that way.

Using the Java API will be less latent (no JSON overhead), and will give you better control over the scope of a transaction. You want your transactions to be large so that the cost of committing them is tiny compared to the amount of data you're storing, but not so large so that if they fail it is a pain.

If you really want to do this through the REST API, I would suggest writing an unmanaged extension to do this work, see:

http://docs.neo4j.org/chunked/1.6/server-unmanaged-extensions.html

Jim

Shouer.Shen

unread,
Mar 1, 2012, 12:42:00 PM3/1/12
to Neo4j
Hi, How to use the Java API in the standalone server ?

Doesn`t The Java API only use in EmbeddedGraphDatabase ?

Jim Webber

unread,
Mar 1, 2012, 12:49:15 PM3/1/12
to ne...@googlegroups.com
Hi,

> Hi, How to use the Java API in the standalone server ?
>
> Doesn`t The Java API only use in EmbeddedGraphDatabase ?

You have 2 choices:

1. Shutdown the server and run a standalone Java app against your store on disk.
2. Create an unmanaged extension to do this. Unmanaged extensions are hosted by the server, but have full access the Java API.

Jim

Shouer.Shen

unread,
Mar 1, 2012, 12:58:54 PM3/1/12
to Neo4j
Hi, Jim

Thanks for your help

very frustrating to hear this, and I will try the second choice.

It`s too late here , and I must go to bed now !

Michael Hunger

unread,
Mar 1, 2012, 1:59:39 PM3/1/12
to ne...@googlegroups.com
If you want continuously import data into the server the managed extension is the better way to go.

You can post with many threads in parallel to the server and the extension will run at full speed against the embedded graph database.

See here: http://docs.neo4j.org/chunked/milestone/server-unmanaged-extensions.html

Michael

Shouer.Shen

unread,
Mar 1, 2012, 9:22:14 PM3/1/12
to Neo4j
Hi, Michael

the managed extension can merger data when import data ?

I am not familiar with the managed extension.

and where can find some demo about it ?

shen

On 3月2日, 上午2时59分, Michael Hunger <michael.hun...@neotechnology.com>
wrote:

Michael Hunger

unread,
Mar 2, 2012, 3:23:03 AM3/2/12
to ne...@googlegroups.com
In the document I pointed out.

Yes the managed extension uses the normal neo4j-api and can work with already existing databases without problems.

Make sure that you have a appropriate tx-size (10k) when creating data in the graph.

Cheers

Michael

Shouer.Shen

unread,
Mar 2, 2012, 4:52:50 AM3/2/12
to Neo4j
Hi, Michael

what`s about a appropriate tx-size (10k) mean?

thanks,
shen


On 3月2日, 下午4时23分, Michael Hunger <michael.hun...@neotechnology.com>
wrote:

Shouer.Shen

unread,
Mar 2, 2012, 4:55:27 AM3/2/12
to Neo4j
Hi, Michael

Can use the Remote Graph Database in stand alone server ?

shen

On 3月2日, 下午4时23分, Michael Hunger <michael.hun...@neotechnology.com>
wrote:
Reply all
Reply to author
Forward
0 new messages