Neo4j store is not cleanly shut down; Recovering from inconsistent db state from interrupted batch insertion

Abhishek Gupta

unread,

Dec 15, 2013, 9:50:15 AM12/15/13

to ne...@googlegroups.com

I was importing ttl ontologies to dbpedia following the blog posthttp://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html. The post uses BatchInserters to speed up the task. It mentions

Batch insertion is not transactional. If something goes wrong and you don't shutDown() your database properly, the database becomes inconsistent.

I had to interrupt one of the batch insertion tasks as it was taking time much longer than expected which left my database in an inconsistence state. I get the following message:

db_name store is not cleanly shut down

How can I recover my database from this state? Also, for future purposes is there a way for committing after importing every file so that reverting back to the last state would be trivial. I thought of git, but I am not sure if it would help for a binary file like index.db.

Johannes Mockenhaupt

unread,

Dec 15, 2013, 12:07:46 PM12/15/13

to ne...@googlegroups.com

On 12/15/2013 03:50 PM, Abhishek Gupta wrote:
> I was importing ttl ontologies to dbpedia following the blog
> posthttp://michaelbloggs.blogspot.de/2013/05/importing-ttl-turtle-ontologies-in-neo4j.html.
> The post uses BatchInserters to speed up the task. It mentions
>
> Batch insertion is not transactional. If something goes wrong and you
> don't shutDown() your database properly, the database becomes inconsistent.
>
> I had to interrupt one of the batch insertion tasks as it was taking
> time much longer than expected which left my database in an
> inconsistence state. I get the following message:
>
> db_name store is not cleanly shut down
>
> How can I recover my database from this state?

AFAIK this is not possible as the data is not just inconsistent, but
corrupt (see all caveats in the manual:
http://docs.neo4j.org/chunked/stable/batchinsert.html)

> Also, for future purposes
> is there a way for committing after importing every file so that
> reverting back to the last state would be trivial. I thought of git, but
> I am not sure if it would help for a binary file like index.db.

I'd guess almost all db files will have changed, so git would just copy
everything while adding overhead of checking what has changed. Why not
just create copies of the graph.db folder?

Abhishek Gupta

unread,

Dec 15, 2013, 12:17:45 PM12/15/13

to ne...@googlegroups.com

I am importing dbpedia, so copying is quite an expensive operation. Also, almost every 10th file in the list leads to such errors because of which I am not able to create dump despite powerful machines at work.

A follow up question, I imported this data earlier in the virtuoso. Is there a way to migrate the virtuoso database into neo4j instead?

Regards

Abhishek

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/y7amc5GewrM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Abhishek Gupta

unread,

Dec 15, 2013, 12:35:28 PM12/15/13

to neo4j

Or if two neo4j databases can be merged, then that can be an effective solution too. I can import all files separately and then merge these databases to each other.

Please advise.

Michael Hunger

unread,

Dec 15, 2013, 4:09:32 PM12/15/13

to ne...@googlegroups.com

What is the actual issue you are running into?

What OS do you run it under?

Do you have enough heap assigned to your import process and configured the memory mapping settings appropriately?

Use a configuration that uses cache_type none and mmio settings with large values for the relationship-store and large-enough values for the node-store and property-store

E.g. 10% for node-and property-store and 80% for the relationship-store.

See here: http://docs.neo4j.org/chunked/milestone/configuration-io-examples.html#configuration-batchinsert

You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.

Abhishek Gupta

unread,

Dec 15, 2013, 4:42:37 PM12/15/13

to neo4j

Responses inline.

What is the actual issue you are running into?

What OS do you run it under?

I am running it on Ubuntu 13.04.

Do you have enough heap assigned to your import process and configured the memory mapping settings appropriately?

I assigned 3 GB. The file has about 10M relationships, and 3M nodes. This translates to about 1G of heap space from the link that you sent to me. Also, this is done on an existing graph. Should I count the nodes in the graph too?

Use a configuration that uses cache_type none and mmio settings with large values for the relationship-store and large-enough values for the node-store and property-store

Let me look into these settings. Also, I noticed that the insertion rates drop significantly as the size of database increases. Does it help in such a case to first insert the file with larger number of triples? Further, would it be of any help if I merge the files into a single file instead?

E.g. 10% for node-and property-store and 80% for the relationship-store.

t

Michael Hunger

unread,

Dec 15, 2013, 7:03:12 PM12/15/13

to ne...@googlegroups.com

Am 15.12.2013 um 22:42 schrieb Abhishek Gupta <abhi...@zumbl.com>:

Responses inline.

What is the actual issue you are running into?
What OS do you run it under?

I am running it on Ubuntu 13.04.

Make sure to have decent disk and the disk scheduler set to noop or deadline, see:

http://structr.org/blog/neo4j-performance-on-ext4

Don't do the barrier=0 though as this is not save, and would only make sense for large imports to mount the disk that way only during the import itself.

Do you have enough heap assigned to your import process and configured the memory mapping settings appropriately?
I assigned 3 GB. The file has about 10M relationships, and 3M nodes. This translates to about 1G of heap space from the link that you sent to me. Also, this is done on an existing graph. Should I count the nodes in the graph too?

That's tiny and should not take more than a minute.

How large is the existing db?

Abhishek Gupta

unread,

Dec 15, 2013, 7:09:19 PM12/15/13

to neo4j

Make sure to have decent disk and the disk scheduler set to noop or deadline, see:

http://structr.org/blog/neo4j-performance-on-ext4

Don't do the barrier=0 though as this is not save, and would only make sense for large imports to mount the disk that way only during the import itself.

Ok.

Do you have enough heap assigned to your import process and configured the memory mapping settings appropriately?
I assigned 3 GB. The file has about 10M relationships, and 3M nodes. This translates to about 1G of heap space from the link that you sent to me. Also, this is done on an existing graph. Should I count the nodes in the graph too?

That's tiny and should not take more than a minute.
How large is the existing db?

About 10GB. Took almost 80 minutes, when I am trying again. Earlier, it got slowed down for no apparent reason - took all the memory and CPU, and finally got killed.

Michael Hunger

unread,

Dec 15, 2013, 7:51:16 PM12/15/13

to ne...@googlegroups.com

Probably some more heap would make more sense then.

Can you show the output from the import run?

Michael

Abhishek Gupta

unread,

Dec 17, 2013, 8:49:11 AM12/17/13

to neo4j

On 16 December 2013 06:21, Michael Hunger <michael...@neopersistence.com> wrote:

Can you show the output from the import run?

I used the code available here - https://github.com/mybyte/tools/tree/master/Turtle%20loader. I run BatchExecutable which uses Neo4jDBBatchHandler as the handler. I add parameter -Xmx3200m to the jvm arguments. It takes about a minute and half to add three million triples in an empty database at 30000 triples per second, but takes more than 5 minutes to execute db.shutdown(). This shutdown becomes the bottleneck when I am not using an empty graph.

The other approach as I found was available on https://github.com/oleiade/dbpedia4neo and BatchInserter, which is much slower as it doesn't use BatchInserter.

Michael Hunger

unread,

Dec 17, 2013, 9:02:28 AM12/17/13

to ne...@googlegroups.com

The batch-inserter slowdown is a regression in 2.0.0 and currently worked on.

How often do you have to run the import?

Michael

Abhishek Gupta

unread,

Dec 17, 2013, 9:25:08 AM12/17/13

to neo4j

On 17 December 2013 19:32, Michael Hunger <michael...@neopersistence.com> wrote:

The batch-inserter slowdown is a regression in 2.0.0 and currently worked on.

How often do you have to run the import?

I have to import the dbpedia which has 20-30 files, but I am not able to complete the same because of following reasons:

1. Some files are malformed. This https://github.com/oleiade/dbpedia4neo/blob/master/cleanup.sh helps to cleanup most of the files, but doesn't handle every case. Due to this, I am not able to use batchinsert on some of the files. Figuring out which files aren't well formed is another lengthy job (basically importing on an empty graph).

2. Some files ends up getting stuck at db.shutdown() command for hours. I tried splitting them into further sub parts, still that takes a lot of time. Currently, I am using indexCache of 500,000 entries and a timeout of 60 seconds.

Few possible solutions :

1. If there is a possibility of merging two databases, then I can create database out of each file and then merge them.

2. If I can run the solution by Oleiade on the graph generated by batch inserter, that may help too. As I can interchange methods at my convenience. On doing so it says - "Store version [NeoStore v0.A.1]. Please make sure you are not running old Neo4j kernel on a store that has been created by newer version of Neo4j. I tried recompiling with the latest version but it didn't work either.

3. If I can run multiple instances of Oleiade on the same graph without significantly affecting the speed of an individual process/thread.

Thanks

Abhishek

--
You received this message because you are subscribed to a topic in the Google Groups "Neo4j" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/neo4j/y7amc5GewrM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to neo4j+un...@googlegroups.com.

Reply all

Reply to author

Forward