Severe performance problems that can only be resolved by deleting all data?

159 views
Skip to first unread message

Ryan Sattler

unread,
Nov 11, 2014, 6:20:42 PM11/11/14
to ne...@googlegroups.com
Hi,

I've been developing an application using Neo4j (which will use an Enterprise install in the final version). As part of this we run a large number of integration tests against Neo. Each test deletes the existing data using a cypher query, then reads and writes as needed. Normally this works fine. However, a few times performance has catastrophically declined. e.g., writing a single node to an empty database (normally taking a few milliseconds) will start consistently taking (eg) 3 seconds. Restarting Neo does not make any difference - the only fix I've found is to delete graph.db, after which everything is back to normal.

Obviously this is a serious concern because in a production environment I can't just delete all our data. Any idea why this might be happening? And regardless, is there any way to recover from this without losing data? If not this seems like a major risk.

We also had a similar issue in the past that seemed to be due to using an accidentally non-indexed query. This caused the time of the query to increase by about 2 seconds per attempt, even though there was the same amount of data in the DB each time (data being deleted and re-written between each test). Again, the only fix was deleting everything. This was fixed by adding a proper index, but now a similar issue has occasionally popped up on indexed queries as well. And at any rate, even though I'd expect a non-indexed query to be slow, I wouldn't expect it's performance to decay sharply over time even when the total data size is not increasing.

Perhaps deletes may not be working correctly?

Context:

Neo4j 2.1.5 community edition
Linux
2GB heap
SSD
Cypher/REST

--
Ryan

Michael Hunger

unread,
Nov 11, 2014, 7:10:39 PM11/11/14
to ne...@googlegroups.com, Philip Rathle
Hi Ryan,

I think the issue you run into is the fact that node and relationship records that are freed during deletion are not reused during the same uptime of the database.
It reuses records after a restart, so if you delete a lot of data, restarting the db to enable reuse of the freed records helps.

This is a feature to be implemented in one of the next releases though.

These large chunks of non-used records are also what makes generic (non-label based) scans of a store longer and less efficient (in terms of mapping files to memory with large free chunks in them).

HTH,

Michael

--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ryan Sattler

unread,
Nov 11, 2014, 7:11:09 PM11/11/14
to ne...@googlegroups.com
Some further investigation suggests that *one* source of problems is non-indexed queries (eg "match (n) return count(n)") becoming very slow even on a near-empty database (eg taking 1000 milliseconds when there is only 1 node in there) after there has been a "performance meltdown" as described in my previous thread posted to this group. Again, this does not recover by restarting Neo, only by deleting the data. It seems that when the database is shutdown while there are stuck threads, there is some sort of DB corruption. I do get the "Detected incorrectly shut down database, performing recovery.." message on restart in this case but there doesn't seem to be any safe way to shut down? (I ctrl-c'd from console mode)

(NB I think there are also other issues as I've seen indexed queries have problems too, but haven't been able to reproduce that one yet)

--
Ryan Sattler

Ryan Sattler

unread,
Nov 11, 2014, 7:15:45 PM11/11/14
to ne...@googlegroups.com, phi...@neotechnology.com
Unfortunately, at least in some cases restarting does not make any difference. Deleting the data always fixes the problem immediately, but isn't a viable solution for production.

Michael Hunger

unread,
Nov 11, 2014, 7:43:13 PM11/11/14
to ne...@googlegroups.com
The problem is.

You had 10M nodes in your db.
You deleted them all, you have 10M empty records on disk.
You don't restart.
You create a node, it is put in record 10.000.001
So you have 10M empty records followed by one used record.
After that happened, a restart won't help you to relocate the node, just to reuse the id's of the 10M deleted nodes.
If you restarted after the big delete, then the node would have been created with record id 0.

I wrote a tool that can take a store and copy it to compact it (currently it doesn't change node-id's though) so this would only be useful for compacting rels.
As if you change node-id's you also have to recreate indexes etc. https://github.com/jexp/store-utils/tree/21

For your query, this is an all node scan, which goes over all records in the db (and if they are in use loads them and counts them).

For a real-world query you'd do that on a Label, like :Product or :Person. Which should come back instantly even if you have millions of empty records.

this: "Detected incorrectly shut down database, performing recovery.." is just recovery after a hard kill or crash, which is ok, as the transactions are written to and reapplied from the tx-log (WAL).


HTH MIchael


--

Ryan Sattler

unread,
Nov 11, 2014, 9:14:00 PM11/11/14
to ne...@googlegroups.com
Ok thanks. In that case this particular issue shouldn't happen in prod since we are only doing soft deletes there anyway. It's pretty inconvenient for testing though.

--
Ryan
Reply all
Reply to author
Forward
0 new messages