ID reuse workaround: never delete nodes (only their relationships and properties)

902 views
Skip to first unread message

Aseem Kishore

unread,
Feb 2, 2013, 9:43:54 AM2/2/13
to Neo4j Discussion
In the app we're building, we really like using Neo4j IDs as our own IDs directly:

  • Auto-increment is built-in; not something we have to implement ourselves, creating a (new) bottleneck and possibly having bugs;

  • Performance is fast for lookups (the most common operation); no index operations needed; IDs get translated to file offsets directly;

  • It's nice and easy to differentiate between (integer) IDs and (alphanumeric) aliases/usernames (and we don't allow numeric-only aliases/usernames for that reason).

We've understood that IDs can get reused after deletions, and so far that's been simple enough to account for: we generate our own UUIDs also, and check those whenever we compare equality, etc.

But we're working on an external REST API now, and now we've hit the unavoidable: if *external* clients receive those IDs directly, now they need to *themselves* account for the possibility of ID reuse. That's not ideal.

So we were thinking about our options, and one that intrigues us a lot is the idea that we never fully delete nodes -- we just clear their properties and delete their relationships. Their IDs will now never get reused, and they won't (shouldn't) affect the performance of the rest of the graph, since they're orphaned.

Obviously, disk usage will be higher with this approach, but presumably not significantly if our data isn't high-churn. Are there any other downsides or gotchas to taking this approach?

Aseem

P.S. It's also worth stating the obvious: it'd be great if it were simply configurable for Neo4j to not reuse deleted IDs. I know the team wants to move away from exposing IDs directly, but even then, it'd be nice if Neo4j exposed *some* built-in ability to get short/simple and never-reused identifiers of some sort. That's something that I imagine any app would find useful.

Michael Hunger

unread,
Feb 2, 2013, 10:09:27 AM2/2/13
to ne...@googlegroups.com
I like the idea you just have to keep the orphaned nodes somewhere eg index

If you disable reuse the node file will grow w/o limit.

Sent from mobile device
--
You received this message because you are subscribed to the Google Groups "Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email to neo4j+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Aseem Kishore

unread,
Feb 2, 2013, 10:16:29 AM2/2/13
to Neo4j Discussion
I like the idea you just have to keep the orphaned nodes somewhere eg index

Why is this, Michael? Does Neo4j "garbage-collect" nodes that are orphaned and not indexed?


If you disable reuse the node file will grow w/o limit.

Isn't that what'll happen with this workaround too? =)

Either way, are you implying that it's possible to disable reuse today?


Thanks Michael!

Aseem

Aseem Kishore

unread,
Feb 2, 2013, 9:16:15 PM2/2/13
to Neo4j Discussion
FYI for the list, Michael's responses:


I like the idea you just have to keep the orphaned nodes somewhere eg index

Why is this, Michael? Does Neo4j "garbage-collect" nodes that are orphaned and not indexed?

He was mistakenly thinking manual reuse. Neo4j doesn't "garbage-collect" nodes; they don't need to be indexed.

 
If you disable reuse the node file will grow w/o limit.

Isn't that what'll happen with this workaround too? =)

Either way, are you implying that it's possible to disable reuse today?

He says it *is* actually possible to disable reuse in Java, so exposing a config for it shouldn't be difficult. I've filed a feature request then! =)

But to be precise: it'd be nice if you could enable/disable reuse separately between nodes and relationships. (Just like you can for e.g. auto-indexing). In our case, we don't rely on relationship IDs, and our relationships get created/deleted frequently, so no reason to disable reuse there.


Thanks Michael!

Aseem

Mike Bryant

unread,
Feb 4, 2013, 12:38:45 PM2/4/13
to ne...@googlegroups.com
This is an interesting idea. I follow these discussions closely because we're also generating our own UUIDs. The thing that mainly pushed us to do that was not primarily the re-use issue (though that was a concern) but rather the inability to reset the ID auto-increment counter without destroying the database. This was mainly a testing issue, since it's impractical to destroy the database between test runs if you're starting up a server instance to test on (which we are, via the WrappingNeoServerBootstrapper.)

~Mike

Aseem Kishore

unread,
Mar 16, 2013, 4:53:36 PM3/16/13
to ne...@googlegroups.com
I've realized that this workaround of never deleting nodes ourselves won't be exactly the same as Neo4j implementing this internally.

The main difference is for Cypher queries: doing a node lookup by ID will no longer throw an error if the node doesn't exist. Instead, the node will still come up. We would have to make sure *every* single one of our Cypher queries filters the start nodes to exclude "deleted" ones.

Things would be so much easier if Neo4j just exposed this config. If we're a production-level app, we know what we're doing and would very much like it. (And the issue of "if you re-import the data, the IDs will change" will never be an issue for us in practice. We only ever reset to backups; we never manually re-import data.)

Aseem

Aseem Kishore

unread,
Mar 16, 2013, 5:11:55 PM3/16/13
to Neo4j Discussion
Reply all
Reply to author
Forward
0 new messages