In the app we're building, we really like using Neo4j IDs as our own IDs directly:
- Auto-increment is built-in; not something we have to implement ourselves, creating a (new) bottleneck and possibly having bugs;
- Performance is fast for lookups (the most common operation); no index operations needed; IDs get translated to file offsets directly;
- It's nice and easy to differentiate between (integer) IDs and (alphanumeric) aliases/usernames (and we don't allow numeric-only aliases/usernames for that reason).
We've understood that IDs can get reused after deletions, and so far that's been simple enough to account for: we generate our own UUIDs also, and check those whenever we compare equality, etc.
But we're working on an external REST API now, and now we've hit the unavoidable: if *external* clients receive those IDs directly, now they need to *themselves* account for the possibility of ID reuse. That's not ideal.
So we were thinking about our options, and one that intrigues us a lot is the idea that we never fully delete nodes -- we just clear their properties and delete their relationships. Their IDs will now never get reused, and they won't (shouldn't) affect the performance of the rest of the graph, since they're orphaned.
Obviously, disk usage will be higher with this approach, but presumably not significantly if our data isn't high-churn. Are there any other downsides or gotchas to taking this approach?
Aseem
P.S. It's also worth stating the obvious: it'd be great if it were simply configurable for Neo4j to not reuse deleted IDs. I know the team wants to move away from exposing IDs directly, but even then, it'd be nice if Neo4j exposed *some* built-in ability to get short/simple and never-reused identifiers of some sort. That's something that I imagine any app would find useful.