Hi,
We are building a contact management application. Each contact is a node. If 2 or more contacts are discovered to be duplicates we want to provide the ability to merge them into a single node. Additionally, we want to maintain the pre-merge node states, so that we can undo the merge if required (*).
We propose to model this by creating a new node and linking the old nodes to it with a "merged_into" edge, and setting a status property to "removed".
Now we have two options:
1. We copy all the existing edges from the two merged nodes to the new node
2. We don't.
Option 2 gives a simpler data structure, however it makes all our queries much more complex. Because we have to travel back through potentially multiple levels of merged nodes to fetch all the edges
Option 1 would keep the queries the same, but will introduce a lot of extra edges.
We also are considering a 3rd option of creating a copy of the full database with all the merged nodes collapsed. i.e. just a view of the current contacts. This would need to be kept in sync with the main database.
Would appreciate any advice/suggestions on the best way to handle this.
I'd also like to suggest a new "collapse" query feature, which would enable Option 2 to work more easily.... something like this:
select out("attended_class") collapse("merged_into") from 10#12
which would collapse the specified edges until there are no further outbound "merged_into" edges, and thus retrieve all the edges attached to the previous (pre-merged) nodes
* To keep things *simple* we won't allow the unmerge operation after any edges have been defined on the new node
Kind Regards
Swami Kevala