Vitalije's new version of c.deletePositionsInList is likely correct. Vitalije thinks it is, and he has backed up his belief with a strong testing framework. That's good enough for me. I have closed
#1539, committed Vitalije's code to the 6.2 and devel branches. The new code will be part of Leo 6.2.
In this Engineering Notebook post, I shall prove that the new code is correct. Neither testing nor assertions that code is "obviously" correct are completely convincing.
What is to be provedVitalije and I both agree that all changes to Leo's outline data must ultimately change vnodes. There is nothing else to change! Indeed, positions are only views on the "underlying" data.
It is also clear (obvious :-) that the data to be changed are v.parents and v.children for v in some set of vnodes. Again, there is nothing else to change.
We must show that the particular set of changed vnodes in the new code is the correct set of vnodes, and that the changes made to that set are correct. This is far from obvious at first glance.
As a commentary on the proof, I'll want to explain how the code handles the following situations:
- The list of positions passed to c.deletePositionsInList contains duplicate positions.
- The list of positions contains positions that (depending on order) may already have been deleted.
An intuitive proof should aid the commentary.
The new codeHere is Vitalije's version of c.deletePositionsInList, with comments and other details omitted:
def deletePositionsInList(self, aList):
c = self
def p2link(p):
parent_v = p.stack[-1][0] if p.stack else c.hiddenRootNode
return p._childIndex, parent_v
links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0])
# The main loop.
for i, v in links_to_be_cut:
child_v = v.children.pop(i)
child_v.parents.remove(v)
AssumptionsWe may assume that aList is indeed a list of positions. p2link will throw an exception otherwise.
We may also assume that all positions in aList are valid. c.deletePositionsInList should check for this.
The proofTo begin the proof, note that last three lines, the main loop, have the expected "form". That is, when we delete a node, we will:
1. Delete v.children[i] for some v and i. This deleted vnode is child_v.
2. Delete child_v from its parents array.
v.parents is an unordered array, so child_v.parents.remove(v) is all that is needed. This line might crash if one of the positions is invalid. It will also crash if v is not present in child_v.parents. I'll discuss this later.
To complete the proof, we must show that the main loop deletes all and only those vnodes implied by the positions in aList. We can tentatively assume that the main loop will work properly if that data in links_to_be_cut is correct, but there some subtleties lurking here which I'll discuss later.
The following code creates links_to_be_cut:
def p2link(p):
parent_v = p.stack[-1][0] if p.stack else c.hiddenRootNode
return p._childIndex, parent_v
links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0])
This is elegant (compressed) code. Let's examine it carefully...
p2link produces tuples (childIndex, parent_v). These become bound to i, v in the main loop:
for i, v in links_to_be_cut:
child_v = v.children.pop(i)
child_v.parents.remove(v)
So parent_v must be the vnode whose i'th child should be deleted. This line does that:
parent_v = p.stack[-1][0] if p.stack else c.hiddenRootNode
That's correct. p.stack entries have the form (v, childIndex). p.stack[-1] is the position p's immediate parent vnode, if p has a parent. Otherwise, c.hiddenRootNode is the proper parent vnode.
p._childIndex is also the correct value for i.
In short,
deleting position p means executing the main loop for (i, v) as returned by p2link(p).
Ordering and filteringThe code does not use p2link directly. Instead, there is this line:
links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0]
Let's break this into parts, from "inside" to "outside"
- map(p2link, aList) applies p2link to every position in aList. The result is a list of tuples (i, v). This list may contain duplicates.
- set(map(p2link, aList) removes those duplicates. Let theSet be set(map(p2link, aList).
- Finally, sorted(theSet, key=lambda x:-x[0]) creates a generator equivalent to a sorted list.
The generator will deliver the tuples (i, v) in descending order of i, when several tuples (i, v) have the same vnode. The generator does not order tuples on v, and there appears to be no need to do so. I'll say a few words about this below.
It's easy to understand the intent of this sort order. The main loop contains this line:
child_v = v.children.pop(i)
The sort order means that pop(i) happens before pop(j) if i > j. This ensures that the precomputed values of i won't change in the main loop.
Completing the proofWe can see why Vitalije asserts that the code is sound. The tuples (i, v) computed from p2link(p) correctly describe the work to be done. Filtering and ordering ensure that the work is done once, and in such a way that the child indices i will not change during the main loop.
The (i, v) tuples are sorted on i, but not v. aList can contain positions in any order, so we know only that for any particular vnode v the main loop will handle tuples (i1, v), (i2, v) in the correct order.
There only remains one more question to answer. Does the order in which the main loop handles vnodes matter? That is, does it matter if a tuple (i, v) describes a position that the main loop has already deleted?
This was, and still is, the source of my uneasiness with this algorithm. However, I'm pretty sure there is no actual problem, for the following reasons:
1. Because of undo, vnodes never get deleted. They only get unlinked.
2. I can foresee no circumstance in which the order in which particular vnodes get unlinked can matter.
In particular, the main loop doesn't actually care whether a vnode has already been unlinked. The final question is whether these lines in the main loop could ever fail:
child_v = v.children.pop(i)
child_v.parents.remove(v)
Again, the answer is no. The sorting and filtering phase ensures that (i, v) are unique. The only cause for concern is the second line. Is v guaranteed to be in child_v.parents? Yes, because v.children[i] must exist.
This concludes the proof.
DiscussionThis is very clever code. You could say that the line:
links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0])
forms the basis of an inductive proof of correctness. The algorithm is sound because the tuples (i, v) contain no "harmful" duplicates, and that for any particular v, the set of tuples (i, v) appear in descending order of i.
It should be straightforward to make this code undoable. The undo code would reverse of the main loop, handling the tuples in ascending order of i. The redo code would be much like the original main loop.
SummaryI have provided a careful proof of correctness. This proof is by no means obvious. I welcome all comments and corrections.
The proof shows that vnodes can be handled in any order, provided only that for any particular v, the tuples (i, v) must be handled in ascending order of i. This resolves the question of whether deleting already deleted positions is valid.
It should be straightforward to make c.deletePositionsInList undoable. c.deletePositionsInList should ensure that all positions in aList are valid.
Vitalije has done us all a great service in fixing c.deletePositionsInList.
Edward