ENB: Proof that c.deletePositionsInList is sound

48 views
Skip to first unread message

Edward K. Ream

unread,
Mar 18, 2020, 8:13:58 AM3/18/20
to leo-e...@googlegroups.com
Vitalije's new version of c.deletePositionsInList is likely correct. Vitalije thinks it is, and he has backed up his belief with a strong testing framework. That's good enough for me. I have closed #1539, committed Vitalije's code to the 6.2 and devel branches. The new code will be part of Leo 6.2.

In this Engineering Notebook post, I shall prove that the new code is correct. Neither testing nor assertions that code is "obviously" correct are completely convincing.

What is to be proved

Vitalije and I both agree that all changes to Leo's outline data must ultimately change vnodes. There is nothing else to change! Indeed, positions are only views on the "underlying" data.

It is also clear (obvious :-) that the data to be changed are v.parents and v.children for v in some set of vnodes. Again, there is nothing else to change.

We must show that the particular set of changed vnodes in the new code is the correct set of vnodes, and that the changes made to that set are correct. This is far from obvious at first glance.

As a commentary on the proof, I'll want to explain how the code handles the following situations:

- The list of positions passed to c.deletePositionsInList contains duplicate positions.
- The list of positions contains positions that (depending on order) may already have been deleted.

An intuitive proof should aid the commentary.

The new code

Here is Vitalije's version of c.deletePositionsInList, with comments and other details omitted:

def deletePositionsInList(self, aList):
    c
= self

   
def p2link(p):
        parent_v
= p.stack[-1][0] if p.stack else c.hiddenRootNode
       
return p._childIndex, parent_v

    links_to_be_cut
= sorted(set(map(p2link, aList)), key=lambda x:-x[0])

   
# The main loop.
   
for i, v in links_to_be_cut:
        child_v
= v.children.pop(i)
        child_v
.parents.remove(v)

Assumptions

We may assume that aList is indeed a list of positions. p2link will throw an exception otherwise.

We may also assume that all positions in aList are valid. c.deletePositionsInList should check for this.

The proof

To begin the proof, note that last three lines, the main loop, have the expected "form". That is, when we delete a node, we will:

1. Delete v.children[i] for some v and i. This deleted vnode is child_v.

2. Delete child_v from its parents array.

v.parents is an unordered array, so child_v.parents.remove(v) is all that is needed. This line might crash if one of the positions is invalid. It will also crash if v is not present in child_v.parents. I'll discuss this later.

To complete the proof, we must show that the main loop deletes all and only those vnodes implied by the positions in aList. We can tentatively assume that the main loop will work properly if that data in links_to_be_cut is correct, but there some subtleties lurking here which I'll discuss later.

The following code creates links_to_be_cut:

def p2link(p):
    parent_v
= p.stack[-1][0] if p.stack else c.hiddenRootNode
   
return p._childIndex, parent_v

links_to_be_cut
= sorted(set(map(p2link, aList)), key=lambda x:-x[0])

This is elegant (compressed) code. Let's examine it carefully...

p2link produces tuples (childIndex, parent_v). These become bound to i, v in the main loop:

for i, v in links_to_be_cut:
    child_v
= v.children.pop(i)
    child_v
.parents.remove(v)

So parent_v must be the vnode whose i'th child should be deleted. This line does that:

parent_v = p.stack[-1][0] if p.stack else c.hiddenRootNode

That's correct. p.stack entries have the form (v, childIndex). p.stack[-1] is the position p's immediate parent vnode, if p has a parent. Otherwise, c.hiddenRootNode is the proper parent vnode.

p._childIndex is also the correct value for i.

In short, deleting position p means executing the main loop for (i, v) as returned by p2link(p).

Ordering and filtering

The code does not use p2link directly. Instead, there is this line:

links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0]

Let's break this into parts, from "inside" to "outside"

- map(p2link, aList) applies p2link to every position in aList. The result is a list of tuples (i, v). This list may contain duplicates.

- set(map(p2link, aList) removes those duplicates. Let theSet be set(map(p2link, aList).

- Finally, sorted(theSet, key=lambda x:-x[0]) creates a generator equivalent to a sorted list.

The generator will deliver the tuples (i, v) in descending order of i, when several tuples (i, v) have the same vnode. The generator does not order tuples on v, and there appears to be no need to do so. I'll say a few words about this below.

It's easy to understand the intent of this sort order. The main loop contains this line:

child_v = v.children.pop(i)

The sort order means that pop(i) happens before pop(j) if i > j. This ensures that the precomputed values of i won't change in the main loop.

Completing the proof

We can see why Vitalije asserts that the code is sound. The tuples (i, v) computed from p2link(p) correctly describe the work to be done. Filtering and ordering ensure that the work is done once, and in such a way that the child indices i will not change during the main loop.

The (i, v) tuples are sorted on i, but not v. aList can contain positions in any order, so we know only that for any particular vnode v the main loop will handle tuples (i1, v), (i2, v) in the correct order.

There only remains one more question to answer. Does the order in which the main loop handles vnodes matter? That is, does it matter if a tuple (i, v) describes a position that the main loop has already deleted?

This was, and still is, the source of my uneasiness with this algorithm. However, I'm pretty sure there is no actual problem, for the following reasons:

1. Because of undo, vnodes never get deleted. They only get unlinked.

2. I can foresee no circumstance in which the order in which particular vnodes get unlinked can matter.

In particular, the main loop doesn't actually care whether a vnode has already been unlinked. The final question is whether these lines in the main loop could ever fail:

child_v = v.children.pop(i)
child_v
.parents.remove(v)

Again, the answer is no. The sorting and filtering phase ensures that (i, v) are unique. The only cause for concern is the second line. Is v guaranteed to be in child_v.parents? Yes, because v.children[i] must exist.

This concludes the proof.

Discussion

This is very clever code. You could say that the line:

links_to_be_cut = sorted(set(map(p2link, aList)), key=lambda x:-x[0])

forms the basis of an inductive proof of correctness. The algorithm is sound because the tuples (i, v) contain no "harmful" duplicates, and that for any particular v, the set of tuples (i, v) appear in descending order of i.

It should be straightforward to make this code undoable. The undo code would reverse of the main loop, handling the tuples in ascending order of i. The redo code would be much like the original main loop.

Summary

I have provided a careful proof of correctness. This proof is by no means obvious. I welcome all comments and corrections.

The proof shows that vnodes can be handled in any order, provided only that for any particular v, the tuples (i, v) must be handled in ascending order of i. This resolves the question of whether deleting already deleted positions is valid.

It should be straightforward to make c.deletePositionsInList undoable. c.deletePositionsInList should ensure that all positions in aList are valid.

Vitalije has done us all a great service in fixing c.deletePositionsInList.

Edward

vitalije

unread,
Mar 18, 2020, 10:12:15 AM3/18/20
to leo-editor
Thanks for the detailed proof. There is one thing I would like to say about making tree operations undoable. This might be off the topic, but since you mentioned making this operation undoable, I wish to share some thoughts about undo/redo code.

At present Leo keeps instances of vnodes in its undo data records. This is not a very good idea (IMHO). Undo data should be immutable, but it is not. If we delete some clone, the node is recorded in the undo data record. Later we change the other clone in the outline and it is also changed in the recorded undo data. It would be much better to record (gnx, h, b) instead of vnode instances. The v.children and v.parents can be recorded as a str

 '|'.join(x.gnx for x in v.children) # or v.parents

That way undo data will be immutable and more resilient. It could be saved, transferred over the net, tested, examined, ... When redoing operations it is easy enough to turn those gnx values to real nodes. If it doesn't exists anymore, we can create new one with the same gnx. If it exists it can be reused. When using .db format, Leo stores complete outline data in sqlite3 database using this technique. All the information necessary to recreate the outline can be stored as a bunch of simple immutable values.

The history_tracer plugin does this. It creates text representation of the complete outline and sends it to the local server which stores snapshots in the database. Before storing a snapshot it calculates end encodes changes in a very compact text form which is stored in the database. This allows keeping thousands of versions of a very large outline in a database which is maybe just a few Mb larger than the original content. For example LeoPy.leo.history on my computer contains 10222 recorded versions of LeoPy.leo. One complete snapshot of LeoPy.leo is about 5Mb. The database file is 8Mb.

While the saving a snapshot of the whole outline each time user presses a key is not very practical, especially if the outline is large, saving snapshot of one part of the outline is certainly doable. On my new computer (which is very fast) creating the snapshot of the complete LeoPy.leo takes about 20-25ms. On the slower machines it might take even 100-150ms. But creating the snapshot of the subtree with 50 nodes should not take more than 10-15ms on any computer. It is not often the case that user changes outline by more than that. So, most of the operations can have snapshot of all necessary data to completely restore the selected part of the outline within 15ms. And snapshot (in this case) is a string, but can also be a tuple of immutable values or something similar immutable.

I know that the undo/redo code is spread all over the Leo's code base, and changing it is a huge and risky task. But if you are making new operations undoable, consider using only immutable data. In the case of deletePositionsInList, I would suggest keeping track of (index, gnx) pairs, not (index, v) pairs. The vnode won't disappear during the execution of the deletePositionsInList method, but it might disappear by the time undo is required.

The rust compiler would not allow storing the references of vnode instances in the undo records and allowing vnode to change. One can have either any number of references with just a read access (i.e. which can't mutate the instance), or just a one reference with the write access (which can mutate instance). This principle is found to be a great help in preventing the bugs. So, it would be wise to follow it in Python too.

Vitalije

Edward K. Ream

unread,
Mar 18, 2020, 10:21:01 AM3/18/20
to leo-editor
On Wed, Mar 18, 2020 at 9:12 AM vitalije <vita...@gmail.com> wrote:
Thanks for the detailed proof. There is one thing I would like to say about making tree operations undoable. This might be off the topic, but since you mentioned making this operation undoable, I wish to share some thoughts about undo/redo code.

It's completely on topic. I have just created #1541 for this. It should be done in Leo 6.3.

At present Leo keeps instances of vnodes in its undo data records.

It keeps references to vnodes in undo records. The references are immutable. The vnodes themselves are mutable.

I'm willing to consider saving (gnx, h, b) in undo data, but the present scheme preserves id(v) as well as v.gnx.

Edward

vitalije

unread,
Mar 18, 2020, 10:21:09 AM3/18/20
to leo-editor
I've just saw in the 6e31d4b that the implementation is changed, but the theory of operation remained. It can lead to some confusion later. Perhaps you might put there your proof as a new theory of operation.

Vitalije

Edward K. Ream

unread,
Mar 18, 2020, 10:24:13 AM3/18/20
to leo-editor
On Wed, Mar 18, 2020 at 9:21 AM vitalije <vita...@gmail.com> wrote:

I've just saw in the 6e31d4b that the implementation is changed, but the theory of operation remained. It can lead to some confusion later. Perhaps you might put there your proof as a new theory of operation.

Good idea. I'll also add a precheck that all positions are valid. It will happen later today.

Edward

Edward K. Ream

unread,
Mar 18, 2020, 10:33:01 AM3/18/20
to leo-editor
On Wednesday, March 18, 2020 at 9:21:01 AM UTC-5, Edward K. Ream wrote:

> I'm willing to consider saving (gnx, h, b) in undo data, but the present scheme preserves id(v) as well as v.gnx.

On second thought, I'm not willing to consider such a scheme. The present scheme preserves all aspects of v, including editing position, uA, and anything else that, say, a plugin might add.

The references ensure the that GC will never delete the vnode. This is exactly what I intend.

Edward

vitalije

unread,
Mar 18, 2020, 11:39:16 AM3/18/20
to leo-editor


On second thought, I'm not willing to consider such a scheme. The present scheme preserves all aspects of v, including editing position, uA, and anything else that, say, a plugin might add.

The references ensure the that GC will never delete the vnode. This is exactly what I intend.


Not that I insist on anything but both of this two arguments are irrelevant. The reference of vnode exists in the children and parents lists so there is no need to keep another one to prevent GC. And undo/redo machinery doesn't take into account uAs at all. So if a plugin wishes to make some uA attribute change undoable it has to implement its own undo/redo. Therefore nothing important is gained by keeping the vnode reference in the undo stack.

You may say nothing important will be gained by keeping (gnx, h, b, gnxchildren, gnxparents) instead of vnodes. It might even seem a little more work to do. But on the other hand the mutability of undo data is a potential source of hard to find bugs. On several places in Leo's code c.fileCommands.gnxDict is changed. If a vnode with the same gnx is ever created after one has been stored in the undo stack, we would have an invalid situation of two different instances with the same gnx.

By the way, #1541 is done at ba21c2f2. And I did it storing only gnx-es and ints. The deletePositionsInList now returns undo data, and c.undoableDeletePositions uses this data to make operation undoable.

Vitalije

Edward K. Ream

unread,
Mar 18, 2020, 4:30:28 PM3/18/20
to leo-editor
On Wed, Mar 18, 2020 at 10:39 AM vitalije <vita...@gmail.com> wrote:

> The reference of vnode exists in the children and parents lists so there is no need to keep another one to prevent GC.

Not if the vnode has been deleted.
> And undo/redo machinery doesn't take into account uAs at all.

undo/redo just restores what was changed.

Edward
Reply all
Reply to author
Forward
0 new messages