How expensive is p.setDirty and can it be cheaper

17 views
Skip to first unread message

vitalije

unread,
Feb 24, 2018, 7:21:13 AM2/24/18
to leo-editor
Yesterday I worked on fixing issue #722 (replace-all can not be saved with crtl + s), I have found that culprit was inside replace-all command caused by avoiding to use p.setDirty() because of its cost, and using rather p.v.setDirty().

The difference between the two is that p.setDirty, traverses parents and calls setDirty on them also, while p.v.setDirty just sets dirty bit in p.v

At first, it seemed to me that really this propagation of setting dirty bit to all parents and grandparents must be too expensive to be used inside replace-all loop. That is why I wrote just one loop after replace-all is finished to set dirty flags to all nodes that are affected by replace-all command.

Today I have looked in p.setDirty and I've found that it is overly complicated and as one might expect accordingly expensive. I don't know if this method has other duties to perform, but if it is just setting dirty bit to all nodes that have in their subtree p.v, then it strikes me as odd that this method should be so expensive.

I have made small experiment script that counts how many nodes should be affected if any given vnode is setDirty, and shows maximum, and average value of this count for all vnodes in my copy of LeoPyRef.leo.

Maximum was 35, and average is 7.1. Now, flipping one bit in 35 nodes can't be that expensive. Here is a function that propagates dirty flag to all ancestors in the outline of given vnode.
def mkDirty(v):
    v
.setDirty()
   
for v1 in v.parents:
        mkDirty
(v1)

Measuring this function on the node with most ancestor nodes   UNL:(#Notes-->@file ../doc/leoNotes.txt-->Test code: do not delete-->Unused unit tests-->@ignore mini tests-->Other tests-->Printing tests...-->Print iterations: do not delete-->b-->c-->c2-->c3), using timeit module gives the following results:

calls[n=100000]  avg:0.033ms


What would be the consequences of replacing Position.setDirty method with the above function? Does that method have some other responsibilities that I am not aware of? Perhaps those responsibilities can be divided in several methods and this one responsibility delegated to new method (for example: propagateDirty)?

Vitalije

Edward K. Ream

unread,
Feb 24, 2018, 7:48:37 AM2/24/18
to leo-editor
On Sat, Feb 24, 2018 at 6:21 AM, vitalije <vita...@gmail.com> wrote:

Today I have looked in p.setDirty and I've found that it is overly complicated and as one might expect accordingly expensive.

​Have you considered clones?

Edward

vitalije

unread,
Feb 24, 2018, 7:58:43 AM2/24/18
to leo-e...@googlegroups.com
More strange things...
here is the declaration of p.setAllAncestorsAtFileNodesDirty
# module leoNodes
# class Position
def setAllAncestorAtFileNodesDirty(self, setDescendentsDirty=False)

This method is called in about 30-50 places inside LeoPyRef.leo. In all those places but two, call is made without specifying setDescendentsDirty. In one place this argument is specified as False, and in one place it is specified as True.

The only place it is specified as var is in leoNodes.Position.setDirty which just propagates its own default argument with the same name but with the default set to True.

Looking through places where p.setDirty is called I have found this one:
# module leo.core.commanderOutlineCommands
# command 'insert-node-before'@g.commander_command('insert-node-before')

@g.commander_command('insert-node-before')
def insertHeadlineBefore(self, event=None):
   
'''Insert a node before the presently selected node.'''
    c
, current, u = self, self.p, self.undoer
    op_name
= 'Insert Node Before'
   
if not current: return
   
# Can not insert before the base of a hoist.
   
if c.hoistStack and current == c.hoistStack[-1].p:
        g
.warning('can not insert a node before the base of a hoist')
       
return
    c
.endEditing()
    undoData
= u.beforeInsertNode(current)
    p
= current.insertBefore()
    g
.doHook('create-node', c=c, p=p)
    p
.setDirty(setDescendentsDirty=False)
    dirtyVnodeList
= p.setAllAncestorAtFileNodesDirty()
    c
.setChanged(True)
    u
.afterInsertNode(p, op_name, undoData, dirtyVnodeList=dirtyVnodeList)
    c
.redrawAndEdit(p, selectAll=True)
   
return p

Pay attention on two consecutive lines:
p.setDirty(setDescendentsDirty=False)
dirtyVnodeList
= p.setAllAncestorAtFileNodesDirty()

p.setAllAncestorAtFileNodesDirty has just been called inside p.setDirty, and the very next call is to repeat what has already been done.

No wander that p.setDirty is expensive.

 

vitalije

unread,
Feb 24, 2018, 8:00:29 AM2/24/18
to leo-e...@googlegroups.com


​Have you considered clones?

Edward
Of course  I have. traversing v.parents reaches all the places where this vnode is. VNodes that are not clones have just one parent in their v.parents list.
In other words if I wished to disregard the clones I would use 
def mkDirty(v):
    v.setDirty()
    mkDirty(v.parents[0])


Vitalije

vitalije

unread,
Feb 24, 2018, 8:31:46 AM2/24/18
to leo-editor
It seems that p.setDirty propagates dirty status not only to the ancestors but also to the all descendant nodes as well, nodes that are roots of external files (v.isAnyAtFileNode()). At first I thought this was unnecessary, but then I realized this is probably to allow that change in the ancestor can change resulting path of the descendant file nodes which would require those nodes to be written on the next save operation.

If this is the reason to propagate dirty status down the tree, then perhaps it can be skipped if this node doesn't contain at-path or any other directive that might cause need for writing descendant file nodes. That would be almost always the case.

Changing for example top level node "Code" in LeoPyRef.leo causes more than 160 files to be set dirty on the next save command, which seems to me wrong.
Vitalije

Edward K. Ream

unread,
Feb 24, 2018, 9:16:37 AM2/24/18
to leo-editor
On Sat, Feb 24, 2018 at 7:31 AM, vitalije <vita...@gmail.com> wrote:

It seems that p.setDirty propagates dirty status not only to the ancestors but also to the all descendant nodes as well, nodes that are roots of external files (v.isAnyAtFileNode()). At first I thought this was unnecessary, but then I realized this is probably to allow that change in the ancestor can change resulting path of the descendant file nodes which would require those nodes to be written on the next save operation.

​Yes.​
 

If this is the reason to propagate dirty status down the tree, then perhaps it can be skipped if this node doesn't contain at-path or any other directive that might cause need for writing descendant file nodes. That would be almost always the case.

​I have forgotten all details of the code and its motivations.​
 

Changing for example top level node "Code" in LeoPyRef.leo causes more than 160 files to be set dirty on the next save command, which seems to me wrong.

​As you say, it's probably safe not to set descendant nodes dirty if the top node contains no directive that affect descendant @<file> nodes.  Since "Code" contains only @language python, you would think all is well.

Alas, it's not enough to look at the present contents of a node.  One must compare this with the previous contents, to ensure, say, that an @path directive has not just been deleted.

​This will be tricky, but there might be a clever way to do it. My concern is that the affected code will be far removed from p.setDirty and its helpers.  I would much rather try to write 160 files, without effect, than to get this wrong, now or in future.

Edward
Reply all
Reply to author
Forward
0 new messages