On Monday, May 15, 2017 at 10:28:49 AM UTC-5, Edward K. Ream wrote:
The way is now open to create this class.
This post describes a collapse in complexity in the LeoMLTree and LeoTreeData classes. This post also gives a very fast caching scheme for redrawing npyscreen outlines.
This post continues the original Engineering Notebook post. It will be of interest only to a few. It will be pre-writing for a post to the npyscreen list.
Two non issues
What I will say later might be taken as a criticism of the npyscreen code base. But any such complaints would be silly. It would be extremely dangerous and unwise to make any non-local changes to the
npyscreen sources. Doing so would almost surely break existing
npyscreen apps. I have felt free to make minor local simplifications to the sources in the leo/external/npyscreen folder, but that's all.
The quality of the npyscreen docs is also a non-issue. Sources, not words,
are the ultimate underlying truth. After brief perusal of the npyscreen
docs, I have only read the npyscreen sources.
Collapsing the LeoMLTree class
I am going to speak plainly. Elsewhere I have described the npyscreen sources as a maze. I have spent days wandering the maze trying to unravel the code. Much of the maze is irritating, useless, C-like, blah, blah blah.
In particular, the base npyscreen code seems deathly afraid of crashing. There is no reason for this fear in Python. During development, crashes help the app's developer find problems. In the end, it's up to the app's developer to eliminate crashes.
Happily, it's possible for subclasses of MLTree, such as LeoMLTree, to cut through this maze without touching MLTree at all. Now, LeoMLTree is dead simple. Here is how I did it:
1. Eliminated weak references. Weak references infect the npyscreen sources like a rash. In fact, there is not the slightest reason to use them in the tree code, and many good reasons not to do so. For starters, weak references complicate debugging. Imo, removing weak references was an essential first step toward navigating the maze.
2. Simplified the all-important values property. The actual drawing code uses what looks like a values array. It took a long time to see that the values property simulates an array whose values may be cached. Here are the over-ridden getter and setter methods, as defined in LeoMLTree:
def _setValues(self, tree):
self._myFullValues = tree or LeoTreeData()
def _getValues(self):
'''
Return the (possibly cached) list returned by self._myFullValues.get_tree_as_list().
Setting _cached_tree to None invalidates the cache.
'''
if getattr(self, '_cached_tree', None):
return self._cached_tree_as_list
else:
self._cached_tree = self._myFullValues
self._cached_tree_as_list = self._myFullValues.get_tree_as_list()
return self._cached_tree_as_list
This code is way simpler than the corresponding code in the base MLTree class. But even this simpler code is faux clever, for at least three, reasons:
1. Code in MLTree (or LeoMLTree) must explicitly disable the cache by setting the _cached_tree ivar to None. This makes the client code aware of a nerdy little helper ivar.
2. Instead of setting _cached_tree to None, the client code might as well refresh the cache directly, say by calling self.refresh_values(). So the values property has no real purpose. To anticipate a bit, Leo will need to refresh/invalidate the cache in exactly one place, namely in its c.frame.tree.redraw method.
3. The values property doesn't even do a very good job of caching. Refreshing the cache creates a TreeData node for every visible node in the tree. This could be a problem for large trees. A better caching scheme, given below, creates TreeData (LeoTreeData) nodes only when actually referenced.
Clever caching
At last I can describe how the big switcheroo will work:
- A values class will replace the values property.
- The class will have a __getitem__ method that will return only the requested LeoTreeData item. Something like this (untested):
def __getitem__(self, n):
return self.get_data(n)
def get_data(self, n)
'''Return a LeoTreeData for the n'th visible position of the outline.'''
c = self.c
# Look for n or n-1 in the caches.
data = self.data_cache.get(n)
if data:
return data
p = self.position_cache.get(max(0,n-1))
if p:
p = p.moveToNextVisible()
if p:
self.position_cache[n] = p
self.data_cache[n] = data = LeoTreeData(p)
return data
else:
return None
# Search the tree, caching the result.
i, p = 0, c.rootPosition()
while p:
if i == n:
self.position_cache[n] = p
self.data_cache[n] = data = LeoTreeData(p)
return data
else:
p.moveToNextVisible()
return None
This is clever code. In practice, it will have close to O(0) running time! Indeed, LeoMLTree._redraw (adapted from MultiLine.update) access lines in numeric order. As a result, a search of the Leo outline
will be needed for only the first displayed line. Thereafter, n-1 will
exist in the position cache, so only a single call to
p.moveToNextVisible() will needed.
Summary
Subclasses of MLTree (and TreeData) can be much simpler than the corresponding base classes. The actual LeoMLTree and LeoTreeData classes contain many more examples than shown here. I hope the actual code will help developers of npyscreen devs navigate the maze.
Significantly better caching of TreeData objects is possible. LeoMLTree.values[i] should be faster than MLTree.values[i], even (especially) for very large Leo outlines. The caching scheme shown above reduces the stress on the GC by greatly reducing the number of allocated LeoTreeData objects.
c.frame.tree.redraw will clear the two caches in a new LeoValues class and then call LeoMLTree.display().
Edward