ENB: acclerating g.findLanguageDirectives

21 views

Skip to first unread message

Edward K. Ream

unread,

Aug 9, 2020, 9:40:54 AM8/9/20

to leo-e...@googlegroups.com

This Engineering Notebook post discusses how to accelerate the speed of g.findLanguageDirectives. When I awoke this morning, I saw that the code I changed yesterday is unnecessarily slow. The new code can search a significant fraction of the entire outline when changing nodes! Note: I haven't noticed any real performance problems yet.

Here, I'll discuss how to avoid this search, using a new way of relating vnodes and positions. Afaik, this is something new in Leo's history, although Vitalije has explored similar ground.

Background

In this post I said: "#1625 expands the search, looking for ancestor @<file> nodes for the other clones." However, this is a brute force search. Here is the new code in g.findLanguageDirectives:

def find_language(p):
    for s in p.h, p.b:
        for m in g_language_pat.finditer(s):
            return m.group(1)
    return None
...
# #1625: Second, expand the search for cloned nodes.
# Similar to g.findRootsWithPredicate.
clones = []
for p in p.self_and_parents(copy=False):
    if p.isCloned() and p.v not in clones:
        clones.append(p.v)
if clones:
    for p in c.all_positions(copy=False):
        if p.isAnyAtFileNode():
            for p2 in p.self_and_subtree():
                if p2.v in clones:
                    language = find_language(p)
                    if language:
                        return language

Accelerating the search

We don't need to search all positions of an outline to find all the @<file> nodes. Instead, we can search using vnodes, as I'll now explain.

Terminology: Let v0 be the initial vnode, that is, c.p.v.

A cloned vnode v is a node with len(v.parents) > 1. This is exactly what v.isCloned() returns.

If v0 is a clone, v0.parents is a list of v0's parent vnodes. That is, v0 appears in parent.children for every parent in v0.parents.

My first thought was to translate vnodes into positions. Indeed, a Position p has three components: p.v, p._childIndex, and p.stack, which is a list of tuples (v, childIndex). As I was thinking about this, I realized:

Aha: for this task, the algorithm does not need child indices!

Instantly the algorithm collapses in complexity.

The search immediately succeeds if v.isAnyAtFileNode() for any v in v0.parents. Otherwise, we extend the search, in no particular order, to parent.parents for every parent vnode in v0.parents. We continue the search until we have found an ancestor @<file> node or until we have looked at all possible ancestors of v0. In practice, this search will be extremely fast.

Summary

Amazingly, after all these years, fundamentally important ideas keep appearing.

A new "parents" branch will contain much faster versions of g.findLanguageDirectives and g.findRootsWithPredicate.

All questions and comments are welcome.

Edward

Edward K. Ream

unread,

Aug 9, 2020, 12:05:20 PM8/9/20

to leo-editor

On Sunday, August 9, 2020 at 8:40:54 AM UTC-5, Edward K. Ream wrote:

> This Engineering Notebook post discusses how to accelerate the speed of g.findLanguageDirectives.

The general idea seems to work. The clones branch some new code. BUT...

Leo's base jEdit colorizer does not use g.findLanguageDirectives directly. See "bjc.scanLanguageDirectives & helpers". There is a substantial (and essential) complication: nodes containing more than one @language directive do not determine the language in effect for their descendants.

In short, the new code will likely involve some refactoring.

Edward

Reply all

Reply to author

Forward

0 new messages