ENB: reusable Leo functions and some experimental results (work on issue 1598)

vitalije

unread,

Jun 2, 2020, 11:58:45 AM6/2/20

to leo-editor

I hope google will accept the attachment containing scripts.

I've started implementing some of the propositions discussed earlier in another thread.

First of all I think Leo should have a single function that will build an outline from some iterable.

def build_tree(c, it):
    '''
    This function builds a tree of vnodes from the
    iterable generating tuples of the following form:

        (parent_gnx, gnx, childIndex,  h, b, ua)

    The tuples must be in the outline order.

    Returns vnode instance that is a root of this tree.
    '''
    gnxDict = c.fileCommands.gnxDict

    def getv(gnx, h, b, ua):
        v = gnxDict.get(gnx)
        if v is None:
            v = leoNodes.VNode(c, gnx)
            v._headString = h
            v._bodyString = b
            if ua:
                v.unknownAttributes = ua
            gnxDict[gnx] = v
        return v

    # root is handled first, before the loop
    parent_gnx, gnx, childIndex, h, b, ua = next(it)
    vparent = gnxDict.get(parent_gnx)
    root = getv(gnx, h, b, ua)
    vparent.children.insert(childIndex, root)
    root.parents.append(vparent)

    # now rest of the tuples
    for parent_gnx, gnx, childIndex, h, b, ua in it:
        vparent = gnxDict.get(parent_gnx)
        v = getv(gnx, h, b, ua)
        vparent.children.insert(childIndex, v)
        v.parents.append(vparent)

    return root

For any operation that need to build some sub-tree, it would be easier to implement just an iterable which yields tuples and then call build_tree. The build_tree should be tested and proved to work correctly. Then all other commands just need to test generated tuples which won't have any bad effect on the Leo's outline.

I have to go now. So, I'll write more later.

In the attached Leo document there are two python scripts:

makedemos.py - creates a zip file containing 1000 random Leo documents
testdemos.py - tests roundtrip of each of these 1000 random documents. Loads file, builds tree, then generates xml from the outline and compares it with the input.

The testdemos.py contains three similar scripts. The test_1 uses Leo c.fileCommands.putLeoFile to generate xml output. The test_2 uses new functions to generate xml output but for each random file creates a separate commander. Finally, test_3 reuses a single commander instance. For each random file, this commander is reset.

Running testdemos.py on my computer gives the following output:

test_1 (  10 files) -  4.891s ........ average:489.07ms
test_2 (1000 files) - 12.833s ........ average:12.83ms
test_3 (1000 files) -  2.216s ........ average: 2.22ms
1/2 ---> 38.11 times faster
1/3 ---> 220.71 times faster

Roundtrip xml->outline->xml using new reusable functions takes at least 38 times less time.

Vitalije

issue-1598-experiments.leo

vitalije

unread,

Jun 2, 2020, 1:24:11 PM6/2/20

to leo-editor

Here is a function that generates tuples from the xml content:

def nodes_from_leo_xml(contents):
    '''
    Parses contents as xml Leo document and returns
    a generator of the tuples
    
        (parent_gnx, gnx, childIndex, h, b, ua, descendentUas)

    suitable to be piped to the build_tree function.
    '''
    xroot = read_xml(contents)
    v_elements = xroot.find('vnodes')
    t_elements = xroot.find('tnodes')
    bodies, uas = get_bodies_and_uas(t_elements)
    heads = {}
    def viter(parent_gnx, i, xv):
        gnx = xv.attrib.get('t')
        d_uas = xv.attrib.get('descendentVnodeUnknownAttributes')
        d_uas = d_uas and resolve_ua('xxx', d_uas) # key is not important here
        h = heads.get(gnx)
        if not h:
            h = xv[0].text or ''
            heads[gnx] = h
            yield parent_gnx, gnx, i, h, bodies[gnx], uas[gnx], d_uas
            for j, ch in enumerate(xv[1:]):
                yield from viter(gnx, j, ch)
        else:
            yield parent_gnx, gnx, i, heads[gnx], bodies[gnx], uas[gnx], d_uas

    for i, xv in enumerate(v_elements):
        yield from viter('hidden-root-vnode-gnx', i, xv)

Now all three paste commands: (paste-node, paste-retaining-clones and paste-as-template) can reuse these functions. Each of them will use nodes_from_leo_xml to generate tuples and then paste-node will reassign on-the-fly all indices, paste-retaining-clones will pass the generated tuples without change, and paste-as-template will change some of the indices on-the-fly.

The next thing I am going to do is to write a function for generating tuples from the external @file files. I hope this function can be used inside nodes_from_leo_xml to include tuples from the @file files on-the-fly. That way we would have a function to fully load Leo document along with all the external files. The same need to be done with the importers: create a function which generates tuples from the @auto files. Handling the descendentVnodeUnknownAttributes will be the next goal after all kinds of external files can generate tuples.

This is far from being production ready, but it shows great potential so far.

Vitalije

Thomas Passin

unread,

Jun 2, 2020, 2:07:18 PM6/2/20

to leo-editor

Vitalije, the code you have been showing is very nice indeed. It's short, understandable, and apparently very performant as well. What a pleasure!

I have a question - you say that the

nodes_from_leo_xml()function will be used for various paste commands. This way, the xml bits have to be parsed each time the nodes are processed.

I would have expected that xml not be used for internal communications, but only for external interactions.  I realize that mostly the xml parts are short and quick to parse, but still I would lean to using plain text or even python objects.  Are you doing it this way mostly because Leo copies nodes as xml in the first place?

On Tuesday, June 2, 2020 at 1:24:11 PM UTC-4, vitalije wrote:

Here is a function that generates tuples from the xml content:

[snip]

vitalije

unread,

Jun 2, 2020, 4:18:16 PM6/2/20

to leo-editor

I have a question - you say that the nodes_from_leo_xml()function will be used for various paste commands. This way, the xml bits have to be parsed each time the nodes are processed. I would have expected that xml not be used for internal communications, but only for external interactions. I realize that mostly the xml parts are short and quick to parse, but still I would lean to using plain text or even python objects. Are you doing it this way mostly because Leo copies nodes as xml in the first place?

Yes, that is the main reason for using xml. But even if in the future we find a new format, or even several new formats, it would be enough just to write a function for each input format that generates tuples and then send those tuples to build_tree. We can treat files with Leo sentinels (@file external files) as another format for describing the outline shape and content. Parsing sentinels and decoding the outline shape and content to the bunch of tuples is enough for build_tree function to create the outline. All those reading functions for any given format can be easily tested. We just need to compare generated tuples with the expected ones. If the generated sequence of tuples is correct, then the outline will be correct too. Many of the tree operations are very difficult to test because they change a lot of things inside Leo and creating a new commander for each test is slow. Testing sequences of tuples is fast and easy.

In the attached Leo file, I tried at first to test 1000 round trips xml->outline->xml, but it was too slow when using Leo's current xml writing code. The technique of reusing a single Leo commander instance whose internals are not disturbed by the tests, allows testing to be 222 times faster.

Vitalije

Edward K. Ream

unread,

Jun 3, 2020, 9:27:05 AM6/3/20

to leo-editor

On Tue, Jun 2, 2020 at 12:24 PM vitalije <vita...@gmail.com> wrote:

Here is a function that generates tuples from the xml content:

I like what I am seeing so far. Also, I was able to download issue-1598-experiments.leo without problems.

Edward

Reply all

Reply to author

Forward