ENB: Studying the performance of Vitalije's code

Edward K. Ream

unread,

May 31, 2018, 6:02:47 AM5/31/18

to leo-editor

I spent most of yesterday studying the performance of Vitalije's prototype code.

The only truly important performance metric is how long it takes miniTkLeo.py to load a substantial .leo file. I changed this file so it loads my private leoPy.leo file if no file is given on the command line.

On my machine, it takes 0.6 to 0.7 seconds to load this file and all external files. This performance is why Vitalije and I are excited about the code.

It's not possible to use cProfile directly on miniTkLeo.py because it uses Python's threading and queue modules. Instead, I added profiling code to the loadex function, like this:

def loadex():
    '''The target of threading.Thread.'''
    if 0: # Profile the code.
        cProfile.runctx('loadex_helper()',
            globals(),
            locals(),
            'profile_stats', # 'profile-%s.out' % process_name
        )
        print('===== writing profile_stats')
        p = pstats.Stats('profile_stats')
        p.strip_dirs().sort_stats('tottime').print_stats(50)
            # .print_stats('leoDataModel.py', 50)
    else:
        loadex_helper()
    
def loadex_helper():
    ltm2 = LeoTreeModel.frombytes(ltmbytes)
    loaddir = os.path.dirname(fname)
    loadExternalFiles(ltm2, loaddir)
    G.q.put(ltm2)

With statistics enabled, the load time on my machine is 0.9 seconds, instead of 0.6 to 0.7 seconds.

This code produces the following statistics, edited to show only the highlights:

1. Limited to leoDataModel.py:

TotTime:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 8212    0.267    0.000    0.586    0.000  leoDataModel.py:1233(load_derived_file)
 8212    0.023    0.000    0.611    0.000  leoDataModel.py:1569(viter)
 8047    0.017    0.000    0.024    0.000  leoDataModel.py:1327(set_node)

Calls:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
16691    0.003    0.000    0.006    0.000  leoDataModel.py:37(parPosIter)
 8212    0.268    0.000    0.587    0.000  leoDataModel.py:1233(load_derived_file)
 8212    0.023    0.000    0.612    0.000  leoDataModel.py:1569(viter)
 8047    0.017    0.000    0.023    0.000  leoDataModel.py:1327(set_node)
  971    0.000    0.000    0.001    0.000  leoDataModel.py:293(parents)
806/165  0.001    0.000    0.001    0.000  leoDataModel.py:412(updateParentSize) (in replaceNode)

2. Including all methods:
    
TotTime:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 8212    0.272    0.000    0.594    0.000  leoDataModel.py:1233(load_derived_file)
626060   0.220    0.000    0.220    0.000  {method 'match' of '_sre.SRE_Pattern' objects}
232802   0.036    0.000    0.036    0.000  {method 'startswith' of 'str' objects}
 165     0.021    0.000    0.030    0.000  {method 'read' of '_io.TextIOWrapper' objects}
     
Calls:
ncalls  tottime  percall  cumtime  percall filename:lineno(function)
626060   0.221    0.000    0.221    0.000  {method 'match' of '_sre.SRE_Pattern' objects}
232802   0.036    0.000    0.036    0.000  {method 'startswith' of 'str' objects}
167416   0.014    0.000    0.014    0.000  {method 'append' of 'list' objects}
110420   0.008    0.000    0.008    0.000  {built-in method builtins.len}
95453    0.007    0.000    0.007    0.000  {method 'isspace' of 'str' objects}
35554    0.006    0.000    0.006    0.000  {method 'group' of '_sre.SRE_Match' objects}
16357    0.001    0.000    0.001    0.000  {method 'random' of '_random.Random' objects}
 8906    0.002    0.000    0.002    0.000  {method 'pop' of 'list' objects}
 8562    0.006    0.000    0.006    0.000  {method 'join' of 'str' objects}
 2204    0.000    0.000    0.000    0.000  {built-in method builtins.isinstance}

This is remarkable. To the first approximation, only load_derived_file matters. None of the helper functions/generators contribute any substantial time to the overall load time.

Summary

load_derived_file is incredibly fast. When loading .leo files, only it's performance matters.

For this function only, the speed of attribute access may be crucial. Converting section references to functions in load_derived_file may slow the code by changing local refs to nonlocal refs.

Edward

Edward K. Ream

unread,

May 31, 2018, 6:53:13 AM5/31/18

to leo-editor

On Thursday, May 31, 2018 at 5:02:47 AM UTC-5, Edward K. Ream wrote:

Summary

load_derived_file is incredibly fast. When loading .leo files, only it's performance matters.

There is a dog that isn't barking: namely compatibility with Leo's existing vnode and position classes.

A final verdict on performance must wait until compatibility with existing code is assured.

Having said that, I am fairly confident that the prototype code can eventually be made to generate Leo's vnodes.

Edward

vitalije

unread,

May 31, 2018, 8:07:07 AM5/31/18

to leo-editor

Thanks for the thorough investigation of the prototype.

The only truly important performance metric is how long it takes miniTkLeo.py to load a substantial .leo file. I changed this file so it loads my private leoPy.leo file if no file is given on the command line.

Well I think this is just one. The other maybe even more important is how fast tree drawing is. For example if you open in miniTkLeo LeoPyRef.leo and then select the first top-level node and then press and hold left arrow on the keyboard it will traverse all nodes in the outline expanding if necessary. There is no noticeable slow down in traversing speed until it expands all nodes and comes to an end.

There is no such command in Leo at the present that can be connected to single key to compare Leo's behavior but here is a script to test it.

from leo.core.leoQt import QtCore
import datetime
def traverse_tree(t1):
    p = c.p
    def move_next():
        if p.hasChildren() and not p.isExpanded():
            p.expand()
        p.moveToThreadNext()
        if p:
            c.selectPosition(p)
            c.redraw(p)
            QtCore.QTimer.singleShot(0, move_next)
        else:
            t2 = datetime.datetime.utcnow()
            dt = t2 - t1
            t = dt.seconds + dt.microseconds / 1e6
            g.es('Finished after %fs'%t)
    move_next()
c.selectPosition(c.rootPosition())
c.contractAllHeadlines()
c.redraw()
t1 = datetime.datetime.utcnow()
traverse_tree(t1)

And I here is a similar test for miniTkLeo

def traverse_speed(x):
    import datetime
    ltm.selectedPosition = ltm.positions[1]
    ltm.expanded.clear()
    ltm.invalidate_visual()
    draw_tree(tree, ltm)
    def tf(i):
        alt_right(None)
        if i < len(ltm.positions):
            tree.after_idle(tf, i + 1)
        else:
            dt = datetime.datetime.utcnow() - t1
            t = dt.seconds + 1e-6*dt.microseconds
            G.log.insert('end -1 ch', 'Finished in %fs\n'%t)
    t1 = datetime.datetime.utcnow()
    tf(1)

G.app.bind_all('<Shift-F10>', traverse_speed)

Pressing once Shift-F10 in miniTkLeo, with the outline of 269 nodes, in miniTkLeo on my machine gives:

Finished in 1.067294s

Opening the same outline in Leo and executing previous script gives:

Finished after 41.404681s

Testing miiTkLeo with LeoPyRef.leo (which has more than 8000 nodes), give:

Finished in 98.063228s

I didn't dare to try it in Leo but I guess it would be huge. Just executing once expand-all took about 59s. Expanding one node at a time is even slower. So, traversing node by node, would take much more time.

For regular user I guess it is even more important to have speed tree browsing than to have fast read/write code which of course is also important.

Vitalije

Edward K. Ream

unread,

May 31, 2018, 11:48:37 AM5/31/18

to leo-editor

On Thu, May 31, 2018 at 7:07 AM, vitalije <vita...@gmail.com> wrote:

Thanks for the thorough investigation of the prototype.

You're welcome. This is a ton of fun.

The only truly important performance metric is how long it takes miniTkLeo.py to load a substantial .leo file. I changed this file so it loads my private leoPy.leo file if no file is given on the command line.

Well I think this is just one. The other maybe even more important is how fast tree drawing is.

Thanks for pointing this out. I'll take a closer look.

Edward

Edward K. Ream

unread,

May 31, 2018, 1:22:00 PM5/31/18

to leo-e...@googlegroups.com

On Thursday, May 31, 2018 at 5:02:47 AM UTC-5, Edward K. Ream wrote:

load_derived_file is incredibly fast.

The main loop, << iterate all other lines >> might be made significantly faster by short-circuiting most tests at the start of the loop.

if not special and not patterns.union.match(line):
    << handle indentation, and add the line >>

"special" would be a new flag that is true when the scanner is in a special state, like verbatim or in_doc. Updating the special flag should be easy.

patterns.union would be the union of all the existing regex patterns. Furthermore, << init the grand iteration >> should set x = patterns.x for all x to avoid extra references

Some patterns are compiled with re.DOT_ALL, and others aren't, so we actually need two "summary" patterns. The quick test would be:

if not(special or union.match(line) or union2.match(line):
    << handle indentation, and add the line >>

The quick test will avoid the sequential sequence of tests in most cases, which should significantly speed up the code. I'll investigate today.

Edward

Edward K. Ream

unread,

May 31, 2018, 2:23:07 PM5/31/18

to leo-editor

On Thursday, May 31, 2018 at 12:22:00 PM UTC-5, Edward K. Ream wrote:

On Thursday, May 31, 2018 at 5:02:47 AM UTC-5, Edward K. Ream wrote:

load_derived_file is incredibly fast.

The main loop, << iterate all other lines >> might be made significantly faster by short-circuiting most tests at the start of the loop.

Wow. The following short-circuit test suffices to drop the typical load time from 6-7 seconds to 4.5 seconds.

if not (
    verbatim or line == verbline or
    in_doc or line.strip().startswith(delim_st)
):
    # unindent the line if necessary
    if indent and line[:indent].isspace() and len(line) > indent:
        line = line[indent:]
    body.append(line)
    continue

What is this, something like a 20% speedup??

No summary regex's are required! In fact, using them slows down the short-circuit test!

Edward

vitalije

unread,

May 31, 2018, 2:39:59 PM5/31/18

to leo-editor

Are you sure it is still correct import?

Just looking at the code and I see that checks for verbatim and line == verbline have different outcomes in my code. In case line == verbatim line is not added to body. I haven't tested your code, but it seems a bit suspicious.

Vitalije

Edward K. Ream

unread,

May 31, 2018, 4:40:33 PM5/31/18

to leo-editor

On Thu, May 31, 2018 at 1:39 PM, vitalije <vita...@gmail.com> wrote:

Are you sure it is still correct import?

No. All I know is that it is fast :-) I'll check.

But the idea of short-circuiting a long list of tests surely is valid.

Edward

Edward K. Ream

unread,

May 31, 2018, 5:03:15 PM5/31/18

to leo-editor

On Thu, May 31, 2018 at 1:39 PM, vitalije <vita...@gmail.com> wrote:

Are you sure it is still correct import?

There was indeed a bug involving verbatim lines. The fix further simplifies the code. The start of the main loop is now:

for i, line in flines[start:]:
     # These three sections must be first.
     << handle verbatim lines >>


     << unindent the line if necessary >>


     << short-circuit later tests >>

where << short-circuit later tests >> is:

# This is valid because all following sections are either:
# 1. guarded by 'if in_doc' or
# 2. guarded by a pattern that matches delim_st + '@'   
if not in_doc and not line.strip().startswith(delim_st+'@'):
    body.append(line)
    continue

I have added the following comment to the start of << handle start of @doc parts >>:

if not in_doc:
    # This guard ensures that the short-circuit tests are valid.
    m = doc_pat.match(line)
    if m:

This seems like a proof of correctness. Do you agree?

Edward

Edward K. Ream

unread,

May 31, 2018, 5:43:47 PM5/31/18

to leo-editor

On Thursday, May 31, 2018 at 7:07:07 AM UTC-5, vitalije wrote:

The other maybe even more important [performance test] is how fast tree drawing is.

if you open in miniTkLeo LeoPyRef.leo and then select the first top-level node and then press and hold left arrow on the keyboard it will traverse all nodes in the outline expanding if necessary.

This is spectacularly fast. Leo must have something similar.

The script you give for miniTkLeo takes 120 sec. to run on my machine. For those who want to try it, note that traverse_speed must be a child of the connect_handlers node.

The test script I use limits the number of redraws to 1000, which should be plenty good enough. That script takes about 6 seconds on my machine.

It's tricky to get real stats for the entire run, because most of the script runs at Tk idle time. I'll report back when I find a solution.

Edward

Edward K. Ream

unread,

May 31, 2018, 9:06:47 PM5/31/18

to leo-editor

On Thu, May 31, 2018 at 4:43 PM, Edward K. Ream wrote:

The test script I use limits the number of redraws to 1000, which should be plenty good enough. That script takes about 6 seconds on my machine.

It's tricky to get real stats for the entire run, because most of the script runs at Tk idle time. I'll report back when I find a solution.

I changed the test script so it runs the code all at once, without waiting for idle time. So the script doesn't draw anything until the very end, but I don't care about the Tk stuff, only Vitalije's code:

def traverse_speed(x):
    '''Traverse the entire tree.'''
    g.cls()
    if profile_redraw:
        cProfile.runctx('traverse_speed_helper()',


            globals(),
            locals(),
            'profile_stats', # 'profile-%s.out' % process_name
        )


        print('===== starting profile_stats')


        p = pstats.Stats('profile_stats')


        p.sort_stats('ncalls').print_stats(100)
            # p.strip_dirs().


            # .print_stats('leoDataModel.py', 50)
    else:


        traverse_speed_helper()
        
def traverse_speed_helper():


        
    ltm.selectedPosition = ltm.positions[1]
    ltm.expanded.clear()
    ltm.invalidate_visual()
    draw_tree(tree, ltm)

    def tf(i):


        if profile_redraw:
            n_positions = len(ltm.positions)
            n = min(1000, n_positions)
            for i in range(n):
                # From alt_right:
                bw = G.body
                gnx = ltm.select_node_right()
                if gnx:
                    body = ltm.attrs[gnx][1]
                    bw.replace('1.0', 'end', body)
                draw_tree(tree, ltm)


            print('===== writing profile_stats')
            p = pstats.Stats('profile_stats')
            p.strip_dirs().sort_stats('tottime').print_stats(50)
                # .print_stats('leoDataModel.py', 50)
        else:


            alt_right(None)
            if i < 500 and i < n_positions:
                tree.after_idle(tf, i + 1)


        dt = datetime.datetime.utcnow() - t1
        t = dt.seconds + 1e-6*dt.

microseconds
        G.log.insert('end -1 ch', '%s iterations in %6.2f sec\n'%(n,t))



    t1 = datetime.datetime.utcnow()
    tf(1)

G.app.bind_all('<Shift-F10>', traverse_speed)

Here are the edited results:

TotTime:
    ncalls  tottime  percall  cumtime  percall filename:lineno(function)


      2001    0.162    0.000    1.995    0.001 miniTkLeo.py:302(draw_loop)
     30005    0.074    0.000    0.091    0.000 leoDataModel.py:482(display_items)
----------
    159149    1.225    0.000    1.751    0.000 {method 'call' of '_tkinter.tkapp' objects}
     64080    0.118    0.000    0.174    0.000 tkinter\__init__.py:1309(_options)
     64062    0.108    0.000    1.065    0.000 tkinter\__init__.py:1460(_configure)
  
NCalls:


   ncalls  tottime  percall  cumtime  percall filename:lineno(function)


    30005    0.074    0.000    0.091    0.000 leoDataModel.py:482(display_items)
     4001    0.002    0.000    0.026    0.000 leoDataModel.py:312(selectedIndex)
     3000    0.004    0.000    1.187    0.001 miniTkLeo.py:452(proxycmd)
     2001    0.162    0.000    2.006    0.001 miniTkLeo.py:302(draw_loop)
     2001    0.008    0.000    2.045    0.001 miniTkLeo.py:290(draw_tree)
     2001    0.003    0.000    0.012    0.000 miniTkLeo.py:469(rows_count)
----------
   348928    0.054    0.000    0.054    0.000 {built-in method builtins.isinstance}
   159149    1.234    0.000    1.764    0.000 {method 'call' of '_tkinter.tkapp' objects}
   128160    0.026    0.000    0.026    0.000 {built-in method _tkinter._flatten}
   128142    0.099    0.000    0.155    0.000 tkinter\__init__.py:93(_cnfmerge)
    92088    0.008    0.000    0.008    0.000 {built-in method builtins.callable}
    86022    0.032    0.000    0.032    0.000 {method 'splitlist' of '_tkinter.tkapp' objects}
    84021    0.094    0.000    0.633    0.000 tkinter\__init__.py:2458(coords)
    84021    0.013    0.000    0.013    0.000 tkinter\__init__.py:2461(<listcomp>)
    65191    0.008    0.000    0.008    0.000 {method 'items' of 'dict' objects}
    64098    0.015    0.000    0.015    0.000 {method 'update' of 'dict' objects}
    64080    0.118    0.000    0.175    0.000 tkinter\__init__.py:1309(_options)
    64062    0.033    0.000    1.105    0.000 tkinter\__init__.py:2565(itemconfigure)
    64062    0.111    0.000    1.072    0.000 tkinter\__init__.py:1460(_configure)
    63854    0.007    0.000    0.007    0.000 {built-in method builtins.len}

Only the items "above the line" refer to Vitalije's code. It's remarkable how few functions/methods are involved, and how little time they take. The next step will be to study all these routines in detail.

Edward

Reply all

Reply to author

Forward