Status report and a big Aha

Edward K. Ream

unread,

Dec 5, 2019, 6:15:57 AM12/5/19

to leo-e...@googlegroups.com

The git logs will show that I have been working night and day for the past month on the fstrings branch.

Yesterday I thought I had completed the next phase of the work. All files but one processed without complaint, which is significant because very strong checks are always present.

However, the one failure involved the most complicated code in the project. After several hours of work in the wee hours this morning I went back to bed. Lying in bed I had a momentous Aha which will eliminate all the hard parts of the code! Let me explain.

Background

The only truly difficult task is determining how many tokens correspond to ast.JoinedStr nodes. These nodes are quite a mishmash. They represent at least one f-string and all other concatenated strings, whether f-strings or plain strings.

The scheme that I have spent so much time attempts to determine, by looking at the JoinedStr node, which tokens correspond to the JoinedStr. This involves an extremely messy process that I call reconciliation, which munges the tree data to put it into exact correspondence with the next 'string' tokens. The following difficult methods are involved: advance_str, adjust_str_token, get_string_parts, scan_fstring, scan_string and the most difficult of all, get_joined_tokens.

All of this is about to go away!

The Aha

We can determine which 'string' tokens are concatenated just by looking at the token list!!!

Indeed, 'string' tokens are concatenated if and only if there are no significant tokens (including parens) between them!

So none of the old correspondence/reconciliation machinery is needed. We can ignore the component ast nodes of the JoinedStr nodes completely and just use the token data.

Figures of merit

The code is already very fast. For example:

leoGlobals.py
len(sources): 286901
  setup time: 0.61 sec.
   link time: 0.44 sec.

The setup time is the time to tokenize the file and compile it to a parse tree. This involves two calls to python's standard library, so it is as fast as possible.

The link time is the time to execute all the code in the TokenOrderGenerator class! It is already way faster than other tools. It will get a tad faster.

Moreover, the TOG is both substantially simpler and more flexible than other tools. The Aha means that it will be very easy to debug and maintain.

Finally, the TOG makes no significant demands on the GC. There are no large data structures involved, aside from the token list and the parse tree. The only variable-length data is a token stack. This will typically only have a few hundred entries. Python's run-time stack will have only a few entries, because generators eliminate all significant recursion.

Summary

Today's Aha is a big deal. All of the difficult parts of the code are about to disappear! The TOG will be easy to understand and maintain. It can now be adapted easily to handle other kinds of parse trees, such as pgen2/lib2to3.

The TOG class is fast, simple, general and flexible. It promises to be an important tool in the python world. I'm proud of it.

The last month's work is as close as I have ever come to working on a significant mathematical theorem. I guess I'll have to stop thinking of myself as a failed mathematician :-)

Edward

Edward K. Ream

unread,

Dec 5, 2019, 8:32:50 AM12/5/19

to leo-editor

On Thursday, December 5, 2019 at 5:15:57 AM UTC-6, Edward K. Ream wrote:

> Today's Aha is a big deal. All of the difficult parts of the code are about to disappear!

Done at rev 8c8d6c in the fstrings branch. This could be called the end of the risky part of this project.\

It's certainly the end of a lot of difficult, ugly, bug-prone code. I am very happy to have gotten this far before vacation.

A few files still fail to process correctly. This is likely due to bugs in not-yet-tested visitors. Fixing these bugs should be straightforward.

Edward

rengel

unread,

Dec 7, 2019, 5:43:29 AM12/7/19

to leo-editor

Thanks for sharing your insights into your thought process!

Reinhard

Edward K. Ream

unread,

Dec 7, 2019, 6:14:00 AM12/7/19

to leo-editor

On Sat, Dec 7, 2019 at 4:43 AM rengel <reinhard...@gmail.com> wrote:

Thanks for sharing your insights into your thought process!

You're welcome.

Edward

Reply all

Reply to author

Forward