ENB: Why didn't I, and black and fstring, use asttokens?

Edward K. Ream

unread,

Jan 23, 2020, 6:58:33 AM1/23/20

to leo-editor

I said that all the work on #1440 has been valuable, even though a simple script might use asttokens to do everything that the code in leoAst.py does.

This Engineering Notebook post explains why deep knowledge of the problem domain was needed to get to the surprising script. This post also explains some parts of the script in detail. As with all ENB posts, feel free to ignore it.

At no time was I upset by the surprise. I immediately treated it as good news. asttokens now provides a valuable point of comparison and context. The work I have done has given me deep insights into the subtle, behind-the-scenes, complications involved.

Why did I, and black, and fstringify miss this possibility?

In retrospect, it's clear why the Aha is easy to miss:

1. I didn't know until yesterday what data would be needed. It's impossible to know what would work until you know exactly what data will be needed. It's just all too confusing.

2. I have been assuming all along that exact traversal order would (ultimately) be required. But that not at all true. Indeed, in some cases random traversal suffices.

The Fstringify code in leoAst.py is an example. The ast.BinOp visitor would work if visited in any order, because potential f-strings are disjoint. However, we actually want the BinOp visitor to be visited in the approximate source-code order those ops appear in the sources, because Fstringify produces log messages, and we don't want those messages to be scrambled ;-)

3. [The big one]. I have been assuming that an exact, 1-to-1, correspondence between tokens and ast nodes is needed. Wrong, wrong, wrong! We can tolerate many-to-many links between tokens and nodes. That is, many nodes might point at a single token, and a single token might point at many nodes.

This is what I saw yesterday while discussing links with Rebecca. Iirc, I saw that crucial test in o.colon would work just fine with a many-to-many mapping between tokens and nodes. I've shown this crucial code before. Here it is again:

def colon(self, val):
    """Handle a colon."""
    node = self.token.node
    self.clean('blank')
    if not isinstance(node, ast.Slice):
        self.add_token('op', val)
        self.blank()
        return
    # A slice.
    [snip]

The Aha: yesterday I saw that the code:

if not isinstance(node, ast.Slice):

could be replaced by:

if not any(isinstance(z, ast.Slice) for z in self.token.node_list):

Let's see how token.node_list can be computed...

The asttokens script

First, we create a list of mutable Token objects. asttokens uses only the named tuples provided by tokenize.tokenize. Named tuples are immutable, so the script must create an auxiliary list. The Token class is simple. No need to show it here.

atok = asttokens.ASTTokens(source, parse=True)
tokens = [Token(atok_name(z), atok_value(z)) for z in atok.tokens]

Given this list of Token objects, it's a snap to create the token lists:

for node in asttokens.util.walk(atok.tree):
    for ast_token in atok.get_tokens(node, include_extra=True):
        i = ast_token.index
        token = tokens[i]
        token.node_list.append(node)

That's all there is to it. It's also straightforward to inject parent/child links into ast nodes. See the actual script for details.

Summary

It takes deep insight to realize that asttokens could replace the TOG and TOT classes. This is the reason I was happy to see this possibility.

In any event, the TOG and TOT classes are still valuable. They are faster and clearer (in most ways) than the asttokens code. Otoh, the asttokens code could be said to be more clever. The new insights promise new ways to simplify the code in leoAst.py using clever asttokens code.

Edward

Edward K. Ream

unread,

Jan 24, 2020, 2:31:10 AM1/24/20

to leo-editor

On Thursday, January 23, 2020 at 6:58:33 AM UTC-5, Edward K. Ream wrote:

This Engineering Notebook post explains why deep knowledge of the problem domain was needed to get to the surprising script. This post also explains some parts of the script in detail.

...

The Aha: yesterday I saw that the code:

if not isinstance(node, ast.Slice):

could be replaced by:

if not any(isinstance(z, ast.Slice) for z in self.token.node_list):

This might introduce subtle bugs. A cff on gen_token(':') shows 16 hits. Most of the hits are for nodes, like ast.ClassDef and ast.FunctionDef, that are not valid within slices. But nodes like ast.Dict, ast.DictComp, ast.Lambda and ast.Slice itself are valid within slices.

So the proper code might be:

if isinstance(self.token.node_list[-1]), ast.Slice)

This relies on a property that probably is true, namely that asttokens.util.walk yields nodes in a "good enough" order. It likely does most of the time ;-). However, the TOG class guarantees that self.token.node is correct.

Edward

Edward K. Ream

unread,

Jan 24, 2020, 2:59:39 AM1/24/20

to leo-editor

On Friday, January 24, 2020 at 2:31:10 AM UTC-5, Edward K. Ream wrote:

The Aha: yesterday I saw that the code:

if not isinstance(node, ast.Slice):

could be replaced by:

if not any(isinstance(z, ast.Slice) for z in self.token.node_list):

This might introduce subtle bugs. A cff on gen_token(':') shows 16 hits. Most of the hits are for nodes, like ast.ClassDef and ast.FunctionDef, that are not valid within slices. But nodes like ast.Dict, ast.DictComp, ast.Lambda and ast.Slice itself are valid within slices.

So the proper code might be:

if isinstance(self.token.node_list[-1]), ast.Slice)

This relies on a property that probably is true, namely that asttokens.util.walk yields nodes in a "good enough" order. It likely does most of the time ;-). However, the TOG class guarantees that self.token.node is correct.

Hmm. Perhaps the "simple script" might be updated to do exactly what is wanted. But...

The TOG class is provably correct: tog.sync_token will fail if each and every token presented to it is not in the correct order. A "simple script" might have this property, but it won't be so easy to prove it.

Edward

Reply all

Reply to author

Forward