I said that all the work on #1440 has been valuable, even though a
simple script might use asttokens to do everything that the code in
leoAst.py does.
This Engineering Notebook post
explains why deep knowledge of the problem domain was needed to get to
the surprising script. This post also explains some parts of the script
in detail. As with all ENB posts, feel free to ignore it.
At no time was I upset by the surprise. I immediately treated it as good
news. asttokens now provides a valuable point of comparison and
context. The work I have done has given me deep insights into the
subtle, behind-the-scenes, complications involved.
Why did I, and black, and fstringify miss this possibility?
In retrospect, it's clear why the Aha is easy to miss:
1.
I didn't know until yesterday what data would be needed. It's
impossible to know what would work until you know exactly what data will
be needed. It's just all too confusing.
2. I have been assuming all along that exact traversal order would (ultimately) be required. But that not at all true. Indeed, in some cases random traversal suffices.
The Fstringify code in leoAst.py is an example. The ast.BinOp visitor would work if visited in any
order, because potential f-strings are disjoint. However, we actually
want the BinOp visitor to be visited in the approximate source-code
order those ops appear in the sources, because Fstringify produces log
messages, and we don't want those messages to be scrambled ;-)
3.
[The big one]. I have been assuming that an exact, 1-to-1,
correspondence between tokens and ast nodes is needed. Wrong, wrong,
wrong! We can tolerate many-to-many links between tokens and nodes. That
is, many nodes might point at a single token, and a single token might
point at many nodes.
This is what I saw
yesterday while discussing links with Rebecca. Iirc, I saw that crucial
test in o.colon would work just fine with a many-to-many mapping between
tokens and nodes. I've shown this crucial code before. Here it is
again:
def colon(self, val):
"""Handle a colon."""
node = self.token.node
self.clean('blank')
if not isinstance(node, ast.Slice):
self.add_token('op', val)
self.blank()
return
# A slice.
[snip]
The Aha: yesterday I saw that the code:
if not isinstance(node, ast.Slice):
could be replaced by:
if not any(isinstance(z, ast.Slice) for z in self.token.node_list):
Let's see how token.node_list can be computed...
The asttokens script
First, we create a list of
mutable Token objects. asttokens uses only the named tuples provided by tokenize.tokenize.
Named tuples are immutable, so the script must create an auxiliary list. The Token class is simple. No need to show it here.
atok = asttokens.ASTTokens(source, parse=True)
tokens = [Token(atok_name(z), atok_value(z)) for z in atok.tokens]
Given this list of Token objects, it's a snap to create the token lists:
for node in asttokens.util.walk(atok.tree):
for ast_token in atok.get_tokens(node, include_extra=True):
i = ast_token.index
token = tokens[i]
token.node_list.append(node)
That's
all there is to it. It's also straightforward to inject parent/child
links into ast nodes. See the actual script for details.
Summary
It
takes deep insight to realize that asttokens could replace the TOG and
TOT classes. This is the reason I was happy to see this possibility.
In
any event, the TOG and TOT classes are still valuable. They are faster
and clearer (in most ways) than the asttokens code. Otoh, the asttokens
code could be said to be more clever. The new insights promise new ways
to simplify the code in leoAst.py using clever asttokens code.
Edward