Yesterday I discovered a dead simple design pattern for finding code patterns in python parse trees.
This pattern is a milestone.
It changes forever how I'll use python's
ast module. Old techniques now seem like trying to swat flies with a sledgehammer!
The Aha is as surprising as the
Aha that created @clean. I'll never forget it.
tl;dr: Convert parts of parse trees to strings using ast.unparse.
Background
I have been revisiting an old problem: how to analyze Leo's code. The previous state of the art was unsatisfactory. There were two approaches, each flawed:
1. Use regex patterns. This approach works for prototyping, but there is no way to ignore false matches in strings, docstrings, and comments.
2. Use parse trees. This approach is sound, but finding patterns in ast nodes was cumbersome.
How to get the best of both approaches?
Finally, my work with the rope project made me familiar with a new design pattern: small, special-purpose tree traversers based on
ast.NodeVisitor.
The Problem
I wanted a program to find all chains of ast Attribute nodes within Leo. Examples are g.app.gui and c.frame.body.wrapper.widget.
I packaged this program as the TestChains.test_all_paths unit test in leo/unittests/test_design.py. This is a new file: see PR
#3319.
Early attempts were clumsy, as before. ast.Attribute trees can contain much more than ast.Name and ast.Attribute nodes.
The Aha: use ast.unparse
I had vaguely planned to experiment with
ast.unparse. Given an ast node/tree, ast.unparse returns a string with code
that would produce an equivalent node/tree.
Aha: This string is exactly what we need to discover patterns in parse trees!!!
Here is the complete traverser class that discovers all chains:
class ChainsTraverser(NodeVisitor):
chains_set = set()
def visit_Attribute(self, node):
"""
Add only top-level Attribute chains to chains_set.
Do *not* call generic_visit!
"""
chain = ast.unparse(node)
self.chains_set.add(chain)
OMG! This is amazing:
- unparse returns the entire chain, no matter how complex.
- visit_Attribute handles only top-level attributes because it
doesn't call
ast.generic_visit.
A straightforward unit test (TestChains.test_all_paths) instantiates ChainsTraverser once, calls the traverser for most of Leo's core files, and prints/tests
ChainsTraverser.chains_set. Nothing could be simpler.
Summary
Finding patterns in ast trees is straightforward:
- Use ast visitors to find the roots of ast trees containing potential patterns.
- Convert those roots to strings with ast.unparse.
- Use regexs to filter the results. See the PostScript.
This Aha changes my whole approach to analyzing ast trees.
Edward
P.S. TestChains.test_all_paths filters the resulting chains as follows:
array_pat = re.compile(r'(\[).*?(\])')
call_pat = re.compile(r'(\().*?(\))')
string_pat1 = re.compile(r"(\').*?(\')")
string_pat2 = re.compile(r'(\").*?(\")')
patterns = (array_pat, call_pat, string_pat1, string_pat2)
def filter_chain(s: str) -> str:
def repl(m):
return m.group(1) + m.group(2)
for pattern in patterns:
s = re.sub(pattern, repl, s)
return s
This kind of post-processing would be impossible if the operands were ast trees.
Edward