Two days ago, all unit tests passed for Leo's new beautifier in leoTokens.leo. My celebrations were premature. Beautifying Leo's sources revealed unexpected (and unwelcome!) changes.
This Engineering Notebook post summarizes the remaining issues and suggests possible fixes.
Background
Leo's legacy colorizer (in leoAst.py) uses data from parse trees to discover proper spacing around the colon, minus sign, star, and 'import' tokens. leoTokens.py does not have that data, so it must compute new context data to resolve ambiguities.
Both colorizers contain visitors and generators. Visitors handle input tokens; generators create output tokens. The new generator adds scanners that discover context.
Unexpected scanning problems
The new beautifier fails because several ad-hoc scanners can move beyond statement boundaries. The unit tests didn't catch such situations because they focused on GvR's single-line pet peeves.
Solutions
I spent yesterday noodling potential solutions. As I went to bed, I saw that the existing scanners are parts of a recursive-descent parser. The new beautifier needs a good enough parser. This morning, the details became clear:
- New scan_statements and scan_statement methods will discover statement boundaries.
They are the top levels of the parser.
- The problematic scanners will use these boundaries to avoid mistakes.
Summary
New scanners will provide the necessary context to problematic code. Various questions remain, but the new scanners are unlikely to impact performance significantly.
The new scanners complete a good enough parser. The final code will look as though the parser was an obvious choice :-)
Edward