This Engineering Notebook post describes a spectacular collapse in complexity in leoTokens.py, Leo's new beautifier.
Last Sunday, January 28, I rewrote leoTokens.py. I had struggled with this project for four weeks, but within 24 hours, the project was essentially complete! See the PostScript for the log.
A dead simple token-based scanner replaces a horribly complex token-based recursive-descent parser. Computing token ranges bedeviled the parser. The scanner avoids all those complications. The new scanner knows almost nothing about Python syntax!
The pre_scan method and its helpers replace the entire parser. The pre_scan method calls three finishers, finish_arg, finish_dict and finish_slice. The finishers salvage the semantics from the old parser. These methods practically wrote themselves.
Origin of the Aha
Big Ahas change the mental landscape so thoroughly that reconstructing their genesis becomes impossible. My best guess: weeks of immersion in the old code subconsciously showed me that only the precursors to the finishers were worth saving. That gave me the courage to start again.
For the last week, my subconscious has been screaming at me. Its criticisms were varied, personal, and insulting. Those criticisms were off the mark, but the message was valid: do something different!
Using ChatGPT might likely have prevented the Aha. I had to struggle with the doomed code first! Otherwise, I would not have gained the deep knowledge required to see the way forward.
Comparison with other breakthroughs
Most code collapses arise from a long sequence of methodical, incremental simplifications. This Aha was different. I suddenly "just knew" that parsing was the wrong approach.
I can think of only two comparable flashes in Leo's history:
@clean: Aha! Leo can use the outline instead of shadow files. This insight happened when I was working on another project!
Leo's importers: Aha! Guide lines eliminate all difficulties in handling comments and strings.
Feeble unit tests
PR #3773 removes recent unit tests. These tests tested the parser instead of the intended results. The PR gets to 100% coverage without these feeble tests.
A coding one-off
The pre_scan method is a one-time trick. Tools such as mypy, pylint, or mypy must use a parse tree. Even the super-fast pyflakes tool uses a parse tree. pyflakes is so fast because Python's ast.parse is essentially C code.
Summary
I feel like a mathematician who has discovered an unexpectedly elementary proof of a complex theorem.
The code pattern seems limited to beautification. Other language tools must use parse trees. Still, the Aha is a metaphor for possibilities hiding in plain sight. That's something!
Edward
P.S. Here is the log of the first 24 hours of work on the scanner:
I saw the way forward Sunday afternoon. The first commit of the rewrite was rev e8a4224 (first draft of a simple scanner) at 13:13:01 on Sunday afternoon.
Just 24 hours later, at 13:46:37 on Monday, rev 3662d55 completed the project. Only a few packaging details remain. It was quite a day.
EKR