ENB: A token-only beautifier

39 views

Skip to first unread message

Edward K. Ream

unread,

Jan 3, 2024, 12:40:33 PM1/3/24

to leo-editor

This Engineering Notebook post tells how to avoid using Python's parser in Leo's beautifier. Leo issue #3744 provides the background. To summarize:

- The new Orange class (in leoAst.py) will use neither an ast (parse tree) nor Leo's TokenOrderGenerator class.

- The new version will be a prototype for transliterating Orange from Python to Nim.

Writing Python's Parser in Nim is out of the question, but writing Python's tokenizer module in Nim should be straightforward. tokenize.py contains about 500 lines of code. Eliminating the (complex!) TokenOrderGenerator class will save a lot of work.

Discovering context

Four tokens require context knowledge not immediately available from nearby tokens: colons, minus signs, equal signs, and stars ('*' and '**').

The legacy version of the Orange class gets the context from the parse stack: token.node is an Ast node. The new version will use token scanning to discover the larger context.

Function definitions will require a (token-based) scan of the entire statement. This look-ahead scan will start at the "def" token and continue forward. It will patch context data into later colon, equal sign, and star tokens. The look-ahead scan will work like a recursive descent parser. It's no big deal because the scanner will understand only a fraction of Python.

Otherwise, a simple backward scan will find the required context.

Summary

Leo's legacy beautifier required Python's ast module and Leo's TokenOrderGenerator class. This approach was elegant but required complex machinery behind the scenes. It would be infeasible to transliterate the legacy code into the Nim language.

The new (token-based) beautifier will be slightly less elegant but will require no additional support code. The new code will be the basis for a super-fast beautifier written in Nim.