Five days ago, on April 11, I started work on leoTokens.rs, a prototype transliteration of leoTokens.py, Leo's token-based beautifier. This work was my first significant Rust project.
This Engineering notebook post discusses my experiences. I'll also discuss an idea for improving leoTokens.py.
RustPython-Parser
The prototype uses lexer.rs, a Python tokenizer written in Rust. This file is part of the RustPython-Parser project. There are a few problems with lexer.rs, but they did not interfere with the prototype.
Performance
Last night, this prototype reached a milestone by realistically modeling the expected performance:
file name: c:/Repos/leo-editor/leo/core/leoTokens.py
read: 1.08ms
tokenize: 12.23ms
tokens: 10159
Leo's beautifier takes roughly 100ms to do the same, so the Rust prototype is about 8x faster. A production version might only be 5x faster.
Learning Rust
The good news and bad news about Rust are the same: Rust is a very picky language :-) Rust programs must specify much more than Python requires. Otoh, the Rust compiler usually offers superb hints for correcting errors.
I enjoyed being a newbie Rustacean. There were so many newbie-level puzzles to solve. Otoh, I nearly became crazed by the effort!
Last night, I finally realized that aList.clone() keeps the borrow-checker happy when iterating over a list. For example:
for input_token in &self.input_list.clone() {
self.make_output_token(input_token);
}
There were many other Ahas, but this one was a milestone.
Improving Leo's beautifier
Yesterday's work created a list of output tokens from the corresponding list of input tokens. Leo's beautifier does the same. But Aha! A simpler architecture might work:
- Don't generate whitespace input tokens!
Skipping whitespace would simplify the token-based parse.
- Lazily generate the whitespace between tokens.
The output list could be a list of simple strings.
Summary
Learning Rust has been an all-consuming experience.
The prototype is now in the "devel" branch. Look for the node "prototype: leoTokens.rs" in LeoPyRef.leo. It's in the attic.
The Rust code is surprisingly simple. It needs neither lifetime annotations nor generic types.
The final Rust beautifier might be 5x to 8x faster than Leo's beautifier. I'm not sure the speedup is worth the maintenance burden on Leo's (future) devs. In any case, the work has been worthwhile.
The perspective gained suggests a significant improvement to Leo's Python beautifier. I'll be exploring that possibility next before continuing work on leoTokens.py. Stay tuned.
Edward
Improving Leo's beautifier
Yesterday's work created a list of output tokens from the corresponding list of input tokens. Leo's beautifier does the same. But Aha! A simpler architecture might work:
- Don't generate whitespace input tokens!
- Lazily generate the whitespace between tokens. The output list could be a list of [Python] strings.
The perspective gained suggests a significant improvement to Leo's Python beautifier.