ENB: leoTokens.rs: Leo's beautifier in Rust

Edward K. Ream

unread,

Apr 17, 2024, 8:40:09 AM4/17/24

to leo-editor

Five days ago, on April 11, I started work on leoTokens.rs, a prototype transliteration of leoTokens.py, Leo's token-based beautifier. This work was my first significant Rust project.

This Engineering notebook post discusses my experiences. I'll also discuss an idea for improving leoTokens.py.

RustPython-Parser

The prototype uses lexer.rs, a Python tokenizer written in Rust. This file is part of the RustPython-Parser project. There are a few problems with lexer.rs, but they did not interfere with the prototype.

Performance

Last night, this prototype reached a milestone by realistically modeling the expected performance:

file name: c:/Repos/leo-editor/leo/core/leoTokens.py

read: 1.08ms

tokenize: 12.23ms

tokens: 10159

Leo's beautifier takes roughly 100ms to do the same, so the Rust prototype is about 8x faster. A production version might only be 5x faster.

Learning Rust

The good news and bad news about Rust are the same: Rust is a very picky language :-) Rust programs must specify much more than Python requires. Otoh, the Rust compiler usually offers superb hints for correcting errors.

I enjoyed being a newbie Rustacean. There were so many newbie-level puzzles to solve. Otoh, I nearly became crazed by the effort!

Last night, I finally realized that aList.clone() keeps the borrow-checker happy when iterating over a list. For example:

for input_token in &self.input_list.clone() {
self.make_output_token(input_token);
}

There were many other Ahas, but this one was a milestone.

Improving Leo's beautifier

Yesterday's work created a list of output tokens from the corresponding list of input tokens. Leo's beautifier does the same. But Aha! A simpler architecture might work:

- Don't generate whitespace input tokens!

Skipping whitespace would simplify the token-based parse.

- Lazily generate the whitespace between tokens.

The output list could be a list of simple strings.

Summary

Learning Rust has been an all-consuming experience.

The prototype is now in the "devel" branch. Look for the node "prototype: leoTokens.rs" in LeoPyRef.leo. It's in the attic.

The Rust code is surprisingly simple. It needs neither lifetime annotations nor generic types.

The final Rust beautifier might be 5x to 8x faster than Leo's beautifier. I'm not sure the speedup is worth the maintenance burden on Leo's (future) devs. In any case, the work has been worthwhile.

The perspective gained suggests a significant improvement to Leo's Python beautifier. I'll be exploring that possibility next before continuing work on leoTokens.py. Stay tuned.

Edward

Edward K. Ream

unread,

Apr 17, 2024, 11:05:28 AM4/17/24

to leo-editor

On Wednesday, April 17, 2024 at 7:40:09 AM UTC-5 Edward K. Ream wrote:

Improving Leo's beautifier

Yesterday's work created a list of output tokens from the corresponding list of input tokens. Leo's beautifier does the same. But Aha! A simpler architecture might work:

- Don't generate whitespace input tokens!

- Lazily generate the whitespace between tokens. The output list could be a list of [Python] strings.

A quick review of leoTokens.py shows that this approach might work. See #3869.

Edward

Edward K. Ream

unread,

Apr 17, 2024, 2:50:11 PM4/17/24

to leo-editor

On Wednesday, April 17, 2024 at 7:40:09 AM UTC-5 Edward K. Ream wrote:

The perspective gained suggests a significant improvement to Leo's Python beautifier.

It may be possible to speed up leoTokens.py significantly.

The speedups would likely apply also to leoTokens.rs, but the faster is Leo's beautifier the less important is a Rust version.