ENB: Improving Leo's importers

33 views
Skip to first unread message

Edward K. Ream

unread,
Aug 12, 2025, 8:06:44 AMAug 12
to leo-editor
This ENB discusses possible fixes to #4419. It is mostly notes to myself and discusses experimental ideas. Feel free to ignore it!

Executive Summary

x.gen_blocks generates a tree of nested blocks. Instead, it might create a list of sequential blocks.

This new scheme greatly simplifies complex code, but several new and old problems must be resolved.

Background

This thread started the train of thought. i.preprocess_blocks moves lines between blocks just by incrementing two ints! As a result, my attention rested on the i.Block class.

I had forgotten that  x.gen_blocks generates a tree of nested blocks. This seemed unnecessarily clumsy. After all, the key invariant is that blocks must cover the source code without gaps.

What was going on? Well, I was focused on accuracy above simplicity. But the resultant complexity obscured possible simplifications.

T​he Aha!

Generating nested blocks is tricky, as you can see from the base i.gen_block method. Furthermore, i.gen_block relies on i.find_blocks, another complex method! Worse, the overridden python_i.find_blocks is truly hairy.

Why did I choose this approach? Because it allows all importers to know the end of each block. You would think this would be essential information!

But Aha! Suppose each block ends with the start of the next block? At first glance, this would be a perfect solution!

Finding blocks becomes much more straightforward. See the experimental(!) version of python_i.gen_block in PR #4422. This version replaces the (complex!) legacy helpers (python_i.find_blocks and python_i.find_end_of_block) with the much simpler python_i.find_start_of_body.

But the new scheme won't work immediately. Blocks representing classes will end with their first method! Such blocks will be equivalent (in some ill-defined sense) to:

class MyClass...:
    <preamble>  # Everything before the first method.
    @others

This won't do. We must generate something like this:

class MyClass...:
    <preamble>
    @others
    <postamble> # Anything following the last method.

Finally, let's discuss underindented lines. This is an existing problem that can't be solved in a truly satisfactory way because Leo no longer has a way of representing underindented lines. Leo's legacy importers handle such lines by generating code like this:

class MyClass...:
    <preamble>
@others
<postamble>

The relevant code is buried deep in the weeds.

Imo, the new importers must handle underindented lines in the same way.

Summary

Using a list of sequential (non-nested) blocks promises to simplify all importers, not just the Python importer.

However, the PR is highly experimental. Complications may appear. We'll see how this scheme pans out.

Edward

Thomas Passin

unread,
Aug 12, 2025, 9:26:23 AMAug 12
to leo-editor
I suppose that the main trick is to understand what is in a postamble and how to know that it isn't part of the last method of a class.  Aside from that, the indentation of an "@others" line (and a "<< section >>" line) would equal the indentation of the first line of their text. This would leave uncertain the right location for a comment that wasn't indented enough by mistake of the author.  One would not want to end a block too soon because of a minor typo like that. 

Edward K. Ream

unread,
Aug 12, 2025, 1:04:35 PMAug 12
to leo-e...@googlegroups.com
On Tue, Aug 12, 2025 at 8:26 AM Thomas Passin wrote:

I suppose that the main trick is to understand what is in a postamble and how to know that it isn't part of the last method of a class.

Python is unique in this regard. Indentation determines structure. For many other languages, the nesting of curly braces determines structure.

Aside from that, the indentation of an "@others" line (and a "<< section >>" line) would equal the indentation of the first line of their text.

Not quite. The indentation of @others must be the maximum common indentation of all methods (child nodes). An underindented line typically means that @others must have no indentation.

Edward

Edward K. Ream

unread,
Aug 12, 2025, 1:15:45 PMAug 12
to leo-editor
On Tuesday, August 12, 2025 at 7:06:44 AM UTC-5 Edward K. Ream wrote:

This ENB discusses possible fixes to #4419. It is mostly notes to myself and discusses experimental ideas.

A few other notes:

I usually try to fix issues using the smallest amount of code. In this case, however, the possibilities of significantly improving code outweigh my usual practice. 

The Python importer never generates nested defs (functions or methods).  So unless a def is the last method of a class, defs always continue to the next class or def at the same indentation or lower. The legacy importer ignored this invariant.

The Rust importer already handles underindented lines fairly well because of Rust's weird multi-line string literals.

<rant>Something like textwrap.dedent isn't good enough for Rust cowboys. Oh no, let's complicate the syntax!</rant>

Edward

Edward K. Ream

unread,
Aug 13, 2025, 12:18:17 PMAug 13
to leo-editor
On Tuesday, August 12, 2025 at 7:06:44 AM UTC-5 Edward K. Ream wrote:

This ENB discusses possible fixes to #4419. It is mostly notes to myself and discusses experimental ideas.
 ...
x.gen_blocks generates a tree of nested blocks. Instead, it might create a list of sequential blocks.

I have abandoned this project for the following reasons:

-  i.preprocess_blocks is a misleading special case that works because the new parse-body generates a flat list of blocks.
- A flat list of blocks creates more problems than it solves. The existing importers are complicated for good reasons.
- There are only two relatively minor bugs to fix.

Summary

I have closed PR #4422 in favor of  PR #4423.

Edward
Reply all
Reply to author
Forward
0 new messages