ENB: @clean: the way forward

50 views
Skip to first unread message

Edward K. Ream

unread,
Aug 6, 2025, 4:09:41 PMAug 6
to leo-editor

This Engineering Notebook post discusses how to handle large numbers of updated nodes within @clean trees. This post proposes several new commands. It may be of general interest to many Leonistas.


Executive summary


Yesterday, I started work at what I thought would be an automatic way of cleaning up changed @clean nodes. I soon realized that this goal was misguided. To state my conclusions first:


- Automatic cleanups have no chance of working, regardless of new settings or directives.

- Instead, we need several new or improved commands that will make it easier to put code exactly where we want it, when we want it!


Background


I have been wondering how to handle large numbers of updated nodes within @clean trees.


The update algorithm guarantees that Leo will write all updated files without changes, regardless of whether new methods get their own nodes. Accuracy is not the question! Rather, the question is, do we care whether updated nodes aren't what we might have produced by hand?


In many cases, we won't care. Even then, "imperfections" in allocating lines to nodes might not bother us.


Otoh, sometimes the nodes that we create by hand wouldn't match what Leo's importers might produce! Oops. Automatically re-importing nodes might be harmful!


I considered creating a new directive, say @norescan, but that would prevent later changes that we want. Such a directive is a non-starter.


The conclusion is inescapable. Automatic splitting of changed @clean nodes is a bad idea. Neither settings nor new directives will magically update some changed nodes but not others. Manual intervention is the only solution.


New commands


The following new commands will split a single node into one or more sibling nodes:


- split-node-at-importer-blocks &  split-selection-at-importer-blocks

- split-node-at-cursor

- split-selection-at-importer-blocks


The split-x-at-importer-blocks will be simplified versions of Leo's parse-body command. Experiments indicate that these commands should be reliable and useful.


In contrast, these commands allow us to handle groups of nodes:


- mark-all-changed-nodes & mark-all-changed-nodes-in-tree

- split-marked-nodes-at-importer-blocks


Summary


We have been using @clean nodes since 2015 (Leo 5.1). In that time, nobody has ever complained about the update algorithm. This fact shows that no heroic measures are necessary.


There is no way to update many @clean nodes automagically. Neither new settings nor new directives will allow Leonistas to get exactly the nodes they want.


Instead of automatic updates, a set of new and improved commands is the way forward. These commands will allow us to:


- Split a single node as we wish.

- Split groups of nodes as we wish.


These commands will use simplified versions of code in Leo's importers. All splitting commands will create new nodes as siblings of the original node.


The number and design of these new commands are up for discussion.  Your comments are welcome!


Edward


P.S.  Leo's parse-body command is similar to recent efforts. We have tolerated this feeble command only because we seldom use it! An improved version of the parse-body command will resemble the split-node-at-importer-blocks command.


P.P.S. The split-node-at-importer-blocks & split-selection-at-importer-blocks commands will not apply to xml or html. These commands will probably only apply to Python, Rust, and perhaps one or two other languages.


EKR

Thomas Passin

unread,
Aug 6, 2025, 5:23:40 PMAug 6
to leo-editor
I agree with not always splitting nodes for changes.  Someone may have worked hard to get a node tree organized the way they like, and may well want to have a new function or method in the same node as a pre-existing one. Reading a slightly changed @clean file should not change the existing organization.  But a command to help splitting up nodes that the user wants to split, that would be valuable.

Edward K. Ream

unread,
Aug 6, 2025, 7:55:45 PMAug 6
to leo-e...@googlegroups.com
On Wed, Aug 6, 2025 at 4:23 PM Thomas Passin <tbp1...@gmail.com> wrote:
I agree with not always splitting nodes for changes.  Someone may have worked hard to get a node tree organized the way they like, and may well want to have a new function or method in the same node as a pre-existing one. Reading a slightly changed @clean file should not change the existing organization.  But a command to help splitting up nodes that the user wants to split, that would be valuable.

Excellent. I'm glad we agree.

Edward

Edward K. Ream

unread,
Aug 7, 2025, 6:59:47 AMAug 7
to leo-editor
On Wednesday, August 6, 2025 at 3:09:41 PM UTC-5 Edward K. Ream wrote:

This Engineering Notebook post discusses how to handle large numbers of updated nodes within @clean trees.

... 

The following new commands will split a single node into one or more sibling nodes:


- split-node-at-importer-blocks &  split-selection-at-importer-blocks

- split-node-at-cursor

- split-selection-at-importer-blocks

...

- mark-all-changed-nodes & mark-all-changed-nodes-in-tree

- split-marked-nodes-at-importer-blocks


On second thought,  PR #4412 will just rewrite Leo's existing parse-body command.

Leo already has a mark-changed-items command. The PR will add a mark-changed-nodes alias.

Additional commands can wait until (much) later.

Edward


Edward K. Ream

unread,
Aug 8, 2025, 5:59:42 AMAug 8
to leo-editor
On Thursday, August 7, 2025 at 5:59:47 AM UTC-5 Edward K. Ream wrote:
 
> On second thought,  PR #4412 will just rewrite Leo's existing parse-body command.

Yesterday was one of those rare golden days of programming. Everything I did seemed to have second sight.

I started with a vague direction and a limited purpose: to get parse-body working for Python. At day's end, I had a general framework in which supporting a new language required two statements: an import statement and a new dictionary entry.

The result is a thorough re-imagining of Leo's importers. The new parse-body command uses only two importer methods: x.delete_comments_and_strings and x.find_blocks. The base Importer class defines these two methods, so all importers define these methods. The only other requirement is that the importer must define x.block_patterns, a list of tuples. The PR adds this list for the Rust importer.

Along the way, I discovered a spectacularly easy way of moving blank lines from the start of a node (where they are always annoying) to the end of the previous (sibling) node, where they are practically invisible. See ic.preprocess_blocks if you are interested. This is the way it is written in The Book.

I won't generalize the code further. Please let me know if you would like parse-body to support other languages.

Edward

Thomas Passin

unread,
Aug 8, 2025, 8:45:46 AMAug 8
to leo-editor
On Friday, August 8, 2025 at 5:59:42 AM UTC-4 Edward K. Ream wrote:
I started with a vague direction and a limited purpose: to get parse-body working for Python. At day's end, I had a general framework in which supporting a new language required two statements: an import statement and a new dictionary entry.

These two could be specified in an @data node in the settings.  Then new importers could be added without revising any of Leo's files. 

Edward K. Ream

unread,
Aug 8, 2025, 8:47:22 AMAug 8
to leo-e...@googlegroups.com
On Fri, Aug 8, 2025 at 4:59 AM Edward K. Ream <edre...@gmail.com> wrote:

I won't generalize the code further [except maybe to support other languages].

I want to mention an important design issue. The new parse-body command intentionally does as little as possible.

Why? Because perfection is impossible. In this context, perfection isn't even well defined!

For example, parse-body doesn't split classes into methods. Sure, the code could do that, but is that what the user wants? The correct answer is: "it depends".

Félix understood this earlier than I did. It's why he objected to having at.readOneCleanNode "optimize" changed nodes. The same principle applies here.

Summary

Leo's new parse-body command completes @clean. No further features are needed to allow Leonistas to handle huge repos.

Edward

Edward K. Ream

unread,
Aug 8, 2025, 8:50:59 AMAug 8
to leo-e...@googlegroups.com
On Fri, Aug 8, 2025 at 7:45 AM Thomas Passin wrote:

These two [lines] could be specified in an @data node in the settings.  Then new importers could be added without revising any of Leo's files.

I doubt it. Anyway, there are better ways to generalize the code.

To repeat, I'll happily support other languages, but only if someone intends to use parse-body for those languages.

Edward

Edward K. Ream

unread,
Aug 8, 2025, 10:10:40 AMAug 8
to leo-editor
On Friday, August 8, 2025 at 7:50:59 AM UTC-5 Edward K. Ream wrote:
On Fri, Aug 8, 2025 at 7:45 AM Thomas Passin wrote:

These two [lines] could be specified in an @data node in the settings.  Then new importers could be added without revising any of Leo's files.

I doubt it. Anyway, there are better ways to generalize the code.

Hehe. Once again it's easier to do something than to try not to think about it.

A straightforward addition to LM.createImporterData creates the new g.app.importerModulesDict. Keys are language names, as in the @language directive. These are the same keys that ic.parse_body uses.

But the values of the new dict are modules, not importer classes, so more work will be required. I'm on it.

Edward

Edward K. Ream

unread,
Aug 8, 2025, 10:59:15 AMAug 8
to leo-editor
On Friday, August 8, 2025 at 9:10:40 AM UTC-5 Edward K. Ream wrote:

Hehe. Once again it's easier to do something than to try not to think about it.

Thanks, Thomas, for suggesting the generalization! It's quite a collaboration we are having.

Rev 895bdcd adds two new g.app dicts. It's not pretty, but it works.
Rev 8b18e9b generalizes parse-body. It's gorgeous :-) No more restrictions. No more static imports!

Edward
Reply all
Reply to author
Forward
0 new messages