ENB: Thoughts on the @clean bug

22 views

Skip to first unread message

Edward K. Ream

unread,

Aug 5, 2025, 8:27:25 AMAug 5

to leo-editor

Leo 6.8.6.1 backed out of buggy code (see #4404) related to @clean. This Engineering Notebook post discusses what went wrong and possible next steps.

Background

The idea "seemed like a good idea at the time"™. The @clean update algorithm knows nothing about computer languages! In particular, the update algorithm will place each new method in an already existing node. The idea was to use Leo's importers to split the node so that each node contains a separate method.

The offending code

The buggy code appeared at the end of at.readOneAtCleanNode:

# Existing code.

if changed_vnodes:
c.setChanged(force=True)
root.v.setDirty()
at.changed_roots.append(root.copy())

# New code part 1: call the correct importer.

for v in changed_vnodes:
at.do_changed_vnode(fileName, root, v)

# New code part 2: Clean up the importers' mess

at.delete_empty_changed_organizers(root)
at.move_leading_blank_lines(root)

I've shown the existing code for context. There are two parts to the new code. The first calls do_changed_vnode for each changed node. do_changed_vnode calls the appropriate importer to try to "improve" the node. The second part attempts to clean up changes that the importer may have made.

Both parts are severely misguided. However, the second part was probably responsible for the data loss reported by #4404.

Possible solutions

A little thought shows that this scheme must never be used with html or xml files. Why? Because neither language provides functions or methods. So neither language defines natural node boundaries. Instead, the user can define elements (in @data settings) that the importer is supposed to assign to separate nodes. This is a pure hack. The importers create nodes that are likely to bear no relationship to the nodes that Leonistas create by hand. End of story!

Next, let's banish the thought of cleaning up after splitting nodes. What we want is a scheme that will split nodes simply and directly. In retrospect, using importers to split the nodes was a wretched idea. It's overkill and it doesn't work.

Instead, we can imagine a set of language-dependent node splitters that would scan changed nodes and split the node if it contains more than one (top-level) function or method. That task doesn't seem challenging!

Summary

Leo's importers are the wrong tools to split nodes. They won't work at all for html and xml files, and they create cruft that must be cleaned up later.

Instead, Leo could implement node splitters that would split nodes directly. Leo's importer files are the natural place for these splitters, but these splitters would be entirely separate from the existing importers.

I might experiment with node splitters, but such a project isn't urgent.

All your comments are welcome.

Edward

Edward K. Ream

unread,

Aug 5, 2025, 2:30:47 PMAug 5

to leo-editor

On Tuesday, August 5, 2025 at 7:27:25 AM UTC-5 Edward K. Ream wrote:

Leo's importers are the wrong tools to split nodes. They won't work at all for html and xml files, and they create cruft that must be cleaned up later.

Instead, Leo could implement node splitters that would split nodes directly. Leo's importer files are the natural place for these splitters, but these splitters would be entirely separate from the existing importers.

I might experiment with node splitters, but such a project isn't urgent.

Heh. It's often easier to do something than to try not to think about it :-) I have just create #4411.