ENB: Could @leo nodes organize other outlines and help the GC?

47 views
Skip to first unread message

Edward K. Ream

unread,
Jul 16, 2025, 11:15:01 AMJul 16
to leo-editor
Recent improvements to @clean allow Leo to update outlines containing thousands of @clean nodes. For the first time, it is feasible to use Leo to work on huge repos such as Rust's compiler.

Alas, Leo's performance degrades substantially when using huge outlines. Python's GC (Garbage Collector) probably gets overly stressed by all the temporary data Leo generates.

This Engineering Notebook post explores a possible solution. As always, please feel free to ignore it. However, this ENB presents an exciting new direction for Leo.

@leo nodes would create a hierarchy of Leo outlines

The idea is to let @leo nodes in a top-level outline coordinate operations in linked sub-outlines. For example: rust_compiler.leo (in the rust/compiler directory) would have the following @leo nodes:

  @leo rustc/rustc.leo
  @leo rustc_abi/rustc_abi.leo
  @leo rustc_arena/rustc_arena.leo

And dozens of others. So the top-level outline will be tiny and the sub-outlines will be much smaller. As discussed below, the performance might not improve enough. But let's discuss some exciting ideas first.

Cross-file searches and (maybe??) cross-file clones

Straightforward extensions to Leo's file commands will allow Leonistas to search all subsidiary outlines from the top-level outline! Cross-tab (or inter-process) communication will transfer results from the sub-outlines to the top-level outline. All details are unclear for now.

The details of cross-file cff commands are more complex. Initially, the sub-outlines could communicate the cross-file unls back to the top-level outline. The cff becomes a set of unls. Recall that Leo already supports cross-file unls.

Later, we might consider true cross-file clones. Changing such a clone in the top-level outline would change the corresponding clone in the sub-outline. And vice versa!

But this is not the time to consider how to do this magic. For now, the conclusion is that cross-file clones might make sense, contrary to my decades-old opinion!

Helping the GC?

Now let's turn our attention back to performance issues.

First, let's suppose Leo handles @leo nodes by loading sub-outlines in separate tabs. Does this help the GC? The answer is "yes and no" -)

Yes, each tab contains less data, so operations on the tree and body might become more efficient. But no, the GC has the same amount (and a bit more) to handle. My guess is that putting smaller outlines into separate tabs will have a small (negligible?) effect on performance.

The first prototype

Happily, it will be easy to prototype this initial idea. I'll write a script that:

- Creates @leo nodes for all sub-directories of the rust/compiler directory.
- Creates the corresponding .leo file in each subdirectory.
- Loads (details unclear) each created (subsidiary) .leo file with the desired @clean nodes.
- Creates a list (suitable for the command-line) of files to be loaded.

So a command line like:

leo rust_compiler.leo <list of sub-outlines>

will load all the desired outlines, placing each sub-outline in its own tab. It will then be easy to see how much this scheme improves Leo's performance.

Separate processes instead of separate tabs

Separate tabs might not help enough. In that case, the @leo could load sub-outlines in separate processes instead of separate tabs. This approach will almost surely solve the performance problems. Operating systems are very very good at running separate processes! Each process will run a separate copy of Python with its own GC.

The same general ideas still apply, but now the top-level outline and all the sub-outlines must communicate via Leo's servers. There will probably be one server per process. Leo's server architecture will almost surely need to be extended. Surely such a scheme is feasible, but I have no intuition about the details.

Happily, we can ignore inter-process complications for now. I'll do all my initial experiments using sub-outlines in separate Leo tabs. It should be straightforward to extend Leo's find command using inter-tab communication. Who knows, maybe cross-file clones do make sense!

Summary

@leo nodes will create a hierarchical relationship between a (single) top-level outline and several sub-outlines. For now, we can assume that @leo nodes appear only in the top-level outline. We'll reexamine this question later.

Extensions to Leo's file commands (including the clone-find commands) will allow sub-outlines to send results back to the top-level outlines. Initially, results will be unls. Eventually, these unls might morph into cross-file clones. Changing a cross-file clone in the top-level outline would change the corresponding clone in the sub-outline and vice versa.

Communication between tabs is straightforward, but putting sub-outlines in separate Leo tabs is unlikely to improve Leo's performance enough.

Ultimately, Leo could run each sub-outline in a separate process. This scheme would require substantial updates to Leo's server. For now, I'll extend Leo's find commands using inter-tab communication. Maybe cross-file clones do make sense!

I welcome all your comments, questions, and suggestions. I am excited by this project, and I hope you are too.

Edward

Thomas Passin

unread,
Jul 16, 2025, 1:02:18 PMJul 16
to leo-editor
You can come close to simulating some of this because any node can contain a list of UNLs.  CTRL-Clicking any of them will open that outline and it should be easy to write a script to open them all.

I foresee some user interface matters to figure out.  For example, all outlines that are open because their parent outline with an @leo node was opened, need to visually show that they are part of a group linked to that parent outline.  And it has be decided what to do if two Leo outlines get opened that contain references to different but overlapping sets of outlines.

Edward K. Ream

unread,
Jul 16, 2025, 2:31:11 PMJul 16
to leo-e...@googlegroups.com
On Wed, Jul 16, 2025 at 12:02 PM Thomas Passin wrote:

You can come close to simulating some of this because any node can contain a list of UNLs.  CTRL-Clicking any of them will open that outline and it should be easy to write a script to open them all.

Yes. But it gets better. See below.
 
I foresee some user interface matters to figure out.  For example, all outlines that are open because their parent outline with an @leo node was opened, need to visually show that they are part of a group linked to that parent outline.  And it has be decided what to do if two Leo outlines get opened that contain references to different but overlapping sets of outlines.

This is probably not necessary because Aha! (In the shower):

From any Leo outline O, it's dead easy to compute the transitive closure of all outlines reached from O via @leo nodes, regardless of the @leo links in any outline. Arbitrary links are just fine!

Here is the algorithm (untested!):

done: list[str] = [c.fileName()]  # List of paths already scanned.
todo: list[str] = []  # List of paths to be scanned.
result: list[Commands] = []  # List of all commanders reachable via @leo nodes from c.

def scan(fileName):
    """
    Add all nodes not already seen to the todo list.
    Add all to-be-scanned Commanders to the result list.
    """
    for p in c.all_unique_positions():
        if p.isAnyAtLeoNode():  # this method doesn't exist yet.
            fileName = p.isAtLeoFileName()
  # this method doesn't exist yet.
            if fileName not in todo and fileName not in done:
                c2 = g.openWithFileName(fileName())
                todo.append(fileName)
                assert c2 not in result
                result.append(c2)
               
# Create the initial to-do list.
scan(c.fileName())
       
# Rescan until the to-do list is empty.
while todo:
    fileName = todo.pop()
    if fileName not in done:
        done.append(fileName)
        scan(fileName)
       
# We probably want to remove c from the result.
result.remove(c)

# And we might want (eventually) to close all the commanders in the result list!

Conclusion

@leo nodes can create arbitrary graphs without any complications!!

A straightforward algorithm (prototyped above) creates the transitive closure of all commanders reached from any outline O via @leo nodes. This algorithm will be essentially the same even if commanders are in different processes.


Edward K. Ream

unread,
Jul 16, 2025, 2:43:27 PMJul 16
to leo-editor
On Wednesday, July 16, 2025 at 1:31:11 PM UTC-5 Edward K. Ream wrote:

From any Leo outline O, it's dead easy to compute the transitive closure of all outlines reached from O via @leo nodes, regardless of the @leo links in any outline. Arbitrary links are just fine!

Here is the algorithm (untested!):

The algorithm is just a bit tricky.  Oops, the binding of 'c' in the scan function is wrong.

I'll debug the script before saying more. Nevertheless, I'm sure that computing the transitive close is straightforward.

Edward

Edward K. Ream

unread,
Jul 17, 2025, 9:04:21 AMJul 17
to leo-editor
On Wednesday, July 16, 2025 at 10:15:01 AM UTC-5 Edward K. Ream wrote:

Recent improvements to @clean allow Leo to update outlines containing thousands of @clean nodes. For the first time, it is feasible to use Leo to work on huge repos such as Rust's compiler.

Leo issue #4396 continues my thinking on this subject. Please read this issue if you are interested in this project.

PR #4397 contains basic support for @leo nodes. That support might be removed. See below.

Initial work is promising. The PR contains a prototype script for creating @leo links between a top-level outline and sub-outlines. This prototype is a good starting point regardless of what happens next.

All ideas and conclusions are preliminary. Explicit @leo nodes (and their implied links) might not be necessary. Perhaps scripts (commands) can calculate those links as needed.  We shall see.

Summary

#4396 and PR #4397 contain the bulk of my work and thinking for now.  Please feel free to comment on either the issue or PR.

I'll create a new post here only if something spectacular arises.

Edward

Edward K. Ream

unread,
Jul 18, 2025, 9:19:04 AMJul 18
to leo-editor
On Wednesday, July 16, 2025 at 10:15:01 AM UTC-5 Edward K. Ream wrote:
Recent improvements to @clean allow Leo to update outlines containing thousands of @clean nodes. For the first time, it is feasible to use Leo to work on huge repos such as Rust's compiler.

Alas, Leo's performance degrades substantially when using huge outlines. Python's GC (Garbage Collector) probably gets overly stressed by all the temporary data Leo generates.

This Engineering Notebook post explores a possible solution.

I now have enough experience with  PR #4397  to know that the features it offers are worthwhile.

However, @leo and its infrastructure will not greatly change the Leonine world.

Splitting outlines is the only way to improve performance

Leo chokes when loading more than about half a dozen outlines into tabs. Indeed, the performance is much worse than when using a single larger file. So much for that idea!

The PR lets user scripts split a large repo into a top-level outline and several sub-outlines, one per sub-outline. A helper, c.makeLinkLeoFiles, does all the heavy lifting. The PR shows the script I have been using for testing.

Sub-outlines help performance because (and only because) they are much smaller than an all-inclusive outline. That's not nothing!

Furthermore, sub-outlines limit the scope of clone find commands to a sub-directory (and all the files in descendant directories). That's a feature or bug, depending on your point of view. But for study purposes, limiting the scope of searches should work just fine.

Conclusions

- @leo is worth doing because splitting a huge outline into sub-outlines ( c.makeLinkLeoFiles) makes sense in some situations. 

- The transitive closure algorithm (c.openAllLinkedFiles) is worth having!

- Imo, there is no reason to consider running Leo in separate processes. The benefits are likely to be negligible.

- Cross-file searches and cross-file clones aren't going to happen.

- I'll merge PR #4397 as soon as Félix approves it.

Edward
Reply all
Reply to author
Forward
0 new messages