Recent improvements to @clean allow Leo to update outlines containing thousands of @clean nodes. For the first time, it is feasible to use Leo to work on huge repos such as
Rust's compiler.
Alas, Leo's performance degrades substantially when using huge outlines. Python's GC (Garbage Collector) probably gets overly stressed by all the temporary data Leo generates.
This Engineering Notebook post explores a possible solution. As always, please feel free to ignore it. However, this ENB presents an exciting new direction for Leo.
@leo nodes would create a hierarchy of Leo outlines
The idea is to let @leo nodes in a top-level outline coordinate operations in linked sub-outlines. For example: rust_compiler.leo (in the rust/compiler directory) would have the following @leo nodes:
@leo rustc/rustc.leo
@leo rustc_abi/rustc_abi.leo
@leo rustc_arena/rustc_arena.leo
And dozens of others. So the top-level outline will be tiny and the sub-outlines will be much smaller. As discussed below, the performance might not improve enough. But let's discuss some exciting ideas first.
Cross-file searches and (maybe??) cross-file clones
Straightforward extensions to Leo's file commands will allow Leonistas to search all subsidiary outlines from the top-level outline! Cross-tab (or inter-process) communication will transfer results from the sub-outlines to the top-level outline. All details are unclear for now.
The details of cross-file cff commands are more complex. Initially, the sub-outlines could communicate the cross-file unls back to the top-level outline. The cff becomes a set of unls. Recall that Leo already supports cross-file unls.
Later, we might consider true cross-file clones. Changing such a clone in the top-level outline would change the corresponding clone in the sub-outline. And vice versa!
But this is not the time to consider how to do this magic. For now, the conclusion is that cross-file clones might make sense, contrary to my decades-old opinion!
Helping the GC?
Now let's turn our attention back to performance issues.
First, let's suppose Leo handles @leo nodes by loading sub-outlines in separate tabs. Does this help the GC? The answer is "yes and no" -)
Yes, each tab contains less data, so operations on the tree and body might become more efficient. But no, the GC has the same amount (and a bit more) to handle. My guess is that putting smaller outlines into separate tabs will have a small (negligible?) effect on performance.
The first prototype
Happily, it will be easy to prototype this initial idea. I'll write a script that:
- Creates @leo nodes for all sub-directories of the rust/compiler directory.
- Creates the corresponding .leo file in each subdirectory.
- Loads (details unclear) each created (subsidiary) .leo file with the desired @clean nodes.
- Creates a list (suitable for the command-line) of files to be loaded.
So a command line like:
leo rust_compiler.leo <list of sub-outlines>
will load all the desired outlines, placing each sub-outline in its own tab. It will then be easy to see how much this scheme improves Leo's performance.
Separate processes instead of separate tabs
Separate tabs might not help enough. In that case, the @leo could load sub-outlines in separate processes instead of separate tabs. This approach will almost surely solve the performance problems. Operating systems are very very good at running separate processes! Each process will run a separate copy of Python with its own GC.
The same general ideas still apply, but now the top-level outline and all the sub-outlines must communicate via Leo's servers. There will probably be one server per process. Leo's server architecture will almost surely need to be extended. Surely such a scheme is feasible, but I have no intuition about the details.
Happily, we can ignore inter-process complications for now. I'll do all my initial experiments using sub-outlines in separate Leo tabs. It should be straightforward to extend Leo's find command using inter-tab communication. Who knows, maybe cross-file clones do make sense!
Summary
@leo nodes will create a hierarchical relationship between a (single) top-level outline and several sub-outlines. For now, we can assume that @leo nodes appear only in the top-level outline. We'll reexamine this question later.
Extensions to Leo's file commands (including the clone-find commands) will allow sub-outlines to send results back to the top-level outlines. Initially, results will be unls. Eventually, these unls might morph into cross-file clones. Changing a cross-file clone in the top-level outline would change the corresponding clone in the sub-outline and vice versa.
Communication between tabs is straightforward, but putting sub-outlines in separate Leo tabs is unlikely to improve Leo's performance enough.
Ultimately, Leo could run each sub-outline in a separate process. This scheme would require substantial updates to Leo's server. For now, I'll extend Leo's find commands using inter-tab communication. Maybe cross-file clones do make sense!
I welcome all your comments, questions, and suggestions. I am excited by this project, and I hope you are too.
Edward