Idea:
use tokenize python module to find where all function/method and class definitions start,
and then use this data to find lines where top level children should start. After creating the top level children, the process can be repeated for all nodes which have more than certain threshold number of lines, generating the second level children.
A little bit of background story (feel free to skip if you just want to see the example code):
A long ago I've tried to solve this problem in more efficient way for importing JavaScript files. I remember looking in the Importer class and the way Leo did imports at the time and feeling that it was too complicated, much more than necessary. I can't say that I've solved this problem in general, but for a very specific case, it worked pretty well.
Recent posts about improving Leo in this area, especially regarding Python, made me think again about this problem.
I strongly feel that the main problem with the current implementation is insisting on the use of scan_line. This is maybe suitable for unification of all other source languages, but it is far from the optimal when we talk about the python source files.
The way I see this problem is to search and find the lines where a new node should start. Whether this node should be indented or not, I would rather leave for the next phase. First of all, the outline structure of my python files which I start from the scratch in Leo usually have in the top level node a few lines, then comes at-others and usually after at-others comes the block with `if __name__ == '__main__':. If I have a lot of imports, then I usually put all imports in one section node `<<imports>>`.
The first line where I would introduce the first child node is actually first function definition or first class definition. Everything before should go directly in the root node. Imports can be extracted later.
Attached to this message is a python script which can be executed inside Leo.
It imports any python module from the standard python library and checks to see if the import is perfect or not. At the moment it just extracts top level definitions in separate nodes, the direct children of root.