ENB: The python importer

59 views
Skip to first unread message

Edward K. Ream

unread,
Nov 14, 2021, 8:17:02 AM11/14/21
to leo-editor
#2327 suggests improving the python importer. This issue is scheduled for Leo 6.7, but it may make it into 6.6.

This Engineering Notebook post consists of notes to myself.  Feel free to ignore.

Defects

I've put up with the following defects of the python importer for way too long:

1. Perfect import checks can fail due to underindented lines, especially comment lines.
2. Perfect import checks can fail due to misplaced (or missing?) decorator lines.
3. The python post-pass code is a bit of a mess.
4. The python importer's main line, py_i.gen_lines, may be more complex than necessary.

The causes of complexity

The python importer is far more complex than the python-to-typescript command:

- The importer must, without fail, handle strings and docstrings properly. This requirement entails the entire token-state logic.
- Decorators complicate how the code recognizes the start and end of classes and functions.
- Allocating "in-between" lines (lines between functions or methods) is always going to be tricky.  Presumably, the post pass will play a part.

The goals

The python importer must import all python files exactly, except for underindented comment lines.

The python importer should be allowed to adjust underindented comment lines by adding needed indentation.

The python perfect import checks should warn about adjusted comment lines, but the python importer should not insert @ignore directives merely due to adjusted comment lines.

Open questions

1. Which code should handle decorators: the main line or the post pass?

2. Which code should handle in-between lines?

Summary

The python importer is shockingly inadequate:

- The importer inserts @ignore directives for way too many files.
- The post pass does not account for the indentation level implied by @others directives. Indented @others directives increase this level.
- The code that handles decorators is complex and buggy.
- The post-pass needs a redesign.

The overall design and organization of Leo's importers will remain completely unchanged. Only the Py_Importer class will change.

I'll fix #2327 for Leo 6.6 if possible, but I shall not hurry this project.

Edward

jkn

unread,
Nov 15, 2021, 9:17:50 AM11/15/21
to leo-editor
I'll watch this with interest, if only because one project for my 'copious free time' is adding an importer for a QnD note-taking format I use myself.

I started on this a while ago but got a bit bogged down in the details of the current mechanism, and have never got back to it (typical!)

J^n

Edward K. Ream

unread,
Nov 15, 2021, 11:20:53 AM11/15/21
to leo-editor
On Mon, Nov 15, 2021 at 8:17 AM jkn <jkn...@nicorp.f9.co.uk> wrote:
I'll watch this with interest, if only because one project for my 'copious free time' is adding an importer for a QnD note-taking format I use myself.

I started on this a while ago but got a bit bogged down in the details of the current mechanism, and have never got back to it (typical!)

Thanks for this note. There was a serious bug in the base Importer class, in the undent method.  Other than that, all changes should be to the Py_Importer subclass.

All importers use the same basic strategy. Each importer handles the input line-by-line. The tokenizer subclasses know all about tokens such as strings and multi-line comments. Tokenizing each line is essential to avoid mistaking strings for syntactic entities.

The main line of each importer is the gen_lines method. Many importers use the base Importer.gen_lines method. The python importer does not, because indentation matters so much.

I am starting to get an inkling that the python version of gen_lines method might be simplified. I am also wondering how much processing to do in gen_lines and how much in the post pass.  We shall see.

Feel free to ask questions about your importer. Leo's importers are part of its crown jewels.

Edward

jkn

unread,
Nov 15, 2021, 1:50:48 PM11/15/21
to leo-editor
Hi Edward
    IIRC my main stumbling block was understanding the gap between 'concrete' uses of the importer subclasses (sorry, haven't got the exact name to hand) and the possibilities given by the more abstract superclass. I was able to make a start by basing my work on a similar subclass, but when that was not quite right it was a bit tricky to work out where to look, whether there was an as-yet unused facility provided by the superclass, etc.

I'm not complaining, I didn't spend long on it and in part I was reminded that I need to be clear in my own mind about what I wanted the output to be!...

Regards, J^n

tbp1...@gmail.com

unread,
Nov 15, 2021, 2:14:34 PM11/15/21
to leo-editor
What is QnD note-taking format?  I couldn't find it with the obvious online searches.

Edward K. Ream

unread,
Nov 15, 2021, 4:22:57 PM11/15/21
to leo-editor
On Mon, Nov 15, 2021 at 12:50 PM jkn <jkn...@nicorp.f9.co.uk> wrote:

    IIRC my main stumbling block was understanding the gap between 'concrete' uses of the importer subclasses

For sure, the importers will never be very easy to understand.  All I can say is that new line-oriented importers are much simpler than the old.

Again, feel free to ask specific questions. I'm deep in the woods with the python importer, so the details are starting to come back ;-)

Edward

jkn

unread,
Nov 15, 2021, 5:27:04 PM11/15/21
to leo-editor
Heh Heh - QnD means 'Quick and Dirty' ;-), ie. it's a format of my own making. It's just an informal format I have used for my 'daily development journal' entries for a long time. It's based on the '{{{' and '}}}' markings that the TDS(*) folding editor Origami used to use.

(*) TDS == Transputer Development System

Edward K. Ream

unread,
Nov 17, 2021, 6:36:30 AM11/17/21
to leo-editor
On Sunday, November 14, 2021 at 7:17:02 AM UTC-6 Edward K. Ream wrote:

> #2327 suggests improving the python importer.

I have spent the last several days revising the python importer. See PR #2331.

Here I'll summarize what I have done and the experiences that have shaped the new code.

Seeing with new eyes

We often say that it's valuable to look at a project "with fresh eyes". But what does that mean?  At least two factors are involved:

1. We have forgotten details about the original code. This gives us a clearer sense of the big picture, and may suggest substantial revisions.

2. We have had new experiences, seemingly unrelated to the project at hand, that suggest improvements.  For this project, the "new experiences" also include an appreciation of how inadequate the python importer is :-)

Applying new techniques

The following "recent" principles have helped improve the python importer, especially python_i.gen_lines, the main line of the importer.

- Eliminate faux helpers, even if it means duplicating code.  This is an amazingly potent guide. In particular, the evil cut_stack helper obscures and complicates the logic.

- Make "if" statements explicit in the main loop, even if it complicates the visual appearance slightly.  The value of this became apparent in revising FastAtRead.scan_lines.

- Use test-driven development.  This is possible now for the first time!

Aha: Use two or more passes in gen_lines

The original version tried to do everything in one go.  Now, the first pass is concerned primarily with splitting lines into nodes so as to pass perfect import checks. An as-yet-unwritten second pass with split/merge nodes and reassign lines.

Yes, the base Importer class supports a so-called post-pass, but doing the second pass directly in gen_lines is clearer and more flexible.

Other improvements

The existing unit tests cover unusual (valid!) python code. The first pass no longer creates separate nodes for nested defs or for functions (non-methods).  Instead, the lines within such defs are "allocated" to the (existing) node presently being accumulated. This removes several sticky special cases.

Similarly, the first pass no longer puts "prefix lines" (lines appearing before the first class line) in a separate node.  The plan is to create a section for prefix lines if their length exceeds a threshold of 10 lines or so.

I've improved the three relevant unit testing classes:

- Improved LeoUnitTest.dump_tree.
- Improved BaseTestImporter.run_test.
- Added the TestPython.check_tree switch.
- Added @command test-one, for running just one unit test.
  This command makes it convenient to single-step through failing code.
- Coming soon: easier tests that generated outline has the expected structure and contents.
- Maybe: separate perfect-import tests following pass 1 and pass 2.

Summary

The rewritten python importer will likely be part of Leo 6.6. This should be safe: the existing unit tests cover all the likely failures.

I might remove the evil cut_stack helper from all importers, but this might not happen for 6.6.

Edward

Edward K. Ream

unread,
Nov 18, 2021, 4:08:15 AM11/18/21
to leo-editor
On Wednesday, November 17, 2021 at 5:36:30 AM UTC-6 Edward K. Ream wrote:

- Added @command test-one, for running just one unit test.

The body is just:

g.run_unit_tests('leo.unittests.core.test_leoImport.TestPython.<<name of test>>')

That is,

g.run_unit_tests('leo.unittests.<<directory>>.<<test module>>.<<test class>>.<<name of test>>')

Edward
Reply all
Reply to author
Forward
0 new messages