This Engineering Notebook post will discuss the difficulties that any python importer must face. To state my conclusions first:
1. Generating the proper whitespace before @others correctly in all cases requires:
A: Some form of look-ahead, or equivalently, delayed code generation.
B: What amounts to a full parse of def and class lines.
2. I am willing to let the importer assume 4-space indentation for @others in class nodes. In effect, this is what the legacy Py_Importer class does!
Background
Vitalije's new importer has trouble importing mypy/test-data/stdlib-samples/3.2/test/test_textwrap.py. The file is imported perfectly, but many nodes are over-indented due to missing indentation in `@others` directives in the class nodes.
The relevant code in the mknode function is:
o = indent('@others\n', ind-l_ind)
...
p.b = f'{b1}{o}{b2}'
Alas, the value ind-l_ind won't work in all cases! Instead, I suggest using the value 4 for all classes :-) That's exactly what the legacy importer does!
Yes, this would break the strangely-indented unit tests, but I'm willing to live with that.
The heroic alternative
Generating the correct indentation for @others in all cases is much more difficult. Indeed, the indentation of the @others line must be the indentation of the first significant line following the class or def line. The first significant line is the first line that is not:
- A blank or a comment.
- In a string.
The legacy
Py_Importer class detects such lines fairly easily. It is the first non-blank, non-comment line for which Python_ScanState.in_context returns False:
def in_context(self):
"""True if in a special context."""
return (
self.context or
self.curlies > 0 or # Open curly brackets
self.parens > 0 or # Open parentheses.
self.squares > 0 or # Open square brackets
self.bs_nl # In backslash/newline.
)
Ironically, having gone through all this trouble, my legacy importer still assumes 4-space indentation! In theory, the importer could get the indentation right. In practice, it's dashed difficult to do so!
The split_root functions (or its helpers) would also have to find the first significant line of a class! In effect, the new importer would have to do a full parse of the entire class or def line.
Summary
The python importer contains analogs of all the phases of an optimizing compiler. The incoming code must be tokenized and maybe even parsed. Code generation will never be easy.
In class or def nodes, the leading whitespace of @others directive should be the leading whitespace of the first significant line of the class or def. Finding the first significant line of a class or def requires a full parse.
Importers can avoid the parse phase only if they assume 4-space indentation! I am willing to make this concession, and I am willing to abandon (parts of) the unit tests for strangely-indented code.
Edward