According to PR
#2331, I started work on the new python importer 9 days ago. This Engineering Notebook post will discuss what I have done and the remaining difficulties.
vnode_info dictionary
All importers now use a vnode_info dict instead of injecting the _import_lines ivar into vnodes. Keys are vnodes; values are inner dictionaries.
The inner dictionary contains at least one key/value pair:
"lines": <list of lines for the vnode>.
VNodes use
slots, so the vnode_info dict
slightly reduces the descriptor memory required in all vnodes. More importantly, the vnode_info dict allows the python importer to contain other key/value pairs.
Stackless python importer
Previously, all importers, including the python importer, used a stack that mirrored the structure of the imported nodes that the importers created. Keeping the stack in sync with created nodes is tricky. Aha! Maybe the stack isn't needed! The vnode_info dict may suffice. The python importer uses an inner dict with these keys:
{
'@others': <True: lines contains @others>,
'indent': <The node's indentation, see below>,
'kind': <one of 'outer', 'org', 'class', 'def'>,
'lines': < list of lines for the vnode>,
}
Instead of getting these values from the stack, the importer will get these values from the generated nodes. For example, in the main importer loop the p var points at the node being generated. So info_dict [p.parent().v] contains the data for p's parent and
info_dict [p.back().v] contains the data for p's previous sibling, if any.
I think this new organization will work, but there are no guarantees. If necessary, I'll revert to the old stack-based architecture, with all of its complexities.
The python importer is inherently complex
Aha! The python importer is intrinsically at least as complex as the javascript importer, and perhaps more so! This complexity has been quite a shock!
How can this be? Doesn't python impose strict standards for indentation and structure?
Strangely indented lines
Alas, the answer is "yes and no." :-) Most of the time python classes, methods, and functions follow a simple format. But not always! For example, the following is a valid python program!
Try it!
if 1:
print('indent 1')
if 2:
print('indent 2')
if 3:
print('indent 3')
if 4:
print('indent 4')
if 5:
print('indent 5')
Who would do such a thing, you ask? Well, mypy unit tests, for one. Those unit tests contain other strange (valid!) constructions.
Furthermore, one could replace the "print" statements above with "class" or "def" statements, and one could imagine similar strange "if" statements within the range of a class definition!
Important: strangely-indented lines can only happen within the range
of compound statements such as "if", "for", "while", and "with", etc. But "class" and "def" statements are also compound statements in this sense! It's quite a mess.
Keeping track of indentation
In short, the python importer can not assume anything about what indentation may be in effect in the range of a class definition!
As noted above, the python importer assigns a vnode kind for each generated vnode. The valid (string) values are outer, org, class, and def. Hmm., As I write this, perhaps the importer should use "method" and "function" kinds instead of the generic "def" kind.
The "org" kind should allow the python importer to handle strangely-indented lines. Indeed, python does not allow complete chaos! For example, the following is a syntax error:
class Class1:
def method1(): # 4-space indentation
pass # 8-space indentation.
def method2(): # 6-space indentation.
pass
Python gives this error:
def method2(): # 6-space indentation.
^
IndentationError: unindent does not match any outer indentation level
That is, the first statement in the range of the class determines the allowed indentation for all other statements of the class, including compound statements. Presumably, the 'indent' value for "class" nodes will be the allowed indentation, but perhaps the vnode_info dict should contain two indent-related keys. See below.
Underindented lines
A further complication involves so-called underindented lines, that is, lines that Leo can not represent properly using the natural node structure. Leo uses an ugly escape convention to represent such lines. Most Leonistas probably have never seen the escape convention, but Leo does support it.
At present, the python importer's perfect-import check allows leading whitespace to be added to otherwise underindented comment lines (only). Imo, adding this extra whitespace is preferable to using the underindented convention, but I might change my mind.
Removing common leading whitespace
Importer.undent removes leading whitespace from generated nodes. i.undent calculates the greatest leading whitespace in the entire node and removes this whitespace from all lines of the nodes, inserting the underindented escape sequence as necessary!
The python importer will likely override i.undent (python_i.undent) so as to never insert the underindented escape sequence. Perhaps textwrap.dedent can be used, but that assumes that all strangely-indented nodes are under the range of an `@others` directive that is indented by exactly the amount that textwrap.dedent will (eventually) remove!
So there are a lot of constraints involved in generating nodes!
Aha! The post pass can use the vnode_info dict
As I write this, I see that the vnode_info dict has another advantage over the stack-based architecture. The vnode_info dict is available to (the possibly overridden) undent method. Perhaps the vnode_info dict might have two indentation-related keys. We shall see.
Summary
Surprisingly, the python importer is inherently the most complex importer of all.
Organizer nodes will allow the importer to handle even the most bizarre strange-indented nodes. However, generating the necessary organizer nodes has stumped me for several days. The task is far from easy.
The base Importer class defines the architecture of all importers. There is no need to improve this architecture! In particular, the line-by-line nature of the gen_lines method ensures that all importers, including the python importer, will be close to as fast as possible. There is no need to worry about the speed of the python importer!
To sum up: the task is to ensure the perfect import of all valid python programs, regardless of indentation quirks.
Edward
P.S. As I write this I see that the underindented escape convention seems not to be documented. Searching for "underindentEscapeString" in leoPy.leo will show the relevant code.
EKR