This Engineering Notebook post reviews the workings of Leo's importers so that I will be clear about the details as I revise the python importers. The architecture of the importers is surprisingly clever, as I'll now explain.
Overview
gen_lines, the main importer loop, splits the incoming lines into nodes, allocating nodes as necessary. gen_lines calls add_line to add a line to a node. The post-pass calls undent on each node to adjust leading whitespace.
Adding lines to nodes
Importers only ever add entire lines to nodes.
In other words, add_line never removes leading whitespace! This clever policy ensures that gen_lines only needs to detect where nodes begin and end, a major simplification!
Removing leading whitespace
The undent method adjusts a node's lws independently of the gen_lines. The python importer overrides the base Importer.undent method. i.undent is complex, possibly buggy, and clearly unsuitable for python.
Py_Importer.undent removes the lws of the first non-blank line of the node. I shall soon change py_i.indent so that it never generates Leo's escape convention:
- It will "promote" underindented comment lines.
- It will cause unit tests to fail for any underindented non-comment line.
Note: neither i.undent nor py_i.undent is the same as textwrap.dedent!
Splitting lines into nodes
gen_lines splits lines into nodes, generating nodes as necessary. Unlike other importers, indentation drives the py_i.gen_lines. Here, a node's indentation means vnode_info [p.v] ['indent']. Similarly for a node's kind.
Case 1: Organizer nodes: kind = 'org'
Organizer nodes contain lines outside of classes and defs. Organizer nodes also handle unusual indentation, including unusually indented class and def lines.
Rule 1: Organizer nodes never contain @others. Naturally, their ancestor nodes could contain @others. So add_lines and py_i.undent should just work for org nodes!
Rule 2:
Org nodes never contain children.
gen_lines sets the indentation of an org node to the indentation of its first non-blank line.
- A class or def line whose lws is less than or equal to the org node's indentation will end the org node.
- A class or def whose lws is greater than the indentation of the org node must reside completely within the org node. This rule is likely the only way of handling unusual indentation!
Case 2: class nodes: kind == 'class'
Most class definitions will occur outside of org nodes. All class nodes will contain an @others directive. The first non-blank line within the class determines:
- the lws of the @others directive and
- the indentation of the class node!
Rule 3: An org node must contain the entire range of an indented class or def that appears outside the range of any class or def node.
As a consequence of python's syntax rules, a parent org node must
already exist. For example, an indented def or class line would be
invalid syntax unless it were already contained in a (top-level) complex
statement such as 'if', 'while', etc.
Case 3: def nodes: kind in ('function', 'method')
Rule 4: def lines without lws will generate function nodes.
Rule 5: Indented def lines appearing with the same indentation as a parent 'class' node will generate method nodes.
Rule 6: Indented def lines appearing at a greater indentation than a parent class node will be included within the containing method node. Imo, this is the only reasonable way of handling inner function definitions.
Rule 7: def lines appearing at a lesser indentation than a parent class node will terminate the class node. In most cases, the def line will then become a method of another parent class node. Per rule 3, if there is no such class node, the def line must be allocated to an already existing parent org node.
Summary
add_lines and undent_lines should work for all nodes, regardless of the node's kind. Happily, the vnode_info dict is available to all methods of the post-pass, if special cases are necessary.
gen_lines assigns an indentation to all generated nodes:
- For org nodes, the indentation is the lws of the first non-blank line.
- For class and def nodes, the indentation is the lws of the first non-blank line following the class or def line.
Unindented class or def lines always generate top-level class or function nodes.
Indented class lines generate class nodes if their lws match the indentation of a parent class node. Otherwise, the class must appear within an already existing org node.
Indented def lines generate method nodes if their lws match the indentation of a parent class node. Otherwise, the def line will appear in an enclosing function or method node. As a last resort, the def line must appear within an already existing org node.
These complex rules are likely buggy. I'll revise them as needed.
Edward