Yesterday's ENB provided background for the new python importer. In this ENB post I'll discuss rules (criteria) for allocating lines to nodes.
Initial rules
Imo, there is no right way to allocate incoming lines to nodes. There are judgment calls involved. The spectrum of choices ranges from putting all lines into a single node, to putting each line in its own node :-)
Let's start with the following relatively straightforward rules:
Rule 1. Put every class line in its own node, unindented, like this:
class TheClass:
<lws>@others
Note: The leading whitespace (lws) of the @others line will be the lws of the first inner class or def contained in TheClass.
Rule 2. Similarly, put every def line in its own node, unindented, like this:
def TheMethodOrFunction(...):
<The rest of the Method or function>
Unlike class lines, def nodes do not contain an @others directive. This rule implies that inner (nested) defs and classes will be allocated to the def node. This is a debatable choice, but in practice it should cause few problems. The user can always create more nodes later.
Example
These rules will "just work" for typical python programs. They also work for strangely-indented code such as:
Consider the following (valid!) python fragment:
if 2:
class C1:
# 2 space lws.
def method1(): # 2+3 space lws.
pass # 2+3+4 space lws.
if 4: # 4-space indentation everywhere
def d1():
pass
Rules 1 and 2 require the following node structure (I'll use the MORE format for representing nodes, in which lines starting with '-' denote headlines):
- if 2:
if 2:
@others (2-space lws comes from `class C1`.)
- class C1
class C1: (No lws for this line!)
@others (3-space lws comes from the `def method1`.)
- def method1():
def method1(): (No lws for this line!)
pass (4-space lws in this node)
- if 4: # 4-space indentation everywhere
@others (4-space lws comes from the `def d1():` line.)
- d1
def d1(): (no lws for this line!)
pass (4-space lws in this node)
Discussion
The rules constrain both the code generator and the post-pass strangely-indented code as follows:
1. The "if 2" and "if 4" organizer nodes must exist if class and def nodes are to start with unindented class or def lines.
2. The post pass must not "dedent away" the lws before @others lines in organizer nodes. Happily, the post-pass knows that 'org' nodes are organizer nodes.
I have glossed over some complications, especially:
- Prefix lines: lines that appear before class or def lines at the same indentation level as the class or def lines.
- Tail lines: lines that appear after class or def lines at the same indentation level as the class or def lines.
The code generator might generate organizer nodes for prefix and/or tail lines. The post-pass may then optimize some of those organizer nodes away. What I'll actually do remains to be seen. It's a sticky problem that can't be designed away here.
Summary
Two seemingly simple rules (criteria) for allocating lines to nodes constrain what nodes the code generator must produce. I am happy, for now, with these rules. We shall see how well the rules work.
Edward