ENB: Designing the code generator

35 views
Skip to first unread message

Edward K. Ream

unread,
Nov 24, 2021, 9:06:21 AM11/24/21
to leo-editor
Yesterday's ENB provided background for the new python importer.  In this ENB post I'll discuss rules (criteria) for allocating lines to nodes.

Initial rules

Imo, there is no right way to allocate incoming lines to nodes. There are judgment calls involved. The spectrum of choices ranges from putting all lines into a single node, to putting each line in its own node :-)

Let's start with the following relatively straightforward rules:

Rule 1. Put every class line in its own node, unindented, like this:

class TheClass:
<lws>@others

Note: The leading whitespace (lws) of the @others line will be the lws of the first inner class or def contained in TheClass.

Rule 2. Similarly, put every def line in its own node, unindented, like this:

def TheMethodOrFunction(...):
<The rest of the Method or function>

Unlike class lines, def nodes do not contain an @others directive. This rule implies that inner (nested) defs and classes will be allocated to the def node.  This is a debatable choice, but in practice it should cause few problems.  The user can always create more nodes later.

Example

These rules will "just work" for typical python programs. They also work for strangely-indented code such as:

Consider the following (valid!) python fragment:

if 2:
  class C1: # 2 space lws.
     def method1(): # 2+3 space lws.
         pass  # 2+3+4 space lws.
if 4:  # 4-space indentation everywhere
    def d1():
        pass

Rules 1 and 2 require the following node structure  (I'll use the MORE format for representing nodes, in which lines starting with '-' denote headlines):

- if 2:
  if 2:
    @others  (2-space lws comes from `class C1`.)
  - class C1
    class C1:  (No lws for this line!)
       @others  (3-space lws comes from the `def method1`.)
     - def method1():
       def method1():  (No lws for this line!)
           pass  (4-space lws in this node)
- if 4:  # 4-space indentation everywhere
      @others (4-space lws comes from the `def d1():` line.)
  - d1
    def d1(): (no lws for this line!)
        pass  (4-space lws in this node)

Discussion

The rules constrain both the code generator and the post-pass strangely-indented code as follows:

1. The "if 2" and "if 4" organizer nodes must exist if class and def nodes are to start with unindented class or def lines.

2. The post pass must not "dedent away" the lws before @others lines in organizer nodes. Happily, the post-pass knows that 'org' nodes are organizer nodes.

I have glossed over some complications, especially:

- Prefix lines: lines that appear before class or def lines at the same indentation level as the class or def lines.

- Tail lines: lines that appear after class or def lines at the same indentation level as the class or def lines.

The code generator might generate organizer nodes for prefix and/or tail lines. The post-pass may then optimize some of those organizer nodes away.  What I'll actually do remains to be seen.  It's a sticky problem that can't be designed away here.

Summary

Two seemingly simple rules (criteria) for allocating lines to nodes constrain what nodes the code generator must produce. I am happy, for now, with these rules. We shall see how well the rules work.

Edward

tbp1...@gmail.com

unread,
Nov 24, 2021, 10:20:55 AM11/24/21
to leo-editor
On Wednesday, November 24, 2021 at 9:06:21 AM UTC-5 Edward K. Ream wrote:
Note: The leading whitespace (lws) of the @others line will be the lws of the first inner class or def contained in TheClass.

After a class() line, there are two other non-comment, non-blank lines that could happen before the first def line:

1. A docstring;
2. A class variable assignment

I think the code generator should not overlook these kinds of lines by only looking for a def line.

tbp1...@gmail.com

unread,
Nov 24, 2021, 10:22:14 AM11/24/21
to leo-editor
I meant the first def or inner class line, of course.

Edward K. Ream

unread,
Nov 24, 2021, 10:51:14 AM11/24/21
to leo-editor
I agree.  Decorators are another common construct that should determine indentation.

Your comment gives me the opportunity to point out several things that I omitted in this post.

1. The two rules will be applied recursively. All the constructs you mention will create an organizer node (a node with 'org' kind) without an @others directive. The @others in class node would apply to the entire organizer node.

2. There are special cases that I haven't mentioned. In particular, each file will start with a prefix node (an organizer node without an @others directive) for all leading lines of the file.  This prefix will include the module's docstring, imports, and other initial lines. The post pass will prepend the lines of the prefix node to the start of the root node, providing the size of the prefix node is less than some threshold.

But as I say, all rules (and special cases?) will be applied recursively, so class nodes and def nodes could also have prefix nodes.

3. The main code generator's primary focus must be to ensure that all files are imported perfectly, except possibly for "promoted" under indented comment lines. The post-pass will then optimize the nodes:

- Move decorators at the end of preceding organizer nodes to the start of the next class or def node.
- Move tail lines from one node to another.
- Delete nodes that become empty during the post-pass, or nodes that otherwise would have no effect, such as childless nodes containing only an @others directive.

4. At the code level, the challenge is to avoid an explosion of cases. There are at least two "interlocking" sets of tests: one set is a switch on the preceding node type; the other set is a switch on the relation between the indentation of the present line and the (cumulative) indentation of the (parent) node that contains the "ruling" @others directive. Like I said, the code is tricky.

5. I am using several fairly simple unit tests as development test beds. These tests allow me to try various ways of untangling the interactions between previously generated nodes and indentation levels.  As a side effect, the unit tests should ensure that I have not forgotten the kinds of special cases you mention.

Edward
Reply all
Reply to author
Forward
0 new messages