ENB: About new python importer

vitalije

unread,

Dec 8, 2021, 3:16:44 PM12/8/21

to leo-editor

The import has two distinct phases. In the first phase, necessary data is being calculated
and in the second phase, outline is built using calculated data.

First of all we have a list of source code lines as an input argument and it is
visible to all inner functions. These lines are tokenized using tokenize module
and a list of all tokens is created which is also available to all inner functions.
Then the tokens are separated in groups according to the line number where the
token originates from.

Then using all this lists a new list is created. For each definition in the file
(i.e. each function/method and each class) a tuple is created with the several
key values.

First value is starting column of this definition. Then follows the number of
the first line that should go in the node with this definition. For example
if we have decorator on the function, or some comments written just above the
function, those lines should be included in the node. In the given example it
is line number 163 which contains a decorator.

Then follows a number of the line where the declaration line ends. In the given
example, the argument list is long so it is divided in two lines. So, the
keyword def is on the line 164, but the colon is at the end of line 165.
When we later build body for this node, those lines should not be indented.
Their indentation in the file is the same as the indentation of the at-others
directive in the parent node.

Then we have two string values: kind (def or class) and the name of the function,
class or method which will used to set the node headline. The name is followed
by two indentation numbers. The first is the indentation of the function body.
The other is the longest common leading white space for the entire function body.
This value is the indentation of at-others that we would put in this node.
In case of under indented comments this value is less than the indentation of the
function body. When there are no under indented comments these two values are
equal.

The last element of this tuple is the line number of the first line that comes
after this node. In the example it is 185.

The getdefn function calculates these tuples for each definition in the file.
All these tuples are gathered in the list `definitions`.

Finally after the `definitions` list has been calculated, we can start to make
outline. The `mknode` function handles this task. It starts with filtering
the definitions list. Only the definitions that start on the certain column
are used to generate the first level children. For each child, this function
makes another selection of definitions which start on the same column as the
function body of this node. Then it calls mknode recursively if there are
inner definitions and if the total number of lines is higher than the certain
threshold. If there are no inner definitions, or if the total number of lines
is small, node body is set immediately and process continues with the next
sibling definition.

If there are some gap in between two consecutive definitions, a declarations
node is inserted containing missed lines.

I hope this explains well how function achieve its goals. As a help for hacking
function returns calculated list of the definitions. I've dumped this list many
times while tweaking and fixing issues with the import.

vitalije

unread,

Dec 8, 2021, 3:18:43 PM12/8/21

to leo-editor

Example: 163|@contextmanager
164|def generate_guarded(mod: str, target: str,
165| ignore_errors: bool = True) -> Iterator[None]:
166| """Ignore or report errors during stub generation.
( col=0 167|
, h1=163 168| Optionally report success.
, h2=165 169| """
, start_b=166 170| if verbose:
, kind='def' 171| print('Processing %s' % mod)
, name='generate_guarded' 172| try:
, c_ind=4 173| yield
, b_ind=4 174| except Exception as e:
, end_b=185 175| if not ignore_errors:
) 176| raise e
177| else:
178| # --ignore-errors was passed
179| print("Stub generation failed for", mod, file=sys.stderr)
180| else:
181| if verbose:
182| print('Created %s' % target)
183|
184|
185|PY2_MODULES = {'cStringIO', 'urlparse', 'collections.UserDict'}

Edward K. Ream

unread,

Dec 9, 2021, 9:29:24 AM12/9/21

to leo-editor

On Wed, Dec 8, 2021 at 2:16 PM vitalije <vita...@gmail.com> wrote:

The import has two distinct phases. In the first phase, necessary data is being calculated
and in the second phase, outline is built using calculated data.