New importers: status report

11 views
Skip to first unread message

Edward K. Ream

unread,
May 17, 2023, 6:52:40 AM5/17/23
to leo-editor
Two recent PRs are now part of devel. Work continues on PR #3347, optimistically called finish all importers :-) The mods are picky and tedious, but there have been only happy surprises!

The new design

The new design is a great success. Most importers contain only a few class-level constants:

- The name of the importer: used in @language directives.
- A tuple of compiled patterns: used to discover the start of blocks.

For example, here are the top-level constants for the C language:

language = 'c'
string_list = ['"']  # Not single quotes.

block_patterns = (
    ('class', re.compile(r'.*?\bclass\s+(\w+)\s*\{')),
    ('func', re.compile(r'.*?\b(\w+)\s*\(.*?\)\s*(const)?\s*{')),
    ('namespace', re.compile(r'.*?\bnamespace\s*(\w+)?\s*\{')),
    ('struct', re.compile(r'.*?\bstruct\s*(\w+)?\s*\{')),
)

In short,  regex patterns encapsulate most differences between languages!

Finding blocks

The patterns above will fail if the parameter list spans multiple lines.

Aha! Have the patterns match less. find_blocks will validate potential matches by looking ahead by (say) at most five lines for the opening curly bracket.

Simple generality

Previous importer architectures were overly clever. Importers overrode too many methods of the base class. In this go-round, importers override just three:

- i.import_from_string: These importers do all the work.
- i.find_end_of_block: typical.
- i.find_block: for larger mods.

Several importers override no methods at all. The top-level class constants do all the work!

Summary

Most importers detect the start of blocks using regex patterns.

Aha! Use simpler regexes (for C-like languages) to detect potential starting points. i.find_blocks will look ahead (skipping the argument list) to find the opening curly brace.

Previous importer architectures were overly complicated. The new importers typically override just one or two methods. Several importers override no methods at all: top-level constants do all the work.

Edward

Edward K. Ream

unread,
May 17, 2023, 6:55:17 AM5/17/23
to leo-editor
On Wednesday, May 17, 2023 at 5:52:40 AM UTC-5 Edward K. Ream wrote:

Previous importer architectures were overly clever. Importers overrode too many methods of the base class. In this go-round, importers override just three:

- i.import_from_string: These importers do all the work.
- i.find_end_of_block: typical.
- i.find_block: for larger mods.

Oops, I meant to say i.find_blocks, plural. There is no find_block method.

Edward
Reply all
Reply to author
Forward
0 new messages