Two recent PRs are now part of devel. Work continues on PR
#3347, optimistically called finish
all importers :-) The mods are picky and tedious, but there have been only happy surprises!
The new design
The new design is a great success. Most importers contain only a few class-level constants:
- The name of the importer: used in @language directives.
- A tuple of compiled patterns: used to discover the start of blocks.
For example, here are the top-level constants for the C language:
language = 'c'
string_list = ['"'] # Not single quotes.
block_patterns = (
('class', re.compile(r'.*?\bclass\s+(\w+)\s*\{')),
('func', re.compile(r'.*?\b(\w+)\s*\(.*?\)\s*(const)?\s*{')),
('namespace', re.compile(r'.*?\bnamespace\s*(\w+)?\s*\{')),
('struct', re.compile(r'.*?\bstruct\s*(\w+)?\s*\{')),
)
In short, regex patterns encapsulate most differences between languages!
Finding blocks
The patterns above will fail if the parameter list spans multiple lines.
Aha! Have the patterns match less. find_blocks will validate potential matches by looking ahead by (say) at most five lines for the opening curly bracket.
Simple generality
Previous importer architectures were overly clever. Importers overrode too many methods of the base class. In this go-round, importers override just three:
- i.import_from_string: These importers do all the work.
- i.find_end_of_block: typical.
- i.find_block: for larger mods.
Several importers override no methods at all. The top-level class constants do all the work!
Summary
Most importers detect the start of blocks using regex patterns.
Aha! Use simpler regexes (for C-like languages) to detect potential starting points. i.find_blocks will look ahead (skipping the argument list) to find the opening curly brace.
Previous importer architectures were overly complicated. The new importers typically override just one or two methods. Several importers override no methods at all: top-level constants do all the work.
Edward