This Engineering Notebook post is a status report about the Ahas discussed
here.
Executive summary
PR
#3330 contains a c++ importer superior in all dimensions to any existing importer. Almost all of Leo's importers will use the new design pattern.
PR #3330 remains a draft because I have just realized I can channel one of Vitalije's ideas to allow multi-line regex (string) searches in a line-oriented context.
Brags
The new code is:
- reliable: The c++ importer flawlessly handles codon's c++ files.
- general: The base Importer class contains all the new methods. Subclasses need only override the find_blocks method!
- stateless: The recursive new_gen_block method generates subtrees naturally.
- simple: All previous infrastructure disappears! All methods are short and easily understandable.
Multi-line regex searches
I was about the declare that the c++ importer was perfect. However, C_Importer.find_blocks is a bit clumsy. I went to bed last night wondering whether I could improve it.
find_blocks applies regexs
line-by-line. Instead, we want (say) to use
Pattern.match on
multi-line strings. Aha! One of Vitalije's techniques provides the answer.
Let guide_string = ''.join(self.guide_lines)
Now compute the self.index_list, a list of tuples (starting_index, ending_index) for each of the guide string's lines.
The lengths of self.lines, self.guide_lines, and self.index_lists are the same!
find_blocks takes two arguments (i1: int, i2: int). Let:
start_i = self.index_list[i1][0]
end_i = self.index_list[i2][1]
It's not necessary to compute self.guide_string [start_i : end_i]! Instead, find_blocks will call pat.match(self.guide_string, i, end_i), where the starting value of i is start_i.
Summary
The new importers are perfect! They are a milestone in Leo's history.
Each importer needs only override i.find_blocks. Nothing could be simpler.
I'll soon improve
C_Importer.find_blocks as described above. I thank Vitalije for showing me how.