ENB: Leo's new c++ importer: brags, improvements, and acknowledgments

29 views
Skip to first unread message

Edward K. Ream

unread,
May 14, 2023, 9:32:36 AM5/14/23
to leo-editor
This Engineering Notebook post is a status report about the Ahas discussed here.

Executive summary

PR #3330 contains a c++ importer superior in all dimensions to any existing importer. Almost all of Leo's importers will use the new design pattern.

PR #3330 remains a draft because I have just realized I can channel one of Vitalije's ideas to allow multi-line regex (string) searches in a line-oriented context.

Brags

The new code is:

- reliable: The c++ importer flawlessly handles codon's c++ files.
- general: The base Importer class contains all the new methods. Subclasses need only override the find_blocks method!
- stateless: The recursive new_gen_block method generates subtrees naturally.
- simple: All previous infrastructure disappears! All methods are short and easily understandable.

Multi-line regex searches

I was about the declare that the c++ importer was perfect. However, C_Importer.find_blocks is a bit clumsy. I went to bed last night wondering whether I could improve it.

find_blocks applies regexs line-by-line. Instead, we want (say) to use Pattern.match on multi-line strings. Aha! One of Vitalije's techniques provides the answer.

Let guide_string = ''.join(self.guide_lines)

Now compute the self.index_list, a list of tuples (starting_index, ending_index) for each of the guide string's lines. The lengths of self.lines, self.guide_lines, and self.index_lists are the same!

find_blocks takes two arguments (i1: int, i2: int). Let:

start_i = self.index_list[i1][0]
end_i = self.index_list[i2][1]

It's not necessary to compute self.guide_string [start_i : end_i]! Instead, find_blocks will call pat.match(self.guide_string, i, end_i), where the starting value of i is start_i.

Summary

The new importers are perfect!  They are a milestone in Leo's history.

Each importer needs only override i.find_blocks. Nothing could be simpler.

I'll soon improve C_Importer.find_blocks as described above. I thank Vitalije for showing me how.

Edward

Edward K. Ream

unread,
May 14, 2023, 2:05:24 PM5/14/23
to leo-editor
On Sunday, May 14, 2023 at 8:32:36 AM UTC-5 Edward K. Ream wrote:

PR #3330 remains a draft because I have just realized I can channel one of Vitalije's ideas to allow multi-line regex (string) searches in a line-oriented context.

Experiments show that my idea is dubious for several reasons:

1. Basing searches on character indices instead of line indices creates the possibility of subtle bugs.
2. Multi-line regexs are trickier than single-line searches.

Summary

I have moved today's experimental code to the attic (leoAttic.txt), where it will likely remain.

I'll merge PR #3330 soon. It's time to convert other importers!

Edward
Reply all
Reply to author
Forward
0 new messages