ENB: Redesigning the c++ and js importers

24 views
Skip to first unread message

Edward K. Ream

unread,
May 10, 2023, 6:53:05 AM5/10/23
to leo-editor
This Engineering Notebook post discusses improving Leo's importers for difficult-to-parse languages such as c++ and javascript. Issue #3327 has become urgent now that I have begun to study codon!

tl;dr: Aha: use helper lines to guide analysis.

Background

Leo's importers have a long history. We are on something like the fifth iteration of their design. Each iteration has been a step forward, but Leo's c++ and javascript need more work.

Definitions of c++ functions or methods may be arbitrarily complex. For example, processSource in codon/codon/app/main.cpp starts this way:

std::unique_ptr<codon::Compiler> processSource(
    const std::vector<const char *> &args, bool standalone,
    std::function<bool()> pyExtension = [] { return false; }) {

Note how {return false} appears inside the parameter list!

Aside: I wonder whether codon generated this file! It's certainly difficult to read: everything is over-qualified.

The problem

The importer must split lines into nodes. Every line must appear in exactly one generated node. The bodies of the resulting nodes must tile the original file.

Handling the file line-by-line ensures that the generated nodes tile the file. However, a line-oriented approach complicates analysis. I'll omit most of the details.

Leo's importers tokenize the file so that strings and comments do not confuse the analysis. Alas, handling tokens creates other complications. What are we to do?

Aha! Let's use helper lines to simplify the analysis. We'll create the helper lines as follows:

- Start with the lines from the original file.
- Remove comments and strings.
- Remove curly brackets associated with 'if', 'for', and 'while' statements.
- Check the result to ensure that parens and brackets are properly nested.

The resulting lines will be much easier to analyze. The importer can assume that any remaining top-level curly brackets start the body of a class, function, or struct. The tiling problem remains challenging but tractable.

Summary

I plan to rewrite the c++ importer as suggested above. Helper lines will likely eliminate the need for the usual tokenizer and state stack.

Edward

Edward K. Ream

unread,
May 10, 2023, 6:11:02 PM5/10/23
to leo-editor
On Wednesday, May 10, 2023 at 5:53:05 AM UTC-5 Edward K. Ream wrote:

> Aha! Let's use helper lines to simplify the analysis.

Wow. This Aha has collapsed the complexity of all parts of the C++ importer.

Without line states, the code generators are dead simple! Take a look at i_c.gen_lines in PR #3330.
 
We'll create the helper lines as follows:

- Start with the lines from the original file.
- Remove comments and strings.
- Remove curly brackets associated with 'if', 'for', and 'while' statements.
- Check the result to ensure that parens and brackets are properly nested.

The c_i.delete_comments_and_strings method handles all complexities involved in a sound, straightforward manner. The code generators see none of these complications!!!

As you can see, this method is completely general, so it will eventually migrate to the base Importer class.

Summary

I'll eventually rewrite almost all the importers to use this new design pattern.

This is the way importers are written in The Book.

Edward
Reply all
Reply to author
Forward
0 new messages