For Vitalije: About unit tests for the python importer

70 views
Skip to first unread message

Edward K. Ream

unread,
Dec 6, 2021, 11:35:03 AM12/6/21
to leo-editor
In this thread I'll discuss issues related to unit tests for the python importer.

The unit tests will change

I've said that the new importer must pass all existing unit tests, but that's not exactly true, for the following reasons:

1. The new importer won't necessarily generate the same nodes.
2. I shall soon change some of the existing unit tests.

The real requirements:

1. The new importer must pass all perfect-import tests.
2. The new importer must generate reasonable nodes, preferably by using the existing python post pass.

The structure of typical unit tests

Let's use test_basic_nesting_1 as an example.  It is:

def test_basic_nesting_1(self):

    s = """
        import sys
        def f1():
            pass

        class Class1:
            def method11():
                pass
            def method12():
                pass
                
        a = 2
        
        def f2():
            pass
    
        # An outer comment
        @myClassDecorator
        class Class2:
            @myDecorator
            def method21():
                pass
            def method22():
                pass
                
        # About main.
        def main():
            pass
    
        if __name__ == '__main__':
            main()
    """
    p = self.run_test(s, verbose=False)
    self.check_headlines(p, (
        (1, 'Organizer: Declarations'),
        (1, 'f1'),
        (1, 'class Class1'),
        (2, 'method11'),
        (2, 'method12'),
        (1, 'Organizer: a = 2'),
        (1, 'f2'),
        (1, 'class Class2'),
        (2, 'method21'),
        (2, 'method22'),
        (1, 'main'),
    ))


This is an important test. It verifies that common python code will import correctly, and will have the expected structure. The line:

p = self.run_test(s, verbose=False)

contains the perfect import test.  That is, BaseTestImporter.run_test will fail if writing p does not generate (modulo blank lines) the string s.

The call to BaseTestImporter.check_headlines has the form:

self.check_headlines(p, (
  << table of tuples (node level, node headline) >>
))

Clearly, the table may have to change. During early development, I would be OK with disabling such calls altogether.

However, it's important to handle decorators correctly!  They should not reside in their own nodes.

Converting old-style structure tests

A cff reveals that 17 unit tests should be converted to call check_headlines. At present, they use clumsy, by-hand code that is roughly equivalent.  I'll do the conversion today or tomorrow and push the result to devel.

Summary

BaseTestImporter.run_test verifies perfect importer.

BaseTestImporter.check_headlines runs only if the perfect-import checks pass. That table that check_headlines uses may have to change. That's fine with me, provided that decorators are placed correctly.

I shall soon convert all tests so that they use check_headlines rather than the more verbose equivalent.

All questions and comments are welcome.

Edward

Edward K. Ream

unread,
Dec 6, 2021, 4:01:02 PM12/6/21
to leo-editor
On Monday, December 6, 2021 at 10:35:03 AM UTC-6 Edward K. Ream wrote:

> I shall soon convert all tests so that they use check_headlines rather than the more verbose equivalent.

Done in devel at rev c65459. In the tests I have marked headlines that should not be generated, or have poor headlines.  Vitalije, feel free to adjust the arguments so the call passes.

Edward

vitalije

unread,
Dec 6, 2021, 4:36:59 PM12/6/21
to leo-editor
Sorry, I've missed your two posts.

I've been working on new branch `importers` today.

While looking at the importer pipeline, I've noticed several conversion text to lines and backwards. First split is in checking tabs and spaces, if there are mixed tabs and spaces, again text is turned into lines, and back to text. Finally in gen_lines the same input text is once again divided in lines. So, I've changed all methods to accept lines as input argument instead of text. There were several places to change, in almost all importers that override gen_lines, but changes were minimal.

All unit tests passed except two related to leoAst that fail.

After that I've marked all test methods in test_leoImport.TestPython for skipping and I've started to add my own tests.
The old tests were checking only on headlines. Using new utility method for checking outline:

def check_outline(self, p, nodes):
    it = iter(nodes)
    zlev = p.level()
    for p1 in p.self_and_subtree():
        lev, h, b = next(it)
        assert p1.level()-zlev == lev, f'lev:{p1.level()-zlev} != {lev}'
        if lev > 0:
            assert p1.h == h, f'"{p1.h}" != "{h}"'
        assert p1.b == b, f'\n{repr(p1.b)} !=\n{repr(b)}'
    try:
        next(it)
        return False, 'extra nodes'
    except StopIteration:
        return True, 'ok'

Now, both structure and content are checked.
After I've added 6 new tests with some corner cases that I've found to be interesting, I've deleted all other old tests in TestPython class.

You can check the tests and suggest any new test case. If you just provide Leo outline for interesting cases, I can easily turn them into test methods.
From outline it is easy to get source text, and expected node structure. All new tests have same form:

def test_something(self):
    txt = '''sourcecode'''
    exp_nodes = [(lev, h, b),...]
    p = self.run_test(txt)
    ok, msg = self.check_outline(p, exp_nodes)
    assert ok, msg

Look at the branch `importers` and we can discus what I've accomplished so far. I don't think there will be any serious problems in the future. Function split_root, should have some more comments and perhaps better var names, before PR is ready, but all remaining tweaks should be easy.

Let me know what do you think about the results of new python importer.
Vitalije

tbp1...@gmail.com

unread,
Dec 6, 2021, 9:03:02 PM12/6/21
to leo-editor
I just imported a Python file using the current devel branch (so not Vitaije's new code).  The file has a if __name__ == '__main__': section.  This section is located at the end of the imported file.  I have no problem with that location even though the older importer put it near the top.  But it is not in its own node, it's just at the end of the last def node. I would like it to be in its own node tree, which might have the top node headline of  __main__ or similar.  Also, a __main__ section can sometimes be longish, and if so it should be decomposed into nodes like any other part of the file.

Edward K. Ream

unread,
Dec 7, 2021, 3:03:33 AM12/7/21
to leo-editor
On Mon, Dec 6, 2021 at 6:48 PM vitalije <vita...@gmail.com> wrote:

I've been working on new branch `importers` today.

I just checked out the importers branch. It looks great!

I've changed all methods to accept lines as input argument instead of text.

Good.

The old tests were checking only on headlines...
Now, both structure and content are checked.

I see that you represent expected outlines as a nested list structure.  That's simple and good.

As an alternative, BaseTestImporter.create_expected_outline creates an expected outline from the MORE representation (a string). I mention this only for reference. Your way is clear enough.

BaseTestImporter.compare_outlines compares the actual and expected outlines. Imo, it would be better to use this (or something like it) rather than the check_outline function.  The assert* methods of unittest.TestCase provide better reports than python's assert statement.

Leo's existing test framework works well with the assert* methods, so the conversion should be easy.

After I've added 6 new tests with some corner cases that I've found to be interesting, I've deleted all other old tests in TestPython class.

Good. The old unit tests were feeble and mostly redundant. Imo, only a few tests will likely be needed to test all the interesting cases.  A few more tests may be needed to get to 100% coverage.

Summary

Keep up the great work :-)

Edward

P.S. I see that the new python importer refers to linescanner.Target and defines a Python_ScanState class. Eventually, we would like both to go away. I'll experiment with making them optional.

EKR

vitalije

unread,
Dec 7, 2021, 7:30:39 AM12/7/21
to leo-editor
On Tuesday, December 7, 2021 at 9:03:33 AM UTC+1 Edward K. Ream wrote:

 The assert* methods of unittest.TestCase provide better reports than python's assert statement.
 
Leo's existing test framework works well with the assert* methods, so the conversion should be easy.


Have you tried using pytest as runner. The report it generates on a simple python assert statement with somewhat complex expressions is astonishing. Much better than any report TestCase.assert would produce. For example try this:
def test_dummy():
   a = [1, 2, ['a', 'b', 'f', 'd', 'e', 'c'], 4, 6]
   b = [1, 2, ['a', 'b', 'c', 'd', 'e', 'f'], 4, 5]

   assert a == b

Running pytest -v on this function I get the following report:


    def test_dummy():
        a = [1, 2, ['a', 'b', 'f', 'd', 'e', 'c'], 4, 6]
        b = [1, 2, ['a', 'b', 'c', 'd', 'e', 'f'], 4, 5]
    
>       assert a == b
E       AssertionError: assert [1, 2, ['a', ...', 'c'], 4, 6] == [1, 2, ['a', ...', 'f'], 4, 5]
E         At index 2 diff: ['a', 'b', 'f', 'd', 'e', 'c'] != ['a', 'b', 'c', 'd', 'e', 'f']
E         Full diff:
E         - [1, 2, ['a', 'b', 'c', 'd', 'e', 'f'], 4, 5]
E         ?                    ^              ^       ^
E         + [1, 2, ['a', 'b', 'f', 'd', 'e', 'c'], 4, 6]
E         ?                    ^              ^       ^

It makes it easy to spot where the difference exactly is without the need of writing special formatting code by hand. In another thread I've read that you don't like using pytest because it generates too much output. And yes I've tried to run all unit tests using pytest and there were huge amount of failures. But most of them were caused by an assertion inside Leo core, during the test setup phase. I think this is caused by executing `TestSyntax.test_that_leo_starts`. When I excluded this test with self.skipTest(), pytest finds exactly the same number of failures as unittest. In fact it has found one more. In test_leoImport.TestCython class, there was no at-others and the only method was not part of the class, so it never run.

Commit 2492a13f1 fixes both of this.

The methods compare_outlines and create_expected_outline rely on vnode_info which split_root doesn't populate.
 
P.S. I see that the new python importer refers to linescanner.Target and defines a Python_ScanState class. Eventually, we would like both to go away. I'll experiment with making them optional.


There is another thing that bothers me with the present code. All importers are forced to export a scanner class, even if they don't use one. Those exported classes are gathered in g.app.atAutoDict and g.app.classDispatchDict. But then, later methods app.body_parser_for_ext, app.scanner_for_ext and app.scanner_for_at_auto each of them create a wrapper function that uses this importer class and returns it as a simple function.

By generating those wrappers in the leoApp, unnecessary coupling exists between importer modules and LeoApp. LeoApp must now about internal affairs of importer and also importers must create class and even override some unused methods even if they don't need a class. It would be much cleaner if importers export a function and not a class. That would much simpler (in fact a minimal) API. Every importer must export just one function f(p, txt) which should create outline from given text in the given position. How each of the importers achieve this is entirely up to them. Of course I don't intend to rewrite all importers, just to change what they export and how LeoApp uses that what is exported.

######## in LeoApp #####
# instead of
def get_importer_for(key):
    aClass = mydict.get(key)
    def f(s, parent):
        return aClass().run(s, parent)
    return f
# we could simply use
def get_importer_for(key):
    return mydict.get(key)

##### in X_Importer ####
# instead of
importer_dict = {
    'class': X_Importer,
    'extensions': [...],
}
# we could use
importer_dict = { 'func': X_Importer.do_import, 'extensions':[...] }

### and in the base scanner class
    ...
    @staticmethod
    def do_import(aClass):
        def f(s, parent):
            return aClass().run(s, parent)
        return f




Edward K. Ream

unread,
Dec 7, 2021, 11:21:35 AM12/7/21
to leo-editor
On Tue, Dec 7, 2021 at 6:30 AM vitalije <vita...@gmail.com> wrote:

> Have you tried using pytest as a runner.

Yes.  g.run_coverage_tests uses pytest. But for "plain" testing (when finding out why tests fail) I prefer unittest. pytest appears to swallow traces, and I also prefer unittest's style of reporting.

Edward

vitalije

unread,
Dec 7, 2021, 1:09:40 PM12/7/21
to leo-editor
> A few more tests may be needed to get to 100% coverage.

Done at 6282e76a62 in importers branch. Tests for python importer cover 100% code, and all succeed.
Vitalije

vitalije

unread,
Dec 7, 2021, 3:10:20 PM12/7/21
to leo-editor
I've added more comments in the code and have created PR to merge importers into devel branch.

Vitalije

Edward K. Ream

unread,
Dec 8, 2021, 8:49:30 AM12/8/21
to leo-editor
On Tue, Dec 7, 2021 at 12:09 PM vitalije <vita...@gmail.com> wrote:
> A few more tests may be needed to get to 100% coverage.

Done at 6282e76a62 in importers branch. Tests for python importer cover 100% code, and all succeed.

Thank you!

Edward

tbp1...@gmail.com

unread,
Dec 8, 2021, 9:42:49 AM12/8/21
to leo-editor
I used last night's import branch to import a file that has a lot of decorators, and it seemed to work just as I would want.  In particular, the decorator statements were in the right places.

Edward K. Ream

unread,
Dec 8, 2021, 9:43:45 AM12/8/21
to leo-editor
On Tue, Dec 7, 2021 at 2:10 PM vitalije <vita...@gmail.com> wrote:
I've added more comments in the code and have created PR to merge importers into devel branch.

Thanks for this. This is PR #2353. I have assigned it to Leo 6.6.

In general, I like what I see:

- The simpler interface between Leo (app.scanner_for_at_auto & app. scanner_for_ext) and the do_import method in each scanner. This change accounts for most of the 33 changed files.

- Importers use a lines-based interface by default, rather than a string-based interface. This affects mostly linescanner.py, but there are simple related changes in some, but not all, of the importers.

Here are my remaining conditions of satisfaction for this PR:

- The new importer should use the vnode_dict interface. This will allow the importer to connect to the post-pass.

- The new importer should import all of mypy's .py files, including all test files. At rev 7b109931aa, the importer crashes for 81 mypy files.  There is no indication elsewhere that the crash has occurred. There probably should be.

The ekr-improve-old-importer branch will contain new post pass of the old importer. Unless I am mistaken, the new importer should benefit from this work. At any rate, I have questions about my own code that I want to resolve.

Edward

Edward K. Ream

unread,
Dec 8, 2021, 9:48:18 AM12/8/21
to leo-editor
On Wed, Dec 8, 2021 at 8:42 AM tbp1...@gmail.com <tbp1...@gmail.com> wrote:
I used last night's import branch to import a file that has a lot of decorators, and it seemed to work just as I would want.  In particular, the decorator statements were in the right places.

Good to know.

Edward

Edward K. Ream

unread,
Dec 8, 2021, 9:49:29 AM12/8/21
to leo-editor
On Mon, Dec 6, 2021 at 8:03 PM tbp1...@gmail.com <tbp1...@gmail.com> wrote:
I just imported a Python file using the current devel branch (so not Vitaije's new code).  The file has a if __name__ == '__main__': section.  This section is located at the end of the imported file.  I have no problem with that location even though the older importer put it near the top.  But it is not in its own node, it's just at the end of the last def node. I would like it to be in its own node tree, which might have the top node headline of  __main__ or similar.  Also, a __main__ section can sometimes be longish, and if so it should be decomposed into nodes like any other part of the file.

Thanks for this report. It's on my own list of things to do in the post-pass.

Edward

Edward K. Ream

unread,
Dec 8, 2021, 9:56:08 AM12/8/21
to leo-editor
On Wednesday, December 8, 2021 at 8:43:45 AM UTC-6 Edward K. Ream wrote:

At rev 7b109931aa, the importer crashes for 81 mypy files.  There is no indication elsewhere that the crash has occurred. There probably should be.

All 81 failures have the same underlying cause, namely calls to tokenenize.generate_tokens. There are two such calls in the new importer. Depending on the source code, the calls raise either IndentationError or tokenize.TokenError.

Edward

vitalije

unread,
Dec 8, 2021, 11:52:38 AM12/8/21
to leo-editor
Today, I've completely rewritten split_root function. This was necessary to allow importing files from mypy repository.
Rev ee7ff9c51f8a7 successfully imports all python files from mypy repository except one. One of the files (misc/upload-pypi.py) has a line in the comments that looks like section reference and Leo can't produce same output as is the input, because the section is missing.

Regarding your condition that new importer must use vnode_info, IMHO that would be just a make-work. There is no need for any post pass because new importer deals with decorators and comment lines as well as blank lines, and it adds at-language and at-tabwidth directives.

I plan only to clean up a little this new version and to add some comments and explanations. In case you find some other input that new importer can't import, I'll try my best to fix the issue. But I don't plan to use vnode_info. If you really wish it, you can add it yourself. It would be easy to reverse engineer vnode_info from the resulting outline, but I have zero interest doing that.

I'll write an ENB on the subject when I finish cleaning and documenting code.

Edward K. Ream

unread,
Dec 8, 2021, 1:47:09 PM12/8/21
to leo-editor
On Wed, Dec 8, 2021 at 10:52 AM vitalije <vita...@gmail.com> wrote:
Today, I've completely rewritten split_root function. This was necessary to allow importing files from mypy repository.
Rev ee7ff9c51f8a7 successfully imports all python files from mypy repository except one. One of the files (misc/upload-pypi.py) has a line in the comments that looks like section reference and Leo can't produce same output as is the input, because the section is missing.

Excellent.  Well done.

Regarding your condition that new importer must use vnode_info, IMHO that would be just a make-work. There is no need for any post pass because new importer deals with decorators and comment lines as well as blank lines, and it adds at-language and at-tabwidth directives.

I'll agree with you for now. We can revisit this question later if need be.
I plan only to clean up a little this new version and to add some comments and explanations. In case you find some other input that new importer can't import, I'll try my best to fix the issue.

Let me know when you want to merge the importers branch into devel.

But I don't plan to use vnode_info. If you really wish it, you can add it yourself. It would be easy to reverse engineer vnode_info from the resulting outline, but I have zero interest doing that.

No problem. I think the issue will remain moot.
I'll write an ENB on the subject when I finish cleaning and documenting code.

Thanks. I'll look forward to it.

Edward

Edward K. Ream

unread,
Dec 8, 2021, 2:10:48 PM12/8/21
to leo-editor
On Wednesday, December 8, 2021 at 10:52:38 AM UTC-6 vitalije wrote:
Today, I've completely rewritten split_root function. This was necessary to allow importing files from mypy repository.
Rev ee7ff9c51f8a7 successfully imports all python files from mypy repository except one.

I've just imported all the files. The code looks excellent!!

There is only one easy change I would like to see.  I would like headlines for the top-level class node to start with 'class '.

Everything else looks perfect.

Edward

vitalije

unread,
Dec 8, 2021, 3:14:11 PM12/8/21
to leo-editor
Done at 117a3bb52.

It adds support for new configuration value `@bool put-class-in-imported-headlines`
if this flag is True, then all imported class nodes have prepended `class ` to their
headline.

Edward K. Ream

unread,
Dec 8, 2021, 3:17:16 PM12/8/21
to leo-editor
On Wed, Dec 8, 2021 at 2:14 PM vitalije <vita...@gmail.com> wrote:
Done at 117a3bb52.

It adds support for new configuration value `@bool put-class-in-imported-headlines`
if this flag is True, then all imported class nodes have prepended `class ` to their
headline.

Many thanks for this. I have just approved the PR.

Edward

vitalije

unread,
Dec 8, 2021, 3:21:15 PM12/8/21
to leo-editor
I've just merged PR 2353 in the devel

vitalije

unread,
Dec 8, 2021, 3:21:47 PM12/8/21
to leo-editor
And the ENB is in another thread.

Edward K. Ream

unread,
Dec 8, 2021, 3:22:29 PM12/8/21
to leo-editor
On Wed, Dec 8, 2021 at 2:21 PM vitalije <vita...@gmail.com> wrote:
I've just merged PR 2353 in the devel

Excellent. 6.6 b2, cominging in a few days, will contain this work.

Edward
Reply all
Reply to author
Forward
0 new messages