Statement coverage is the weakest measure of code coverage. It can't tell you when an if statement is missing an else clause ("branch coverage"); when a condition is only tested in one direction ("condition coverage"); when a loop is always taken and never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17] <URL:http://www.kaner.com/pnsqc.html> for a summary of test coverage measures.
So, measuring "coverage of executed statements" reports complete coverage incorrectly for an inline branch like 'foo if bar else baz', or a 'while' statement, or a 'lambda' statement. The coverage is reported complete if these statements are executed at all, but no check is done for the 'else' clause, or the "no iterations" case, or the actual code inside the lambda expression.
What approach could we take to improve 'coverage.py' such that it *can* instrument and report on all branches within the written code module, including those hidden inside multi-part statements?
-- \ "Technology is neither good nor bad; nor is it neutral." | `\ —Melvin Kranzberg's First Law of Technology | _o__) | Ben Finney
> Statement coverage is the weakest measure of code coverage. It > can't tell you when an if statement is missing an else clause > ("branch coverage"); when a condition is only tested in one > direction ("condition coverage"); when a loop is always taken and > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17] > <URL:http://www.kaner.com/pnsqc.html> for a summary of test > coverage measures.
> So, measuring "coverage of executed statements" reports complete > coverage incorrectly for an inline branch like 'foo if bar else baz', > or a 'while' statement, or a 'lambda' statement. The coverage is > reported complete if these statements are executed at all, but no > check is done for the 'else' clause, or the "no iterations" case, or > the actual code inside the lambda expression.
> What approach could we take to improve 'coverage.py' such that it > *can* instrument and report on all branches within the written code > module, including those hidden inside multi-part statements?
> -- > \ "Technology is neither good nor bad; nor is it neutral." | > `\ -Melvin Kranzberg's First Law of Technology | > _o__) | > Ben Finney
Well, having used it for Python FIT, I've looked at some if its deficiencies. Not enough to do anything about it (although I did submit a patch to a different coverage tool), but enough to come to a few conclusions.
There are two primary limitations: first, it runs off of the debug or trace hooks in the Python kernel, and second, it's got lots of little problems due to inconsistencies in the way the compiler tools generate parse trees.
It's not like there are a huge number of ways to do coverage. At the low end you just count the number of times you hit a specific point, and then analyze that.
At the high end, you write a trace to disk, and analyze that.
Likewise, on the low end you take advantage of existing hooks, like Python's debug and trace hooks, on the high end you instrument the program yourself, either by rewriting it to put trace or count statements everywhere, or by modifying the bytecode to do the same thing.
If I was going to do it, I'd start by recognizing that Python doesn't have hooks where I need them, and it doesn't have a byte code dedicated to a debugging hook (I think). In other words, the current coverage.py tool is getting the most out of the available hooks: the ones we really need just aren't there.
I'd probably opt to rewrite the programs (automatically, of course) to add instrumentation statements. Then I could wallow in data to my heart's content.
One last little snark: how many of us keep our statement coverage above 95%? Statement coverage may be the weakest form of coverage, but it's also the simplest to handle.
John Roth <JohnRo...@jhrothjr.com> writes: > On Oct 28, 4:56 pm, Ben Finney <b...@benfinney.id.au> wrote: > > What approach could we take to improve 'coverage.py' such that it > > *can* instrument and report on all branches within the written > > code module, including those hidden inside multi-part statements?
> If I was going to do it, I'd start by recognizing that Python > doesn't have hooks where I need them, and it doesn't have a byte > code dedicated to a debugging hook (I think).
Is this something that Python could be improved by adding? Perhaps there's a PEP in this.
> One last little snark: how many of us keep our statement coverage > above 95%? Statement coverage may be the weakest form of coverage, > but it's also the simplest to handle.
Yes, I have several projects where statement coverage of unit tests is 98% or above. The initial shock of running 'coverage.py' is in seeing just how low one's coverage actually is; but it helpfully points out the exact line numbers of the statements that were not tested.
Once you're actually measuring coverage as part of the development process (e.g. set up a rule so 'make coverage' does it automatically), it's pretty easy to see the holes in coverage and either write the missing unit tests or (even better) refactor the code so the redundant statements aren't there at all.
-- \ "I'd like to see a nude opera, because when they hit those high | `\ notes, I bet you can really see it in those genitals." -- Jack | _o__) Handey | Ben Finney
> Statement coverage is the weakest measure of code coverage. It > can't tell you when an if statement is missing an else clause > ("branch coverage"); when a condition is only tested in one > direction ("condition coverage"); when a loop is always taken and > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17] > <URL:http://www.kaner.com/pnsqc.html> for a summary of test > coverage measures.
> So, measuring "coverage of executed statements" reports complete > coverage incorrectly for an inline branch like 'foo if bar else baz', > or a 'while' statement, or a 'lambda' statement. The coverage is > reported complete if these statements are executed at all, but no > check is done for the 'else' clause, or the "no iterations" case, or > the actual code inside the lambda expression.
> What approach could we take to improve 'coverage.py' such that it > *can* instrument and report on all branches within the written code > module, including those hidden inside multi-part statements?
I used to write once a coverage tool ( maybe I can factor this out of my tool suite some time ) which is possibly transformative. Currently it generates measurement code for statement coverage and i'm not sure it has more capabilities than coverage.py because I was primary interested in the code generation and monitoring process, so I didn't compare.
Given it's nature it might act transformative. So a statement:
if a and b: BLOCK
can be transformed into
if a: if b: BLOCK
Also
if a or b: BLOCK
might be transformed into
if a: BLOCK elif b: BLOCK
So boolean predicates are turned into statements and statement coverage keeps up. This is also close to the way bytecode works expressing "and" | "or" predicates using jumps. I'm not sure about expressions yet, since I did not care about expression execution but traces.
The underlying monitoring technology needs to be advanced. I used a similar approach for an even more interesting purpose of feeding runtime type information back into a cloned parse tree of the initial tree which might be unparsed to type annotated source code after program execution. But that's another issue.
The basic idea of all those monitorings is as follows: implement an identity function with a side effect. I'm not sure how this monitoring code conflicts with rather deep reflection ( stacktrace inspection etc. )
Kay Schluehr <kay.schlu...@gmx.net> writes: > I used to write once a coverage tool ( maybe I can factor this out > of my tool suite some time )
That'd be wonderful. I'd like to see comparisons between different test-coverage tools, just as we have the different but comparable 'pyflakes' and 'pylint' code inspection tools.
> Given it's nature it might act transformative. So a statement:
> if a and b: > BLOCK
> can be transformed into
> if a: > if b: > BLOCK
I don't see that this actually helps in the cases described in the original post. The lack of coverage checking isn't "are both sides of an 'and' or 'or' expression evaluated", since that's the job of the language runtime, and is outside the scope of our unit test.
what needs to be tested is "do the tests execute both the 'true' and 'false' branches of this 'if' statement", or "do the tests exercise the 'no iterations' case for this loop", et cetera. That is, whether all the functional branches are exercised by tests, not whether the language is parsed correctly.
-- \ "Know what I hate most? Rhetorical questions." -- Henry N. Camp | `\ | _o__) | Ben Finney
> Kay Schluehr <kay.schlu...@gmx.net> writes: > > I used to write once a coverage tool ( maybe I can factor this out > > of my tool suite some time )
> That'd be wonderful. I'd like to see comparisons between different > test-coverage tools, just as we have the different but comparable > 'pyflakes' and 'pylint' code inspection tools.
> > Given it's nature it might act transformative. So a statement:
> > if a and b: > > BLOCK
> > can be transformed into
> > if a: > > if b: > > BLOCK
> I don't see that this actually helps in the cases described in the > original post. The lack of coverage checking isn't "are both sides of > an 'and' or 'or' expression evaluated", since that's the job of the > language runtime, and is outside the scope of our unit test.
> what needs to be tested is "do the tests execute both the 'true' and > 'false' branches of this 'if' statement", or "do the tests exercise > the 'no iterations' case for this loop", et cetera. That is, whether > all the functional branches are exercised by tests, not whether the > language is parsed correctly.
You are right. I re-read my coverage tool documentation and found also the correct expansion for the statement
if a and b: BLOCK
which is:
if a: if b: BLOCK else: BLOCK else: BLOCK
This will cover all relevant traces. The general idea still holds.
Note I would like to see some kind of requirement specification ( a PEP style document ) of different coverage purposes and also a test harness. I'm all for advancing Python and improve the code base not just accidentally. Something in the way of an MVC framework was nice in addition which implements UI functions independently s.t. the basic coverage functionality can be factored out into components and improved separately. I do not think it's a good idea to have 10 coverage tools that handle presentation differently.
> Kay Schluehr <kay.schlu...@gmx.net> writes: > > I used to write once a coverage tool ( maybe I can factor this out > > of my tool suite some time )
> That'd be wonderful. I'd like to see comparisons between different > test-coverage tools, just as we have the different but comparable > 'pyflakes' and 'pylint' code inspection tools.
> > Given it's nature it might act transformative. So a statement:
> > if a and b: > > BLOCK
> > can be transformed into
> > if a: > > if b: > > BLOCK
> I don't see that this actually helps in the cases described in the > original post. The lack of coverage checking isn't "are both sides of > an 'and' or 'or' expression evaluated", since that's the job of the > language runtime, and is outside the scope of our unit test.
> what needs to be tested is "do the tests execute both the 'true' and > 'false' branches of this 'if' statement", or "do the tests exercise > the 'no iterations' case for this loop", et cetera. That is, whether > all the functional branches are exercised by tests, not whether the > language is parsed correctly.
Since 'and' and 'or' are short-circuit evaluations, you do need something to determine if each piece was actually executed. Turning it into an if-else construct would do this nicely.
I don't know how to extend coverage.py to do more extensive checking, but I know it would be both difficult and fascinating. To help spur some thought, I've sketched out some problems with statement coverage: http://nedbatchelder.com/blog/20071030T084100.html
--Ned.
On Oct 28, 6:56 pm, Ben Finney <b...@benfinney.id.au> wrote:
> Statement coverage is the weakest measure of code coverage. It > can't tell you when an if statement is missing an else clause > ("branch coverage"); when a condition is only tested in one > direction ("condition coverage"); when a loop is always taken and > never skipped ("loop coverage"); and so on. See [Kaner 2000-10-17] > <URL:http://www.kaner.com/pnsqc.html> for a summary of test > coverage measures.
> So, measuring "coverage of executed statements" reports complete > coverage incorrectly for an inline branch like 'foo if bar else baz', > or a 'while' statement, or a 'lambda' statement. The coverage is > reported complete if these statements are executed at all, but no > check is done for the 'else' clause, or the "no iterations" case, or > the actual code inside the lambda expression.
> What approach could we take to improve 'coverage.py' such that it > *can* instrument and report on all branches within the written code > module, including those hidden inside multi-part statements?
> -- > \ "Technology is neither good nor bad; nor is it neutral." | > `\ -Melvin Kranzberg's First Law of Technology | > _o__) | > Ben Finney