Testing Pythoscope on the Reverend project

Michał Kwiatkowski

unread,

Mar 13, 2010, 12:03:34 PM3/13/10

to pytho...@googlegroups.com

Hi group,

For the last few days I've been working on fixes and enhancements to
Pythoscope, so it is able to run a sample project and report basic
statistics regarding quality of the generated test suite, as described
in this blueprint:

https://blueprints.launchpad.net/pythoscope/+spec/open-source-project-coverage

Code currently resides in reverend-fixes branch on Launchpad:

https://code.launchpad.net/~ruby/pythoscope/reverend-fixes

The blueprint has been implemented as a script (see
tools/gather-metrics.py). I've chosen Reverend
(http://divmod.org/trac/wiki/DivmodReverend) as the sample project.

The script takes a source tar of Reverend, unpacks it, does pythoscope
--init, adds two points of entry and tells pythoscope to generate
tests for reverend/thomas.py module. After the generation is done nose
is run on those test cases. Number of generated test cases is reported
along with a coverage info.

Results for the reverend-fixes branch:

35 test cases:
19 passing
0 failing
16 stubs
69% coverage

Points of entry has been taken verbatim from the Reverend's README and
home page.

I'm not sure how to go about tracking those reports over time.

First, current Reverend test is pretty narrow, so some more work is
needed here. A more involved point of entry could be written, which
would touch more pieces of the project. There's also a Tk user
interface, which could be used to drive more exploratory test
generation. Another idea is to look at a different open source project
altogether.

Second, I don't think I need a fancy reporting app for now, I'm fine
with manually running this from time to time. Since everything is in a
repository anyway, we'll be able to generate historical data once we
need them.

Thoughts?

mk

Ryan Freckleton

unread,

Mar 13, 2010, 5:58:44 PM3/13/10

to pytho...@googlegroups.com

2010/3/13 Michał Kwiatkowski <consta...@gmail.com>:

> Hi group,
>
> For the last few days I've been working on fixes and enhancements to
> Pythoscope, so it is able to run a sample project and report basic
> statistics regarding quality of the generated test suite, as described
> in this blueprint:

Very cool!

>
> 35 test cases:
> 19 passing
> 0 failing
> 16 stubs
> 69% coverage
>

That seems pretty impressive for something done without any additional
analysis work. 69% coverage already!

> First, current Reverend test is pretty narrow, so some more work is
> needed here. A more involved point of entry could be written, which
> would touch more pieces of the project. There's also a Tk user
> interface, which could be used to drive more exploratory test
> generation. Another idea is to look at a different open source project
> altogether.

I think doing some exploratory function testing with the GUI as
pythoscope is running would be really cool and illustrative. Only
after we see how successful that is would I elaborate on entry points.

> Second, I don't think I need a fancy reporting app for now, I'm fine
> with manually running this from time to time. Since everything is in a
> repository anyway, we'll be able to generate historical data once we
> need them.

Sounds good to me. We want to be careful to occasionally spot check
the tests generated as well. Coverage metrics can provide only two
answers to "Is a test suite excellent?": "No" and "Maybe". As long as
we have the historical data we can go back and do this type of
analysis

> Thoughts?
>
> mk

This is fantastic, thanks Michał!

Ryan

Michał Kwiatkowski

unread,

Mar 14, 2010, 4:08:40 AM3/14/10

to pytho...@googlegroups.com

On Sat, Mar 13, 2010 at 11:58 PM, Ryan Freckleton
<ryan.fr...@gmail.com> wrote:
>> 35 test cases:
>> 19 passing
>> 0 failing
>> 16 stubs
>> 69% coverage
>
> That seems pretty impressive for something done without any additional
> analysis work. 69% coverage already!

Although I must note it's only for the reverend/thomas.py module.
Since there are no entry points that use other modules, the overall
(project-wide) coverage would be much lower. That's where the
exploratory testing comes in.

> I think doing some exploratory function testing with the GUI as
> pythoscope is running would be really cool and illustrative. Only
> after we see how successful that is would I elaborate on entry points.

OK, then I think a screencast is in order. I'll have to tweak
Pythoscope some more for that to be possible, so it may take some
time.

>> Second, I don't think I need a fancy reporting app for now, I'm fine
>> with manually running this from time to time. Since everything is in a
>> repository anyway, we'll be able to generate historical data once we
>> need them.
>
> Sounds good to me. We want to be careful to occasionally spot check
> the tests generated as well. Coverage metrics can provide only two
> answers to "Is a test suite excellent?": "No" and "Maybe". As long as
> we have the historical data we can go back and do this type of
> analysis

You're right, how the tests look like is also very important. I can
post the detailed coverage info and test module somewhere if you want
to check them out.

>> Thoughts?

>
> This is fantastic, thanks Michał!

:-)

Cheers,
mk

Paul Hildebrandt

unread,

Mar 14, 2010, 11:39:22 AM3/14/10

to pytho...@googlegroups.com

Very nice. Since I went to Titus' talk about continuous integration at Pycon I've been playing with Hudson a lot. I set it up at work to run my unit tests, coverage.py, and pylint on all commits on all my projects. It's keeps a nice history. I am not sure if you can use it for fancy graphs you may want but if you decide you want them it's would be worth looking at it.

Paul

--

Michał Kwiatkowski

unread,

Mar 14, 2010, 12:25:04 PM3/14/10

to pytho...@googlegroups.com

On Sun, Mar 14, 2010 at 4:39 PM, Paul Hildebrandt <Paul.Hil...@disneyanimation.com> wrote:

Very nice. Since I went to Titus' talk about continuous integration at Pycon I've been playing with Hudson a lot. I set it up at work to run my unit tests, coverage.py, and pylint on all commits on all my projects. It's keeps a nice history. I am not sure if you can use it for fancy graphs you may want but if you decide you want them it's would be worth looking at it.

Thanks for the suggestion. Right now I'm more concerned with the general problem of how to assess the quality of the generated tests. Dynamic inspection is very project-specific and depends strongly on code that has been executed, i.e. good points of entry or a good exploratory testing session will yield good results. OTOH tests generated based on static inspection are not very interesting, although there are some improvements waiting to be done in this area as well.

I guess we're stuck with manual assessment for now. That means Pythoscope needs more users. :-) I have to write a blog post or two to encourage people to check it out.

Cheers,

mk

Paul Hildebrandt

unread,

Mar 14, 2010, 1:55:07 PM3/14/10

to pytho...@googlegroups.com

On 3/14/2010 9:25 AM, Michał Kwiatkowski wrote:

On Sun, Mar 14, 2010 at 4:39 PM, Paul Hildebrandt <Paul.Hil...@disneyanimation.com> wrote:

Very nice. Since I went to Titus' talk about continuous integration at Pycon I've been playing with Hudson a lot. I set it up at work to run my unit tests, coverage.py, and pylint on all commits on all my projects. It's keeps a nice history. I am not sure if you can use it for fancy graphs you may want but if you decide you want them it's would be worth looking at it.

Thanks for the suggestion. Right now I'm more concerned with the general problem of how to assess the quality of the generated tests. Dynamic inspection is very project-specific and depends strongly on code that has been executed, i.e. good points of entry or a good exploratory testing session will yield good results. OTOH tests generated based on static inspection are not very interesting, although there are some improvements waiting to be done in this area as well.

When you say "quality of generated test" I want to make sure I understand what you mean. Are you talking about code style, clarity, coverage of source code, percent of side effects covered, or something else?

I guess we're stuck with manual assessment for now. That means Pythoscope needs more users. :-) I have to write a blog post or two to encourage people to check it out.

Good idea. Well, to that end I presented Pythoscope at the Southern California Python Users group on March 11th. There were about 20 people there. There were several people that were interested in trying it. You might have noticed the spike of access on the web site on that date.

Cheers,

mk

--
You received this message because you are subscribed to the Google Groups "Pythoscope" group.
To post to this group, send email to pytho...@googlegroups.com.
To unsubscribe from this group, send email to pythoscope+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pythoscope?hl=en.

Michał Kwiatkowski

unread,

Mar 14, 2010, 2:38:08 PM3/14/10

to pytho...@googlegroups.com

On Sun, Mar 14, 2010 at 6:55 PM, Paul Hildebrandt <Paul.Hil...@disneyanimation.com> wrote:

On 3/14/2010 9:25 AM, Michał Kwiatkowski wrote:

Thanks for the suggestion. Right now I'm more concerned with the general problem of how to assess the quality of the generated tests. Dynamic inspection is very project-specific and depends strongly on code that has been executed, i.e. good points of entry or a good exploratory testing session will yield good results. OTOH tests generated based on static inspection are not very interesting, although there are some improvements waiting to be done in this area as well.
When you say "quality of generated test" I want to make sure I understand what you mean. Are you talking about code style, clarity, coverage of source code, percent of side effects covered, or something else?

Yes, and more: number of test cases generated, length of those test cases, how helpful the stubs are with their template code in comments, are the test cases put in the right place, are there many duplicates, etc. It's actually the heart of the problem I'm facing: how to assess the quality of a test suite, both in general and in context of those test cases being generated by an automated tool. In other words, I'm trying to find quality metrics for unit test suites, so eventually I could apply them to Pythoscope.

One design principle of Pythoscope is to generate test cases that should be readable to a human, and resembling a manually written unit test as much as possible, so the developer feels right at home when she wants to modify them. It also shows the right way to do things for people who want to start unit testing their projects. This principle is visible in such areas as naming of test cases and the whole approach of using unit tests, instead of something else, like data-driven tests. While I feel it's an important feature, it's really hard to measure.

That's why I think that, for the time being, I need community guidance on what Pythoscope does right and what it does wrong. Of course I'll continue improving the inspection and generation capabilities, but I could really use some help on the usability front. My recent experiences with Reverend has been revealing, but I'm well aware how diverse the Python ecosystem is, and there's simply no way for a single person to know even a half of it. I don't expect to get most of this stuff right the first time, so the more feedback I get the better. What I'm trying to say is that in order for Pythoscope to move forward, it needs both core development as well as people actually using (and breaking) it.

Feedback that I got so far from people like Paul, Pieter or Ryan has been extremely helpful and I'm very grateful for that. I think Paul's code snippet idea that I implemented some time ago can really make it easier for people to check out dynamic inspection and thus show the real strengths of Pythoscope. I need more ideas like this. :-) As an author of Pythoscope I'm flawed, because I know exactly how it works, so I'm blind to some of the issues.

I guess we're stuck with manual assessment for now. That means Pythoscope needs more users. :-) I have to write a blog post or two to encourage people to check it out.

Good idea. Well, to that end I presented Pythoscope at the Southern California Python Users group on March 11th. There were about 20 people there. There were several people that were interested in trying it. You might have noticed the spike of access on the web site on that date.

This is great, thanks! Do you have any slides from the presentation that we could put on pythoscope.org?