For the last few days I've been working on fixes and enhancements to
Pythoscope, so it is able to run a sample project and report basic
statistics regarding quality of the generated test suite, as described
in this blueprint:
https://blueprints.launchpad.net/pythoscope/+spec/open-source-project-coverage
Code currently resides in reverend-fixes branch on Launchpad:
https://code.launchpad.net/~ruby/pythoscope/reverend-fixes
The blueprint has been implemented as a script (see
tools/gather-metrics.py). I've chosen Reverend
(http://divmod.org/trac/wiki/DivmodReverend) as the sample project.
The script takes a source tar of Reverend, unpacks it, does pythoscope
--init, adds two points of entry and tells pythoscope to generate
tests for reverend/thomas.py module. After the generation is done nose
is run on those test cases. Number of generated test cases is reported
along with a coverage info.
Results for the reverend-fixes branch:
35 test cases:
19 passing
0 failing
16 stubs
69% coverage
Points of entry has been taken verbatim from the Reverend's README and
home page.
I'm not sure how to go about tracking those reports over time.
First, current Reverend test is pretty narrow, so some more work is
needed here. A more involved point of entry could be written, which
would touch more pieces of the project. There's also a Tk user
interface, which could be used to drive more exploratory test
generation. Another idea is to look at a different open source project
altogether.
Second, I don't think I need a fancy reporting app for now, I'm fine
with manually running this from time to time. Since everything is in a
repository anyway, we'll be able to generate historical data once we
need them.
Thoughts?
mk
Very cool!
>
> 35 test cases:
> 19 passing
> 0 failing
> 16 stubs
> 69% coverage
>
That seems pretty impressive for something done without any additional
analysis work. 69% coverage already!
> First, current Reverend test is pretty narrow, so some more work is
> needed here. A more involved point of entry could be written, which
> would touch more pieces of the project. There's also a Tk user
> interface, which could be used to drive more exploratory test
> generation. Another idea is to look at a different open source project
> altogether.
I think doing some exploratory function testing with the GUI as
pythoscope is running would be really cool and illustrative. Only
after we see how successful that is would I elaborate on entry points.
> Second, I don't think I need a fancy reporting app for now, I'm fine
> with manually running this from time to time. Since everything is in a
> repository anyway, we'll be able to generate historical data once we
> need them.
Sounds good to me. We want to be careful to occasionally spot check
the tests generated as well. Coverage metrics can provide only two
answers to "Is a test suite excellent?": "No" and "Maybe". As long as
we have the historical data we can go back and do this type of
analysis
> Thoughts?
>
> mk
This is fantastic, thanks Michał!
Ryan
Although I must note it's only for the reverend/thomas.py module.
Since there are no entry points that use other modules, the overall
(project-wide) coverage would be much lower. That's where the
exploratory testing comes in.
> I think doing some exploratory function testing with the GUI as
> pythoscope is running would be really cool and illustrative. Only
> after we see how successful that is would I elaborate on entry points.
OK, then I think a screencast is in order. I'll have to tweak
Pythoscope some more for that to be possible, so it may take some
time.
>> Second, I don't think I need a fancy reporting app for now, I'm fine
>> with manually running this from time to time. Since everything is in a
>> repository anyway, we'll be able to generate historical data once we
>> need them.
>
> Sounds good to me. We want to be careful to occasionally spot check
> the tests generated as well. Coverage metrics can provide only two
> answers to "Is a test suite excellent?": "No" and "Maybe". As long as
> we have the historical data we can go back and do this type of
> analysis
You're right, how the tests look like is also very important. I can
post the detailed coverage info and test module somewhere if you want
to check them out.
>> Thoughts?
>
> This is fantastic, thanks Michał!
:-)
Cheers,
mk
Very nice. Since I went to Titus' talk about continuous integration at Pycon I've been playing with Hudson a lot. I set it up at work to run my unit tests, coverage.py, and pylint on all commits on all my projects. It's keeps a nice history. I am not sure if you can use it for fancy graphs you may want but if you decide you want them it's would be worth looking at it.
On Sun, Mar 14, 2010 at 4:39 PM, Paul Hildebrandt <Paul.Hil...@disneyanimation.com> wrote:Very nice. Since I went to Titus' talk about continuous integration at Pycon I've been playing with Hudson a lot. I set it up at work to run my unit tests, coverage.py, and pylint on all commits on all my projects. It's keeps a nice history. I am not sure if you can use it for fancy graphs you may want but if you decide you want them it's would be worth looking at it.
Thanks for the suggestion. Right now I'm more concerned with the general problem of how to assess the quality of the generated tests. Dynamic inspection is very project-specific and depends strongly on code that has been executed, i.e. good points of entry or a good exploratory testing session will yield good results. OTOH tests generated based on static inspection are not very interesting, although there are some improvements waiting to be done in this area as well.
I guess we're stuck with manual assessment for now. That means Pythoscope needs more users. :-) I have to write a blog post or two to encourage people to check it out.
--
Cheers,mk
You received this message because you are subscribed to the Google Groups "Pythoscope" group.
To post to this group, send email to pytho...@googlegroups.com.
To unsubscribe from this group, send email to pythoscope+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pythoscope?hl=en.
On 3/14/2010 9:25 AM, Michał Kwiatkowski wrote:When you say "quality of generated test" I want to make sure I understand what you mean. Are you talking about code style, clarity, coverage of source code, percent of side effects covered, or something else?Thanks for the suggestion. Right now I'm more concerned with the general problem of how to assess the quality of the generated tests. Dynamic inspection is very project-specific and depends strongly on code that has been executed, i.e. good points of entry or a good exploratory testing session will yield good results. OTOH tests generated based on static inspection are not very interesting, although there are some improvements waiting to be done in this area as well.
Good idea. Well, to that end I presented Pythoscope at the Southern California Python Users group on March 11th. There were about 20 people there. There were several people that were interested in trying it. You might have noticed the spike of access on the web site on that date.I guess we're stuck with manual assessment for now. That means Pythoscope needs more users. :-) I have to write a blog post or two to encourage people to check it out.