[gsoc2009-testing] Windmill Runners Kicking it up a Notch

Kevin Kubasik

unread,

Jun 27, 2009, 2:22:17 AM6/27/09

to djang...@googlegroups.com, django-developers

So, up until now, most of the work on windmill hasn't exactly been 'tester ready' in that, if you didn't you my exact incantation of settings, luck, module versions and love, you didn't get very far. While I still haven't gotten real docs done (my original plan for the week) I have learned a valuable lesson in assumptions. I foolishly assumed that splitting the WSGI handler into its own thread, while introducing another server (windmill), and doing my fixture loading in the primary thread would just work. Surprisingly, it almost does, except for the sqlite3+:memory: instance. Obviously, as this is how 99% of django tests are run, I had hit a snag. I believe that the problem is solved, but it did put me a bit behind, as I have had to totally reconsider how the loaders and databases are handled. Namely, the following questions arose:

Should windmill tests default to non-isolated/non-transactional DB behavior? Basically, we are providing the means for functional tests, these should be tests like 'Register and Complete a Profile', then 'Edit Profile'. We don't want this as one massive test, and it seems like that would be the expected behavior most of the time, and still allowing for the option of specific tests being run in isolation seems like the best move. However, this could be confusing, or just bad practice, so I wanted to get some feedback.
Scratch my #2 I caught one of the Windmill guys on the IRC and got some good direction on detection stuff.
What is the general interest in test-only models as a public api? The mechanics of it have been worked out for the regression suite, but the debate falls to one of the following systems.

A class used instead of db.Model (db.TestModel)
A module in the app (test_models.py)
Similar to fixtures (a property on tests)
A settings option

I am assuming that code coverage of windmill tests isn't that useful of a number, given the specialized execution paths etc. But I wanted to double check that people wouldn't be surprised by that.

Overall, there has been marked improvements in the runner state, and I've added some more tests as well. However, I am holding off on the really js intense tests until the framework is rock solid. But I wanted to get moving on documentation sooner rather than later, so I expect a bit more cleanup next week to make sure that the elaborate charade we play (conning windmill to play with us) is reliable for 3rd party applications as well.

The branch isn't really ready for testing yet, but it has been known to work. And Eric has kindly thrown up a coverage report!

http://media.ericholscher.com/django_coverage/

--
Kevin Kubasik
http://kubasik.net/blog

Alex Gaynor

unread,

Jun 27, 2009, 2:36:21 AM6/27/09

to django-d...@googlegroups.com

Just a small note, but there seems to be an issue with the coverage, in that any module level statements aren't reported as being executed, such as imports, function definitions, or class definitions. That might be an issue with whatever does the coverage report though.

Alex

--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Kevin Kubasik

unread,

Jun 27, 2009, 3:18:46 AM6/27/09

to django-d...@googlegroups.com

It's statement coverage, so it largely depends on what is called, and how.

http://nedbatchelder.com/code/coverage/faq.html

The above explains why we get that effect. the problem is that we don't want to start it much earlier, since we are doing identification of apps. It's never perfect, but its more to give us an idea of where we have totally missed a codepath, imports are easy to identify.

http://nedbatchelder.com/blog/200710/flaws_in_coverage_measurement.html

Just my decision making process, but I am completely open to trying to move things around if this is a serious problem. But what's generally a bit easier is just expanding our exclude statements. I've added some import excludes which eliminates most of them, I'll commit tomorrow!

Ben Ford

unread,

Jun 27, 2009, 5:22:13 AM6/27/09

to django-d...@googlegroups.com

Hi Kevin,

I haven't had a chance to really read and digest this - but yu might want to check out fixture http://farmdev.com/projects/fixture/ and the django support branch for it http://bitbucket.org/kumar303/fixture/ for loading fixtures for tests. Fixture might give you more options for loading data to test.

Cheers,
Ben

2009/6/27 Kevin Kubasik <ke...@kubasik.net>

--
Regards,
Ben Ford
ben.f...@gmail.com
+447792598685

Russell Keith-Magee

unread,

Jun 29, 2009, 12:20:42 PM6/29/09

to django-developers

On Sat, Jun 27, 2009 at 2:22 PM, Kevin Kubasik<ke...@kubasik.net> wrote:
> Should windmill tests default to non-isolated/non-transactional DB behavior?
> Basically, we are providing the means for functional tests, these should be
> tests like 'Register and Complete a Profile', then 'Edit Profile'. We don't
> want this as one massive test, and it seems like that would be the expected
> behavior most of the time, and still allowing for the option of specific
> tests being run in isolation seems like the best move. However, this could
> be confusing, or just bad practice, so I wanted to get some feedback.

You need to clarify your terms here - when you say "in isolation", do
you mean in the sense that the effects of test 1 shouldn't affect test
2 (i.e., the basic Unit Test premise), or are you referring to the
transactional testing framework that has been introduced for Django
v1.1? What are you trying to isolate from what?

> What is the general interest in test-only models as a public api? The
> mechanics of it have been worked out for the regression suite, but the
> debate falls to one of the following systems.
>
> A class used instead of db.Model (db.TestModel)
> A module in the app (test_models.py)
> Similar to fixtures (a property on tests)
> A settings option

It's not entirely obvious to me what these alternatives mean. You're
describing a relatively complex feature, yet your explanation of four
options doesn't dig much deeper than 4 words in a parenthetical
comment. That isn't much to base a design judgement upon.

Here are my expectations as a potential end user of this feature:

* I should be able to define a test model in exactly the same way as
any other model (i.e., subclassing models.Model)

* Test models shouldn't be defined in the same module as application
models. Putting the test models somewhere in the tests namespace
(e.g., myapp/tests/models.py) would make some sort of sense to me, but
I'm open to other suggestions.

* There must be sufficient flexibility so that I can have different
sets of models for different tests. For example, the admin app should
be testing behavior when there are no models defined. It should also
test behavior when there are N models defined. These two test
conditions cannot co-exist if there is a single models file and all
models in that file are automatically loaded as part of the test app.

* The test models should appear to be part of a separate test
application - i.e., if I have a test model called Foo in myapp, it
should be myapp_test.Foo (or something similar), not myapp.Foo.

* The appropriate housekeeping should be performed to ensure that app
caches are flushed/purged at the end of each test so that when the
second test runs, it can't accidentally find out about a model that
should only be present for the first test.

I'm open to almost any design suggestion that enables this use case.

> I am assuming that code coverage of windmill tests isn't that useful of a
> number, given the specialized execution paths etc. But I wanted to double
> check that people wouldn't be surprised by that.

I wouldn't rule out the proposition that someone might be interested
in this number.

I'm also a little confused as to why this decision is even required.
My understanding was that determining code coverage is one problem;
starting a Windmill test was a separate problem. As I understood it,
both features were being layered over the top of the standard UnitTest
framework, so if you wanted to determine code coverage of a Windmill
test, it would just be a matter of turning on the coverage flag on
your django.test.WindmillTest instance. Have I missed something
important here?

Yours,
Russ Magee %-)

Alex Gaynor

unread,

Jun 29, 2009, 12:28:37 PM6/29/09

to django-d...@googlegroups.com

Another thought just occured to me. As a part of my multi-db work I've had to update the testing harness for syncing more than one DB. That work itself is probably 100% orthagonal to what you're doing. However Russ mentioned subclasses on UnitTest which reminded me that I've already hit the problem doing all of the setUp and tearDown work when multiple databases aren't needed is wasteful, and can kill some of the speed ups we got from transactional test cases. What I'd like to do is introduce a MultipleDatabaseTestCase, and only for this is the syncing of all DBs done. The issue here is we'd end up with 4 classes: TestCase, TransactionalTestCase, MutliDBTestCase, MultiDBTransactionalTestCase. This is Bad (tm). If you're introducing additional TestCase subclasses (like for Windmill), this situation would be further exacerbated. I'm not sure what the right solution to this is, but I figured I'd put it out there.

Ned Batchelder

unread,

Jun 29, 2009, 4:27:57 PM6/29/09

to django-d...@googlegroups.com

I'm really pleased to see coverage.py being used this way, thanks for doing the work. But I would strongly recommend starting coverage before importing the modules to avoid this effect of module-level statements not being measured. You can add exclude regexes to mitigate the problem, but it's a losing game. You'll inevitably either under- or over-exclude statements.

One of the things I tried hard to do with coverage was remove noise from the workflow. Excludes let developers move known not-executed code out of their attention, and lets coverage.py focus the developers on potential problem. Using excludes to "fix" module-level statements feels like a mis-use.

I'd be glad to lend a hand if I can help.

--Ned.
http://nedbatchelder.com

-- 
Ned Batchelder, http://nedbatchelder.com

Kevin Kubasik

unread,

Jul 1, 2009, 7:01:54 AM7/1/09

to django-d...@googlegroups.com

On Mon, Jun 29, 2009 at 10:20 AM, Russell Keith-Magee <freakb...@gmail.com> wrote:

On Sat, Jun 27, 2009 at 2:22 PM, Kevin Kubasik<ke...@kubasik.net> wrote:
> Should windmill tests default to non-isolated/non-transactional DB behavior?
> Basically, we are providing the means for functional tests, these should be
> tests like 'Register and Complete a Profile', then 'Edit Profile'. We don't
> want this as one massive test, and it seems like that would be the expected
> behavior most of the time, and still allowing for the option of specific
> tests being run in isolation seems like the best move. However, this could
> be confusing, or just bad practice, so I wanted to get some feedback.

You need to clarify your terms here - when you say "in isolation", do
you mean in the sense that the effects of test 1 shouldn't affect test
2 (i.e., the basic Unit Test premise), or are you referring to the
transactional testing framework that has been introduced for Django
v1.1? What are you trying to isolate from what?

Sorry, yes I was referring to the transactional framework.

> What is the general interest in test-only models as a public api? The
> mechanics of it have been worked out for the regression suite, but the
> debate falls to one of the following systems.
>
> A class used instead of db.Model (db.TestModel)
> A module in the app (test_models.py)
> Similar to fixtures (a property on tests)
> A settings option

It's not entirely obvious to me what these alternatives mean. You're
describing a relatively complex feature, yet your explanation of four
options doesn't dig much deeper than 4 words in a parenthetical
comment. That isn't much to base a design judgement upon.

Here are my expectations as a potential end user of this feature:

* I should be able to define a test model in exactly the same way as
any other model (i.e., subclassing models.Model)

* Test models shouldn't be defined in the same module as application
models. Putting the test models somewhere in the tests namespace
(e.g., myapp/tests/models.py) would make some sort of sense to me, but
I'm open to other suggestions.

* There must be sufficient flexibility so that I can have different
sets of models for different tests. For example, the admin app should
be testing behavior when there are no models defined. It should also
test behavior when there are N models defined. These two test
conditions cannot co-exist if there is a single models file and all
models in that file are automatically loaded as part of the test app.

My though on best solving this is a property (similar to how we use 'fixtures') that for each test points to any model modules that should be explicitly loaded. Thoughts?

* The test models should appear to be part of a separate test
application - i.e., if I have a test model called Foo in myapp, it
should be myapp_test.Foo (or something similar), not myapp.Foo.

I hadn't planned on doing this, do you have a use case in which myapp.Foo would cause problems? Not entirely sure I understand why this would be ideal.

* The appropriate housekeeping should be performed to ensure that app
caches are flushed/purged at the end of each test so that when the
second test runs, it can't accidentally find out about a model that
should only be present for the first test.

This is a tough(er) problem, since my initial approach (flush the entire app cache after test and force a call to _populate()) made things unbearably slow. My main issue has been trying to determine behavior based on changes to the AppCache, are there any docs for the AppCache that I might be missing?

Otherwise, it looks like manual manipulation of the cache is the key. I have just checked in a super-alpha sample with me messing around a bit, but usage is pretty straightforward:

from django.test import TransactionTestCase

class TestMyViews(TransactionTestCase):

test_models = ['test_models']

def testIndexPageView(self):

# Here you'd test your view using ``Client``.

Where ``test_models.py`` has models declared for use, there are no limits on the number of modules that can be loaded for one test. If a user wants different models for different tests, they just have to declare several modules.

I'm open to almost any design suggestion that enables this use case.

> I am assuming that code coverage of windmill tests isn't that useful of a
> number, given the specialized execution paths etc. But I wanted to double
> check that people wouldn't be surprised by that.

I wouldn't rule out the proposition that someone might be interested
in this number.

Certainly, and it is easy to get, however I was referring to what the --coverage flag does by default to the runtests.py script.

I'm also a little confused as to why this decision is even required.
My understanding was that determining code coverage is one problem;
starting a Windmill test was a separate problem. As I understood it,
both features were being layered over the top of the standard UnitTest
framework, so if you wanted to determine code coverage of a Windmill
test, it would just be a matter of turning on the coverage flag on
your django.test.WindmillTest instance. Have I missed something
important here?

Somewhat, WindmillTests are special cases, and do not extend unittest. This is for 2 main reasons,

Windmill is very tightly coupled with functests, and functests don't play well with unittests.
Running windmill tests from Django's unittest runner presents a few problems, including proper reporting of test success/failure and the ability to filter when windmill tests are run.

Since windmill tests are very slow, and represent a special case, I have instead opted for a special runner which only runs windmill tests. This also means that the specialized error-casing, threading environment and transaction behavior don't pollute the django TestCase.

While its not ideal, it is more flexible, and more inline with how I imagine most people using windmill tests. I am planning on writing a twill runner as well to extend and help me abstract the windmill runner.

I am open to a design discussion regarding how Windmill tests are integrated into django. Basically, there are 2 routes:

Specialized Runner locates windmilltests directories and runs them with functests.
Specialized subclass of TestCase, which starts the windmill runner and loads tests from windmilltests.

As mentioned above, I prefer option 1 because of the specialized hacks it takes to run windmill tests, and the performance hits we take when bringing the background server up and down.

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,

Jul 3, 2009, 9:46:46 AM7/3/09

to django-d...@googlegroups.com

On Wed, Jul 1, 2009 at 7:01 PM, Kevin Kubasik<ke...@kubasik.net> wrote:
>
>
> On Mon, Jun 29, 2009 at 10:20 AM, Russell Keith-Magee
> <freakb...@gmail.com> wrote:
>>
>> On Sat, Jun 27, 2009 at 2:22 PM, Kevin Kubasik<ke...@kubasik.net> wrote:
>> > Should windmill tests default to non-isolated/non-transactional DB
>> > behavior?
>> > Basically, we are providing the means for functional tests, these should
>> > be
>> > tests like 'Register and Complete a Profile', then 'Edit Profile'. We
>> > don't
>> > want this as one massive test, and it seems like that would be the
>> > expected
>> > behavior most of the time, and still allowing for the option of specific
>> > tests being run in isolation seems like the best move. However, this
>> > could
>> > be confusing, or just bad practice, so I wanted to get some feedback.
>>
>> You need to clarify your terms here - when you say "in isolation", do
>> you mean in the sense that the effects of test 1 shouldn't affect test
>> 2 (i.e., the basic Unit Test premise), or are you referring to the
>> transactional testing framework that has been introduced for Django
>> v1.1? What are you trying to isolate from what?
>
> Sorry, yes I was referring to the transactional framework.

Ok - so you're referring to the transactional framework, but what
problem are you referring to? I don't see how this relates to your
original question/feedback request.

A Windmill test exists to test views. Views will often (but not
always) use transactions. A Windmill test suite will contain multiple
view tests. People will want to invoke individual tests, as well as
invoking the entire suite and getting a report. What exactly is it
that you need feedback on?

Again - I think you've given a one line response to a problem that
isn't that simple. You need to explain your thoughts. You need to
elaborate on your plans. I can imagine any number of ways that "a
property" could be used to solve this problem. What do you have in
mind?

>> * The test models should appear to be part of a separate test
>> application - i.e., if I have a test model called Foo in myapp, it
>> should be myapp_test.Foo (or something similar), not myapp.Foo.
>
> I hadn't planned on doing this, do you have a use case in which myapp.Foo
> would cause problems? Not entirely sure I understand why this would be
> ideal.

Think of what it is that we could be testing. For example:

* Admin needs to test layouts when there are multiple applications.
* Schema evolution projects need to test migrations when there are
cross-app dependencies.

These are just two examples - it shouldn't be too hard to think of
others. My point is that contrib.admin is a single application, and it
has models of its own. You can't do a comprehensive test of
contrib.admin by putting all the test models in the same namespace -
you need to be able to define multiple test app namespaces within the
contrib.admin test suite.

>> * The appropriate housekeeping should be performed to ensure that app
>> caches are flushed/purged at the end of each test so that when the
>> second test runs, it can't accidentally find out about a model that
>> should only be present for the first test.
>
> This is a tough(er) problem, since my initial approach (flush the entire app
> cache after test and force a call to _populate()) made things unbearably
> slow. My main issue has been trying to determine behavior based on changes
> to the AppCache, are there any docs for the AppCache that I might be
> missing?

Nope :-)

Unfortunately, this is one of those internal areas that hasn't been
fully documented. The closest we come to any form of documentation is
"documentation by test" - that is, the test suite expects that the app
cache behaves the way it does, so if tests start failing, you've
probably broken something important. Not very helpful, I know, but
that's the way it is.

> Otherwise, it looks like manual manipulation of the cache is the key. I have
> just checked in a super-alpha sample with me messing around a bit, but usage
> is pretty straightforward:
>    from django.test import TransactionTestCase
>    class TestMyViews(TransactionTestCase):
>    test_models = ['test_models']
>    def testIndexPageView(self):
>    # Here you'd test your view using ``Client``.
> Where ``test_models.py`` has models declared for use, there are no limits on
> the number of modules that can be loaded for one test. If a user wants
> different models for different tests, they just have to declare several
> modules.

Ok - so the idea here is that you have:
/myapp
__init__.py
models.py
tests.py
test_models.py
test_models2.py

and then the TestMyViews test case can assume the existence of the
'test_models' app in the app cache? Some quick queries:

* What happens if I want to put in some namespace modules (e.g., a
tests directory)? How does test_models resolve in this case?

* How does this integrate with tests that depend on non-test
applications - e.g., admin requires auth, so the test case should be
able to specify contrib.auth as a test dependency

Ok, that's a little clearer. I agree that the runner approach is
better. Is there any reason that the code for invoking and reporting
coverage couldn't be factored out so it could be shared between the
two runners?

Russ %-)

Reply all

Reply to author

Forward