Creating data in TestCase.setUpClass() (proposed refactoring)

343 views
Skip to first unread message

Thomas

unread,
Nov 9, 2014, 7:19:20 PM11/9/14
to django-d...@googlegroups.com
Hi!

I'm opening this topic to discuss some TestCase refactoring I have been working on, as it might have a non-negligible impact.

The basic idea — which is not mine, see #20392 and that related discussion https://groups.google.com/forum/#!topic/django-developers/N0HEAD1ht8k — is to allow the creation of initial data once for a TestCase, within setUpClass. Currently, data created this way will persist between different test cases, thus breaking tests isolation. To maintain that isolation, the idea is to wrap the whole TestCase execution within an atomic block. Note that the isolation between individual test functions within a TestCase will be preserved as those functions are themselves called within an atomic block.

I implemented that change in https://github.com/django/django/pull/3464, not without some difficulties.
As a demonstration of how that change could speed up tests, I refactored the queries and select_related tests within django test suite to create initial data in setUpClass instead of setUp.

The result is an approximate -70% execution time — indeed, I chose those tests because a lot of data is created and the result would probably not be the same with other tests:

Using setUp():

$ ./tests/runtests.py select_related queries --settings=test_postgres
...
Ran 253 tests in 10.637s

Using setUpClass():

$ ./tests/runtests.py select_related queries --settings=test_postgres
...
Ran 253 tests in 2.291s

In order to make the whole Django test suite pass with that change, I had to adapt a bunch of tests. I described the technical issues in https://code.djangoproject.com/ticket/20392#comment:22 but the main problem is due to some tests in Django test suite relying on the fact that a new database connection is generated after each test, which is a behaviour that should change if we want to maintain a transaction at TestCase level. I don't know if a lot of TestCase out of Django test suite rely on that behavior (for instance, Django relies on that in a test which closes the connection explicitely) and those tests should work if turned into TransactionTestCase.

Another issue I can think of is that if people start refactoring their tests to create initial data in setUpClass, the refactored tests will generate data that will leak between TestCase when run on a database which does not support transactions (e.g. MySQL with MyISAM engine).

So, do you think it's worth implementing that despite the possible incompatibilities ? Do you see any other issue ?

Anssi Kääriäinen

unread,
Nov 10, 2014, 1:14:32 AM11/10/14
to django-d...@googlegroups.com
On Sun, 2014-11-09 at 16:19 -0800, Thomas wrote:

> In order to make the whole Django test suite pass with that change, I
> had to adapt a bunch of tests. I described the technical issues in
> https://code.djangoproject.com/ticket/20392#comment:22 but the main
> problem is due to some tests in Django test suite relying on the fact
> that a new database connection is generated after each test, which is
> a behaviour that should change if we want to maintain a transaction at
> TestCase level. I don't know if a lot of TestCase out of Django test
> suite rely on that behavior (for instance, Django relies on that in a
> test which closes the connection explicitely) and those tests should
> work if turned into TransactionTestCase.
>
> Another issue I can think of is that if people start refactoring their
> tests to create initial data in setUpClass, the refactored tests will
> generate data that will leak between TestCase when run on a database
> which does not support transactions (e.g. MySQL with MyISAM engine).

It might be better to go with setupTestData() method. On MySQL + MyISAM
this is ran before every test. Similar for TransactionTestCase. For
those databases that do fully support atomic() the setup is ran just
once for each test class.

By the way I believe there is an additional speedup to be achieved here
- we can also load fixture data once per test class, and that should
give a significant speedup for tests that rely on fixture data. You
could check this with aggregation, aggregation_regress and admin test
modules.

> So, do you think it's worth implementing that despite the possible
> incompatibilities ? Do you see any other issue ?

+1 from me.

- Anssi

Josh Smeaton

unread,
Nov 10, 2014, 4:43:58 AM11/10/14
to django-d...@googlegroups.com
Just a quick FYI - https://groups.google.com/forum/#!msg/django-developers/N0HEAD1ht8k/GQJ77RLUydsJ

I tried to implement fixture loading in setupClass() and ran into a few difficulties. I did a lot of profiling though. You get about a factor of 3 speedup for tests that use fixtures. It only represents about 10% of the entire test run though (which is still a great speedup). There may be some ideas in the patch linked to in the above thread that you could use.

+1 from me too. It would be great to speed up the test suite.

Thomas Chaumeny

unread,
Nov 12, 2014, 2:10:58 PM11/12/14
to django-d...@googlegroups.com
Thanks for your feedback.

I updated my pull request (https://github.com/django/django/pull/3464) following Anssi suggestion: subclasses of TestCase can define a class method "setupTestData" to create some data once for the whole TestCase. If the database backend does not support transaction (e.g. MySQL with MyISAM), setUpTestData will be called once before each test, like setUp. Also, I noticed that the build time of my PR was relatively low compared to other PR (~ -20%). I tend to believe this might be a consequence of not closing database connections after each test.

Regarding fixtures, I see no reason why it shouldn't be possible to load them in setUpClass to speedup the tests. I think this might be a next step, but it would require deeper changes in the structure of TransactionTestCase/TestCase.
Reply all
Reply to author
Forward
0 new messages