Loading fixtures once for each TestCase to improve running time

408 views
Skip to first unread message

Josh Smeaton

unread,
Jun 7, 2014, 3:34:27 AM6/7/14
to django-d...@googlegroups.com
I've been looking into the django testing infrastructure today to see if there was any particular reason why the test suite takes so long to run. Mainly, I didn't understand why fixtures are loaded for every single test method. To be honest, I didn't think fixtures were loaded for every test method; I assumed they were loaded once when the TestCase was created.

My mental model looked like this:

# for each test module:
call_command(flush)  # flush the DB ready for this test case
testcase = get_test_case()  # returns the next subclass of TestCase (or TransactionTestCase etc)
testcase.load_fixtures()  # populate the DB with data for the entire test case
for test_method in testcase.get_test_methods():
    with db.transaction():  # at the end of each test_method, rollback so that we're back to our fixtures
        testcase.setUp()
        getattr(testcase, test_method)()
        testcase.tearDown()

Unfortunately, that's not at all what happens. For each test method, you're given a brand new instance of the testcase, and the initial data is loaded again:

# for each test module:
call_command(flush)  # flush the DB ready for this test case
testcaseclass = get_test_case()
for test_method in testcaseclass.get_test_methods():
    with db.transaction():  # at the end of each test_method, rollback to an empty database
        testcase = testcaseclass(testMethod=test_method)
        testcase.load_fixtures()  # populate the DB with data for a single test_method
        testcase.setUp()
        testcase.run()  # runs the testMethod
        testcase.tearDown()

Note that the above isn't exactly right, but I think it demonstrates the problem. Each test_method is given its own TestCase (unnecessary python overhead), but more importantly, we're not using transactions to get back to the initial data. We're using transactions to get back to an empty database before loading fixtures again.

I know lots of people have invested lots of time on the test suite, especially when it comes to run time. I doubt that I'm raising anything new for the people who have come before. But my question is why? Is there a reason that each test method has to have its own TestCase? Is there a reason that each test method has to load its own fixtures again and again, or is that just a symptom of how each test_method is collected by the test suite?

There are many kinds of tests that deal with the ORM that should be able to rely on fixtures being loaded once for the entire TestCase, and relying on transactions to get back to initial data. Is this theoretically possible, or am I missing something? I figure we could eliminate something like 1/3rd of all queries.

Regards,

Josh

Aymeric Augustin

unread,
Jun 7, 2014, 4:12:35 AM6/7/14
to django-d...@googlegroups.com
Hi Josh,

Fixtures don’t receive a lot of attention because they’re hard to maintain and generally inferior to object generators such as factory_boy. Still, it would be good to optimize them.

On 7 juin 2014, at 09:34, Josh Smeaton <josh.s...@gmail.com> wrote:

> I've been looking into the django testing infrastructure today to see if there was any particular reason why the test suite takes so long to run.

You aren’t the first one to notice the overhead of fixtures: https://code.djangoproject.com/ticket/9449.

That ticket focuses on the cost of parsing fixtures, not loading them.

> Note that the above isn't exactly right, but I think it demonstrates the problem. Each test_method is given its own TestCase (unnecessary python overhead)
> but more importantly, we're not using transactions to get back to the initial data. We're using transactions to get back to an empty database before loading fixtures again.

I have a theory for this. Until Django 1.6, Django didn’t have support for savepoints. All you could do is a full rollback.

So you had a choice between:

- Load fixtures
- For each test method
- Start transaction
- Run test
- Roll back transaction
- Truncate tables (that’s awfully slow when you have lots of models, like Django’s test suite does)

or:

- For each test method
- Start transaction
- Load fixtures
- Run test
- Roll back transaction

The second solution is /probably/ faster for /some/ use cases, and certainly for Django’s own test suite.

It may also explain why Django rewraps each method in a test case, but I’m not sure about that part.

Now, if the test suite is running on a database that supports savepoints (there’s a database feature providing this information) you could do:

- Start transaction
- Load fixtures
- For each test method
- Create savepoint
- Run test
- Roll back to savepoint
- Roll back transaction

> I know lots of people have invested lots of time on the test suite, especially when it comes to run time. I doubt that I'm raising anything new for the people who have come before. But my question is why? Is there a reason that each test method has to have its own TestCase? Is there a reason that each test method has to load its own fixtures again and again, or is that just a symptom of how each test_method is collected by the test suite?

I don’t have all the answers, but hopefully the above sheds some light on the underlying issues.

> There are many kinds of tests that deal with the ORM that should be able to rely on fixtures being loaded once for the entire TestCase, and relying on transactions to get back to initial data. Is this theoretically possible, or am I missing something? I figure we could eliminate something like 1/3rd of all queries.

That would be pretty cool.

If you work on a patch, please keep multiple databases in mind — I don’t know how they’re handled transaction-wise.

--
Aymeric.

Aymeric Augustin

unread,
Jun 7, 2014, 4:29:12 AM6/7/14
to django-d...@googlegroups.com
In fact, I just reinvented https://code.djangoproject.com/ticket/20392.

The patch on that ticket is pretty good, I’ll try to review it again.

--
Aymeric.

Josh Smeaton

unread,
Jun 10, 2014, 7:34:14 PM6/10/14
to django-d...@googlegroups.com
I used the patch in the above ticket as a base, and implemented fixture loading in the setUpClass classmethod rather than the setUp method. I found that it improved the total running time of the entire test suite by about 10%, but it improved TestCases that use fixtures by a factor of 3.


I'm unable to test this patch on Oracle or mssql though, which are known to be a lot slower than most of the other backends (for the test suite). The list of test modules that use fixtures are:

admin_changelist admin_custom_urls admin_docs admin_inlines admin_views admin_widgets aggregation aggregation_regress contenttypes_tests fixtures fixtures_model_package generic_inline_admin known_related_objects m2m_through_regress multiple_database proxy_models raw_query servers syndication_tests test_client test_client_regress test_utils timezones

If someone is able to test Oracle or mssql with that set of test modules and report back the difference in time taken between master and the above branch, that'd be extremely useful information.

As for profiling the entire test suite:

vagrant@djangocore:~$ PYTHONPATH=/django /home/vagrant/.virtualenvs/py2.7/bin/python -m cProfile -s cumulative /django/tests/runtests.py

This shows that for psql, mysql (inno), and sqlite, the majority of (cumulative) time is spent inside the request/response phases of the test client, reversing urls, and rendering templates. I don't think the choice of backend would massively influence these tests (unless transactions themselves are especially slow per backend), so I'm betting that the loaddata/rollback operations are the primary pain points. See my output (with loaddata optimisations) here: https://gist.github.com/jarshwah/42c4cd1f54c8fb3dd273

Unfortunately, this patch has introduced some bad threading issues for sqlite which seg faults for me and also Aymeric in certain conditions. It's definitely not ok to use this code in production. If there isn't significant improvement for oracle or mssql, I'll probably abandon this patch myself.

Cheers

Josh Smeaton

unread,
Jun 11, 2014, 10:37:45 AM6/11/14
to django-d...@googlegroups.com
manfre was nice enough to test this patch out on mssql for me. The subset of tests that use fixtures went from 385 seconds to 158 seconds (+2 failures). The improvement seems to be stable across backends, but isn't very significant when considering the entire test suite.

I'm going to abandon this patch and ticket myself - mainly due to the sqlite threading issues. If someone else would like to run with it, I'll be happy to share any information I have.

- Josh

Schmitt, Christian

unread,
Jun 12, 2014, 9:18:03 AM6/12/14
to django-d...@googlegroups.com
Just a quick question regarding to these posts / tickets.

In the last few weeks we fixed: https://code.djangoproject.com/ticket/22487

So I think that we now don't need to rely on fixtures and should just tell the people to use that behavior in the first place.
I mean I'm not a Django core developer, but since we fix our code for Django 1.7 our test suite runs faster since we dropped them out.

Also we wrote somethings in the docs about that:
Deprecated since version 1.7: If an application uses migrations, there is no automatic loading of fixtures. Since migrations will be required for applications in Django 2.0, this behavior is considered deprecated. If you want to load initial data for an app, consider doing it in a data migration.

The question about that is, what happens if you don't use migrations. Will fixtures still be loaded in Django 2.0 and upwards or not.
If they will be loaded, we should fix this, but if not we should maybe clear that things up that the people SHOULD migrate away from fixtures to migrations.

Quick Note: I think, currently the data migrations still uses the old technique, by serializing / deserialize on every test case, so maybe we should expand that logic somehow instead of putting work onto fixtures.




--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/aeb670bd-8ef1-4b5b-9170-65489532dc3b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Manfre

unread,
Jun 12, 2014, 9:35:28 AM6/12/14
to django-d...@googlegroups.com
That ticket seems to address issues with initial_data and not necessarily deal with fixtures that are loaded for a specific TestCase. I do agree that we should encourage people to not use fixtures and build their test data within the scope of the test or the TestCase.

Regards,
Michael manfre


Christian Schmitt

unread,
Jun 12, 2014, 9:40:01 AM6/12/14
to django-d...@googlegroups.com
Yeah as said that ticket resolved the issue that you can't test data migrations, so for now on you could use data migrations and test them.

And I think one of the things we should definitly do is a guide for migrating fixtures to data migrations and work towards that path instead of fixing behavior that are getting deprecated anyway.
Also there is still work to do, since this conversation also focues on the data migrations part. As already said, their behavior is nearly the same and maybe we could do something to have data migrations be super fast, so that people have another reason to stay away or migrate away from fixtures. 

Andrew Godwin

unread,
Jun 12, 2014, 12:45:45 PM6/12/14
to django-d...@googlegroups.com
Data migrations are kind of separate to the per-test fixtures; one is "initial data" shared between testing and production, and one is test-specific code. #22487 was just to emulate database rollback on backends that don't have transactions, since we can no longer emulate it using "flush; loaddata initial_data".

The current rollback mechanism in tests is, as you say, rolling back to the starting state of the database each time and then re-installing fixtures per test; we need to keep that rollback all the way to the start so we can reset between test cases, but as Aymeric says, savepoints are probably the answer here, and that ticket he linked seems to be partway there..

Andrew


Daniel Moisset

unread,
Jun 12, 2014, 1:29:08 PM6/12/14
to django-d...@googlegroups.com
A long time ago (django 1.3 era) I wrote a library (no need to change django) to do exactly this, using savepoints to rollback: https://github.com/machinalis/django-fasttest

It worked pretty well for a large project and provided a large speedup (i think it was about 5x) on a large test suite. But I haven't mantained it since, I started doing everything with factories (factory boy) and abandoning fixtures because they were hard to maintain. You might find it useful if you can update it to newer django versions; also note that the current code was developend with postgres, it might work on some other db but I have no idea. 

Regards,
    D.





--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages