Speeding up tests

190 views
Skip to first unread message

Anssi Kääriäinen

unread,
Jan 16, 2012, 11:46:02 AM1/16/12
to Django developers
I have been investigating what takes time in Django's test runner and
if there is anything to do about it. The short answer is: yes, there
is a lot of room for improvement. I managed to reduce the running
speed of the test suite (on sqlite3) from 1700 seconds to around 500
seconds. On postgresql I reduced the run time from 5000 to 2500
seconds.

The things I did are listed below. Note that I haven't implemented any
of these (except #4) in a way that would be acceptable for core
inclusion. I implemented them in the quickest way possible to see what
the effect is.
1. Use MD5 passwords (or SHA1 passwords) in testing. This is just a
change to the default test_sqlite.py file, and documentation. Result:
150 seconds from this alone. One password check takes 0.3 seconds on
default settings. There is no reason to run with the highest security
passwords when testing.
2. Do not validate the whole project for each test case. Django uses
call_command to flush database, load fixtures etc between test runs.
These commands run validation of the whole project. For every test.
That is not necessary. In the patch I just disabled validation, which
is not the correct approach. I think this is responsible for around
100-200 seconds.
3. Fix fixture loading. Fixtures are reloaded for every test case.
Now, the problem is that checking for the fixture file is implemented
in a way that results in around 2.5 million "file exists" checks
during the test suite. Basically, for every fixture directory, every
compression type and every fixture file type combination check if a
file with that combination exists. I hacked this into somewhat better
shape (from performance point of view) but it could be much better,
both in code quality and speed. Around 200 seconds saved from here.
4. I applied the deepcopy removal patch from ticket #16759. I think
this is responsible for around 100-200 seconds. This would be
important to get into Django regardless of test runner speed.
5. Track "model data changed". This way, Django can skip flushing
most tables (most notably contenttypes and permissions) between each
transactional test case. This data is mostly static, and the test
suite has really a lot of models, and thus a lot of contenttypes and
permissions. This results in massive reloads between each test case.
In the patch I track changes to models. If there are none, no need to
flush & reload data. Now, the implementation is far from perfect. And
there is the question if this is too much magic. However, I think this
alone is responsible for 1000-1500 seconds when running under
postgresql. I bet for mysql & oracle there is still more gain from
this.

After all of the above, there aren't any really easy gains left, or at
least I can't find them. From above points 1-4 should be somewhat easy
to do. I would like 5 included also, but I can see some counter
arguments for that. It would also be nice for static data in testing.
I work in medical sector here, ICD-10 is 10000+ codes. From
application perspective this is static data. I really do not wish to
reload 10000 rows for every test.

There is a github branch (https://github.com/akaariai/django/compare/
fast_tests) for the things I did. Note that there really isn't
anything ready for 1, 2, 3 or 5 there. They are there just to show the
pain-points. 4 is ready and is tracked by ticket #16759.

There is also a profile file from after the above fixes.
https://github.com/akaariai/django/commit/fedc5c1d10960cefe845fa91f3692cab953253fd

Unfortunately I do not have before-profile available any more, and
generating another one will take more than an hour.

- Anssi

Javier Guerra Giraldez

unread,
Jan 16, 2012, 12:00:28 PM1/16/12
to django-d...@googlegroups.com
On Mon, Jan 16, 2012 at 11:46 AM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> I have been investigating what takes time in Django's test runner and
> if there is anything to do about it. The short answer is: yes, there
> is a lot of room for improvement. I managed to reduce the running
> speed of the test suite (on sqlite3) from 1700 seconds to around 500
> seconds. On postgresql I reduced the run time from 5000 to 2500
> seconds.

doesn't 2 (model validation) and 3 (fixture loading) get skipped when
using SQLite?


--
Javier

Anssi Kääriäinen

unread,
Jan 16, 2012, 12:33:49 PM1/16/12
to Django developers
On Jan 16, 7:00 pm, Javier Guerra Giraldez <jav...@guerrag.com> wrote:
> On Mon, Jan 16, 2012 at 11:46 AM, Anssi Kääriäinen
>
> <anssi.kaariai...@thl.fi> wrote:
> > I have been investigating what takes time in Django's test runner and
> > if there is anything to do about it. The short answer is: yes, there
> > is a lot of room for improvement. I managed to reduce the running
> > speed of the test suite (on sqlite3) from 1700 seconds to around 500
> > seconds. On postgresql I reduced the run time from 5000 to 2500
> > seconds.
>
> doesn't 2 (model validation) and 3 (fixture loading) get skipped when
> using SQLite?

I don't think those are skipped when using SQLite.

For example the admin_views tests do fixture loading for every test
method. Four fixtures. The fixtures are loaded using call_command. The
fixture loading command (if I am not mistaken) will do model
validation. Now, I am not 100% sure if fixture loading is causing
model validation. The fixtures do need reload for every test case even
under sqlite3.

The model validation one is really easy ti test. You can test this by
turning of model validation in core/management/base.py (set
requires_model_validation to False) and see what difference it makes.
That is what I did. Not that it is the right fix at all, but shows the
time used for model validation. I tried this and got a runtime of 760
seconds. So, model validation alone is responsible for at least 200
seconds, more likely in the range of 250 seconds.

Fixture loading is a problem only because the implementation will do a
"cross join" of app directories, available file types (json etc) and
compression types. And then check every file name combination, for
every test. That is expensive.

- Anssi

Thomas Guettler

unread,
Jan 17, 2012, 6:04:12 AM1/17/12
to django-d...@googlegroups.com
Hi,

same subject, but different content:

we have a lot of tests which are read only. They don't modify the database or
other files.

You don't need to flush the database for every test of read only test cases. These tests can be run on production
servers, too.

At the moment, this unittest code is non public. But if there is any interest, we
could make it public.

Is anyone interested?

Thomas Güttler

On 16.01.2012 17:46, Anssi Kääriäinen wrote:
> ....


--
Thomas Guettler, http://www.thomas-guettler.de/
E-Mail: guettli (*) thomas-guettler + de

Łukasz Rekucki

unread,
Jan 17, 2012, 6:35:20 AM1/17/12
to django-d...@googlegroups.com
On 17 January 2012 12:04, Thomas Guettler <h...@tbz-pariv.de> wrote:
> Hi,
>
> same subject, but different content:
>
> we have a lot of tests which are read only. They don't modify the database
> or
> other files.

I assume your code has a way of checking that the tests are really
read only ? If yes, I would be interested. If no, then (sorry to say
this), but the "read-only" part is only wishful thinking.

--
Łukasz Rekucki

Anssi Kääriäinen

unread,
Jan 17, 2012, 7:35:59 AM1/17/12
to django-d...@googlegroups.com, Thomas Guettler
On 01/17/2012 01:04 PM, Thomas Guettler wrote:
> Hi,
>
> same subject, but different content:
>
> we have a lot of tests which are read only. They don't modify the database or
> other files.
>
> You don't need to flush the database for every test of read only test cases. These tests can be run on production
> servers, too.
>
> At the moment, this unittest code is non public. But if there is any interest, we
> could make it public.
>
> Is anyone interested?
Me at least. Running Selenium tests with the current implementation
(based on TransactionTestCase) is somewhat slow, as the whole DB needs a
to be flushed between each test. More so if you happen to have some
codes you need to have in the DB. A readonly-test would be a big
performance bonus in this case.

Is it possible to write read-only tests in a way that no-modifications
are actually enforced? Maybe that is not wanted. For example, writes to
session table are OK assuming your test starts by new login every time.
It would be possible to implement an API: "read-only, except for these
models" but now this is beginning to get pretty complex...

It is also notable that if "flush only modified" was implemented in
as-fast-as-possible way there would not be much need for read-only test
from performance point. If you have changed nothing, nothing will get
flushed. The main problem here are fixtures: these modify the database
for each test, and thus they need to be reloaded for each test. I have
done some investigating to find a nice way to reload fixtures only when
the data has changed. I haven't found anything nice yet, especially not
for normal Django TestCases, which do the fixture loading inside
transaction which will be rolled back after each test.

If there was a generic "model modified" signal, or something like that,
it would make it possible to easily implement both "flush only modified"
and force "read-only" for tests. Now, I don't know if we want yet
another signal for this. But something like that would be needed. The
implementation used for no.5 upthread is not even closely acceptable for
inclusion.

Now, my biggest concern about "flush only modified" and read-only tests
are that they make the test runner much more complicated and especially
increase the risk of cross-test failures. These errors are generally
hard to debug. The numbers 1-4 do not have this problem.

Maybe the best approach here would be to introduce the needed hooks into
Django core, and then implement the read-only tests and flush only
modified tests as external projects. If they are successful, then
include them in Django core.

- Anssi

Adrian Holovaty

unread,
Jan 19, 2012, 12:47:50 PM1/19/12
to django-d...@googlegroups.com
On Mon, Jan 16, 2012 at 10:46 AM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> I have been investigating what takes time in Django's test runner and
> if there is anything to do about it. The short answer is: yes, there
> is a lot of room for improvement. I managed to reduce the running
> speed of the test suite (on sqlite3) from 1700 seconds to around 500
> seconds. On postgresql I reduced the run time from 5000 to 2500
> seconds.

Wow! Just wanted to say thanks for doing all of this work and making
these optimizations. I'm going to take a look at #16759, along with
your Git branch.

Adrian

David Cramer

unread,
Jan 19, 2012, 10:54:17 PM1/19/12
to Django developers
So a few things we've done to take our test suite from 45 minutes to
12:

1. Implement global fixtures

These get loaded after syncing just like initial data. Obviously this
is a massive speed up
as you only reload them in between transaction test cases.

2. Don't inherit from TestCase if you aren't using the db

Around 10% of our tests now inherit from unit test instead. This
let's us only bootstrap the db if it's required as well as skip
any db flushing on those tests.

3. Speedup fixture loading

More or less what is proposed by the original poster

4. Stop writing integration tests

Biggest win :) learn to use mock


On Jan 19, 12:47 pm, Adrian Holovaty <adr...@holovaty.com> wrote:
> On Mon, Jan 16, 2012 at 10:46 AM, Anssi Kääriäinen
>

Anssi Kääriäinen

unread,
Jan 20, 2012, 3:53:10 AM1/20/12
to Django developers
On Jan 20, 5:54 am, David Cramer <dcra...@gmail.com> wrote:
> So a few things we've done to take our test suite from 45 minutes to
> 12:
>
> 1. Implement global fixtures
>
> These get loaded after syncing just like initial data. Obviously this
> is a massive speed up
> as you only reload them in between transaction test cases.

I would really like to push global fixtures forward, and in a way that
would be usable for transactional test cases. The reason is that these
would be really, really useful for LiveServer tests (that is, selenium
tests). The tests which are read-only would be really fast in this
way, while tests that do alter data would be still safe to use.

What would be needed for this? First, of course some API for loading
the global fixtures (maybe post_syncdb could be used for this?) And
then, some way of only flushing the changed models. For change
tracking, pre/post save + m2m_changed signals could be used. Except
they do not match all the cases, bulk_create doesn't send any signals,
and I think .update() doesn't send any signals either. So, a new
signal, model_changed() would be needed, maybe with an
API .model_changed(instances=iterable, action='update/save/bulk_create/
delete/...'). This signal would be useful for cache flushing and
reindexing (Haystack comes in mind here).

I would need to test what speed difference this would make for the
LiveServer test cases I have. I think for my use case, nearly an order
of magnitude speedup could be possible. This is because I load a lot
of "codes" data into the DB.

The sad thing about database state tracking is that it is complex and
prone to errors. Worse, the errors might be hard to track. As said
before, this might be best done as an external project, with the
needed hooks implemented into Django core. The needed changes would be
flush only changed database data and "model_changed" signal or some
other way of tracking state changes. If this proves successful, then
include it into Django core.

- Anssi

Thomas Rega

unread,
Jan 17, 2012, 7:00:20 AM1/17/12
to Thomas Guettler, django-d...@googlegroups.com

Hi,

I am interested in it - could you be so nice and make this available anywhere?

Thanks a lot in advance.

TR

> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To post to this group, send email to django-d...@googlegroups.com.
> To unsubscribe from this group, send email to django-develop...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.
>

Reply all
Reply to author
Forward
0 new messages