**tl;dr** I can run the full test suite in 85 seconds on SQLite, a 4.8x speedup.
Hello,
Since I last wrote about this project, I improved parallelization by:
- reworking the IPC to avoid exchanging tracebacks
- implementing database duplication for SQLite, PostgreSQL and MySQL
The code is still rough. Several options of runtests.py don't work with
parallelization.
Initially performance was disappointing. I couldn’t max out my cores during
the whole run. With some basic monitoring I noticed that the CPU load
plummeted when there were spikes of disk writes. This led me to believe I was
disk I/O bound even with an in-memory database and a SSD.
So I started optimizing disk I/O, which means doing less I/O and doing it only
in a RAM-mounted temporary directory. Writing to RAM instead of writing to
disk helps a lot. It’s as simple as creating a RAMdisk (that depends on your
OS) and pointing the TMPDIR environment variable to the RAMdisk.
Unfortunately, i18n and migrations management commands write in the
application directories. I haven't found (yet) a way to point them to a
temporary directory instead.
My 2012 MacBook Pro with a 2.3 GHz Intel Core i7 (4 cores, 8 threads) takes:
- 30 seconds for creating the two databases
- 240 seconds to run the actual tests in a single process
- 72 seconds in 4 processes (3.3x faster)
- 60 seconds in 6 processes (4x faster)
- 55 seconds in 8 processes (4.4x faster)
That looks quite close to what my hardware can do. Hyperthreading doesn't help
as much as multiple cores when it comes to running multiple processes and the
synchronization costs increase with the number of processes.
Creating the database accounts for more than a third of the total runtime. I'm
not sure how much time is spent in the migrations framework and how much doing
the table creations. --keepdb helps but it requires an on-disk database. An
on-RAMdisk database would most likely be the best option. I haven't tried it
yet. It should work with at least SQLite and PostgreSQL (using tablespaces).
If you want to help, I'd be interested in:
- reports of whether parallelization works for test suites other than Django's
own -- apply my pull request and run `django-admin test --parallel` or
`django-admin test --parallel-num=N`
- a patch implementing database duplication on Oracle
Let me know if you have questions or concerns.