In theory, at least if one's using in-memory SQLite database, it should be
possible to parallelize the tests and make the run faster. There are
possibly other potential ways the tests can clobber each other (eg.
allocating the same ports for LiveServerTestCases, using the same
memcached key prefixes, etc...), but none seem insurmountable at a first
thought.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => assigned
* needs_better_patch: => 1
* needs_tests: => 0
* needs_docs: => 0
Comment:
I've got a proof-of-concept runner that splits the tests into N groups and
runs the each group independently. In completely unscientific tests on my
(dual core) laptop, I got ~3x speedups.
The runner is here:
https://github.com/senko/django/commit/fa6d7a5845ae7863e3bff0c571588a67b19419f0
It's not in a mergeable state yet, as it doesn't address any potential
http port allocation or memcached prefix allocation problems. I also
couldn't find a reliable way to mimic runtest's behaviour in getting the
tests to run, so I think I actually run more tests (by default, if no
labels are set) than runtests does by default.
If this entire excercise makes sense, at some point the functionality
would probalby need to get moved to runtests itself (which would solve the
"can't get exactly the test labels we need" problem).
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:1>
Comment (by akaariai):
Seems like a good idea to me (even if complete implementation might be
hard). If we want to do this for other databases than in-memory sqlite we
will need separate databases for each parallel process.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:2>
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:3>
Comment (by akaariai):
I discussed this on #django-dev IRC channel, and it seems we do not want
this into django repo. However, it would be excellent if you could write a
standalone script (usable from $HOME/bin/ for example) by which you could
run the in-memory tests in parallel. The script doesn't need to do more
than what the current script does.
Having a fast way to run all tests, even if the output is ugly and it is
usable only for in-memory sqlite would be a valuable addition.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:4>
Comment (by anonymous):
Ahh, too bad, I just got the integration into runtests.py working (with
extra --workers=N param), avoiding the test label discovery problem I had
earlier. Okay, I'll adapt it into the standalone script - actually that
way it's easier to specify multiple settings files (needed if you use
database other than the in-memory sqlite) without breaking the runtests
usage.
I'll attached the updated script here.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:5>
Comment (by senko):
I've attached the script. It can be located anywhere as long as either the
current working directory is django/tests or that it's passed via the
--testdir argument.
Script usage:
~/bin/paralleltests.py --testdir=/path/to/django/tests --runners=<N>
--settings=test_sqlite [test_labels ...]
All arguments except --testdir=<dir> and --runners=<N> are passed as-is to
runtests.py. A good heuristic for the optimal number of runners is 2 * cpu
cores you have (from the assumption that some of the tests are actually
using the CPU and some are waiting for I/O at any given time - in any
case, produces good results on my laptop :)
The script discovers all the tests that you want to run (or uses the test
labels provided manually), splits them into N chunks, and starts N
parallel runtests processes, one for each chunk of the labels. So it will
not speed up a single test label execution (these usually don't last very
long, anyways) Also, the test discovery in it sucks, as I try to mimic
runtests behaviour but don't impleent the actual discovery logic (and
can't reuse it easily from runtests), so it actually runs *more* tests
than the default runtests (I've no idea why).
In case it matters to anyone: the script is licensed under the same terms
as Django itself.
Is there a good place to put the script so it's more visible/useful to
people, without them having to sift through Trac?
Also: I'm going to continue poking at runtests and trying to make it work
in parallel for non-sqlite databases as well, to satisfy my own curiosity.
I'd appreciate if this ticket could stay open for a while so I have a
place to report my findings (if any).
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:6>
Comment (by senko):
I've updated my branch experimenting with runtests for parallel run. The
modified runtests can run sqlite, mysql and postgres tests in parallel
with no problems (these are all that I've tested). The only special thing
needed is to make sure the test database name is different in each worker
(which is easily done by using a randomized name in your test settings
file).
The major downside remaining that I can see is that the test output is not
nice (interspersed from N parallel workers), and could be unreadable if
you have a lot of errors. In the usual case where you want to quickly run
all the tests ("I'm done with this bit, let's check nothing else got
broken"), at least for me it's not so much of a problem.
Code (still) at: https://github.com/senko/django/tree/ticket_20461
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:7>
Comment (by aaugustin):
That last link is a 404 these days. senko, do you have the latest code
around?
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:8>
Comment (by aaugustin):
While Django is thread-safe in normal use, it isn't during tests because
`override_settings` has process-wide effects.
That's why parallelizing tests requires processes, not threads. (Knowing
this would have saved me some time.)
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:9>
* owner: senko => aaugustin
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:10>
* cc: cmawebsite@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:11>
* needs_better_patch: 1 => 0
* has_patch: 0 => 1
Comment:
Pull request: https://github.com/django/django/pull/4063
Mailing list discussion: https://groups.google.com/d/msg/django-
developers/6locyZUxY8w/UbgusvfSz0IJ
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:12>
* needs_better_patch: 0 => 1
Comment:
Still a work in progress as far as I can tell.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:13>
* needs_better_patch: 1 => 0
Comment:
PR: https://github.com/django/django/pull/4761
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:14>
* needs_better_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:15>
* keywords: => 1.9
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:16>
* needs_better_patch: 1 => 0
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:17>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"b1a29541e5d37c03becde6c84e793766ef23395c" b1a29541]:
{{{
#!CommitTicketReference repository=""
revision="b1a29541e5d37c03becde6c84e793766ef23395c"
Merge pull request #4761 from aaugustin/parallelize-tests-attempt-1
Fixed #20461 -- Allowed running tests in parallel.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:18>
* status: closed => new
* resolution: fixed =>
Comment:
b1a2954 fails for me with python 2.7 and 3.4 under fedora 22 x64 using
default sqlite configuration.
Stacktraces are there -
https://gist.github.com/bak1an/434d884425f3354896b2
There are no old *.pyc files, {{{git status}}} is clean.
Tests finish successfully on previous revision (acb8330).
Am I doing something wrong or something is broken?
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:19>
* stage: Ready for checkin => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:20>
Comment (by bak1an):
Found it.
Looks like tblib is not optional here.
{{{ pip install tblib }}} helped.
{{{
Ran 10287 tests in 137.824s
OK (skipped=735, expected failures=7)
}}}
Perhaps this should be mentioned in docs and added to the requirements
file.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:21>
Comment (by collinanderson):
https://github.com/django/django/pull/5257
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:22>
Comment (by collinanderson):
adding tblib to requirements: https://github.com/django/django/pull/5258
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:23>
Comment (by bak1an):
perhaps someone from core team should close this ticket now.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:24>
Comment (by Tim Graham <timograham@…>):
In [changeset:"c97b755a1cdd7051270de6ac2605a8a1e87555fb" c97b755a]:
{{{
#!CommitTicketReference repository=""
revision="c97b755a1cdd7051270de6ac2605a8a1e87555fb"
Refs #20461 -- Fixed parallel test runner on Python 2.7.
textwrap.indent() is new in Python 3.3.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:25>
* status: new => closed
* resolution: => fixed
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:26>
Comment (by Aymeric Augustin <aymeric.augustin@…>):
In [changeset:"968b02f8f0f5170d6ec69bca9d92b2c281417782" 968b02f]:
{{{
#!CommitTicketReference repository=""
revision="968b02f8f0f5170d6ec69bca9d92b2c281417782"
Refs #20461 -- Made tblib optional for a passing test run.
This was the original intent.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:27>
Comment (by frankoid):
Is the naming strategy for extra test databases documented anywhere? I
tried to find this info in the docs but couldn't.
Based on get_test_db_clone_settings(self, number) in the code it looks
like the extra databases use the test database name as defined in the
DATABASES setting with _<number> appended.
It might also be worth documenting how to set up permissions when using
MySQL. When I use Django with PostgreSQL then I grant the PostgreSQL user
used by Django permission to create databases (with any name) so I won't
need to change anything for parallel testing to work. However when I use
MySQL then I usually grant the MySQL user used by Django permission on
specific databases, i.e. mydjangodb and mydjangodb_test, so I think I'll
need to grant more permissions for parallel testing to work. I'm not aware
of a way to grant permission on mydjangodb_test_* without granting
permission on all databases (including existing ones), but I haven't
researched this in depth.
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:28>
* cc: github.com@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/20461#comment:29>