- putting things in setUpClass rather than setUp
- treating the database as hot lava (at least, when you don't have ManyToMany relations)
- using sqlite3 in-memory
- using mock to emulate instances
- using nose to only run the tests you need
But I want to talk about another component of test performance: Disk I/O syscalls
My company (the startup accelerator MassChallenge) uses buildout to manage dependencies and the several apps modules within our appication. So I run tests from the root of the buildout by calling `bin/django test`, a script that looks like the following: (note the datetime printing)
```
#!/home/afarrell/projects/masschallenge/venv/bin/python
from datetime import datetime
print datetime.now().isoformat()
print "before test"
import sys
sys.path[0:0] = [
'/home/afarrell/projects/masschallenge/mc2013/eggs/MySQL_python-1.2.3-py2.7-linux-i686.egg',
... 38 other paths to various eggs...
'/home/afarrell/projects/masschallenge/venv/lib/python2.7/site-packages',
'/home/afarrell/projects/masschallenge/mc2013', #the root of our app
]
import djangorecipe.manage
if __name__ == '__main__':
sys.exit(djangorecipe.manage.main('mcproject.develop_settings'))
```
I am in the habit of writing a single test as I am writing a method. To do this, I use nose's attrib tag in a test file that looks like this:
```
from nose.plugins.attrib import attr
import nose.tools as nt
from django.test import TestCase
from datetime import datetime
class TestExample(Testcase):
@attr('now')
def test_trivial_case(self):
print datetime.now().isoformat()
print "test has begun"
nt.assert_equal(1 + 1, 2)
```
I then run tests with `bin/django test mentor_match -a'now' -s` and this prints out the following:
```
2014-03-05T13:46:02.774142
before test
nosetests --verbosity 1 mentor_match -anow -s
To reuse old database "test_02_06_18_29" for speed, set env var REUSE_DB=1.
Creating test database for alias 'default'...
2014-03-05T13:46:28.612895
test has begun
```
As you can see, there are about 26 seconds from the test command is issued and when the first test actually runs. This delay seems small but it is noticeable and kinda annoying when that is the only test. You could load, present, and fire a musket in that time. There are two possible causes that I can think of:
1) The test database takes a long time to set up
2) The test command takes a long time to import all required modules
However the if I `export REUSE_DB=1` and then run `bin/django test mentor_match -a'now' -s`, I see
2014-03-05T15:22:51.743338
before test
nosetests --verbosity 1 mc_allocate -s -anow
2014-03-05T15:23:15.813821
test has begun
which is still a difference of 24 seconds. Working on a linode, I often get notifications about the high disk i/o rate, so it makes sense that the cause is the latter. So, I turn to the sysadmin's microsope: strace.
With REUSE_DB not set, I run `strace -f bin/django test mentor_match -s -a'now' 2> tracelog`. I can now see all of the system calls made by the command, 67626. `cat tracelog | grep "open(" | wc -l` shows that 37635 of those are open commands.
total: 67626
open: 37635 56.7%
stat64: 15401 22.8%
read: 5362 7.9%
fstat64: 2619 3.9%
write: 2316 3.4%
poll: 2268 3.4%
close: 1754 2.6%
mmap2: 949 1.4%
munmap: 847 1.3%
Since there are far more open calls than there are read calls, and
there are 45847 lines containing "ENOENT (No such file or directory)" (86.4% of the open or stat64 calls), I think that django spends a lot of time looking for files in the wrong location.
I plan to write a blog post about this, but before I dig in, does anyone have any comments on the following approaches I'm considering:
- reordering the paths assigned to sys.path
- removing items from sys.path when I am running tests
- caching the locations where modules were previously found and looking there first
Has anyone looked at doing anything like this?
--
Andrew Farrell
Software Engineer | MassChallenge, Inc.
ONE Marina Park Drive | Boston, MA 02210
entorship, resources, coworking space. No equity taken.
Apply or refer great startups today!