Easier way to populate test databases for parallel tests (patch in github)

Marcos Diez

unread,

Apr 28, 2017, 8:45:55 PM4/28/17

to Django developers (Contributions to Django itself)

Ticket: https://code.djangoproject.com/ticket/28153

Pull Request: https://github.com/django/django/pull/8437

Although Django makes very easy for one to extend django.test.runner.DiscoverRunner , it's setup_databases() does too much.

Currently, it

creates all the test databases (for single thread unit tests)
duplicates all the test databases (in case of parallel unit tests)

In case I am running not running tests in parallel, I can just populate the DB after running unit tests without any issues.

But if I care about my time and want to run tests in parallel, I can either:

a) populate my data after setup_databases() is executed, once for each thread of the parallel tests, which is slow
b) get my hands dirty and reimplement setup_databases()

I propose (and I am sending the code to do so) a better solution. We just have to break setup_databases() in 3 functions:

DiscoverRunner.prepare_databases()
DiscoverRunner.populate_databases() # noop by default
DiscoverRunner.duplicate_databases_if_necessary()

The idea is quite simple: in order to be backward compatible, setup_databases() , will still exist but only call three functions above in that order.

The first function will create all the test databases necessary for non parallel tests to run.

populate_databases() , which should be a no op, can be overwritten by the user who extends django.test.runner.DiscoverRunner so his/her data can be populated

Afterwards, all the test DBs are copied as many times as necessary in case parallel tests are run via DiscoverRunner.duplicate_databases_if_necessary()

I believe this change on Django will have no downside, will be backward compatible and help people who needs to populate real data on the DB for their tests.

Thanks

Marcos Diez

Tim Graham

unread,

Apr 28, 2017, 8:50:17 PM4/28/17

to Django developers (Contributions to Django itself)

I would expect test data population to happen using migrations rather than in the test runner. Can you elaborate on your use case and say if that method would be unsuitable?

Shai Berger

unread,

Apr 29, 2017, 4:39:50 AM4/29/17

to django-d...@googlegroups.com

On Saturday 29 April 2017 03:50:16 Tim Graham wrote:
> I would expect test data population to happen using migrations rather than
> in the test runner. Can you elaborate on your use case and say if that
> method would be unsuitable?
>

Apparently, many people think that migrations are the wrong tool for this job.

See previous discussion, which didn't seem to go anywhere:

https://groups.google.com/d/msg/django-developers/Ln1-IqysEwE/DuyZl7QkEwAJ

Have fun,
Shai.

Adam Johnson

unread,

Apr 29, 2017, 5:07:35 AM4/29/17

to django-d...@googlegroups.com

Avoiding migrations, one can populate test data with a post_migrate signal handler. django.contrib.contenttypes already does this to fill the DB with content types, see https://github.com/django/django/blob/c651331b34b7c3841c126959e6e52879bc6f0834/django/contrib/contenttypes/apps.py#L18 . To do it during tests only you could have a condition to register said handler.

--

Adam

Marcos Diez

unread,

Apr 29, 2017, 8:36:11 AM4/29/17

to Django developers (Contributions to Django itself)

I believe I was not clear.

I do use migrations to populate Enums and other data that should also be available in production.

The code I am sending is to load fixtures on the database.

This way all tests can assume the same set of data and we all the fixtures are loaded in one place, which in my case of use it makes sense.

The advantage of the method I am proposing is that it is quite fast. Data is loaded only once in the DB and that it is duplicated in bulk mode by the DBMS, as many times as necessary when tests run in parallel.

Another unexpected convenience of my method is that a developer who uses Django to populate fixtures in the database, does not have to worry if his/her code to generate data has side effects or not if he is running tests in parallel, because his data generation code will run only once.

Actually, if I may ask, how else would one load bunches of fixtures in the DB and run tests in parallel without my PR ?

Adam Johnson

unread,

Apr 29, 2017, 6:40:18 PM4/29/17

to django-d...@googlegroups.com

Actually, if I may ask, how else would one load bunches of fixtures in the DB and run tests in parallel without my PR ?

As I said, register a post_migrate handler during testing that loads your data. It will run during the creation of the first database in connection.creation.create_test_db, as part of call_command('migrate'), before the test runner code clones the database for parallel execution. There's no need to change Django to support this.

Another option is that you extend your database backend and override creation.create_test_db and add logic there.

By the way, I think it's the general opinion that tests are best without a large generic fixture available to them. It's certainly been my experience, as it makes it very hard to later understand what data a specific test does or does not rely upon, and if the data can be updated safely. The tool I prefer for test data generation is factory boy ( https://factoryboy.readthedocs.io/en/latest/ ) which can be used to create data per test method or class, without having to laboriously specify every field of every model.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/32df6bec-a2ce-494e-b007-5f4433ad682f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.