TestCase.setUpTestData in-memory data isolation.

456 views
Skip to first unread message

charettes

unread,
Nov 23, 2018, 10:29:33 PM11/23/18
to Django developers (Contributions to Django itself)
Dear developers,

Django 1.8 introduced the `TestCase.setUpTestData()` class method as a mean to
speed up test fixtures initialization as compared to using `setUp()`[0].

As I've come to use this feature and review changes from peers using it in
different projects the fact that test data assigned during its execution
couldn't be safely altered by test methods without compromising test isolation
has often be the source of confusion and frustration.

While the `setUpTestData` documentation mentions this limitation[1] and ways to
work around it by using `refresh_from_db()` in `setUp()` I believe it defeats
the whole purpose of the feature; avoiding unnecessary roundtrips to the
database to speed up execution. Given `TestCase` goes through great lengths to
ensure database level data isolation I believe it should do the same with class
level in-memory data assigned during `setUpTestData`.

In order to get rid of this caveat of the feature I'd like to propose an
adjustment to ensure such in-memory test data isolation.

What I suggest doing is wrapping all attributes assigned during `setUpTestData`
in descriptors that lazily return `copy.deepcopy()`ed values on instance
attribute accesses. By attaching the `deepcopy()`'s memo on test instances we
can ensure that the reference graph between objects is preserved and thus
backward compatible.

In other words, the following test would pass even if `self.book` is a deep
copy of `cls.book`.

class BookTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        cls.author = Author.objects.create()
        cls.book = cls.author.books.create()

    def test_relationship_preserved(self):
        self.assertIs(self.book.author, self.author)

Lazily returning `deepcopy'ies and caching returned values in `__dict__` à la
`cached_property` should also make sure the slight performance overhead this
incurs is minimized.

From a check against a few projects and Django's test suite[2] I have only
identified a single issue which is that attributes assigned during
`setUpTestData` would now have to be `deepcopy()`able but it shouldn't be
a blocker given `Model` instance are.

In order to allow other possible issues from being identified against existing
projects I packaged the proposed feature[3] and made it available on pypi[4]. It
requires decorating `setUpTestData` methods but it shouldn't be too hard to
apply to your projects if you want to give it a try.

Given this reaches consensus that this could be a great addition I'd file
a ticket and finalize what I have so far[2].

Thank your for your time,
Simon

[0] https://docs.djangoproject.com/en/1.8/releases/1.8/#testcase-data-setup
[1] https://docs.djangoproject.com/en/2.1/topics/testing/tools/#django.test.TestCase.setUpTestData
[2] https://github.com/charettes/django/compare/setuptestdata...charettes:testdata
[3] https://github.com/charettes/django-testdata
[4] https://pypi.org/project/django-testdata/

Adam Johnson

unread,
Nov 25, 2018, 3:53:48 AM11/25/18
to django-d...@googlegroups.com
I have run into this problem myself in the past. On a previous project we added a helper function to make deepcopy's of named attributes during setUp().

From a check against a few projects and Django's test suite[2] I have only
identified a single issue which is that attributes assigned during
`setUpTestData` would now have to be `deepcopy()`able but it shouldn't be
a blocker given `Model` instance are.

I think this is too much of an ask for backwards compatibility. Lots of things aren't deepcopy-able, as per its docs:

This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types. It does “copy” functions and classes (shallow and deeply), by returning the original object unchanged; this is compatible with the way these are treated by the pickle module.

How about adding a container object to TestCase, which deepcopy()'s its attributes on setUp. It could be called something short like "data', which would make your example:

class BookTests(TestCase):
    @classmethod
    def setUpTestData(cls):
        cls.data.author = Author.objects.create()
        cls.data.book = cls.author.books.create()

    def test_relationship_preserved(self):
        self.assertIs(self.data.book.author, self.data.author)

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7563b4b6-2708-4614-a74a-2b63ecad2f67%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Adam

charettes

unread,
Nov 27, 2018, 3:59:34 AM11/27/18
to Django developers (Contributions to Django itself)
Thanks for chiming in Adam.

I think this is too much of an ask for backwards compatibility. Lots of things aren't deepcopy-able, as per its docs:

Right. Are these the kind of objects you assign during setUpTestData though? They could be easily moved to setUpTestCase for example.

How about adding a container object to TestCase, which deepcopy()'s its attributes on setUp. It could be called something short like "data', which would make your example:

That seems like a viable alternative.

My only concerns are that it requires adjusting all existing usages of setUpTestData to assign to cls.data instead. It looks like it could be easy to forget doing so until you actually hit a data isolation issue. That makes setUpTestData unsafe to use unless you assign to data. That also creates two data "world" where objects assigned to cls and cls.data lose their references so altering self.data.book.author wouldn't affect self.author if the latter is not assigned to data.

What about going through a deprecation period where non-picklable assignment during setUpTestData raises a deprecation warning suggesting using setUpClass or direct class attributes assignment? Another alternative could be to silently ignore deepcopy failures and return the original objects in these cases.

Simon

Adam Johnson

unread,
Nov 28, 2018, 4:50:51 PM11/28/18
to django-d...@googlegroups.com
Are these the kind of objects you assign during setUpTestData though?

They're not the kind of things *I* assign but I bet somewhere out there, some project does extensively :)

What about going through a deprecation period where non-picklable assignment during setUpTestData raises a deprecation warning suggesting using setUpClass or direct class attributes assignment?

I like this idea, it's actually likely to be easy to implement too (on top of the descriptor).


For more options, visit https://groups.google.com/d/optout.


--
Adam

Josh Smeaton

unread,
Nov 28, 2018, 9:40:53 PM11/28/18
to Django developers (Contributions to Django itself)
Our project also suffers extensively with mutating objects assigned from setUp, preventing us from moving most of our tests to setUpTestData. I'll likely begin using your pypi package right away, thanks Simon!

Backward compat issues are probably likely - but they'd be in test cases exclusively, making them extremely easy to find during an upgrade. That said, a deprecation warning is probably the most sensible path forward to prevent the need for immediate action.

Is there anyway to determine the pickle-ability of something without just trying to pickle it? I wouldn't be keen on that overhead. Could you just capture any copy exceptions, raise a deprecation warning, and abandon the copy for that attribute?

charettes

unread,
Nov 28, 2018, 11:42:32 PM11/28/18
to Django developers (Contributions to Django itself)
Hey Josh, glad the package can help in the mean time!

> Is there anyway to determine the pickle-ability of something without just trying to pickle it? I wouldn't be keen on that overhead.

Not that I'm aware off but unfortunately. There really shouldn't be much overhead though because the deepcopy is only performed on instance attribute access. That means that tests that are only creating test data without accessing attributes assigned during setUpTestData() are not going to be affected by this change at all. In other words, I suggest only doing the deep copy if required for the requested attributes so if you define a complex data set in setUpTestData() then only the few attributes and their related objects would get deepcopied on instance attribute accesses.

> Could you just capture any copy exceptions, raise a deprecation warning, and abandon the copy for that attribute?

Yeah that's the exact plan for the deprecation phase; warn and return the actual attribute. From Django 3.1 this would result in an AttributeError.

I've updated the branch with the deprecation warnings[0] approach.

I'll give it one or two more weeks to gather a bit more feedback but it looks like this could be a viable option so far.

Cheers,
Simon

Tobias McNulty

unread,
Nov 30, 2018, 11:15:51 AM11/30/18
to django-d...@googlegroups.com
Hi Simon,

Intriguing proposal! Thanks for bringing it up. This is certainly something I've struggled with on older projects we've continued to maintain.

My first impression (and it may be only that) is that this seems a big magical and potentially confusing, especially in the event the copy operation fails. I would want some way to disable the behavior, or even have "disabled" as the default, at least for a couple releases (e.g., by keeping it as a decorator the way you have it now). Alternatively, is there a reason this couldn't continue to live as an external package?

I'm also not sure I understand the argument about how the current implementation doesn't help reduce round trips to the database. Is this just a case of the docs offering a less-than-optimal recommendation? When tests do need to modify a record in the database that was created in setUpTestData(), it's easy enough to replace self.obj.foo = bar; self.obj.save() with Model.objects.filter(pk=self.obj.pk).update(foo=bar), which will avoid modifying the Python object and be at least as efficient as the original code, without the need for any additional overhead. I'll admit tracking down the offending code can be a pain, but it can be done.

Could we, for example, include something along these lines as the preferred approach in the docs, and/or even find a way to raise an error or warning when a test does modify an object that it shouldn't?

Tobias

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

hunar techie

unread,
Nov 30, 2018, 11:25:36 AM11/30/18
to django-d...@googlegroups.com
hi there , im from kurdistan and my mother language is kurdish but unfortunately django doesnt support kurdish language , sometimes i need to add multi language to my projects , how can i add multi language? 
thanks for advice 

Adam Johnson

unread,
Nov 30, 2018, 1:03:38 PM11/30/18
to django-d...@googlegroups.com
Hunar - it's best to start a separate thread for different topics, and even better to check a support channel first, like the IRC channels. I think you might be looking for how to contribute translations - start here: https://docs.djangoproject.com/en/dev/internals/contributing/ . Or if you just need support as a user, did you want https://docs.djangoproject.com/en/2.1/topics/i18n/ ?

Tobias - using database updates isn't always viable. Also the system under test might need to take in the model instance, mutate it and execute its own save(), and then there's no way of rolling that back if the object wasn't deepcopy'd.


For more options, visit https://groups.google.com/d/optout.


--
Adam

Tobias McNulty

unread,
Dec 2, 2018, 7:43:14 PM12/2/18
to django-d...@googlegroups.com
On Fri, Nov 30, 2018, 1:03 PM Adam Johnson <m...@adamj.eu wrote:
Tobias - using database updates isn't always viable. Also the system under test might need to take in the model instance, mutate it and execute its own save(), and then there's no way of rolling that back if the object wasn't deepcopy'd

Fair enough, but for those cases, one could just as easily deepcopy the object in the test itself. Again, it's more explicit, gives the developer control over exactly what they want to happen, and makes failures more obvious.

I suppose if the API allowed one to mix and match approaches for different tests and model objects (like Simon's testdata app does right now), that would work, too.

Tobias

charettes

unread,
Dec 3, 2018, 7:12:50 AM12/3/18
to Django developers (Contributions to Django itself)
Hey Tobias, thanks for chiming in.

The main motivation behind this package and this core inclusion proposal is that I've come across a non-negligible number users that were surprised to find out this was not working automatically and others that completely stopped using setUpTestData because of this limitation (we're even doing this in the Django test suite under certain circumstances). Now these cases could effectively be solved by an explicit decoration of the setUpTestData method but it does feel like an hack, at least to me, that users shouldn't have to worry about given how TestCase was branded as a bulletproof test data isolation tool.

With that said I wouldn't really mind forcing the developer to opt-in through the use of a decorator like in the external package does if that's what the consensus is here. I just wanted to mention that I was having a hard time siding with the _too much magic_ argument given all the isolation _magic_ TestCase already performs.

Cheers,
Simon

charettes

unread,
Mar 22, 2020, 6:38:26 PM3/22/20
to Django developers (Contributions to Django itself)
For anyone interested I finally submitted a ticket[0] to add this feature to core since the third-party app[1]
was proven to be quite functional in all the projects I've used it so far and no issues were opened
against the package in the two years it has been around.

Looking forwards to see this feature included in the next versions of Django.

Cheers,
Simon

Adam Johnson

unread,
Mar 22, 2020, 6:56:01 PM3/22/20
to django-d...@googlegroups.com
+1 from me - I was thinking of this the other day.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.


--
Adam
Reply all
Reply to author
Forward
0 new messages