Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[Django] #35967: TransactionTestCase.serialized_rollback reads from real database rather than test when using read replica for a model instance created in a migration with a ManyToManyField

20 views
Skip to first unread message

Django

unread,
Dec 3, 2024, 5:59:21 AM12/3/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Type: Bug
Status: new | Component: Database
| layer (models, ORM)
Version: dev | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
(Yes, this is a rather specific bug...)

When:

1. Using a database router to create a read-replica database, configured
as a `MIRROR` in tests, and
2. Using `TransactionTestCase.serialized_rollback`, and
3. Having a model instance created in a migration which has a `ManyToMany`
field

The serializer for `serialized_rollback` tries to read from the non-test
database. If that database doesn't exist yet (for example, in CI), this
throws an error:

{{{
django.db.utils.OperationalError: no such table: auth_user
}}}

If migrations are run (`manage.py migrate`), thus creating the tables for
the non-test database, tests pass correctly. Prooving it's reading from
the wrong connection.

I've created a [https://github.com/RealOrangeOne/django-db-serialize-repro
minimal reproduction] of this issue, and confirmed it happens on SQLite,
PostgreSQL and Django 4.2, 5.0, 5.1 and `main`
--
Ticket URL: <https://code.djangoproject.com/ticket/35967>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 3, 2024, 7:42:04 AM12/3/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ryan Cheley):

* cc: Ryan Cheley (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:1>

Django

unread,
Dec 5, 2024, 8:17:43 AM12/5/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: (none)
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* cc: Jacob Walls (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:2>

Django

unread,
Dec 7, 2024, 6:13:17 PM12/7/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: (none)
Type: Bug | Status: new
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* component: Database layer (models, ORM) => Core (Serialization)

Comment:

Thanks for the crystal clear reproduction. I haven't tested to be sure,
but it certainly looks a lot like #23979.

This hack passes your test case, but I'm not going to opine on whether
it's correct; I just put this together as a learning exercise and to
verify my hunch that this was more to do with the serializer and less to
do with the testing framework. If someone more knowledgable can opine on
correctness and backwards compatibility, I'm happy to push this forward.

{{{#!diff
diff --git a/django/core/serializers/base.py
b/django/core/serializers/base.py
index 1fbca9244b..f2f0d4d8d6 100644
--- a/django/core/serializers/base.py
+++ b/django/core/serializers/base.py
@@ -5,7 +5,7 @@ Module for abstract serializer/unserializer base classes.
from io import StringIO

from django.core.exceptions import ObjectDoesNotExist
-from django.db import models
+from django.db import models, DEFAULT_DB_ALIAS

DEFER_FIELD = object()

@@ -91,6 +91,7 @@ class Serializer:
use_natural_primary_keys=False,
progress_output=None,
object_count=0,
+ using=DEFAULT_DB_ALIAS,
**options,
):
"""
@@ -141,7 +142,7 @@ class Serializer:
self.selected_fields is None
or field.attname in self.selected_fields
):
- self.handle_m2m_field(obj, field)
+ self.handle_m2m_field(obj, field, using=using)
self.end_object(obj)
progress_bar.update(count)
self.first = self.first and False
@@ -192,7 +193,7 @@ class Serializer:
"subclasses of Serializer must provide a handle_fk_field()
method"
)

- def handle_m2m_field(self, obj, field):
+ def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
"""
Called to handle a ManyToManyField.
"""
diff --git a/django/core/serializers/python.py
b/django/core/serializers/python.py
index 57edebbb70..93898d3801 100644
--- a/django/core/serializers/python.py
+++ b/django/core/serializers/python.py
@@ -64,7 +64,7 @@ class Serializer(base.Serializer):
value = self._value_from_field(obj, field)
self._current[field.name] = value

- def handle_m2m_field(self, obj, field):
+ def handle_m2m_field(self, obj, field, *, using=DEFAULT_DB_ALIAS):
if field.remote_field.through._meta.auto_created:
if self.use_natural_foreign_keys and hasattr(
field.remote_field.model, "natural_key"
@@ -86,6 +86,7 @@ class Serializer(base.Serializer):
getattr(obj, field.name)
.select_related(None)
.only("pk")
+ .using(using)
.iterator()
)

diff --git a/django/db/backends/base/creation.py
b/django/db/backends/base/creation.py
index 6856fdb596..8adfa0d7ca 100644
--- a/django/db/backends/base/creation.py
+++ b/django/db/backends/base/creation.py
@@ -142,7 +142,9 @@ class BaseDatabaseCreation:

# Serialize to a string
out = StringIO()
- serializers.serialize("json", get_objects(), indent=None,
stream=out)
+ serializers.serialize(
+ "json", get_objects(), indent=None, stream=out,
using=self.connection.alias
+ )
return out.getvalue()

def deserialize_db_from_string(self, data):
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:3>

Django

unread,
Dec 8, 2024, 2:04:36 PM12/8/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* owner: (none) => Jacob Walls
* stage: Unreviewed => Accepted
* status: new => assigned

Comment:

I'm looking into something a little more backwards compatible.
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:4>

Django

unread,
Dec 9, 2024, 12:01:37 AM12/9/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> If someone more knowledgable can opine on correctness and backwards
compatibility, I'm happy to push this forward.

Hey Jacob I admittedly haven't looked into the issue in depth but I think
that forcing the usage of the current alias like you did here is a
certainly a key towards the solution.

I have a hunch that the actual problems lives in
`djang.test.utils.setup_databases` though and particularly how
`BaseDatabaseCreation.create_test_db` is implemented. `setup_databases`
currently follows this sequence of operations for each `DATABASES` entries

1. Create the test db replacement
2. Repoint `settings.DATABASES[alias][name]` to the test db replacement
3. Peform migrations
4. Serialize content

I think that instead what it should do is 1, 2, 3 for each `DATABASES`
entries (making sure that all the test databases are setup) and then do 4
for each of them. Something like

{{{#!diff
diff --git a/django/db/backends/base/creation.py
b/django/db/backends/base/creation.py
index 6856fdb596..7a0e2a0622 100644
--- a/django/db/backends/base/creation.py
+++ b/django/db/backends/base/creation.py
@@ -91,6 +91,7 @@ def create_test_db(
# who are testing on databases without transactions or who are
using
# a TransactionTestCase still get a clean database on every test
run.
if serialize:
+ # XXX: Emit a deprecation warnings when `serialize` is
provided.
self.connection._test_serialized_contents =
self.serialize_db_to_string()

call_command("createcachetable", database=self.connection.alias)
diff --git a/django/test/utils.py b/django/test/utils.py
index ddb85127dc..a2fe8b14cc 100644
--- a/django/test/utils.py
+++ b/django/test/utils.py
@@ -189,6 +189,7 @@ def setup_databases(
test_databases, mirrored_aliases =
get_unique_databases_and_mirrors(aliases)

old_names = []
+ serialize_connections = []

for db_name, aliases in test_databases.values():
first_alias = None
@@ -200,15 +201,13 @@ def setup_databases(
if first_alias is None:
first_alias = alias
with time_keeper.timed(" Creating '%s'" % alias):
- serialize_alias = (
- serialized_aliases is None or alias in
serialized_aliases
- )
connection.creation.create_test_db(
verbosity=verbosity,
autoclobber=not interactive,
keepdb=keepdb,
- serialize=serialize_alias,
)
+ if serialized_aliases is None or alias in
serialized_aliases:
+ serialize_connections.append(connection)
if parallel > 1:
for index in range(parallel):
with time_keeper.timed(" Cloning '%s'" % alias):
@@ -223,6 +222,13 @@ def setup_databases(
connections[first_alias].settings_dict
)

+ # Serialize content of test databases only once all of them are setup
+ # to account for database routing during serialization.
+ for serialize_connection in serialize_connections:
+ serialize_connection._test_serialized_contents = (
+ serialize_connection.serialize_db_to_string()
+ )
+
# Configure the test mirrors.
for alias, mirror_alias in mirrored_aliases.items():
connections[alias].creation.set_as_test_mirror(
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:5>

Django

unread,
Dec 9, 2024, 12:02:53 AM12/9/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:6>

Django

unread,
Dec 10, 2024, 10:20:45 PM12/10/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

Thanks for the idea, I had a look. Turns out minimally adjusting comment:5
to run (by adding `.creation` in one of a couple places) doesn't fix OP's
reproduction. It also doesn't pass the unit test I had written to go with
comment:3 below, but that is to be expected since the test has nothing to
do with `setup_databases()`. (It does pass with comment:3.)

{{{#!diff
diff --git a/tests/backends/base/test_creation.py
b/tests/backends/base/test_creation.py
index 7e760e8884..84eb3f4a5f 100644
--- a/tests/backends/base/test_creation.py
+++ b/tests/backends/base/test_creation.py
@@ -172,6 +172,7 @@ class TestDbCreationTests(SimpleTestCase):

class TestDeserializeDbFromString(TransactionTestCase):
available_apps = ["backends"]
+ databases = {"default", "other"}

def test_circular_reference(self):
# deserialize_db_from_string() handles circular references.
@@ -267,6 +268,24 @@ class
TestDeserializeDbFromString(TransactionTestCase):
self.assertIn('"model": "backends.schoolclass"', data)
self.assertIn(f'"schoolclasses": [{sclass.pk}]', data)

+ def test_serialize_db_to_string_with_m2m_field_and_router(self):
+ class OtherRouter:
+ def db_for_read(self, model, **hints):
+ return "other"
+
+ with override_settings(DATABASE_ROUTERS=[OtherRouter()]):
+ obj1 = Object.objects.create()
+ obj2 = Object.objects.create()
+ obj2.related_objects.set([obj1])
+ with
mock.patch("django.db.migrations.loader.MigrationLoader") as loader:
+ # serialize_db_to_string() serializes only migrated apps,
so mark
+ # the backends app as migrated.
+ loader_instance = loader.return_value
+ loader_instance.migrated_apps = {"backends"}
+ data = connection.creation.serialize_db_to_string()
+ self.assertIn(f'"related_objects": [{obj1.pk}]', data)
+ # Test serialize() directly, in all four cases (json/xml, natural
key/without)
+

class SkipTestClass:
def skip_function(self):
}}}

My plan was to repeat these tests with plain calls to `serialize()` in
tests/serializers and test all four cases (json vs. xml, natural key vs.
without). If this test is fair, wouldn't this be an issue with the
serializers? Maybe I'm missing a reason this test isn't realistic.
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:7>

Django

unread,
Dec 11, 2024, 12:23:59 AM12/11/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Jacob
| Walls
Type: Bug | Status: assigned
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

Sorry for sending on a wild goose chase Jacob, the patch above was meant
to be a draft that needs tweaking and not a final solution.

Once adjusted like the following it addresses the problem reported by Jake

{{{#!diff
diff --git a/django/test/utils.py b/django/test/utils.py
index ddb85127dc..7d66140efa 100644
--- a/django/test/utils.py
+++ b/django/test/utils.py
@@ -189,6 +189,7 @@ def setup_databases(
test_databases, mirrored_aliases =
get_unique_databases_and_mirrors(aliases)

old_names = []
+ serialize_connections = []

for db_name, aliases in test_databases.values():
first_alias = None
@@ -200,15 +201,14 @@ def setup_databases(
if first_alias is None:
first_alias = alias
with time_keeper.timed(" Creating '%s'" % alias):
- serialize_alias = (
- serialized_aliases is None or alias in
serialized_aliases
- )
connection.creation.create_test_db(
verbosity=verbosity,
autoclobber=not interactive,
keepdb=keepdb,
- serialize=serialize_alias,
+ serialize=False,
)
+ if serialized_aliases is None or alias in
serialized_aliases:
+ serialize_connections.append(connection)
if parallel > 1:
for index in range(parallel):
with time_keeper.timed(" Cloning '%s'" % alias):
@@ -229,6 +229,13 @@ def setup_databases(
connections[mirror_alias].settings_dict
)

+ # Serialize content of test databases only once all of them are setup
+ # to account for database mirroring and routing during serialization.
+ for serialize_connection in serialize_connections:
+ serialize_connection._test_serialized_contents = (
+ serialize_connection.creation.serialize_db_to_string()
+ )
+
if debug_sql:
for alias in connections:
connections[alias].force_debug_cursor = True
}}}

The above include three tweaks that were not included in comment:5

1. It explicitly pass `serialize=False` to `create_test_db` as we want to
defer this operation to a time when *all* connections are repointed to
test databases
2. Use `.creation.serialize_db_to_string` as you've noticed
3. Make sure to perform serialization only once mirrors have been
appropriately setup

I want to re-iterate that what appears to be the problem here, at least to
me, is less about the routing of queries during serialization and more
that we attempt to perform any form of reads against non-test databases
before `DATABASES` entries are **all** re-pointed to their test
equivalent.
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:8>

Django

unread,
Dec 11, 2024, 7:17:40 AM12/11/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-----------------------------------+---------------------------------------
Reporter: Jake Howard | Owner: Jacob Walls
Type: Bug | Status: assigned
Component: Testing framework | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+---------------------------------------
Changes (by Jacob Walls):

* component: Core (Serialization) => Testing framework

Comment:

Perfect, thanks for clarifying.
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:9>

Django

unread,
Dec 18, 2024, 12:02:26 AM12/18/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-----------------------------------+---------------------------------------
Reporter: Jake Howard | Owner: Jacob Walls
Type: Bug | Status: assigned
Component: Testing framework | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-----------------------------------+---------------------------------------
Changes (by Simon Charette):

* has_patch: 0 => 1

Comment:

Jacob & Jake, I submitted [https://github.com/django/django/pull/18949
this PR] for review and I'd like to have your thoughts on it. I didn't
manage to create an exact integration test as it would require creating a
''nested'' test database and ensure that no database queries can be
performed against the testing database but it seems the chosen approach in
there so far has been to rely heavily on mocking.
--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:10>

Django

unread,
Dec 18, 2024, 9:03:14 AM12/18/24
to django-...@googlegroups.com
#35967: TransactionTestCase.serialized_rollback reads from real database rather
than test when using read replica for a model instance created in a
migration with a ManyToManyField
-------------------------------------+-------------------------------------
Reporter: Jake Howard | Owner: Simon
| Charette
Type: Bug | Status: assigned
Component: Testing framework | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* owner: Jacob Walls => Simon Charette

--
Ticket URL: <https://code.djangoproject.com/ticket/35967#comment:11>
Reply all
Reply to author
Forward
0 new messages