[Django] #36750: dumpdata's output of m2m values is not deterministic

18 views
Skip to first unread message

Django

unread,
Nov 21, 2025, 12:59:02 PMNov 21
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Type: Bug
Status: new | Component: Core
| (Serialization)
Version: dev | Severity: Normal
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Recent failure of `main-random` CI job reveals that dumpdata's output of
m2m values is not deterministic:

{{{
./runtests.py fixtures --shuffle 7825542710 --settings=test_postgres -k
forward -v2
}}}

{{{
test_forward_reference_fk
(fixtures.tests.ForwardReferenceTests.test_forward_reference_fk) ... ok
test_forward_reference_m2m
(fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m) ... ok
test_forward_reference_fk_natural_key
(fixtures.tests.ForwardReferenceTests.test_forward_reference_fk_natural_key)
... ok
test_forward_reference_m2m_natural_key
(fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key)
... FAIL

======================================================================
FAIL: test_forward_reference_m2m_natural_key
(fixtures.tests.ForwardReferenceTests.test_forward_reference_m2m_natural_key)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/jwalls/django/tests/fixtures/tests.py", line 1308, in
test_forward_reference_m2m_natural_key
self._dumpdata_assert(
~~~~~~~~~~~~~~~~~~~~~^
["fixtures"],
^^^^^^^^^^^^^
...<8 lines>...
natural_foreign_keys=True,
^^^^^^^^^^^^^^^^^^^^^^^^^^
)
^
File "/Users/jwalls/django/tests/fixtures/tests.py", line 127, in
_dumpdata_assert
self.assertJSONEqual(command_output, output)
~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: Lists differ: [{'mo[94 chars] [['t3'], ['t2']]}},
{'model': 'fixtures.natur[179 chars][]}}] != [{'mo[94 chars] [['t2'],
['t3']]}}, {'model': 'fixtures.natur[179 chars][]}}]

First differing element 0:
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t3'],
['t2']]}}
{'mod[49 chars]: 't1', 'other_thing': None, 'other_things': [['t2'],
['t3']]}}

[{'fields': {'key': 't1',
'other_thing': None,
- 'other_things': [['t3'], ['t2']]},
? --------

+ 'other_things': [['t2'], ['t3']]},
? ++++++++

'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't2', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'},
{'fields': {'key': 't3', 'other_thing': None, 'other_things': []},
'model': 'fixtures.naturalkeything'}]

----------------------------------------------------------------------
Ran 4 tests in 0.071s

FAILED (failures=1)
Used shuffle seed: 7825542710 (given)
}}}

----
In the spirit of #24558 and #10381, I'm suggesting to make this
deterministic instead of fiddling with the test.

A naive diff just adding ordering by "pk" fixes the failure ...:

{{{#!py
diff --git a/django/core/serializers/python.py
b/django/core/serializers/python.py
index 2929874b01..496d3a5ba6 100644
--- a/django/core/serializers/python.py
+++ b/django/core/serializers/python.py
@@ -74,7 +74,7 @@ class Serializer(base.Serializer):
return value.natural_key()

def queryset_iterator(obj, field):
- attr = getattr(obj, field.name)
+ attr = getattr(obj, field.name).order_by("pk") # !!!
not right
chunk_size = (
2000 if getattr(attr, "prefetch_cache_name",
None) else None
)
}}}

... but to be mergeable, the patch would need to:
- get the default ordering from somewhere better (through model? target
model?), instead of hardcoding "pk"
- check for all locations in the python and xml serializers that might
need this change
--
Ticket URL: <https://code.djangoproject.com/ticket/36750>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Nov 22, 2025, 4:56:56 AM (14 days ago) Nov 22
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
-------------------------------------+-------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core | Version: dev
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by VIZZARD-X):

I tried to reproduce this issue on the current main branch using the same
shuffle seed and parameters from the report:
python tests/runtests.py fixtures --shuffle 7825542710
--settings=tests.test_sqlite -k forward -v2

All four forward_reference tests pass, including:
test_forward_reference_m2m_natural_key

Given this, I’m unable to reproduce the nondeterministic M2M natural key
ordering on the latest code. It appears the issue may already have been
resolved by recent changes in the serializers or related refactors.

Could the ticket be reviewed and possibly closed if others also cannot
reproduce it?
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:1>

Django

unread,
Nov 22, 2025, 9:57:10 AM (13 days ago) Nov 22
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Simon Charette):

* stage: Unreviewed => Accepted

Comment:

@Jacob

> get the default ordering from somewhere better (through model? target
model?), instead of hardcoding "pk"

Since the objects will be filtered by `from_model_id = ?` I think that
ordering by `to_model_id` (AKA ''target model'') is a good candidate.

@VIZZARD-X

> Given this, I’m unable to reproduce the nondeterministic M2M natural key
ordering on the latest code. It appears the issue may already have been
resolved by recent changes in the serializers or related refactors.

It might be finicky to reproduce as it entirely depends on the order the
database decides to returns objects which is backend dependent.

> `python tests/runtests.py fixtures --shuffle 7825542710
--settings=tests.test_sqlite -k forward -v2`

The report explicitly state that the issue was reproduced against Postgres
(see `--settings=test_postgres`) and you tried reproducing agaisnt SQLite
so this is highly likely why you can't reproduce. In most cases SQLite
will objects in their order of creation while in the case of Postgres more
factors come into play.

> Given this, I’m unable to reproduce the nondeterministic M2M natural key
ordering on the latest code. It appears the issue may already have been
resolved by recent changes in the serializers or related refactors.

Which recent changes are you referring to? The problem was observed less
than a day ago (see ticket creation date) and no change to serialization
ordering landed on `main` since then.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:2>

Django

unread,
Nov 22, 2025, 11:23:12 AM (13 days ago) Nov 22
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Comment (by Ahmed Asar):

Hi, I’d like to work on this issue and submit a patch. I’m still learning
Django core, but I want to try reproducing the test failure and
understanding the m2m serialization behavior.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:3>

Django

unread,
Nov 22, 2025, 12:41:02 PM (13 days ago) Nov 22
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Comment (by VIZZARD-X):

Thanks, Simon — that makes sense.

I realized my earlier attempt used tests.test_sqlite, which explains why
the issue didn’t reproduce.
Running the same test suite against Postgres: python tests/runtests.py
fixtures --shuffle 7825542710 --settings=tests.test_postgres -k forward
-v2

I can now reproduce the nondeterministic ordering exactly as reported by
@Jacob.

I’ll work on a patch that:

- orders the M2M related manager by the target model’s primary key (as you
suggested),
- updates both Python + XML serializers, and
- adds a test for deterministic natural key ordering.

I’ll submit a PR shortly.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:4>

Django

unread,
Nov 22, 2025, 1:09:36 PM (13 days ago) Nov 22
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Comment (by Simon Charette):

> orders the M2M related manager by the target model’s primary key (as you
suggested),

That's not what I suggested though, it should be the local field field
(from the many-to-many model's perspective) pointing at the target /
remote model.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:5>

Django

unread,
Nov 24, 2025, 4:19:50 AM (12 days ago) Nov 24
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Comment (by VIZZARD-X):

Updated patch and opened a new PR after refreshing my fork:

https://github.com/django/django/pull/20308

This version applies database-level primary key ordering for natural-key
many-to-many relations, per the guidance from Jacob and Simon. It avoids
Python-side sorting and only applies DB ordering when:

- natural foreign keys are enabled,
- the related model defines natural_key(), and
- the model has no explicit Meta.ordering.

All serializer tests, including PostgreSQL with --shuffle, now pass
consistently. The change is limited to Python and XML serializers.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:6>

Django

unread,
Nov 24, 2025, 4:26:25 AM (12 days ago) Nov 24
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by VIZZARD-X):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:7>

Django

unread,
Dec 3, 2025, 3:06:27 PM (2 days ago) Dec 3
to django-...@googlegroups.com
#36750: dumpdata's output of m2m values is not deterministic
--------------------------------------+------------------------------------
Reporter: Jacob Walls | Owner: (none)
Type: Bug | Status: new
Component: Core (Serialization) | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
--------------------------------------+------------------------------------
Changes (by Jacob Walls):

* needs_better_patch: 0 => 1

Comment:

Will look after the merge conflicts are fixed.
--
Ticket URL: <https://code.djangoproject.com/ticket/36750#comment:8>
Reply all
Reply to author
Forward
0 new messages