[Django] #31051: Serialization dependency sorting disallows circular references unneccesarily

22 views
Skip to first unread message

Django

unread,
Dec 1, 2019, 6:15:59 PM12/1/19
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
------------------------------------------------+------------------------
Reporter: Matthijs Kooijman | Owner: nobody
Type: Bug | Status: new
Component: Core (Serialization) | Version: master
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
------------------------------------------------+------------------------
The `core.serialization.sort_dependencies()` function takes a list of apps
and/or models, and resolves this into a sorted flat list of models, ready
to be serialized in that order. This function is intended to make natural
foreign keys work, by serializing models referenced by a natural key
before the referencing models. When deserializing, this guarantees that
natural key references can be resolved, because there are no "forward
references". Furthermore, when a circular reference using natural keys is
present, this function raises an exception (e.g. "Can't resolve
dependencies for some_app.SomeModel in serialized app list") and prevents
serialization from completing, since there is no way to guarantee a model
ordering that will have no forward references.

Note that this ordering is *only* needed when natural keys are involved,
since data is intended to be loaded in a transaction without constraint
checks, so numerical foreign keys can be added in the wrong order, as long
as all referenced data is present at the end of the transaction. This does
not work with natural keys, since those are resolved by Python code that
needs the referenced objects present in the database to resolve them.

However, this sorting is not actually strictly necessary in all cases
where it is applied. When circular references are involved, this then
actually prevents serialization for no good reason. In particular, this is
the case:
- When running `dumpdata` without natural keys enabled (which is the
default). Even though natural keys might be defined in the models (which
causes the sorting and exception), no natural keys will be present in the
dumped data, so no ordering is needed.
- When dumping data intended for loading with `loaddata` (which I think
is the primary usecase for `dumpdata`?). `loaddata` will (since 17 months
ago in v2.2, see #26291) automatically handle forward references by
deferring setting fields that reference natural keys that are not added
yet. In this case, sorting is still useful, to prevent forward references
where possible, but when there are circular references, it is acceptable
to ignore some dependencies rather than preventing serialization from
happening alltogether.
- When serializing data for tests for `serialized_rollback=True` (in
`django.db.backends.base.creation.create_test_db`). This is a
serialization that does not use natural keys, so no ordering is needed at
all. Note that this serialization happens always (unlike deserialization
only happens with `serialized_rollback=True`), so AFAIU this effectively
prevents *any* tests from working on a database with circular references
with natural keys defined.

The fix for these issues seems to be rather simple:
- For dumpdata without `use_natural_foreign_keys`, skip the ordering and
just serialize all models in arbitrary order. AFAICS
`use_natural_primary_keys` is not relevant here, since that only controls
omitting the numerical primary key.
- For dumpdata *with* `use_natural_foreign_keys`, do the ordering but do
not bail out when there are circular references (instead just ignore some
dependencies and produce a best-effort ordering).
- For test database serialization, also skip the ordering and serialize
in arbitrary order.

Note that this would remove two of the three calls to
`sort_dependencies()` and allow loops in the last remaining instance. This
means that `sort_dependencies` could be modified to allow loops
unconditionally, or we could add an argument and default to disallowing
loops in case any code outside of django is using this function?

Note that #26552 is a related, but different issue, concerning the
*deserialization* of data in testcases.

I've been working on fixing this and that related issue today and have a
basic version working, with testcases (which proved to be quite a
challenge, since testing the test subsystem is a bit tricky...). I'll do
some additional testing and cleanup and submit a PR soon.


Also note that the circular-reference exception was already disabled for
self-referencing models in #16317. The fix for that issue simply ignores
self-referencing models for sorting, without taking any additional
measures to sort instances to prevent problems in deserialization (this
code was added when the deferred deserialization did not exist yet), so I
wonder how much value this exception still has.

--
Ticket URL: <https://code.djangoproject.com/ticket/31051>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Dec 1, 2019, 6:22:40 PM12/1/19
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Matthijs Kooijman):

* status: new => assigned
* cc: Matthijs Kooijman (added)
* owner: nobody => Matthijs Kooijman


--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:1>

Django

unread,
Dec 2, 2019, 9:12:59 AM12/2/19
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

Hi Matthijs. Yes, OK, let's accept this on the basis of your description —
Happy to see a PR tackling it!

> ...with testcases (which proved to be quite a challenge, since testing


the test subsystem is a bit tricky...).

Yes. That's the rub. :)

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:2>

Django

unread,
Dec 2, 2019, 10:15:59 AM12/2/19
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Matthijs Kooijman):

* has_patch: 0 => 1


Comment:

A PR is available at https://github.com/django/django/pull/12166. It is
still marked as draft, since there are still some issues with running the
tests, but review of the code, additions to the discussion in the PR and
maybe ideas on how to fix the remaining testsuite issues are welcome
already.

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:3>

Django

unread,
Jan 13, 2020, 2:10:43 AM1/13/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* needs_better_patch: 0 => 1


Comment:

Marking as `Patch needs improvement`, because it's a draft PR.

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:4>

Django

unread,
Jan 19, 2020, 12:36:12 PM1/19/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Matthijs Kooijman):

* needs_better_patch: 1 => 0


Comment:

The issues with the testcases have been resolved, so the patch is ready
for review.

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:5>

Django

unread,
Feb 17, 2020, 2:43:32 AM2/17/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by felixxm):

[https://github.com/django/django/pull/12458 New PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:6>

Django

unread,
Apr 2, 2020, 6:03:11 AM4/2/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"289d0ec6fd4c7e3d68b8a8e4f833e719e3dbe205" 289d0ec6]:
{{{
#!CommitTicketReference repository=""
revision="289d0ec6fd4c7e3d68b8a8e4f833e719e3dbe205"
Refs #31051 -- Fixed reloading the database with circular related objects
and natural keys for tests.

Made deserialize_db_from_string() do not sort dependencies.

deserialize_db_from_string() doesn't use natural keys, so there is no
need to sort dependencies in serialize_db_to_string(). Moreover,
sorting models cause issues for circular dependencies.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:7>

Django

unread,
Apr 2, 2020, 6:03:12 AM4/2/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"75520e1767abb167633b1936d7171169d6300911" 75520e17]:
{{{
#!CommitTicketReference repository=""
revision="75520e1767abb167633b1936d7171169d6300911"
Refs #31051 -- Optimized serialize_db_to_string() by avoiding creation of
models list.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:8>

Django

unread,
Apr 7, 2020, 10:03:30 AM4/7/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"481d8fc324545cc35b11311707ccebca6102838a" 481d8fc3]:
{{{
#!CommitTicketReference repository=""
revision="481d8fc324545cc35b11311707ccebca6102838a"
Refs #31051 -- Added test for loaddata/dumpdata with circular references
without natural keys.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:9>

Django

unread,
Apr 7, 2020, 2:45:09 PM4/7/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily.

-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: assigned
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:10>

Django

unread,
Apr 7, 2020, 3:21:59 PM4/7/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily.
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: closed

Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak <felisiak.mariusz@…>):

In [changeset:"2e67e80fbe0accd5f256415ac28af8bd82eeaced" 2e67e80]:
{{{
#!CommitTicketReference repository=""
revision="2e67e80fbe0accd5f256415ac28af8bd82eeaced"
Refs #31051 -- Made dumpdata do not sort dependencies if natural foreign
keys are not used.

There is no need to sort dependencies when natural foreign keys are not
used.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:12>

Django

unread,
Apr 7, 2020, 3:22:00 PM4/7/20
to django-...@googlegroups.com
#31051: Serialization dependency sorting disallows circular references
unneccesarily.
-------------------------------------+-------------------------------------
Reporter: Matthijs Kooijman | Owner: Matthijs
| Kooijman
Type: Bug | Status: closed
Component: Core | Version: master
(Serialization) |
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"4f216e4f8ea9bac61948cb63066cd4747f6d05fb" 4f216e4f]:
{{{
#!CommitTicketReference repository=""
revision="4f216e4f8ea9bac61948cb63066cd4747f6d05fb"
Fixed #31051 -- Allowed dumpdata to handle circular references in natural
keys.

Since #26291 forward references in natural keys are properly handled by
loaddata, so sorting depenencies in dumpdata doesn't need to break on
cycles. This patch allows circular references in natural keys by
breaking sort_depenencies() on loops.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/31051#comment:11>

Reply all
Reply to author
Forward
0 new messages