[Django] #24558: django-admin.py dumpdata should be deterministic for VCS and diff friendliness

11 views
Skip to first unread message

Django

unread,
Mar 31, 2015, 3:05:30 PM3/31/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
--------------------------------------------+--------------------
Reporter: gfairchild | Owner: nobody
Type: New feature | Status: new
Component: Core (Management commands) | Version: 1.7
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------------------+--------------------
I have several projects in which I like to store fixtures that are shared
between project instances (e.g., development and production instances).
Since fixtures are just text files, they're easily stored in version
control; I use git for all my projects.

The problem I'm experiencing is that the `dumpdata` management command
outputs data differently every time it runs (I'm on Django 1.7.7, so it
may be different on other versions of Django). I understand why it does
this - a lot of the data are stored in dicts or sets, and dicts and sets
don't provide any promises on ordering. The reason this is a problem is
because, for large fixtures, this can cause significant changes to be
perceived by the VCS. Git is pretty smart, but if a 10mb fixture is
completely reordered, it can't actually show me what's changed when I do a
diff. Additionally, if I re-dump all the data in my database, even if the
data haven't changed, git will detect that the files are different because
all the data are in a different order.

The feature I'd like is for the `dumpdata` command to be deterministic;
that is, every time it runs, it should produce the same output for the
same input. This could even be an option that's turned off by default.
This will reduce VCS thrashing and improve the ability for us to diff
fixtures in order to understand what's actually changed.

I imagine this could fairly easily be solved by tossing a few `sorted`
statements in the right places, but I'm not sure.

--
Ticket URL: <https://code.djangoproject.com/ticket/24558>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 1, 2015, 3:07:56 AM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------

Reporter: gfairchild | Owner: nobody
Type: New feature | Status: new
Component: Core (Management | Version: 1.7
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Agreed that deterministic ordering is a desirable goal. Now can you tell
us what sort of changed ordering are you seeing in your data? It should
not be with objects themselves, as they are sorted by their primary key.
Is it an issue with `serializers.sort_dependencies` which gives the model
ordering?

--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:1>

Django

unread,
Apr 1, 2015, 1:08:02 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------

Reporter: gfairchild | Owner: nobody
Type: New feature | Status: new
Component: Core (Management | Version: 1.7
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by gfairchild):

* Attachment "terms1.json" added.

terms dump 1

Django

unread,
Apr 1, 2015, 1:08:11 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------

Reporter: gfairchild | Owner: nobody
Type: New feature | Status: new
Component: Core (Management | Version: 1.7
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by gfairchild):

* Attachment "terms2.json" added.

terms dump 2

Django

unread,
Apr 1, 2015, 1:09:06 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------

Reporter: gfairchild | Owner: nobody
Type: New feature | Status: new
Component: Core (Management | Version: 1.7
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by gfairchild):

Sure, so here's an example model I have in my `swap` app:

{{{
class Term(models.Model):
term = models.CharField(max_length=255)
definition = models.TextField()
reference = models.TextField()
}}}

I've attached the output from two consecutive dumps (`./manage.py dumpdata
swap.Term`).

You can see that the content of each file is identical. And as you say,
the objects are indeed sorted by their primary key. The ordering issue is
with the key-value pairs printed for each object. Each object is just a
dictionary of keys and values, so there's no ordering maintained.

--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:2>

Django

unread,
Apr 1, 2015, 3:41:09 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------
Reporter: gfairchild | Owner: charettes
Type: New feature | Status: assigned
Component: Core (Management | Version: master
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* owner: nobody => charettes
* status: new => assigned
* version: 1.7 => master
* stage: Unreviewed => Accepted


--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:3>

Django

unread,
Apr 1, 2015, 3:46:00 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------
Reporter: gfairchild | Owner: charettes
Type: New feature | Status: assigned
Component: Core (Management | Version: master
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* has_patch: 0 => 1
* needs_tests: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:4>

Django

unread,
Apr 1, 2015, 4:20:28 PM4/1/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------
Reporter: gfairchild | Owner: charettes
Type: New feature | Status: assigned
Component: Core (Management | Version: master
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by charettes):

* needs_tests: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:5>

Django

unread,
Apr 2, 2015, 2:58:01 PM4/2/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------
Reporter: gfairchild | Owner: charettes
Type: New feature | Status: assigned
Component: Core (Management | Version: master
commands) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin

Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by claudep):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:6>

Django

unread,
Apr 2, 2015, 3:23:19 PM4/2/15
to django-...@googlegroups.com
#24558: django-admin.py dumpdata should be deterministic for VCS and diff
friendliness
-------------------------------------+-------------------------------------
Reporter: gfairchild | Owner: charettes
Type: New feature | Status: closed

Component: Core (Management | Version: master
commands) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette <charette.s@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"5bc3123479bd97dc9d8a36fa9a3421a71063d1da" 5bc31234]:
{{{
#!CommitTicketReference repository=""
revision="5bc3123479bd97dc9d8a36fa9a3421a71063d1da"
Fixed #24558 -- Made dumpdata mapping ordering deterministic.

Thanks to gfairchild for the report and Claude for the review.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:7>

Reply all
Reply to author
Forward
0 new messages