The problem I'm experiencing is that the `dumpdata` management command
outputs data differently every time it runs (I'm on Django 1.7.7, so it
may be different on other versions of Django). I understand why it does
this - a lot of the data are stored in dicts or sets, and dicts and sets
don't provide any promises on ordering. The reason this is a problem is
because, for large fixtures, this can cause significant changes to be
perceived by the VCS. Git is pretty smart, but if a 10mb fixture is
completely reordered, it can't actually show me what's changed when I do a
diff. Additionally, if I re-dump all the data in my database, even if the
data haven't changed, git will detect that the files are different because
all the data are in a different order.
The feature I'd like is for the `dumpdata` command to be deterministic;
that is, every time it runs, it should produce the same output for the
same input. This could even be an option that's turned off by default.
This will reduce VCS thrashing and improve the ability for us to diff
fixtures in order to understand what's actually changed.
I imagine this could fairly easily be solved by tossing a few `sorted`
statements in the right places, but I'm not sure.
--
Ticket URL: <https://code.djangoproject.com/ticket/24558>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Agreed that deterministic ordering is a desirable goal. Now can you tell
us what sort of changed ordering are you seeing in your data? It should
not be with objects themselves, as they are sorted by their primary key.
Is it an issue with `serializers.sort_dependencies` which gives the model
ordering?
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:1>
* Attachment "terms1.json" added.
terms dump 1
* Attachment "terms2.json" added.
terms dump 2
Comment (by gfairchild):
Sure, so here's an example model I have in my `swap` app:
{{{
class Term(models.Model):
term = models.CharField(max_length=255)
definition = models.TextField()
reference = models.TextField()
}}}
I've attached the output from two consecutive dumps (`./manage.py dumpdata
swap.Term`).
You can see that the content of each file is identical. And as you say,
the objects are indeed sorted by their primary key. The ordering issue is
with the key-value pairs printed for each object. Each object is just a
dictionary of keys and values, so there's no ordering maintained.
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:2>
* owner: nobody => charettes
* status: new => assigned
* version: 1.7 => master
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:3>
* has_patch: 0 => 1
* needs_tests: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:4>
* needs_tests: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:5>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:6>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"5bc3123479bd97dc9d8a36fa9a3421a71063d1da" 5bc31234]:
{{{
#!CommitTicketReference repository=""
revision="5bc3123479bd97dc9d8a36fa9a3421a71063d1da"
Fixed #24558 -- Made dumpdata mapping ordering deterministic.
Thanks to gfairchild for the report and Claude for the review.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/24558#comment:7>