Commitable json dumps

110 views
Skip to first unread message

Brice PARENT

unread,
Mar 6, 2017, 7:25:23 AM3/6/17
to django-d...@googlegroups.com
Hello,

I've had a few customers for whom I've had to create a repository with
all their almost-static data (pages contents, etc.). To do that, when
they want a backup, a scripts calls several "manage.py dumpdata
--indent=4 [app].[model] > app_model.json", then commit the whole thing.
The customer may then update their version of the repo.

But whenever there are some changes, I'd like to be able to see them
easily (that's the reason of the --indent), but right now, the fields
order changes frequently as the order has no meaning. But in a diff, the
order changes everything. It's almost impossible to see the changes
because every line has moved.

I have no idea if this should be an argument to dumpdata, or a special
behaviour on the serializer's side, but having the fields sorted during
the serialization doesn't change the validity of the data, but allows
the diffs to be way more explicit.

How it can be done for json's serializaer
(django/core/serializers/json.py:60 for django 1.8):
json.dump(self.get_dump_object(obj), self.stream, cls=DjangoJSONEncoder,
sort_keys=True, **self.json_kwargs)

(I added the sort_keys=True argument)

I haven't looked if it would have an equivalent for other serializers,
nor if it would make any sense without the "indent" argument, for now
it's just an idea that feels good, but probably require more thinking
and advice before being investigated more deeply. And I didn't launch
any test suite for now, so I don't know if there is any side effect.
Just validating the idea here.

Any thoughts?

Brice Parent



Adam Johnson

unread,
Mar 6, 2017, 8:29:08 AM3/6/17
to django-d...@googlegroups.com
PyYAML sorts keys by default, so if you use the YAML serializer that should work for your usecase.

I think patching the JSON serializer to be deterministic by default is a good idea, the performance cost of sorting keys is pretty small compared to disk operations.



Brice Parent



--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/818b56b8-716d-d80f-ade2-1f3424206b08%40brice.xyz.
For more options, visit https://groups.google.com/d/optout.



--
Adam

Brice PARENT

unread,
Mar 6, 2017, 9:45:02 AM3/6/17
to django-d...@googlegroups.com
Le 06/03/17 à 14:28, Adam Johnson a écrit :
> PyYAML sorts keys by default, so if you use the YAML serializer that
> should work for your usecase.
I think it will become my new default !
Thanks for the info.

Brice

Brice PARENT

unread,
Mar 6, 2017, 9:47:32 AM3/6/17
to django-d...@googlegroups.com

Le 06/03/17 à 13:28, James Pic a écrit :
>
> Django-dbdiff solved that serialization issue, specifically to create
> diff outputs, in earlier versions. Now it has its own diff engine
> built in though, definitely worth taking a look.
>
I will ! But for my use case, it appears that Yaml would be a better
idea, as it should already work with a stock Django.
Thanks !

Brice

Brice PARENT

unread,
Mar 6, 2017, 11:53:51 AM3/6/17
to django-d...@googlegroups.com

Le 06/03/17 à 15:44, Brice PARENT a écrit :
> Le 06/03/17 à 14:28, Adam Johnson a écrit :
>> PyYAML sorts keys by default, so if you use the YAML serializer that
>> should work for your usecase.
> I think it will become my new default !
It appears that the rendered format is not very consistent, or at least
it's what I've found. Yaml seems to offer a short and a long syntax.
I tried with 2 models, one from stock Django (flatpages), which seems to
give something that corresponds to my needs, and one with a custom one,
where the used syntax doesn't create a new line for each field. (I
edited the outputs to focus on the idea and remove irrelevant contents).
./manage.py dumpdata --format yaml --indent 4 flatpages
- fields:
content: '<p>First line.</p>
<p>Second line</p>'
enable_comments: false
registration_required: false
sites: [1]
template_name: ''
title: Multiline
url: /my/test/
model: flatpages.flatpage
pk: 13

./manage.py dumpdata --format yaml --indent 4 myapp
- fields: {content: "<p>First line.</p>\r\n\r\n<h3>Second line</h3>\r\
\n", module: 1, position: 1, summary: "<h3>First line.</h3>\r\
\n\r\n<h3>Second line</h3>\r\", title: "My title"}
model: myapp.mymodel
pk: 1

So with the same command, I've gotten two formats, one that is
git-friendly, and one that isn't. I haven't yet looked at the source
code on why it chose to use one syntax over the other though.

Brice

Adam Johnson

unread,
Mar 7, 2017, 5:24:01 PM3/7/17
to django-d...@googlegroups.com
Ah yes, PyYAML just does this. It can be disabled by passing a different option to yaml.dump (I think default_flow_style=Falsebut that would be similar to changing the JSON serializer..



Brice

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

For more options, visit https://groups.google.com/d/optout.



--
Adam

Adam Johnson

unread,
Mar 7, 2017, 5:30:33 PM3/7/17
to django-d...@googlegroups.com
Wait, I just looked into this further, and discovered that the ordering of fields was made deterministic for all serializers in #24558 - this was released in Django 1.9! Enjoy👌
--
Adam

Brice PARENT

unread,
Mar 8, 2017, 3:39:15 AM3/8/17
to django-d...@googlegroups.com

Wait, I just looked into this further, and discovered that the ordering of fields was made deterministic for all serializers in #24558 - this was released in Django 1.9! Enjoy👌
Nice. Thanks for the info!
So I'll wait for v1.11 for that , no problem! (as a Freelancer, I only deploy LTS versions for my customers)

Brice
Reply all
Reply to author
Forward
0 new messages