[Django] #34697: Migration serializer for sets results in indeterministic output due unstable iteration order / hash randomization

2 views
Skip to first unread message

Django

unread,
Jul 6, 2023, 12:09:13 PM7/6/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
-------------------------------------------+------------------------
Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 1
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------------+------------------------
We are using quite a lot of complex `index_together` / `unique_together`
constraints on our models, and the output in the generated migrations is
flip-flopping all the time like follows, causing spurious diffs in our
checkouts:

{{{
migrations.AlterUniqueTogether(
+ unique_together={("tenant", "dealer"), ("tenant", "order")},
- unique_together={("tenant", "order"), ("tenant", "dealer")},
}}}

This is happening because these constraints are normalized to sets
internally in the ModelState, which kind of makes sense, but unfortunately
set iteration order (unlike dicts!) is unstable in Python 3 due to hash
randomization. However, migrations serializer doesn't have any special
facilities for ensuring stable output for sets and this is what causes
annoying diffs for us all the time.

I suggest to add a trivial serializer specifically for unordered sequences
which ensures stable output no matter the iteration order. Stability can
be achieved by sorting elements in the set by their string representation.
This only affects the writer output, and doesn't interfere with the rest
of Django in any way, so this change only improves developer experience,
but has no effect on the performance and/or reliability.

I hope that even though it's apparently not a major problem for most users
you would still accept the fix to ensure stable migration writer output
for the rest of us.

--
Ticket URL: <https://code.djangoproject.com/ticket/34697>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Jul 6, 2023, 12:11:06 PM7/6/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
---------------------------------+--------------------------------------

Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Resolution:

Keywords: | Triage Stage: Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------
Changes (by Yury V. Zaytsev):

* Attachment "0001-Fixed-34697-Added-serializer-for-unordered-
sequences.patch" added.

Django

unread,
Jul 7, 2023, 12:38:40 AM7/7/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
---------------------------------+--------------------------------------
Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------
Changes (by Mariusz Felisiak):

* cc: Simon Charette (added)
* has_patch: 1 => 0


Comment:

Thanks for the ticket, however I'm not sure it's worth changing as other
elements may still be non-deterministic for a different set, see #30029.
Setting a fixed `PYTHONHASHSEED` should make it deterministic for you.

--
Ticket URL: <https://code.djangoproject.com/ticket/34697#comment:1>

Django

unread,
Jul 7, 2023, 1:59:26 PM7/7/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
---------------------------------+--------------------------------------
Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------
Changes (by Yury V. Zaytsev):

* Attachment "0002-Fixed-30029-Sort-dependencies-in-MigrationWriter-
to-.patch" added.

Django

unread,
Jul 7, 2023, 2:25:45 PM7/7/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
---------------------------------+--------------------------------------
Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------

Comment (by Yury V. Zaytsev):

Hi, thanks for the feedback and the reference to an earlier ticket.

Actually, we've also hit the flip-flops with the dependencies, which are
annoying, but we do hit them so rarely, that I actually have forgotten
about them. However, we are hitting the set-caused issues every couple of
days and this finally motioned me to try to do something about it. I have
salvaged the old part of the patch concerning the dependencies, rebased
it, fixed and it seems to work. Very nice. The result is uploaded to the
ticket.

Yes, I know about the "workaround" of setting the `PYTHONHASHSEED`, but
this is not a practical solution for us. Maybe you can do that if you are
a sole developer working alone on some Django project. You can write a
wrapper script around `manage.py`, or set this variable globally, or use
the IDE configuration. But if you are a team of more than a dozen of
developers working on a large number of Django projects collaboratively,
things are getting very annoying and difficult.

We can't easily patch `manage.py` to set it for `makemigrations`, because
it has to be set before interpreter startup. Wrapper scripts are also
problematic, because each member would have to integrate it in his
workflow. Finally, forcing people to set `PYTHONHASHSEED` globally on
their machine is a questionable solution with security implications, which
is why it was introduced in the first place...

Making migrations completely deterministic is easy ;-) You just have to
sort in `BaseSequenceSerializer` for all sequence types, and not just for
set types like I did. Only this will of course have a huge number of
unwanted side-effects like force-sorted choices and so on, and it will
cause hugely complicated and unnecessary code mess to write opt-outs
explicitly.

Right now, all concerns raised by Simon in the past are addressed:

1. Sets are (now) globally deterministic
2. Dicts are serialized in an fixed insertion order
3. Dependencies are now sorted explicitly as an exception^*
4. Other stuff (lists, tuples) are ordered, because the order has a
semantic

^* Or maybe a better approach would be to switch dependencies to a set if
the order is not important, how would you like that?

So I think that fixing it for set types and dependencies will solve 99% of
the cases for everyone without any bad side effects and no effort on the
part of the users, and will not cause much maintenance & code complexity
burden. For now you've got 2 tickets in 5 years for this, and if these
patches are committed and we still missed a thing, then it's no problem to
fix another edge case 10 years later, in as far as I'm concerned... :-)

I would be curious to hear Simon's thoughts on that.

--
Ticket URL: <https://code.djangoproject.com/ticket/34697#comment:2>

Django

unread,
Jul 7, 2023, 3:01:22 PM7/7/23
to django-...@googlegroups.com
#34697: Migration serializer for sets results in indeterministic output due
unstable iteration order / hash randomization
---------------------------------+--------------------------------------
Reporter: Yury V. Zaytsev | Owner: nobody
Type: Bug | Status: new
Component: Migrations | Version: 4.2
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
---------------------------------+--------------------------------------

Comment (by Simon Charette):

Thanks for your linking the issue Mariusz and for the taking the read
through it and craft a detailed answer Yury.

TL;DR I think your assessment of the current situation makes sense and I'm
starting to believe we should proceed with this patch.

I'm curious of where you experience the flip-flop though. Are you
generating the same migrations over and over again? What development
process causing the ''diffing'' noise you are referring to given
`makemigrations` operates on the equality of objects and should not care
about how they are serialized as long as they are equal.

> Or maybe a better approach would be to switch dependencies to a set if
the order is not important, how would you like that?

I think this is something that should be considered in another ticket but
that we could ultimately do. The fact they are currently stored in a list
today provides a false sense that their ordering is meaningful while it's
not actually the case. That's a problem that is generalized to many parts
of Django unfortunately (e.g. `Model.Meta.unique_together` at the model
definition level in another example) so I'm not sure it's actually worth
the effort in fixing.

--
Ticket URL: <https://code.djangoproject.com/ticket/34697#comment:3>

Reply all
Reply to author
Forward
0 new messages