FK constraints are not checked at the end of nested atomic blocks

317 views
Skip to first unread message

Артём Клименко

unread,
Dec 21, 2015, 6:26:41 AM12/21/15
to Django developers (Contributions to Django itself)
I opened this ticket https://code.djangoproject.com/ticket/25955

The last proposal for the improvement https://code.djangoproject.com/ticket/25955#comment:5
add parameter in OPTIONS
DATABASES = {
   
'default': {
       
'ENGINE': 'django.db.backends.postgresql_psycopg2',
       
# ...
       
'OPTIONS': {
           
'savepoint_check_constraints': True
       
}
   
},
}
https://code.djangoproject.com/attachment/ticket/25955/django_master_v2.patch

Anssi Kääriäinen

unread,
Dec 21, 2015, 7:13:02 AM12/21/15
to django-d...@googlegroups.com
The check_constraints operation is extremely expensive, and not suitable for use at the end of savepoints.

The reason why we use deferred constraints by default is mostly historical. We do need deferred constraints for fixture loading, and applications might rely on current implementation, but we could perhaps get away with deferrable initially immediate constraints if the user chooses those. Allowing users to choose constraint style requires relatively large amount of work, so nobody has tackled that problem yet.

 - Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/f1d474af-1191-4110-b664-2561d9477631%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Dec 21, 2015, 7:16:37 AM12/21/15
to django-d...@googlegroups.com
Hello,

When a question of "should Django do A or B here?" comes up, adding a setting to let the user choose is the last resort, because it's really a cop-out, not the default choice.

Django explicitly creates FKs on PostgreSQL as DEFERRABLE INITIALLY DEFERRED in order not to check constraints until the end of a transaction. Certainly it does that for a reason -- although I'm not sure which one, perhaps ensuring fixture data can be loaded in arbitrary order?

The thought process here should be:

1. For what reason does Django do that? Does this reason also apply, or still apply, to the situation where you disagree with Django's behavior?
2. What's going to break if we make this change? We aren't going to provide an option that's incompatible with other documented features.
3. Is there a good reason to let the user pick one behavior or the other? If so, how are we going to explain how to choose the adequate behavior for a given use case?
4. Given the above, what should be implemented?

I never thought about deferred constraints check when I added support for savepoints to all built-in backends. Previously one PostgreSQL had support for savepoints and it was minimal.

My gut feeling is that it's a decent idea to check integrity constraints in atomic(savepoint=True), assuming this doesn't have horrific consequences for performance. Note that savepoint=True is the default for users, but that Django uses savepoint=False internally whenever possible for performance.

-- 
Aymeric.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/f1d474af-1191-4110-b664-2561d9477631%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Aymeric.

Aymeric Augustin

unread,
Dec 21, 2015, 7:18:19 AM12/21/15
to django-d...@googlegroups.com
2015-12-21 13:12 GMT+01:00 Anssi Kääriäinen <akaa...@gmail.com>:
The check_constraints operation is extremely expensive, and not suitable for use at the end of savepoints.

Ah. Too bad!

--
Aymeric.

Shai Berger

unread,
Dec 21, 2015, 9:41:36 AM12/21/15
to django-d...@googlegroups.com
While expensive operations may be unsuitable for general consumption, this kind of validation may be useful in special cases and offline jobs, notably in "data" migrations. So I think we should consider supporting them via an argument to atomic().

Be that as it may, so long as we do not plan to change global behavior, we should fix the documentation. The current text and example code strongly imply that FKs are checked at the end of atomic blocks, even though they manage to avoid saying it explicitly.

Shai.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Anssi Kääriäinen

unread,
Dec 21, 2015, 12:42:42 PM12/21/15
to django-d...@googlegroups.com
The only generic way to check the constraints is to run through all
tables and their constraints. The constraint check itself is a check
that a matching row in a foreign table exists for each row of the
checked table. This means that the check could take hours on large
databases. In addition the check isn't safe against concurrency
issues. I'm afraid that if we have a flag for this for atomic(), then
users are going to think it is a good idea to run the check just in
case.

We could add a documented API to check wanted models for foreign key
constraint violations (.check_constraints(Model1, Model2) would be a
likely API). Then lets just document that depending on the database
this check can be slow. On some databases we can get away with SET
CONSTRAINTS IMMEDIATE
[http://www.postgresql.org/docs/9.1/static/sql-set-constraints.html],
which should be fast.

I do think DEFERRABLE INITIALLY IMMEDIATE constraints would make a lot
of sense for what Django is doing with the constraints, that is we
want to keep the constraints in check except in special cases.
Unfortunately I can't see an easy way to switch constraints to
INITIALLY IMMEDIATE mode due to backwards compat and migrations
issues.

- Anssi
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/F0A7E950-24FE-457C-877D-905EA3E9AA65%40platonix.com.

Shai Berger

unread,
Dec 21, 2015, 6:20:23 PM12/21/15
to django-d...@googlegroups.com
On Monday 21 December 2015 19:42:22 Anssi Kääriäinen wrote:
> The only generic way to check the constraints is to run through all
> tables and their constraints. The constraint check itself is a check
> that a matching row in a foreign table exists for each row of the
> checked table. This means that the check could take hours on large
> databases. In addition the check isn't safe against concurrency
> issues. I'm afraid that if we have a flag for this for atomic(), then
> users are going to think it is a good idea to run the check just in
> case.
>

We can handle that by naming the argument "lengthy_constraint_check" or some
such.

> We could add a documented API to check wanted models for foreign key
> constraint violations (.check_constraints(Model1, Model2) would be a
> likely API). Then lets just document that depending on the database
> this check can be slow. On some databases we can get away with SET
> CONSTRAINTS IMMEDIATE
> [http://www.postgresql.org/docs/9.1/static/sql-set-constraints.html],
> which should be fast.

Then why don't we implement check_constraints() as
"SET CONSTRAINTS IMMEDIATE; SET CONSTRAINTS DEFERRED" on PG? That is
documented to only check outstanding changes in the transaction.

I am not sure about the performance implications on Oracle, but the Oracle
docs suggest "Making constraints immediate at the end of a transaction is a
way of checking whether COMMIT can succeed."[1]

As far as I know, neither MySql nor Sqlite support deferrable constraints.

FWIW, MSSQL does not support deferring as such -- it only supports explicitly
turning the constraints off and on; when you turn them on, you can choose
between "with check", which checks every single row in your db, and "with no
check", which allows corrupt data into your DB; this makes loading fixtures
excruciatingly slow (I believe Anssi was involved in a recent to improve this
recently).

> I do think DEFERRABLE INITIALLY IMMEDIATE constraints would make a lot
> of sense for what Django is doing with the constraints, that is we
> want to keep the constraints in check except in special cases.
> Unfortunately I can't see an easy way to switch constraints to
> INITIALLY IMMEDIATE mode due to backwards compat and migrations
> issues.
>

We could handle the migrations if deferrability with an attribute of a FK, I
think; we could then handle backwards compatibility with a deprecation cycle
forcing deferrability to be explicit and then, setting a new default. However,
I don't think the benefit here is worth the amount of code churn.

Shai.

[1]
https://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_10003.htm

Anssi Kääriäinen

unread,
Dec 28, 2015, 2:02:03 AM12/28/15
to Django developers (Contributions to Django itself)
On Tuesday, December 22, 2015 at 1:20:23 AM UTC+2, Shai Berger wrote:
> We could add a documented API to check wanted models for foreign key
> constraint violations (.check_constraints(Model1, Model2) would be a
> likely API). Then lets just document that depending on the database
> this check can be slow. On some databases we can get away with SET
> CONSTRAINTS IMMEDIATE
> [http://www.postgresql.org/docs/9.1/static/sql-set-constraints.html],
> which should be fast.

Then why don't we implement check_constraints() as
"SET CONSTRAINTS IMMEDIATE; SET CONSTRAINTS DEFERRED" on PG? That is
documented to only check outstanding changes in the transaction.

I am not sure about the performance implications on Oracle, but the Oracle
docs suggest "Making constraints immediate at the end of a transaction is a
way of checking whether COMMIT can succeed."[1]

As far as I know, neither MySql nor Sqlite support deferrable constraints.

Hmmh, if we only need to check deferred constraints, this can be made to work. I don't believe SET CONSTRAINTS is expensive at all, it does the same job the DB needs to do at commit time in any case.

Using SET CONSTRAINTS ALL IMMEDIATE; SET CONSTRAINTS ALL DEFERRED; is a bit problematic, as any constraint that was DEFERRABLE INITIALLY IMMEDIATE would then be set to deferred mode after the commands. This isn't a big problem in practice, unless we at some point allow creating DEFERRABLE INITIALLY IMMEDIATE constraints. We could also explicitly name all the constraints we want to defer, but I'm not sure if we actually reliably know the names of the constraints.

So, we could add a flag "check_deferred_constraints" to atomic(), and use the above SET CONSTRAINTS commands to achieve the wanted results. We will likely need to document that DEFERRABLE INITIALLY IMMEDIATE constraints will be set to deferred mode.

 - Anssi

Gavin Wahl

unread,
Dec 29, 2015, 12:40:45 PM12/29/15
to Django developers (Contributions to Django itself)
What does it even mean for constraints to be checked at savepoint commit? I would expect this to not cause any errors:

    with atomic():
        # insert invalid fk
        with atomic(check_deferred_constraints=True):
            # do stuff that doesn't invalidate any contraints
        # delete the invalid fk inserted before

But with the proposed solution _all_ constraints are checked, not just the ones invalidated in that savepoint.

Shai Berger

unread,
Dec 30, 2015, 6:23:53 PM12/30/15
to django-d...@googlegroups.com
Hi,

On Tuesday 29 December 2015 19:40:44 Gavin Wahl wrote:
> What does it even mean for constraints to be checked at savepoint commit? I
> would expect this to not cause any errors:
>
> with atomic():
> # insert invalid fk
> with atomic(check_deferred_constraints=True):
> # do stuff that doesn't invalidate any contraints
> # delete the invalid fk inserted before
>

I think your expectation is wrong, for both technical and "ideological"
reasons.

Technically, you can easily do things in the inner block which, although they
do not violate any constraints in themselves, rely on the invalid FK record.
If that were allowed to pass silently, we would violate the expectation that
"if the inner block succeeded, then things in it are ok".

Philosophically, I don't think "checking the constraints only for things I
changed here" is a valid *logical* operation, as you seem to imply; a database
is valid or broken as a whole. The fact that we can get away with checking
only the changes at a transaction limit is an *implementation* shortcut, valid
only as far as it is logically equivalent to checking the whole database.

Shai.

Gavin Wahl

unread,
Dec 30, 2015, 11:32:25 PM12/30/15
to django-d...@googlegroups.com
I understand that a database is invalid or invalid as a whole, which is why you don't need to check constraints until transaction commit (when the transaction becomes visible to the world). Why do you want to bundle constraint checking with a savepoint release though? They seem logically unrelated. Wouldn't a standalone way to check deferred constraints be more useful?

Shai Berger

unread,
Dec 31, 2015, 1:54:17 AM12/31/15
to django-d...@googlegroups.com
On Thursday 31 December 2015 06:31:40 Gavin Wahl wrote:
> I understand that a database is invalid or invalid as a whole, which is why
> you don't need to check constraints until transaction commit (when the
> transaction becomes visible to the world). Why do you want to bundle
> constraint checking with a savepoint release though? They seem logically
> unrelated. Wouldn't a standalone way to check deferred constraints be more
> useful?
>
It makes some sense to check constraints at the end of a set of operations
which are defined as a unit of work.

At the database level, it is mandatory at a transaction's end, and makes more
sense at a savepoint release than at an arbitrary point in the transaction
(although it does make sense elsewhere).

At Python code level, we use the same construct (atomic() block) for
transactions and savepoints, so it would make sense to allow them to behave in
similar ways. It is also arguable that any place where you'd want to check
constraints signifies the end of a unit of work, and so we should encourage the
uise of atomic() to also specify the beginning of said unit.

And then, there's "there should be one obvious way to do it", which hints we
should avoid having both check-in-atomic and check-independently.

My 2 cents,
Shai.
Reply all
Reply to author
Forward
0 new messages