`Model.validate_unique` excluding partial unique constraint

1,654 views
Skip to first unread message

Gaga Ro

unread,
Jun 1, 2021, 11:18:23 AM6/1/21
to Django developers (Contributions to Django itself)
Hi,

I changed several models from fields using `unique=True` to using `UniqueConstraint` with a condition in the Meta.

As a side-effect, the uniqueness are no longer validated during cleaning of a Form and an integrity error is raised. This is because partial unique indexes are excluded :

It seems that `total_unique_constraints` is also used to check for fields that should be unique (related fields and USERNAME_FIELD specifically).

I tried modifying `total_unique_constraints` and the only tests which failed were related to the above concern and `test_total_ordering_optimization_meta_constraints` which also uses `total_unique_constraints`. My application works fine and the validation error are correctly raised in my forms.

The current behaviour of `Model.validate_unique` is also not the one I expected as my conditional `UniqueConstraint` were not used (which caused the integrity error).

Am I missing something? Or should we use all constraints (including partial) in `Model.validate_unique`?

If this is indeed what should be done, adding an `all_unique_constraints` next to `total_unique_constraints` and using it in `Model.validate_unique` instead of `total_unique_constraints` would do the trick. I don't mind opening a ticket and doing the PR if needed.

Thanks.

charettes

unread,
Jun 1, 2021, 6:33:12 PM6/1/21
to Django developers (Contributions to Django itself)
Hello there,

Partial unique constraints are currently not supported during validation for reasons described in this ticket[0].

For example (inspired by this Github comment[1]), if you define the following model

class Article(models.Model):
    slug = models.CharField(max_length=100)
    deleted_at = models.DateTimeField(null=True)

    class Meta:
        constraints = [
            UniqueConstraint('slug', condition=Q(deleted_at=None), name='unique_slug'),
        ]

Then validate_unique must perform the following query to determine if the constraint is violated

SELECT NOT (%(deleted_at)s IS NULL) OR NOT EXISTS(SELECT 1 FROM article WHERE NOT id = %(id)s AND slug = %(slug)s AND deleted_at IS NULL)

In other words, the validation of a partial unique constraint must check that either of these conditions are true
1. The provided instance doesn't match the condition
2. There's no existing rows matching the unique constraint (excluding the current instance if it already exists)

This is not something Django supports right now.

In order to add proper support for this feature I believe (personal opinion here feedback is welcome) we should follow these steps:

1. Add support for Expression.check(using: str) -> bool that would translate IsNull(deleted_at, True).check('alias') into a backend compatible 'SELECT %(deleted_at)s IS NULL' query and return whether or not it passed. That would also allow the constructions of forms like

(~Q(IsNull(deleted_at, True)) | ~Exists(Article.objects.exclude(pk=pk).filter(slug=slug, deleted_at=None)).check(using)

2. Add support for Constraint.validate(instance, excluded_fields) as described in [0] that would build on top of Expression.check to implement proper UniqueConstraint, CheckConstraint, and ExclusionConstraint validation and allow for third-party app (e.g. django-rest-framework which doesn't use model level validation[2]) to take advantage of this feature. For example the unique_for_(date|month|year) feature of Date(Time)?Field could be deprecated in favour of Constraint subclasses that implement as_sql to enforce SQL level constraint if available by the current backend and implement .validate to replace the special case logic we have currently in place for these options[3].

I hope this clarify the current situation.

Cheers,
Simon

Gaga Ro

unread,
Jun 2, 2021, 11:36:02 AM6/2/21
to Django developers (Contributions to Django itself)
Thanks for the thorough answer. I also realize now that it worked in my app only because of another side effect when my instance was saved..

I started to take a look at the ORM part where the check method should be implemented as I'm not used to it. Shouldn't .check() be implemented on Q and not on Expression? Or are you including Lookup / Q in it?

Then I'd guess it's just a matter of calling as_sql() from each part and assemble them. Everythings we need seems to be done in Query and we can't use it as it has to be linked to a model, so we would have to redo it? Although as_sql needs a compiler which itself needs a query. I admit I'm a bit lost in all those classes, everything seems to be too much linked to the models to do something without one.

If you have any more hints as to where I should look, thanks again.

charettes

unread,
Jun 10, 2021, 12:00:17 AM6/10/21
to Django developers (Contributions to Django itself)
Alright so here's for a few hints about I believe things should be done.

First things first Lookup must be made a subclass of Expression which is being worked on[0].

Ideally Q would also be made a subclass of Expression but that's likely a big can of worms so I'd focus on implementing it for Q only at first.

Now for the compiler part. Things are bit tricky here as these expressions are not going to be bound to a model/table and most of the sql.Query and resolve_expression machinery revolves around the availability of a Query.model: models.Model property. I think there's two solutions here:

1. Adapt sql.Query so it can be *unbounded* meaning that it's .model property type would change from models.Model to Optional[models.Model].
2. Follow the sql.RawQuery route and define a new sql.UnboundQuery class that *looks* like a Query but doesn't allow any form of column references or JOINs or filters (WHERE).

In both cases the Query like object should prevent any form of column references and JOINs with a comprehensible error messages (e.g. in resolve_ref and setup_join if go with 1.). I have a feeling 2. is easier to implement but 1. seems like it could be a much more rewarding experience for you and the community as you'll have to special case a couple of long lived assumptions in django.db.models.sql.

Depending on whether you choose the 1. or 2. you'll have to implement a way for database backends to specify how to *wrap* the provided expression in a SELECT statement. Most databases won't require any special casing but I know that Oracle will require the addition of a trailing "DUAL" clause (SELECT ... FROM DUAL)[1] and possibly some special casing of expressions such as exists but there's already pre-existing logic for that[2]. If you go with 1. this can be done by returning a backend specific string in SQLCompiler.get_from_clause when self.query.alias_map is empty[3].

In the end the call stack should be (assuming 1. is followed):

Q.check(self, using):
  query = Query()
  query.add_annotations(self, '_check')
  query.set_values('_check')
  compiler = query.get_compiler(using=db)
  result = compiler.execute_sql(SINGLE)
  return bool(result[0])

I hope this clears things up a bit!

Cheers,
Simon

Gaga Ro

unread,
Jun 14, 2021, 3:09:35 AM6/14/21
to Django developers (Contributions to Django itself)
Thanks, it clears things a lot.

I'll try my hand at it when I'll have some more time available.

charettes

unread,
Jun 15, 2021, 8:02:28 PM6/15/21
to Django developers (Contributions to Django itself)
FWIW I thought I'd give a timeboxed shot at 2. to make sure I don't send you towards a deep rabbit hole and it seems pretty straightforward!

charettes

unread,
Jun 15, 2021, 8:04:31 PM6/15/21
to Django developers (Contributions to Django itself)
I meant 1. in my previous email where sql.Query.model is allowed to be None. The tests happen to pass on SQLite, MySQL, and Postgres.

Gaga Ro

unread,
Jun 16, 2021, 4:44:08 AM6/16/21
to Django developers (Contributions to Django itself)
It looks like you went even further than that :D.

Should we still add Q.check() (which will be as you said before), then refactor BaseConstraint.validate() to use it?

charettes

unread,
Jun 16, 2021, 9:19:16 AM6/16/21
to Django developers (Contributions to Django itself)
> It looks like you went even further than that :D.

yeah didn't want to step on your toes but I got very excited about trying it out 😅

> Should we still add Q.check() (which will be as you said before), then refactor BaseConstraint.validate() to use it?

I think it would still be worth doing to avoid the Query(None), add_annotation, except FieldError, get_compiler() boilerplate but me might need to change the planed function signature.

What do you think of

def check(self, instance, exclude=None, using=DEFAULT_DB_ALIAS):
    query = Query(None)
    for field in instance._meta.local_concrete_fields:

        if exclude and field.name in exclude:
            continue
        value = getattr(instance, field.attname)
        query.add_annotation(Value(value, field), field.name, select=False)

    try:
        query.add_annotation(ExpressionWrapper(self, BooleanField()), '_check')

     except FieldError:

         # Check is referencing an excluded field

         return
None
     compiler = query.get_compiler(using=using)

     return bool(compiler.execute_sql(SINGLE)[0
])

Looks like it would deal with most of the boilerplate while still being flexible enough for our use case. Maybe we don't want to push down the model/exclude field notion this far though and require literals to be passed directly instead.

def check(self, against, using=DEFAULT_DB_ALIAS):
    query = Query(None) 
    for name, value in against.items():
        if not hasattr('resolve_expression'):
            value = Value(value)
        query.add_annotation(value, name, select=False)
    try:
        query.add_annotation(ExpressionWrapper(self, BooleanField()), '_check')
     except FieldError:

         # Check is referencing a missing field

         return
None
     compiler = query.get_compiler(using=using)

     return bool(compiler.execute_sql(SINGLE)[0
])

I have a slight preference for the second option as it seems like it could be used in other context than constraints[0] but I'm curious about your thoughts? Looking at [0] in more details I also feel like matches() could be a better name than check().

Simon

Gaga Ro

unread,
Jun 16, 2021, 11:08:24 AM6/16/21
to Django developers (Contributions to Django itself)
> yeah didn't want to step on your toes but I got very excited about trying it out 😅

Don't worry about that, it's a good thing this motivated you enough to advance on this topic.

> I have a slight preference for the second option as it seems like it could be used in other context than constraints[0] but I'm curious about your thoughts?

Yes I agree that this should be independent from models. There is no reason to tie it to them, and if we want to do it anyway, it would be trivial to do using the new method. Or maybe we could have both we different named parameters and change the behaviour deping on those.

> Looking at [0] in more details I also feel like matches() could be a better name than check().

It should be obvious that the method return a boolean, matches sounds like that it could returns a list of the matches to me.

Gaga Ro

unread,
Jun 21, 2021, 8:54:59 AM6/21/21
to Django developers (Contributions to Django itself)
I tried my hand at implementing Q.check() (https://github.com/Gagaro/django/tree/ticket-30581).

A few things:

1/ Is the exclude parameter there because of Model.validate_unique signature? Conditional UniqueConstraint might not work in those cases if a field use in a the condition is not in the form for example.
2/ Shouldn't we let the FieldError raising in Q.check() instead of returning None? Or raise another (new one?) exception?
3/ I'm not so sure anymore about the check name being better than matches. We might need more inputs on that one :).
4/ Should we raise NotImplementedError in BaseConstraint.validate?

Thanks for your inputs.

charettes

unread,
Jun 21, 2021, 12:00:03 PM6/21/21
to Django developers (Contributions to Django itself)
That's looking great :)

1. Yes and that's expected. If a form/serializer doesn't provide some fields included in the constraint the database client side of the validation can't do much about it. It might result in an integrity error but that's a misuse of the API. I guess a check/runtime warning could be emitted when creating model forms/serializers that don't fully cover constraints define on attached models but that's already an issue with the existing validate_unique logic.
2. I guess it could be surfaced in Q.check and expected to be caught Constraint.validate. Whichever layer performs the field exclusion should be responsible for handling the FieldError.
3. Yep definitely something we can bring to this list once we're satisfied with the API.
4. I guess we could yes! That makes me think we'll also want to implement it for ExclusionConstraint if you're up for it!

Simon

Gaga Ro

unread,
Jun 21, 2021, 5:07:39 PM6/21/21
to Django developers (Contributions to Django itself)
So am I right that the example model with deleted_at will not be validated by ModelForm as deleted_at will never be included in it?

I tried implementing ExclusionConstraint.validate (https://github.com/Gagaro/django/commit/558f33f574838b21cc9bf58a825ef337e7b1d0b2) but I had to use RawSQL as I didn't find another way to use the operator. It works great when running the query alone but:

I don't like using raw SQL when there is a better way to do it (is there?).
And it doesn't work when used in the Exists as the table is aliased and the raw SQL is not.

Do you have any idea how to fix/improve that?

Thanks again!

charettes

unread,
Jun 22, 2021, 10:43:36 AM6/22/21
to Django developers (Contributions to Django itself)
> I don't like using raw SQL when there is a better way to do it (is there?).
> And it doesn't work when used in the Exists as the table is aliased and the raw SQL is not.

I think the issue is that you're resolving before annotation/aliasing. If #27021 landed you could actually create a Lookup instance to contain both the rhs and lhs and just use that but in the mean time you're kind of stuck to writing your own Expression subclass. Maybe we should just rebase the branch on top of the current work of #27021[0] since it should land before this work anyway?

Simon

Gaga Ro

unread,
Jun 24, 2021, 8:25:21 AM6/24/21
to Django developers (Contributions to Django itself)
I took a bit of time to try with the new lookups and it looks much better! Also it actually works now :).

Is the code ready for a PR? Or should I add the documentation / more tests before?

charettes

unread,
Jul 7, 2021, 2:53:23 PM7/7/21
to Django developers (Contributions to Django itself)
Just a small note that I didn't forget about this thread but I was waiting for lookup annotation support to land before focusing on it[0].

I guess you could go ahead and create a PR once it lands.

I assume we'll want to have Model.full_clean take advantage of this new Constraint.validate method and remove the special handling in _validate_unique as well.

I wonder if we'll want to add a Constraint(invalid_message) argument to allow for localized error messages to be raised on violation like we do with validators?

Simon

Gaga Ro

unread,
Jul 10, 2021, 2:41:05 PM7/10/21
to Django developers (Contributions to Django itself)
No problem. The lookup PR has been merged, I'll work on our PR monday.

I'm all up for the invalid message, it will be shown to the end user after all.

charettes

unread,
Jul 11, 2021, 8:31:02 PM7/11/21
to Django developers (Contributions to Django itself)
Awesome, thanks for your continued efforts on this!

Gaga Ro

unread,
Jul 12, 2021, 7:26:53 AM7/12/21
to Django developers (Contributions to Django itself)
Reply all
Reply to author
Forward
0 new messages