Backwards compatibility and field validation

233 views
Skip to the first unread message

Cal Leeming [Simplicity Media Ltd]

unread,
15 Oct 2013, 11:43:5915/10/2013
to django-d...@googlegroups.com
Hello,

I am trying to understand why field validators (model.full_clean()) are not called for model.save()

I've spent about an hour reviewing all the discussions/replies on this subject;

From what I can tell this was rejected on the basis of breaking backwards compatibility.

It is unclear whether this would have been desired behavior had backwards compatibility not been an issue.

It also seems to me that it would be possible to maintain backwards compatibility by using settings flags to determine the default behavior, albeit at the cost of increased complexity.

So I have four questions;

1) Without taking backwards compatibility into consideration, is it reasonable to expect field validation automatically upon calling model.save()
2) At what point would Django devs be willing to break backwards compatibility? Would this be considered on 2.x, or is it life time?
3) Could backwards compatibility not be maintained by means of a flag in settings? For example, settings.MODEL_VALIDATE_ON_SAVE=True would cause the validation to be handled using the new approach?
4) Assuming this cannot/will not be implemented, what would be the most suitable workaround? Are there any risks in calling full_clean() without ever using ModelForm? Should ModelForm always be used instead of using a model directly? etc.

Any comments/feedback would be much appreciated, I am also happy to spend time updating docs to reflect responses in this thread.

Thanks

Cal

Cal Leeming [Simplicity Media Ltd]

unread,
15 Oct 2013, 12:12:3215/10/2013
to django-d...@googlegroups.com
I've been thinking about this a little bit more, and it seems that the design approach of validating on save() is not entirely correct.

For example, say you have a validator that ensures a field must be at least 4 characters long, and a few months later the validation is then changed to be 5 characters long. This means any save() call on that field (which may include a modification to a field which is unrelated to the change being made) will then stop that save from happening. By applying the validation at the model level, you are then potentially invalidating data inside your models. This is fine for forms, because it can be handled at the user side (by returning a form error saying X field is incorrect), but would cause problems at the model layer.

I think by design it would be very wrong to validate the model fields at the point of save(), despite seeming like it would be the correct thing to do.

Could someone please verify if I have come to the correct conclusion here?

Cal

Cal Leeming [Simplicity Media Ltd]

unread,
15 Oct 2013, 12:15:0415/10/2013
to django-d...@googlegroups.com
Sorry I should have made myself a little more clear.

When I said "invalidating data inside your models" I was referring to any previous data that was saved when the validator was set to a minimum length of 4 characters. By then changing the validator to be 5 characters, you are effectively breaking save() on any data that was saved before that point, and the ability to fix the model data may not be available in that specific save() context. For example, a cron job that updates a "last_checked" field on the user would not necessarily have data relevant to changes in the users phone number (which could have its min/max length changed at any point in the validators life time).

Hope that makes more sense

Cal

poswald

unread,
16 Oct 2013, 00:20:2716/10/2013
to django-d...@googlegroups.com
One problem with it as you've found is that you sometimes do want control over it on a per-model or even per-instance (per-save) basis. One way to deal with this is to create something like the following that you can extend in your models:

class ValidatedModel(models.Model):
    def save(self, validate=True, *args, **kwargs):
        if validate:
            self.full_clean()
        super(ValidatedModel, self).save(*args, **kwargs)

    class Meta:
        abstract = True

This helps you if you're trying it follow the 'fat model' philosophy and centralize the validation logic down out of the forms and into the model layer. It might end up validating twice if used in a model form though, I'm not sure. I personally wouldn't be against having a setting to change in the project-wide default behavior in a future Django version, but I think the ability to override it is important.

Cal Leeming [Simplicity Media Ltd]

unread,
16 Oct 2013, 09:03:4016/10/2013
to django-d...@googlegroups.com
I'd considered having a `validate` attribute on save(), but this feels unnecessary because we can already have model.full_clean(), having `validate` seems superfluous.

One idea I'd had was to only validate fields which had changed. The reason for this is because we cannot assume data in the model from previous save() will validate correctly on this save() (i.e. in the event of validation rules changing). However the concept of only validating changed fields seems confusing and likely to sting a lot of people.

Ultimately I think code based validation cannot be enforced at the model layer for these reasons, with the exception of IntegrityError raised from db constraints (which in my opinion is completely different to validations in code).

I'd like to hear what a core dev has to say on the matter, at the very least I think this is worthy of a docs patch to explain this more clearly to users.

Cal


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/fa2c2936-4e44-4132-863c-e93c3ee7da8a%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Russell Keith-Magee

unread,
16 Oct 2013, 19:03:0516/10/2013
to Django Developers
On Tue, Oct 15, 2013 at 11:43 PM, Cal Leeming [Simplicity Media Ltd] <cal.l...@simplicitymedialtd.co.uk> wrote:
Hello,

I am trying to understand why field validators (model.full_clean()) are not called for model.save()

I've spent about an hour reviewing all the discussions/replies on this subject;

From what I can tell this was rejected on the basis of breaking backwards compatibility.

It is unclear whether this would have been desired behavior had backwards compatibility not been an issue.

It also seems to me that it would be possible to maintain backwards compatibility by using settings flags to determine the default behavior, albeit at the cost of increased complexity.

So I have four questions;

1) Without taking backwards compatibility into consideration, is it reasonable to expect field validation automatically upon calling model.save()

Yes. Based on the original discussions (as I recall them), this wasn't in question -- the problem was entirely backwards compatibility.
 
2) At what point would Django devs be willing to break backwards compatibility? Would this be considered on 2.x, or is it life time?

Well… ideally never. :-)

Pragmatically, we will make backwards compatible changes as long as there's a clear migration path. The version number isn't something to get hung up on; I'm not expecting 2.0 to be a "massive change" release in the same way that Python 3.0 was. If we can get the solution right, we'll just do it, not wait for a particular version number.

The approach we've taken to backwards incompatibility issues is to introduce them slowly -- to provide lots of warnings that something is going to change, provide opt-in for those that want to migrate, and ensure that all new projects use the new behaviour. The change to the {% url %} tag, for example, has taken almost 4 years to complete, which has given everyone plenty of time to adapt. 

3) Could backwards compatibility not be maintained by means of a flag in settings? For example, settings.MODEL_VALIDATE_ON_SAVE=True would cause the validation to be handled using the new approach?

Historically, we're not wild on introducing new settings, especially for behaviour changes. However, we obviously need to have a way to flag 'use the new behaviour'. The only other option I can think of would be a Meta flag on individual models, but that obviously makes it difficult to turn on the feature project-wide.

4) Assuming this cannot/will not be implemented, what would be the most suitable workaround? Are there any risks in calling full_clean() without ever using ModelForm? Should ModelForm always be used instead of using a model directly? etc.

At the simplest level, rewriting Model.save() to invoke Model.full_clean() should be all the workaround you need.

As for risks - none that I can think of. Model validation is independent of the forms framework; you can use model validation without ever using ModelForms. 
 
Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
16 Oct 2013, 19:08:4516/10/2013
to Django Developers
On Wed, Oct 16, 2013 at 12:15 AM, Cal Leeming [Simplicity Media Ltd] <cal.l...@simplicitymedialtd.co.uk> wrote:
Sorry I should have made myself a little more clear.

When I said "invalidating data inside your models" I was referring to any previous data that was saved when the validator was set to a minimum length of 4 characters. By then changing the validator to be 5 characters, you are effectively breaking save() on any data that was saved before that point, and the ability to fix the model data may not be available in that specific save() context. For example, a cron job that updates a "last_checked" field on the user would not necessarily have data relevant to changes in the users phone number (which could have its min/max length changed at any point in the validators life time).

Hope that makes more sense

Your analysis is correct, but you're possibly over thinking this a bit.

Consider a world in which model validation *was* on by default. You write some models with a char field, and you insert some data. All valid, all saved successfully. Now you change your models and add a validation condition. You've just exposed yourself to exactly the same situation, without feature-level backwards compatibility ever being part of the picture.

What this highlights is that migration needs to be part of the whole solution. At the very least, as part of adding default model validation, we'd need to document the fact that all existing data must be assumed to be potentially invalid, and provide an easy way to force validation. Unfortunately, there's not sure I see any easy way to do this other than a "for model in database: model.save()" loop.

Yours,
Russ Magee %-)

Cal Leeming [Simplicity Media Ltd]

unread,
16 Oct 2013, 19:47:3316/10/2013
to django-d...@googlegroups.com
Thank you for the detailed reply, much appreciated.

I think the problem isn't just storing model validations with migrations, because it's probably good practice for any developer to assume that field data may be invalid (i.e. manual field/table changes, unknown validation bugs from previous releases, buggy migration scripts etc). And a for/model.save() loop would only work if auto-repair was feasible, in which case the developer has to decide how to handle validation failures at certain points in the code. The typical scenarios you'd handle when encountering invalid data would be to ignore, repair or error - depending on where you are within the code.

The approaches that come to mind are;

1) save(validate=True)
This still feels superfluous because we can just call full_clean() before save().

2) full_clean() then save()
This works but you have to manually check the errors to see if it was old or new data that causes validation

3) save(validate_only_changes=True) OR full_clean(validate_only_changes=True)
Feels a bit better as it allows me to handle background jobs gracefully, say a cron that is failing on a record that it doesn't know/need at that point of execution.

4) CharField(validate_old=True)
This would not allow context specific handling (i.e. we want to enforce validate old if we have a user request because we can ask them for updated info, but do not enforce validate when inside a background job which does not have the necessary context to request repair)

From what I can tell, 3 would be a good approach based on the logic of making it easier for the developer to decide how the failed validation should be handled at certain points in the code.

Do you think that `save(validate_only_changes=True)` would be a good approach (worthy of a patch), or am I over-engineering this problem?

Cal


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Luciano Pacheco

unread,
17 Oct 2013, 18:03:4217/10/2013
to django-d...@googlegroups.com

A few cents...

First, a  project wide settings to control this behaviour would be complex with pluggable apps. Some apps might expect global valodation enabled, but some might expect it disabled.

We should have model.save validating by default and have a flag to turn it off. I think it's reasonable to assume that by default we want our data to be consistent.

Regards,

Luciano Pacheco

Karen Tracey

unread,
17 Oct 2013, 22:28:2117/10/2013
to django-d...@googlegroups.com

On Wed, Oct 16, 2013 at 7:03 PM, Russell Keith-Magee <rus...@keith-magee.com> wrote:

1) Without taking backwards compatibility into consideration, is it reasonable to expect field validation automatically upon calling model.save()

Yes. Based on the original discussions (as I recall them), this wasn't in question -- the problem was entirely backwards compatibility.
 


Wasn't there also concern for double validation performed during form clean and then model instance save?

Anssi Kääriäinen

unread,
18 Oct 2013, 02:20:1318/10/2013
to django-d...@googlegroups.com

Another problem is performance - some database validations aren't cheap. Uniqueness checks and foreign key checks do generate database queries, and both of these are already validated in the DB (well, in any real DB). In many realistic cases model.save() could get multiple times more expensive than what it is now.

In addition model validation on save() still doesn't give you any guarantee that your data in the DB is actually correct. Using .update() will still pass any validation. And, for any validation that checks data from referenced models, it is going to be very likely that concurrent transactions will lead to invalid data in the DB. Getting plain referential integrity right under concurrency is surprisingly hard, and getting more advanced constraints right is still harder.

If you want your data consistent you want your validations done in the DB. So, personally I am more interested in trying to have more constraints automatically created in the DB. Of course, not all types of validation are easy to do in SQL, for example converting random python validations to DB triggers is pretty much impossible. But there are also validations that would be straightforward to do (value in choices for example).

I guess I am -0 on automatic model validation on .save(). In any case I would make the default mode model._meta flag so that you could choose the default per model.

 - Anssi

Florian Apolloner

unread,
18 Oct 2013, 09:41:3918/10/2013
to django-d...@googlegroups.com


On Friday, October 18, 2013 12:03:42 AM UTC+2, lucmult wrote:

I think it's reasonable to assume that by default we want our data to be consistent.

I disagree, everything which isn't coming from user input should not need validation, since you really __should__ know what you are putting in there. User input should go through forms…

Regards,
Florian

Florian Apolloner

unread,
18 Oct 2013, 09:42:2118/10/2013
to django-d...@googlegroups.com


On Friday, October 18, 2013 4:28:21 AM UTC+2, Karen Tracey wrote:
Wasn't there also concern for double validation performed during form clean and then model instance save?

Yes, technically we would probably have to track the validation state per field and also track changes to it etc… :(

Cal Leeming [Simplicity Media Ltd]

unread,
18 Oct 2013, 10:09:0218/10/2013
to django-d...@googlegroups.com
This is yet another reason why I don't think it would be reasonable to expect field validation within the model.

I also think that introducing such a check would not only lure the user into a false sense of security

If the validations were done using DB constraints (as per Anssi's reply) then this would be a reasonable expectation, however as mentioned before changes in the field data can come from multiple places and it is not safe to assume that any data in the model is ever going to be valid.

Given the performance concerns and the minimal amount of benefit that validating on save() would give, 


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Cal Leeming [Simplicity Media Ltd]

unread,
18 Oct 2013, 10:18:3318/10/2013
to django-d...@googlegroups.com
Sorry please ignore my last email, my email client went a bit weird and sent the draft whilst I was still editing/thinking. Here is the proper version;

This is yet another reason why I don't think it would be reasonable to expect field validation within the model.

If the validations were done using DB constraints (as per Anssi's reply) then this would be a reasonable expectation, however as mentioned before changes in the field data can come from multiple places and it is not safe to assume that any data in the model is ever going to be valid.

I think if the developer understands why this is important, they would be adding a lot of code to handle these different scenarios anyway, so I don't think removing the need to call full_clean() manually would be of much benefit.

However that being said, I would be in favour of adding a call which allows you to validate changed and/or all fields, making it easier to handle these different validation scenarios. (i.e. being able to easily detect if it is old data that has caused validation error, or new data).

Cal


Marc Tamlyn

unread,
18 Oct 2013, 15:19:0818/10/2013
to django-d...@googlegroups.com

I think Anssi summed up my feelings on this very well, in particular with the issues with update(). Similarly create() would also bypass as it does not call save() AFAIK.

Given these commonly used approaches do not work, and that the name of the method - save() - implies only saving to the database - I'm -1 on save() calling clean().

That said, there is an interesting option to explore in ModelForms as to how to push more if the validation to the database where that's appropriate. This would increase the performance of write-heavy sites, and also enforce better data integrity. At the moment all cleaning is done before the instance is saved, and the assumption is that the instance will pass db validation. As far as I know, we don't currently have any code which detects problems there (dB integrity errors) and ties them back to the fields or instances which have caused the issue. This is obviously backend-specific code, and is far from straightforward. I think some of the pending ValidationError changes could make it possible.

Anssi and Andrew - do you think the dB can actually tell is enough information to get this right? I guess if not an option would be to fall back to the current query-creating code to work out why the save failed.

Most of this is throwing thoughts around, apologies if they're illogical…

Marc

Cal Leeming [Simplicity Media Ltd]

unread,
22 Oct 2013, 08:40:4622/10/2013
to django-d...@googlegroups.com
It seems there is a split reaction on whether or not models should assume data is clean, and comes down to design decision.

I think that a docs patch explaining this entire concept to new users would be sufficient for now (and I'm more than happy to spend time writing it).

Would any of the core devs object to such a patch?

Cal


Tim Graham

unread,
22 Oct 2013, 09:01:3622/10/2013
to django-d...@googlegroups.com

German Larrain

unread,
2 Nov 2013, 08:11:0802/11/2013
to django-d...@googlegroups.com
FWIW I was very surprised when I realized this problem. I couldn't understand why a model object with an email field (without `blank=True`) could be saved with an empty string as email address.

The way I dealt with this was creating a mixin (ValidateModelMixin) and adding it as left-most superclass in all the models where I needed 'full_clean' to be called before 'save'.

Reply all
Reply to author
Forward
0 new messages