Model-level validation

493 views
Skip to first unread message

Aaron Smith

unread,
Sep 29, 2022, 12:29:30 AM9/29/22
to Django developers (Contributions to Django itself)
Why doesn't Django validate Models on save()?

I am aware that full_clean() is called when using ModelForms. But most web app development these days, and every django app I've ever worked with, are headless APIs. The default behavior is dangerous for the naive developer.

Bringing View-level concepts such as forms or serializers down into celery tasks and management commands breaks separation of concerns, and having multiple validation implementations at different layers in the app is fraught with divergence and unexpected behavior.

It's not right. The data store layer should protect the validity of the data.

Carlton Gibson

unread,
Sep 29, 2022, 4:04:17 AM9/29/22
to Django developers (Contributions to Django itself)
Hi. 

I have to ask, did you search the history at all here? This has been discussed *several times* over the years. 

> Bringing View-level concepts such as forms down into celery tasks and management commands breaks separation of concerns...

I think it's an error to think of forms (please read "or serializers" throughout) as "view-level". 
A view's job is to turn requests into responses. 
It may use other layers of Django, such as the ORM to do that, but that doesn't make said layers part of the view. (The ORM is not "view-level".)

Forms are such another level. Their role is to validate (and transform) incoming data, and they provide an opportunity to present user-friendly validation errors back up to the client code. 
You should definitely be using them in all places that you process incoming data, and that includes celery tasks. 

אורי

unread,
Sep 29, 2022, 5:46:40 AM9/29/22
to django-d...@googlegroups.com
On Thu, Sep 29, 2022 at 11:04 AM Carlton Gibson <carlton...@gmail.com> wrote:

On Thursday, 29 September 2022 at 06:29:30 UTC+2 aa...@aaronsmith.co wrote:
Why doesn't Django validate Models on save()?

I am aware that full_clean() is called when using ModelForms. But most web app development these days, and every django app I've ever worked with, are headless APIs. The default behavior is dangerous for the naive developer.

Bringing View-level concepts such as forms or serializers down into celery tasks and management commands breaks separation of concerns, and having multiple validation implementations at different layers in the app is fraught with divergence and unexpected behavior.

It's not right. The data store layer should protect the validity of the data.

I haven't received the original message but only the reply from Carlton Gibson.

In Speedy Net, all the models inherit from a BaseModel which inherits from ValidateModelMixin which runs self.full_clean() on save().

I was surprised that this is not a default in Django and I think it should be - models should validate their values before saving to the database.

I also disabled bulk actions such as bulk_create() to avoid saving to the database without calling save().

Personally, I think this is a must in Django - calling save() for each instance and calling self.full_clean() to validate data in save().

Uri Rodberg, Speedy Net.

 

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/566b4706-4075-485b-9122-eca24d3b67fcn%40googlegroups.com.

Aaron Smith

unread,
Sep 29, 2022, 11:54:41 AM9/29/22
to Django developers (Contributions to Django itself)
Yes, I did search, and I did not find an answer to my question.

If one is always supposed to use a ModelForm, why isn't that ModelForm functionality part of the Model?

Aaron Smith

unread,
Sep 29, 2022, 12:01:08 PM9/29/22
to Django developers (Contributions to Django itself)
All I was able to find was that it was for "performance reasons", and I refuse to believe that a mature web framework like Django would prioritize performance (let's face it - even my millions-of-requests-per-day API service doesn't care about a few extra milliseconds here and there) over the most basic data safety concerns by default. My expectation would be that validation would be opt-out, not opt-in. There must be another reason.
On Thursday, September 29, 2022 at 1:04:17 AM UTC-7 carlton...@gmail.com wrote:

Curtis Maloney

unread,
Sep 29, 2022, 8:19:07 PM9/29/22
to 'Mike Hansen' via Django developers (Contributions to Django itself)
On Thu, 29 Sep 2022, at 14:29, Aaron Smith wrote:
Why doesn't Django validate Models on save()?

The short answer is: backwards compatibility.

Model level validation hasn't always been a thing, with Django initially depending primarily on Form validation.

Since it hadn't _always_ been there, it wasn't possible to introduce it, enforce it, and not break most apps out there.

There was so much code written that generally assumed it could call `save()` and not have to catch validation errors.

For what it's worth, I'm all in favor of making it run on `save()` ... updating the documentation and preparing the community is going to be a mammoth task, however. A safe course through this will take some very careful planning.

--
C

Carl Meyer

unread,
Sep 29, 2022, 8:36:29 PM9/29/22
to django-d...@googlegroups.com
Another factor that should be considered is that the Django ORM gives you plenty of ways to update your database (eg bulk updates on a queryset) that clearly cannot materialize and validate every object in the queryset. So is it better to consistently say “the ORM doesn’t validate, forms do,” or better to say “the ORM will validate on Model.save() but not in various other cases, so you still can’t really rely on the ORM to enforce model validation invariants consistently.”

Carl

Aaron Smith

unread,
Sep 29, 2022, 9:30:32 PM9/29/22
to Django developers (Contributions to Django itself)
How about a new class, `ValidatedModel`, that subclasses `Model` and does nothing more than call `full_clean()` on `save()`?

This would be completely backwards compatible, would clearly communicate what it does, and when documented right next to `Model` make it fairly obvious that Model is something other than validated, hopefully preventing many footguns.

Or, and I think this would be better, if the current Model were renamed `UnvalidatedModel`, the new validated implementation above were `Model`. This upgrade path is a simple string replacement for those legacy codebases (Model->UnvalidatedModel), making it abundantly clear they are not validated, and new apps following the most naive path (Model) are as safe as possible. The new, validated, `Model.save()` could accept the kwarg `validate=False` as an opt-out, which as much as I hate to admit it is an important option for some codebases.

Aaron Smith

unread,
Sep 29, 2022, 9:34:05 PM9/29/22
to Django developers (Contributions to Django itself)
Carl,

All ORMs I have worked with allow you to bypass validations when necessary. Sometimes you have to. But the path of greatest naivety should be as safe as possible. I cannot even imagine how many lost hours and economic damages have occurred because the easy path is the dangerous path. I have been there for some of it - data consistency problems are horrible.

Aaron Smith

unread,
Sep 29, 2022, 9:39:20 PM9/29/22
to Django developers (Contributions to Django itself)
I would also like everyone to know, my objective in starting this thread is to get the go-ahead to open a PR for this. I would like to contribute back.

Adrian Torres

unread,
Sep 30, 2022, 12:38:36 AM9/30/22
to Django developers (Contributions to Django itself)
Hi,

Regardless of what you consider ModelForms to be, the fact that validation doesn't happen at the model level is very jarring if you've ever used any other MVC framework, it was and still is one of the major pet peeves of Django for me, to the point where we do something similar to what Uri does, and I'm sure many other people do.

bulk_update is not an excuse, as Aaron mentioned many other ORMs / frameworks come with features that forego certain safeties in favor of performance / convenience, heck you can write SQL directly if you so desire but you need to be ready to face the consequences.

I like the `UnvalidatedModel` vs `Model` idea proposed by Aaron.

Cheers,
Adrian

Jörg Breitbart

unread,
Sep 30, 2022, 4:12:18 AM9/30/22
to django-d...@googlegroups.com
Hi there,

I dont quite understand where the sudden fuzz about this comes from. I
want to point out a few things, before going down the rabbit hole of
competing high level interfaces, that ValidatedModel/UnvalidatedModel
would introduce:

- Django offers all validation needs as building blocks. It is literally
a Mixin away, where you can create a validate_and_save method yourself,
pulling in any level of validation you need. Thats most flexible and has
always covered our needs (up to validate complex side constraints, that
dont fit the default validators).
- The ORM is still a quite thin abstraction on top of the db engines,
which is good as it keeps things speedy. In terms of isolation of
concerns this is again just a db building block, thus I'd expect
`Model.save` to do the db work persistence work, not any other side
tracking work. Also `save` already is quite fat (a reason why we often
reshape things to use batch/bulk actions and do validation quite different).
- Validation on `save` might not be wanted for different reasons (e.g.
data got already validated by other means).
- Because other frameworks always validate data itself before writing to
database is a no-argument - it is clearly stated in the docs, that
`save` does *not* do it (maybe needs more prominent mentioning in
starter examples?) Or to make this as no-argument more blatantly: "Why
doesn't Python use curly braces, I've seen that in other languages, imho
it is a must-have!" - Nope its not. If in doubt - read the docs.

Now regarding another model interface doing validation on `save` - imho
this just creates more ambiguity for starters. Clearly bulk actions
cannot be treated that way - so now we have a model type that does
validation on `save` but not on any other bulk action. Great - ppl get
lured into thinking, that their data gets always validated not getting
the bulk catch.

Imho the opposite is easier to communicate even to starters - "Nope,
model actions dont do any data validation beside db-level integrity
checks. But the model type offers validation building blocks as of
`clean()`, `full_clean()` ... If you want explicit data validation do
this ... or that ..." (Note thats literally one Model.full_clean call
away if you are cool with djangos default validation ideas)

This gets a -1 from me for:
- mixing db concerns with python-side data validation
- and putting more work on `save`


Regards,
Jörg



Am 30.09.22 um 06:38 schrieb Adrian Torres:
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/f51db24d-7c69-49f2-ad83-c8fd2418bfbdn%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/f51db24d-7c69-49f2-ad83-c8fd2418bfbdn%40googlegroups.com?utm_medium=email&utm_source=footer>.

Danilov Maxim

unread,
Sep 30, 2022, 4:38:01 AM9/30/22
to django-d...@googlegroups.com
I am completely agree with Jörg.

We use
Model.full_clean
And
Model.validate_unique to check conditional constraints.

It was more than enough. I've not seen any case - where I need something else.


Mit freundlichen Grüßen,
DI Mag. Maxim Danilov

+43(681)207 447 76
ma...@wpsoft.at
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/74abdaee-fcb6-872d-25ad-545bbf9c9e3b%40netzkolchose.de.

Aaron Smith

unread,
Sep 30, 2022, 11:52:31 AM9/30/22
to Django developers (Contributions to Django itself)
Jorg,

My observations come from the real world and are not hypothetical guesses. Among the dozen of Django applications I have worked on, at 3 companies, not a single one was actually running any kind of validation. It has always been a mistake, 100% of the time, never the desired behavior.

Django applications are usually worked on by people who are pressed into service to work on a web app because they know python. Most are not professional coders. They do the easiest possible thing and read a minimum of documentation. As a result, they wind up with horrible data consistency problems that they need to hire people like me to clean up.

The question is not whether you can compose validation into django models. The concern is that it should be done by default to protect the average naive newbie developer from mistakes.

Jörg Breitbart

unread,
Sep 30, 2022, 5:19:35 PM9/30/22
to django-d...@googlegroups.com
@Aaron

Oh well, if anecdotal personal evidence counts for you - here is mine:

Working here since 2008 with Django in tons of projects in different
positions. The project sizes were from small websites to big API-driven
SPA cluster installations (with and w'o DRF). Ofc it is not all rainbows
and ponies with Django, but python-side data validation never crossed my
way as seriously flawed in Django. NOT EVEN ONCE. (Could list at least
5-7 other topics that are somewhat tedious to get done with Django, but
thats offtopic here.)

Plz dont jump from personal frustration about poor development processes
you have observed to all-conclusions, that depict most Django users as
total noobs. (Still funny to read, as it reminded me on those flaming
wars between Perl and Python folks ~18ys ago, which abruptly ended when
Perl6 finally made its debut.)

> The question is not whether you /can/ compose validation into django
> models. The concern is that it should be done /by default/ to protect
> the average naive newbie developer from mistakes.

I'm sorry if I didn't answer that more directly for you - nope, imho it
should not be done by default on `Model.save`. It violates the path in
separation of concerns Django has chosen with form validation, thus the
-1 from my side.


Regards,
Jörg

Aaron Smith

unread,
Sep 30, 2022, 7:27:13 PM9/30/22
to Django developers (Contributions to Django itself)
Jorg,

I do not believe it violates any separation of concerns. `full_clean()` is already a method on the Model class itself. The Model is already where all validation logic lives, except for the actual triggering of the validation.

What I believe violates separation of concerns is that models do not run something which is already internal to itself, i.e. they are not actually fully functional as a data store layer, unless an external thing (ModelForm) is implemented. That feels wrong to me.

Aaron Smith

unread,
Sep 30, 2022, 7:53:24 PM9/30/22
to Django developers (Contributions to Django itself)
If `ModelForm` were truly where validation logic lived, django would not even use foreign keys. All constraints would be handled at the Form layer. But we do use FKs, and also do other database-level consistency and validity features, because data objects should understand and enforce their own constraints. So what we have now is some types of validation happen below Model, and some live above in the Form. This means that the data store is not a single thing that's implemented with a simple interface, it is a network of things which in inherently more difficult to work with, understand, and maintain.

If Forms were truly the validation layer, why am I able to specify things like maximum length and allowed choices on the Model? Shouldn't those things be specified at the Form layer?

Tim Graham

unread,
Oct 1, 2022, 8:16:51 PM10/1/22
to Django developers (Contributions to Django itself)
> Among the dozen of Django applications I have worked on, at 3 companies, not a single one was actually running any kind of validation. It has always been a mistake, 100% of the time, never the desired behavior.

Besides not taking time to understand how Django works, it seems they weren't doing any manual testing or writing tests for invalid data either, so for me, this doesn't add much weight to the argument.

> I was able to find was that it was for "performance reasons", and I refuse to believe that a mature web framework like Django would prioritize performance (let's face it - even my millions-of-requests-per-day API service doesn't care about a few extra milliseconds here and there) over the most basic data safety concerns by default.

I'm not sure it's correct to dismiss performance considerations, particularly when Model.full_clean() could add database queries for validating unique or other constraints. I believe doing validation redundantly (e.g. with form validation) or unnecessarily (e.g. bulk loading good data) would add undesired overhead. I think that forcing the entire Django ecosystem to opt out of automatic model validation as needed would be a heavy requirement to impose at this point. And apparently it was considered too heavy a requirement to impose when model validation was added in Django 1.2, released May 2010.

I would try to keep an open mind to a concrete proposal, but I'm skeptical and it's surely non-trivial.

David Sanders

unread,
Oct 1, 2022, 9:50:56 PM10/1/22
to django-d...@googlegroups.com
I'm not really interested in debating whether the ORM validates or not but I thought it might be worth pointing out a few things that haven't been touched on yet:

> It's not right.

Design decisions are often neither outright right nor wrong but more tradeoffs of varying values.


> The data store layer should protect the validity of the data.

I disagree that the ORM is the data store layer - that's the database. I never put any guarantees in ORM validation because there's always a myriad of ways to get around it.

If you want guarantees I suggest you look into setting up constraints, they're quite easy with Django nowadays. Some examples aside from the usual unique constraint:
  • Validation of choices? Setup a check constraint to check the value exists in the TextChoices `values` attribute.
  • Validation of non-overlapping date ranges? Use range types with exclusion constraints.
  • Only 1 column from a set of columns should be set? Use a check constraint with an xor not null test.
  • There are plenty more of these :)
Only the database can protect the data.

--
David

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/37ec0c58-2561-4300-9ead-05160410c389n%40googlegroups.com.

Aaron Smith

unread,
Oct 1, 2022, 10:30:43 PM10/1/22
to Django developers (Contributions to Django itself)
I'm glad you brought up tests, because automated tests for these issues are quite difficult and require tribal knowledge. A first-hand example from a while back: The application had over 90% test coverage, including all of the validation rules in the DRF serializer. One of those rules was that a `status` could only be certain values. At some point we decided to start ingesting this data from a CSV (polled/downloaded from S3) instead of POST to an endpoint. Nothing about the shape of the data changed, so no new tests were written around validity. A later bug in the CSV generation resulted in invalid values, which got ingested and saved directly with the Model. Cue a lost week of firefighting and data-fixing. Yes, this could have been unit tested. But requiring a full set of unit tests for object validity specific to every data source is an expanding set of tests that need to be maintained, if you have many sources and an evolving model.

Contrast to the other ORMs I've worked with: Your model tests validity, and the testing burden is far, far lower and less error prone as the application evolves.

I would be happy with any step in the direction of treating validation as a first-class feature of the ORM, even if it's not by default. A `validate` kwarg to `save()`, even if it's defaulted to False. Something plain, obvious, easy to communicate and review.

Aaron Smith

unread,
Oct 1, 2022, 10:50:29 PM10/1/22
to Django developers (Contributions to Django itself)
David -

All of your points are accurate. A usable ORM will probably never be perfectly safe, and none of the Django workarounds are particularly difficult. But requiring extra steps to get the save level of data safety as other ORMs will, just by nature of human nature and scale, make Django a riskier choice as well as increase the cost and risk of maintaining it. I think that unnecessary risk damages Django's long-term viability as a project and a technology choice for an organization.

Aaron Smith

unread,
Oct 5, 2022, 11:11:40 PM10/5/22
to Django developers (Contributions to Django itself)
It sounds like there is little support for this being the default. But I'd like to propose something that might satisfy the different concerns:

1) A `validate` kwarg for `save()`, defaulted to `False`. This maintains backwards compatibility and also moves the validation behavior users coming to Django from other frameworks likely expect, in a more user friendly way than overriding save to call `full_clean()`.

And/or...

2) An optional Django setting (`VALIDATE_MODELS_DEFAULT`?) to change the default behavior to `True`. The `validate` kwarg above would override this per call, allowing unvalidated saves when necessary.

These changes would be simple, backwards compatible, and give individual projects the choice to make Django behave like other ORMs with regard to validation. This being the Django developers mailing list I should not be surprised that most people here support the status quo, but in my personal experience, having had this conversation with dozens of coworkers over the years - 100% of them expressed a strong desire for Django to do this differently.

Curtis Maloney

unread,
Oct 5, 2022, 11:32:01 PM10/5/22
to 'Mike Hansen' via Django developers (Contributions to Django itself)
FWIW +1 from me!

--
C
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

אורי

unread,
Oct 5, 2022, 11:35:36 PM10/5/22
to django-d...@googlegroups.com
On Thu, Oct 6, 2022 at 6:11 AM Aaron Smith <aa...@aaronsmith.co> wrote:
It sounds like there is little support for this being the default. But I'd like to propose something that might satisfy the different concerns:

1) A `validate` kwarg for `save()`, defaulted to `False`. This maintains backwards compatibility and also moves the validation behavior users coming to Django from other frameworks likely expect, in a more user friendly way than overriding save to call `full_clean()`.

And/or...

2) An optional Django setting (`VALIDATE_MODELS_DEFAULT`?) to change the default behavior to `True`. The `validate` kwarg above would override this per call, allowing unvalidated saves when necessary.

These changes would be simple, backwards compatible, and give individual projects the choice to make Django behave like other ORMs with regard to validation. This being the Django developers mailing list I should not be surprised that most people here support the status quo, but in my personal experience, having had this conversation with dozens of coworkers over the years - 100% of them expressed a strong desire for Django to do this differently.

+1

I would suggest having a setting "VALIDATE_MODELS_BY_DEFAULT", which is true or false (true by default), whether to call full_clean() on save(), with an option to call it with "validate=True" or "validate=False" to override this default. Maybe also allow changing the default for specific models.

This is similar to forms that have `def save(self, commit=True):`, and you can call them with "commit=True" or "commit=False" to save or not save the results to the database. I also suggest that VALIDATE_MODELS_BY_DEFAULT will be true by default from some specific future version of Django, so that if users don't want it, they will have to manually set it to false.

We should still remember that there are bulk actions such as bulk_create() or update(), that bypass save() completely, so we have to decide how to handle them if we want our data to be always validated.

Uri Rodberg, Speedy Net.

James Bennett

unread,
Oct 6, 2022, 3:47:19 AM10/6/22
to django-d...@googlegroups.com
I see a lot of people mentioning that other ORMs do validation, but not picking up on a key difference:

Many ORMs are designed as standalone packages. For example, in Python SQLAlchemy is a standalone DB/ORM package, and other languages have similar popular ORMs.

But Django's ORM isn't standalone. It's tightly integrated into Django, and Django is a web framework. And once you focus *specifically* on the web framework use case, suddenly things start going differently.

For example: data on the web is "stringly-typed" (effectively, since HTTP doesn't really have data types) and comes in via HTML's form mechanism or other string-y formats like JSON or XML payloads. So you need not just data *validation*, but data *conversion* which works for the web use case.

And since the web use case inevitably involves supporting forms/payloads that don't persist to a relational data store -- think of, for example, a contact form that sends an email, or forms that store their results client-side for things like language or theme preferences -- you inevitably end up needing to do data conversion and validation *independently of the ORM*.

And at that point, you have to start asking tough questions about whether it's worth having *two* conversion and validation layers, just because "every other ORM has this, so we have to put one in the ORM".

Which basically is where Django is. Yes, there are utilities to do your data conversion and validation in the ORM layer if you want to. But Django is, first and foremost, a web framework, which needs to support the web use case I've described above, and so its primary conversion/validation layer can never be the ORM.

Personally, I wish model-level validation had never been added even as an option, because in a web framework like Django it's conceptually the wrong place to put the validation logic. Though that battle was lost many years ago, I'd be *strongly* against trying to expand it or start forcing the ORM to default to doing validation work that, in Django, properly belongs to the forms layer (or to serializers if you use DRF).

So: Django ships with ModelForm, which does the hard work of auto-deriving as much validation logic as possible from your model definition so you don't have to repeat it. DRF ships with ModelSerializer, which does the same thing for its validation/conversion layer. I would strongly urge people to use them. Trying to force all that validation back into the model layer misses the bigger picture of what Django is and how it works.

Aaron Smith

unread,
Oct 6, 2022, 11:33:41 AM10/6/22
to Django developers (Contributions to Django itself)
Uri - that's a great upgrade path (or should I say, non-upgrade path). Agree with `VALIDATE_MODELS_BY_DEFAULT`.

Rails also skips validations for some operations, like `update_column`, but they are prominently marked to use with caution, and the other ORMs i've used follow a similar pattern. bulk_create sounds like there's legitimate reason to not validate everything, seems reasonable to exclude it so long as there's a prominent "use with caution" statement in the docs.

Aaron Smith

unread,
Oct 6, 2022, 12:00:50 PM10/6/22
to Django developers (Contributions to Django itself)
James - The problem with moving validation up the stack, i.e. to logical branches from Model (Form, Serializer) is that you must duplicate validation logic if your data comes from multiple sources or domains (web forms and API endpoints and CSVs polled from S3. Duplication leads to divergence leads to horrible data integrity bugs and no amount of test coverage can guarantee safety. Even if you consider Django to be "only a web framework" I would still argue that validation should be centralized in the data storage layer. Validity is a core property of data. Serialization and conversion changes between sources and is a different concern than validation.

James Bennett

unread,
Oct 6, 2022, 3:03:28 PM10/6/22
to django-d...@googlegroups.com
On Thu, Oct 6, 2022 at 9:00 AM Aaron Smith <aa...@aaronsmith.co> wrote:
James - The problem with moving validation up the stack, i.e. to logical branches from Model (Form, Serializer) is that you must duplicate validation logic if your data comes from multiple sources or domains (web forms and API endpoints and CSVs polled from S3. Duplication leads to divergence leads to horrible data integrity bugs and no amount of test coverage can guarantee safety. Even if you consider Django to be "only a web framework" I would still argue that validation should be centralized in the data storage layer. Validity is a core property of data. Serialization and conversion changes between sources and is a different concern than validation.

I would flip this around and point out that the duplication comes from seeing the existing data conversion/validation layer and deciding not to use it.

There's nothing that requires you to pass in an HttpRequest instance to use a form or a serializer -- you can throw a dict of data from any source into one and have it convert/validate for you.  Those APIs are also designed to be easy to check and easy to return useful error messages from on failed validation, while a model's save() has no option other than to throw an exception at you and demand you parse the details out of it (because it was designed as part of an overall web framework that already had the validation layer elsewhere).

So I would argue, once again, that the solution to your problem is to use the existing data conversion/validation utilities (forms or serializers) regardless of the source of the data. If you refuse to, I don't think that's Django's problem to solve.

Aaron Smith

unread,
Oct 6, 2022, 10:34:16 PM10/6/22
to Django developers (Contributions to Django itself)
James - to clarify, the duplication I was referring to is having both Forms and Serializers do validation. I often work with web apps where data for the same model can arrive via user input, serializer, or created in some backend process e.g. Celery. If forms/serializers are your validation layer, you need to duplicate it and worry about how to keep them from diverging over time as there's no single source of truth. I also don't relish the thought of needing to use a Form or Serializer every time I alter a Model's data.

Perhaps we think about validation differently. I consider it to be critical to maintain complex systems with any kind of confidence, any time data is being created or changed, regardless of where that change comes from. Bugs can happen anywhere and validation is the best (only?) option to prevent data-related bugs.

Carlton Gibson

unread,
Oct 7, 2022, 3:01:30 AM10/7/22
to django-d...@googlegroups.com
> ... the duplication I was referring to is having both Forms and Serializers do validation.

That's a separate issue. 

Can we merge various aspects of DRF into Django, so that it better handles building JSON APIs? Yes, clearly. One step of that is better content type handling, another is serializers. (There are others). 
On the serializer front, it would be a question of making django.forms better able to handle list-like (possibly do-able with FormSet) and nested data, and so on. 
Not a small project, but with things like django-readers, and Pydantic (and django-ninja), and attrs/cattrs showing new ideas, re-thinking about serialization in Django is about due. 

But the issue is here: 

> ... I also don't relish the thought of needing to use a Form or Serializer every time I alter a Model's data.

I'm like literally, "¿Qué? 😳" - Every single time you get data from an untrusted source you simply **must** validate it before use. ("Filter input, escape output", I was drilled.) That applies exactly the same to a CSV file as it does to HTTP request data. (That your CSV is malformed is axiomatic no? :) 

If you want to enforce validation, with a single call, write a method (on a manager likely) that encapsulates your update logic (and runs the validation before save). Then always use that in your code. (That's long been a recommended pattern.) But don't skip the validation layer on your incoming data. 

I would be -1 to `validate` kwarg to `save()` — that's every user ever wondering should I use it? every time. (Same for a setting.)
Rather — is this a docs issue? — we should re-emphasise the importance of the validation layer. 
Then if folks want a convenience API to do both tasks, they're free to write that for their models. (This is what Uri has done for Speedy Net. It's not a bad pattern.) 






--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

אורי

unread,
Oct 7, 2022, 3:11:13 AM10/7/22
to django-d...@googlegroups.com
On Fri, Oct 7, 2022 at 10:01 AM Carlton Gibson <carlton...@gmail.com> wrote:
> ... the duplication I was referring to is having both Forms and Serializers do validation.

That's a separate issue. 

Can we merge various aspects of DRF into Django, so that it better handles building JSON APIs? Yes, clearly. One step of that is better content type handling, another is serializers. (There are others). 
On the serializer front, it would be a question of making django.forms better able to handle list-like (possibly do-able with FormSet) and nested data, and so on. 
Not a small project, but with things like django-readers, and Pydantic (and django-ninja), and attrs/cattrs showing new ideas, re-thinking about serialization in Django is about due. 

But the issue is here: 

> ... I also don't relish the thought of needing to use a Form or Serializer every time I alter a Model's data.

I'm like literally, "¿Qué? 😳" - Every single time you get data from an untrusted source you simply **must** validate it before use. ("Filter input, escape output", I was drilled.) That applies exactly the same to a CSV file as it does to HTTP request data. (That your CSV is malformed is axiomatic no? :) 

If you want to enforce validation, with a single call, write a method (on a manager likely) that encapsulates your update logic (and runs the validation before save). Then always use that in your code. (That's long been a recommended pattern.) But don't skip the validation layer on your incoming data. 

I would be -1 to `validate` kwarg to `save()` — that's every user ever wondering should I use it? every time. (Same for a setting.)
Rather — is this a docs issue? — we should re-emphasise the importance of the validation layer. 
Then if folks want a convenience API to do both tasks, they're free to write that for their models. (This is what Uri has done for Speedy Net. It's not a bad pattern.) 

Thank you! 🍑

You might want to include such a solution in the docs, in case Django users want to validate models.

My solution is taken from https://gist.github.com/glarrain/5448253

Barry Johnson

unread,
Oct 7, 2022, 10:53:39 AM10/7/22
to Django developers (Contributions to Django itself)
I agree with James in several ways.   Our large Django application does rather extensive validation of data -- but I would argue strongly against embedding that validation in the base instance.save() logic.

(I would not argue against Django including a "ValidatingModel", derived from Model, that automatically runs defined validations as part of save().  Then developers could choose which base they'd like to subclass when designing their objects.  Of course, anyone could simply create their own "ValidatingModel" class and derive everything from that class.)

Reason 1 is that business logic validation often requires access to multiple model instances -- for example, uniqueness across a set of objects about to be updated.  (e.g., "Only one person can be marked as the primary contact").  Or internal consistency:   "If this record is of this type, then it cannot have any children of that type".  Or even referential integrity in some cases:  "The incoming data has a code that serves as the primary key in some other table.  Make sure that primary key exists."

Yes, you can encode all of those cross-instance validations into an instance-level check, but then that brings us to the second point:  Performance.  There are a number of types of validations that are best served by operating on sets or lists of instances at a time.   Again, consider a referential integrity validation:  If I'm about to bulk_create 5000 instances, but need to confirm that the "xyz code" is valid for all of them, then I should run a query that selects the "xyz table" for all of the codes that are referenced within the 5000 items.... instead of doing 5000 individual lookups within that table.   Yes, one can maintain and access caches of known-valid things, but those are awkward to manage from within the Model layer.  

It's particularly difficult to write performant validations within the model when you're using .only() or .defer().   Unless the validation logic is able to detect that certain properties haven't been loaded from the database, then they would trigger extra queries retrieving values from the database solely for the purpose of validating that they are still correct (even though you aren't changing them).

Also on the performance front, there are times that removing the extra layer of validation is necessary and appropriate.  With well-tested code, once the incoming data has been validated and the transformation/operational logic is considered fully tested and accurate, then avoiding a second validation on the outbound data can result in a significant performance improvement.  If you're dealing with millions or billions of records at a time (as we do during data conversions), then those significant performance improvements are worthwhile.

Finally, Django supports the queryset .update() method.  Again, validations that run within the model instance won't even HAVE instances when using .update() -- the queryset manager would need to figure out how to do the necessary validation (and if it's a multi-field validation, good luck!)   There are also cases where the use of raw SQL is appropriate, and one obviously cannot lean on instance-level validation in that case.

Validation is indeed important -- but testing the validity of data belongs in the business logic layer, not in the database model layer.  Agreed that some types of validations can easily be encoded into the database model, but then you find yourselves writing two layers of validation ("one simple, the other more sophisticated")...  that that makes things even more complex.  We do indeed use the model-level validations for single-field validations... but we invoke those validations from our business logic at the proper time, not during the time we're saving the data to the database.

baj
------------
Barry Johnson
Epicor


Aaron Smith

unread,
Oct 7, 2022, 11:23:54 AM10/7/22
to Django developers (Contributions to Django itself)
Yes, every time you you get data from an untrusted source you must validate it. As well as every time you change model attributes, ever. There seems to be a widespread frame of mind in Django that validation is something you only need to do with data from a untrusted sources. As someone who has had to deal with the consequences of this pattern in mission critical systems, this terrifies me, and I consider it extremely harmful. Untrusted users are not the only place you can get bad data from. Bugs can happen anywhere, and no data source can be considered "safe". It happens all the time. Nothing is more dangerous than a developer who says "don't worry, I'll remember to do everything perfectly 100% of the time".  This is why model-level validation is the default in other ORMs. Django is not somehow immune to this fundamental property of software.

I am aware there are patterns to work around this in Django. My position is that skipping validation should be the rare edge case and not the easy naive path. Unless Django's stated purpose is to be a cute toy for making blogs, and robust infrastructure is off-label, but that's not what I see in the wild.

Mariusz Felisiak

unread,
Oct 7, 2022, 11:55:24 AM10/7/22
to Django developers (Contributions to Django itself)
> I am aware there are patterns to work around this in Django. My position is that skipping validation should be the rare edge case and not the easy naive path. Unless Django's stated purpose is to be a cute toy for making blogs, and robust infrastructure is off-label, but that's not what I see in the wild.

I think you're going a bit too far with your judgement and comparisons. It's already clear for everyone involved in this thread that you're firmly convinced that your way of doing things is the only right one. You don't need to emphasize it any more.

I can say that in the past 15+ years I made dozens of web apps (I've never written a blog) including critical workflows for international retailers, pharmaceutical companies, public sector etc. and I've never missed auto-validation in the ORM layer. Is this an argument in the discussion? Not really, IMO :) It's just the way it is, it's not something that can or should convince anyone that I'm right :)

Personally, I agree with James and I'm strongly against any auto-validation in the ORM. I'm also against extra settings and built-in subclasses of `Model` as `ValidatingModel`, because they would be confusing for newcomers and increase the barrier of entry for developers. Django has to make design decisions and that's one of them.

Best,
Mariusz

Aaron Smith

unread,
Oct 7, 2022, 9:21:54 PM10/7/22
to Django developers (Contributions to Django itself)
Mariusz - fair enough, I will consider my point made and apologies if it came off too strong. FWIW it's not just my opinion, it's shared by every developer (dozens) I've had this conversation with up until now. It's a stark contrast that makes me wonder how aware the core developers / old timers are of the broader user base's experience.

So you would object to a `VALIDATE_MODELS_BY_DEFAULT` setting, defaulted to False?

James Bennett

unread,
Oct 8, 2022, 2:28:58 AM10/8/22
to django-d...@googlegroups.com
On Fri, Oct 7, 2022 at 6:21 PM Aaron Smith <aa...@aaronsmith.co> wrote:
Mariusz - fair enough, I will consider my point made and apologies if it came off too strong. FWIW it's not just my opinion, it's shared by every developer (dozens) I've had this conversation with up until now. It's a stark contrast that makes me wonder how aware the core developers / old timers are of the broader user base's experience.

I would wonder how many of these developers you've talked to are used to working in Python.

The main standalone ORM package people use in Python is SQLAlchemy, which *also* does not do validation in the ORM layer. The last time I worked with Flask, the standard practice was to use Marshmallow to write serializers, and these days the popular async frameworks like Starlite and FastAPI have you write Pydantic models. In either case they fill the role of, say, a DRF serializer -- they do the validation, and data type conversion at the application boundaries, so that the ORM doesn't have to.

It's true that when you start branching out into other languages you'll encounter ORMs which have validation built-in, like Entity Framework or Hibernate, but you'll also more often encounter that in statically-typed languages where the data conversion step has already been handled for you. It's also not always clear that the ORM is the right place for validation, since often the rules being enforced are ones that aren't actually enforced at the DB level by constraints.

Either way, I think I've made the case for why Django doesn't and shouldn't do this. You seem to have a strong reluctance to use either Django forms (in a "vanilla" Django project) or DRF serializers (in a more "API" project) to validate data from sources other than direct user-initiated HTTP request, but I don't really get that -- the validation utilities are there, and if you're not willing to use them that still is not Django's problem to solve -- after all, someone else might be equally set in their conviction that all the existing validation layers are the wrong way to do things, and demand we add yet another one, and I doubt you'd be supportive of that.

So I think Django should continue to be Django, and validation should continue to be a layer independent of the ORM (which, as I originally noted, it *has* to be in a web framework, since not every use case for validation will end up touching the database). For that reason I'd be very strongly against ever adding even an optional default enforcement of model-level data validation.

Aaron Smith

unread,
Oct 8, 2022, 11:44:38 AM10/8/22
to Django developers (Contributions to Django itself)
James - these developers come from all over. They represent the random sample of people who end of working on or inheriting legacy django projects. Probably more Rails than anything else, but a few who probably worked with Hibernate or Node frameworks. For better or worse, it's who's using django.

The reason I don't want to use serializers or forms in celery tasks is because validation should happen every time a model attribute is changed. This pattern would mean more imports and boilerplate scattered around my codebase any time I want to update a status attribute on a model. Pure boilerplate vs. just calling `full_clean()` on save. full_clean() is even a method already present on Model! Why import a third-party library just so a model instance can call it's own method?!?

I think you (and Django at large) are conflating Validation and conversion/serialization/santization/input filtering, I think of them as separate concepts. Validation should happen everywhere, because bugs can happen anywhere. Input filtering is a concern of the external interfaces. Other frameworks treat them separately and I find it to be a far more robust and flexible pattern.

James Bennett

unread,
Oct 8, 2022, 5:50:16 PM10/8/22
to django-d...@googlegroups.com
On Sat, Oct 8, 2022 at 8:44 AM Aaron Smith <aa...@aaronsmith.co> wrote:
The reason I don't want to use serializers or forms in celery tasks is because validation should happen every time a model attribute is changed. This pattern would mean more imports and boilerplate scattered around my codebase any time I want to update a status attribute on a model.

This feels to me like an important symptom -- regardless of ORM design patterns (which I'll get to in a moment), data mutation should occur only in well-defined, controlled ways. Having it "scattered around [the] codebase" is somewhat worrying. In an Active Record ORM like Django, very commonly this is just methods on the model itself exposing the desired logical operations ("resolve this ticket", "publish this article", etc.), while in a lot of Data Mapper ORM setups it would be in a "business logic" or "domain" object or some sort of Repository pattern or CQRS setup or... well, lots of options, but either way it would be strongly discouraged to be directly mutating data from lots of places around the codebase.

I think you (and Django at large) are conflating Validation and conversion/serialization/santization/input filtering, I think of them as separate concepts.

I still think there's a static-versus-dynamic thing going on here where the type conversion is seen as an afterthought in statically-typed frameworks simply because they're statically typed (and a similar phenomenon occurs in Python with Pydantic).

But the basic point -- that a web framework will have to do validation that's completely independent of the persistence layer -- stands. There are simply too many use cases for forms/submissions/payloads/whatever-you-call-them that *don't* get persisted to the DB, as I mentioned in my original message. Once that's established, it's inevitable that a validation -- not just conversion -- layer is needed that isn't tied to the persistence layer. And once that exists, it's reasonable to ask whether having *another* one in the persistence layer is wasteful/redundant.

Anyway, I've explained this about as thoroughly as I can now. I think the real underlying problem here is an unsupported generalization from liking a particular pattern to deciding that pattern is objectively the only correct/acceptable one. There are lots of acceptable patterns for how to design and build and use ORMs and surrounding code. Django has settled on one in particular, and isn't the only Python framework or ORM to have settled on it. That doesn't mean it's wrong or bad, just that it's different from the pattern you want or are used to.

And so I am still very strongly against trying to push a model-layer-validation approach in Django, even optionally. Django's pattern is fine, and works well/makes sense for a web framework, for the reasons I've gone over multiple times now. I would suggest once again that you adapt to it, because fighting against a framework is never pleasant. Or, failing that, I'd suggest that you look into switching to something better suited to your preferences.

Aaron Smith

unread,
Oct 8, 2022, 11:32:44 PM10/8/22
to Django developers (Contributions to Django itself)
And so I am still very strongly against trying to push a model-layer-validation approach in Django, even optionally.

It already exists, though. `full_clean()` is a method on Model. CharFields on the model already have a notion of allowed choices. Validators are already an option on model fields. Models already do their own validation. But they require additional implementations of classes further up the stack to actually be triggered.

While validation not being a concern of the model layer is a valid design choice (not one that I would prefer, but one that I could live with) the root of the problem here is that the concept of validation is spread across multiple layers in a way that's extremely misleading for people coming to Django from other ORMs. If I can specify `validators` on a field definition, I should be able to expect that it's actually run without extra steps.

Surely we can agree that something should happen here? The status quo is confusing, a footgun and a gotcha. If it's not Model's concern, then get it out of Model.

James Bennett

unread,
Oct 9, 2022, 2:58:59 AM10/9/22
to django-d...@googlegroups.com
On Sat, Oct 8, 2022 at 8:32 PM Aaron Smith <aa...@aaronsmith.co> wrote:
Surely we can agree that something should happen here? The status quo is confusing, a footgun and a gotcha. If it's not Model's concern, then get it out of Model.

I've already said that I wish model-level validation hadn't been added to Django.

Unfortunately it's something that's been in long enough now that it would be very difficult to deprecate and remove. But that's not an argument, to my mind, for expanding it.

Shai Berger

unread,
Oct 10, 2022, 9:39:50 AM10/10/22
to django-d...@googlegroups.com
I see two separate concerns here:

1) Should Django present to users the option to do validate-on-save by
default? That is, should that option be visible -- in the form of a
documented setting or an optional argument to save()?

I tend to accept James' (and others) views and reasoning against that.

2) Can a user activate validation-on-save-by-default without resorting
to monkeypatching? Should it be possible?

I think applying such validation should be possible -- because, in many
places, you see less-than-disciplined teams creating large projects
containing heaps of code that is not of the finest quality. And I think
we should help these projects improve incrementally -- that is,
introduce means to improve their situation; promoting notions like

data mutation should occur only in well-defined, controlled ways

without making them prerequisite.

This notion is a nice ideal, but installing it on a project
after-the-fact is hard; in many places it is not realistically
attainable -- at least in the boundaries of the team's resources,
delivery requirements, and a reasonable timeframe.

Note that for such general validation, a project-wide-base-model as
suggested e.g. by Uri is, in general, not sufficient, because it may
not apply to models from 3rd-party apps, or even from django.contrib
apps. Most models in such apps are not swappable.

But there is a way, I think, using a pre_save signal. One can write a
small app to install pre-save validation on all models in the project,
merely by including it in INSTALLED_APPS.

Basically,

from django.db.models import signals
...
class WhateverConfig(AppConfig):
...

def ready()

def validate(sender, instance, raw, **kwargs):
if not raw:
instance.full_clean()

signals.pre_save.connect(validate, weak=False)

This, of course, is just a POC -- it doesn't include things like
allowing a model to opt out, for example. Or one might want to apply it
only where settings.DEBUG is set. Or only log warnings. Or a few other
variations that don't jump to my mind immediately.

But it is a way for those of us involved with large teams and projects
to add this feature, without affecting the experience of newcomers or
the layer-separation sensibilities of the framework.

HTH,
Shai.

Jure Erznožnik

unread,
Oct 12, 2022, 4:19:56 AM10/12/22
to django-d...@googlegroups.com

I'd like to chime in with this:

There was a point in time when we ran into this issue and solved it with our own Model descendant.

IMHO, I'm completely with Aaron on this: all the guts are there, just not being used. It took me quite a while to wrap my brain around the idea that validation would be specified in models, but only used in forms. And then some more looking at why all the support functions are in models, but not being used. It just didn't compute why I would have validation when using Django one way, but not when I wanted something a little different.

While all the arguments against providing this function are valid and good, they neglect that an implementation satisfying n00bs like me would really not carry any penalties: it's just an if in the model.save checking against a global setting. No code creep, no additional tests to write (but I'm pretty sure some of the existing ones might fail, django or end-app). It would remove some confusion, but grandmasters like yourselves could still turn it off and do it the way you believe is "the correct one".

Also, just to be clear: the above solution (turning this on) yielded TONS of bugs when we did it for our projects. We vastly improved unit testing following this implementation, so it was a total win for us.

LP,
Jure

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

Aaron Smith

unread,
Oct 12, 2022, 12:57:42 PM10/12/22
to Django developers (Contributions to Django itself)
I think the core developers who are making assertions about what is "accessible" and "makes sense" to newcomers would be well served by taking into account the actual experiences of newcomers. The expectation does not appear to align with reality.
Reply all
Reply to author
Forward
0 new messages