Except that very few validation requirements correspond to database
constraints (for example any reg-exp matching field, or limits on an
integer, etc). We aren't about to require database check constraints for
all of those. So it's not really a one-to-one in practice. You've
misread the patch quite badly, it sounds like: only the unique
requirements have to check the database (and a pre-emptive check is
reasonable there, since it's early error detection and there's only a
small class of applications that are going to have such highly
concurrent overlapping write styles that they will pass that test and
fail at the save time).
We might even be able to trim all the unique checks down to a single
database call. That's something that can be looked at further down the
line, since it depends on how the code falls out as well as a bunch of
profiling and some intelligent guesses about use-case likelihoods.
> The problem is conceptual: how aware should validation level be
> of storage level. Python is known for the "It's easier to ask
> forgiveness than permission" motto, the proposal is ideologically
> based on that.
>
> In short, db validation is shifted to `ModelForm.save()`. Outside of
> ModelForms, model validation has to be done manually as
> presently. The changes are less disruptive and fewer lines of
> code need to be updated.
This isn't a very good idea and the reason for the separation been
discussed before. There needs to be a clear phase prior to saving when
you can detect validation errors so that they can be correctly presented
back to the user. You see this already with forms where we check for
validity and, if it's not valid, we present the errors. If it is valid,
we move onto doing whatever we want with the data (saving it or
whatever).
The only time there's any kind of overlap is when there's a database
constraint such as uniqueness which we cannot guarantee will remain true
between the validation step and the saving step. So there's a chance
that save() will raise some kind of database integrity error. But that's
actually the edge-case and in a wide number of use-cases it's
practically zero since you know that your application is the only thing
working with the database. So it's an acceptable trade-off.
Validation for models will look a lot like validation for forms in the
end. Honza is working on a slightly different approach to the last one,
taking into account what we've learnt from that approach. So take a pew
for a bit and wait until he has some code ready. A wild redesign isn't
required, though and what you're proposing would actually make a lot of
code harder to write, since custom save methods really work much more
easily when you know that the data is in a consistent format and valid.
Then there's one point (the final "talk to the database" save line) that
can raise an IntegrityError, depending on your data modelling and
problem domain and that's it. Everything splits fairly nicely between
validation and saving with a space in the middle to operate on the
results of validation.
Regards,
Malcolm
Obviously not, since we're building a proper framework here. How did you
get from "we should do checking as part of the validation step" to "we
should not handle database errors in save()"? They're different steps
and the former is a good idea and not done at the expense of the latter.
You were proposing letting the database be the only place to catch
unique errors which has flaws like waiting until save time to raise a
validation problem that could be caught earlier (at a point when we can
report it to the user) as well as not catching all the problems (when an
IntegrityError is raised, you won't find out that two fields failed
uniqueness constraints -- you'll get one database constraint violation).
Doing uniqueness detection in the validation phase is still a necessary
component, as is handling errors at save time.
> And I don't buy the "your application is the only thing working with
> the database" argument -- it is, but it happens to be multi-process or
> -thread. How am I supposed to avoid that thread or process A and B do
> a validate_unique() for some x concurrently, finding both out that it
> doesn't exist and proceed happily with saving, so that one of them
> chokes on an IntegrityError?
No, you've missed the point. Quite often one knows (which is why I said
a wide number of use-cases) that simultaneous updates to the same bit of
data even within the application are very low probability. Even on sites
that are largely read-write as opposed to write only. I'm not making
this stuff up out of thin air. I have experience working on such systems
with multiple-master database replication and watching the traffic
patterns to work out things like likelihood of the need for conflict
resolution on updates. The multiple simultaneous writes to the exact
same piece of data is only one corner of the full domain space. We plan
for the worst, but often optimise for other cases. Again, I'm not
dismissing anything.
Regards,
Malcolm