(a) The link leads to an error page, although it appears to be a 500
error (would be nice if github didn't make it look like it could be a
404 or a 500).
(b) Please do write it out and post it here so that we can have the
discussion on the mailing list. That way, everybody is already
subscribed to the right place (django-developers automatically reaches
the appropriate audience) Wikis make it much harder to track history and
authorship, amongst other problems. We have the mailing list for design
discussions.
Thanks,
Malcolm
This is a pretty good summary of the situation, although I suspect
there's a fairly easy solution at hand, which I've outlined below.
On Mon, 2009-01-19 at 02:47 -0800, mrts wrote:
[...]
>
> The main focus is on fields.
>
> Form fields
> ===========
This all sounds like a description of the current situation, which is
fine.
> Model fields
> ============
>
> Ditto for model fields, omitting the first step: format validation,
> coercion to Python and value validation is generally required.
Agreed.
>
> If, however, we want to more radically depart from current
> behaviour, then it is possible and indeed reasonable to assume that
> the values assigned to model fields are already the expected Python
> types, as it is the application developer’s responsibility to assure
> that. This would get rid of the format validation and coercion step
> (these are already handled by the form layer). to_python() is
> currently used in model fields for controlling the conversion
As a practical matter, that ship has already sailed. Lots of code
assigns things like strings to datetime fields. I've always thought that
was an unfortunate decision, but it's hardly a showstopper. Model
validators will need to be able to convert from reasonable input
variations to the right Python object. It's not unreasonable to expect,
however, that after validation/cleaning has occurred on a model, all the
attributes contains things of the right type (assuming the model is
valid). That means that subsequent code working with, e.g, a datetime
attribute on a valid model doesn't have to worry that
mymodel.foo.strftime() will raise an exception because "foo" happens to
be a string, not a datetime.
> Interaction between form and model fields
> =========================================
>
> Forms and models should be orthogonal and oblivious of each other.
> Modelforms serves as the glue and mapping between the two, being
> aware of the internals of both.
>
> A modelform should have the following responsibilites in validation
> context:
>
> * clean the form (performs format validation, coercion to a Python
> type and value validation),
> * assign the form field values to the associated model fields,
>
> * inform the model fields that basic validation has already been done
> to
> avoid duplicated validation and call any additional validation
> methods.
That's one approach, but the alternative I've sketched below is another
one. Short version: when the form field validation would match what the
model field is going to do anyway, don't do anything at the form level.
The model field validation is about to be called anyway.
> Thus, the modelform should be able to invoke only the additional
> and custom model field validators, *skipping the default coercers
> and validators* that can be assumed to be the same in similar form
> and model field classes (e.g. an IntegerField in forms and
> IntegerField in models).
That's a slightly flawed assumption, as it won't necessarily be true.
Particularly in the case of custom form field overrides on a model form.
There could be any number of variations, from the range of data
permitted to the types.
[Aside: It's certainly reasonable to think about the duplicated effort,
but I wouldn't worry about it too much. It's not a huge amount of
duplicated computation, since the case when something is already of the
right type and passes normalisation and validation tends to be quite
quick (an if-check or two and then moving on). So if we don't have a
fantastically neat solution for this initially in the communcation
between ModelForms and Models, the world won't stop spinning. That's
only one use-case in the much larger scheme of things (again, I'm not
dismissing it, but let's not get too hung up on this case). That being
said... ]
The solution here might not be too difficult and doesn't need the
requirement of communication from the form to the model about what it's
already done.
To wit, when a formfield is constructed automatically from the model
field, if the default formfield doesn't do any validation/normalisation
beyond what the normal model field does, the formfield() method on the
Field subclass returns a formfield with no validators. Thus, when the
form cleaning process is run, all the validation for that field is done
by the model. Thus form fields returned for model fields could well be
slightly modified (in that their validator list is different) from what
you get directly from using forms.fields.*.
That dovetails nicely with the reuqirements of custom form fields. When
a custom form field is created, the form field validation won't be
skipped (unless the creator wants to remove any validation from the form
field), so any differences between the default model field validation
and the custom form field validation aren't overlooked: they both get
run (which is the correct behaviour).
> General validation principles
> =============================
>
> The extended conclusion of the sections above is as follows.
>
> Double validation should never happen.
It doesn't hurt if it does and sometimes both form and models will need
to run validation on the same piece of data (e.g. the custom form field
case).
> And now something completely different
> ======================================
>
> "Every problem in computer science can be solved by
> another level of indirection."
> --- source unknown
"... and now you (often) have two problems" -- Malcolm (précising a few
hundred other people). :-)
>
> Both form and model fields need similar functionality that is not
> well served by just duplicating validation and coercion functions
> in them.
The two data flows are similar, but not identical. I'd tend to shy away
from class-ifying this if we don't have to. That is, prefer direct
functions over yet another class, simply because there's not really a
class structure there -- a need to poke at class internals -- and it's
not going to enough functions to require namespacing.
Attempts to abstract away similar-but-not-the-same algorithms can often
lead to more difficult code than writing the same thing out directly in
the two places. It tends to show up in the number of parameters you pass
into the abstract algorithm and/or the complexity of the return types.
And, in this case, it's only two places, not fourteen or anything like
that. So whether this is worth it tends to be implementation specific.
> Action plan for 1.1
> ===================
>
> The mixin approach is not in scope for 1.1. We go with what we
> already have, factoring the bits in current model fields' to_python
> to a separate django.utils.typeconverters library
> (see http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a01785181a91a37971
> for the first steps in that direction) and using both
> django.core.validators and django.utils.typeconverters throughout
> model and form fields.
As a style issue, there's a bit of an arbitrary split going on there.
Type conversion is really part of what we're calling "validation" (since
the type coercion is part of the normalisation part of the process,
which we tend to lump into validation). I'd lump things a bit closer
together in the source. There's not really an external use-case scenario
for the converters and, even within Django, they don't have a lot of
user outside of the validators (we can't do a lot of sharing with the
database-specific backend converters, since they do tend to be
backend-specific).
Maybe (and looking at my notes, I might upgrade that to probably, given
the number of times I've mentioned it), we need to make a directory for
django.core.validators instead of a single file. That gives us room to
split out converters from things that work with standard types. But
definitely try to keep them close together in the source, rather than
putting them in django.utils, which can sometimes be a dumping ground
for stuff that better belongs elsewhere.
> I'm not sure about Honza's plans in regard avoiding duplicate
> validation, hopefully he comments on that himself.
> Also, let me remind that model and form objects have not been
> discussed in this post (and it's already more than 150 lines long),
> only fields.
I don't think those two situations are particularly tricky, are they?
Validation on forms shouldn't change from what it is now. Partly because
it Just Works pretty logically, but primarily because we should do the
hard work so that users don't have to and don't need to rewrite their
code.
We do need a multi-field validation method on models, similar to
Form.clean(). There's currently a Model.clean() method in Honza's code,
which I've been reviewing, and I've got a note to probably change the
name a bit there so that the clean() name can be reserved for people
wanting to write multi-field validation, keeping things in parallel with
the way Forms work. That's a naming issue, not a substantive one,
however. As you've noted, the algorithm for forms and models isn't that
difference.
Anyway, nice write-up. Be interesting to see what others think.
Regards,
Malcolm
" ... except the problem of too many levels of indirection"
- Kevlin Henney
http://en.wikipedia.org/wiki/Abstraction_layer
http://en.wikipedia.org/wiki/Butler_Lampson
Regards
Rajeev J Sebastian
For models, the "instance" will the models "self" attribute. So that one
can do checks based on other information available on the model
instance. It's kind of the whole point behind the "model-aware" portion
of model-aware validation.
Asking that everything like that gets pushed to clean() is being
awkward. clean() is for multi-field validation, the individual
validators are for single field validation. That doesn't mean the latter
cannot use other information available on the form.
I'll try and make time to look at this, along with other recent work in
this area, over the weekend.
Malcolm
On Fri, Jan 23, 2009 at 8:04 AM, mrts <mr...@mrts.pri.ee> wrote:
> works both for forms and models with the proposed approach (all_values
> is model_instance.__dict__.copy() in model field validation case).
One thing to consider is that not all data is always available via the
dictionary, particularly with regard to models. Anything that uses a
descriptor (such as relations to other models) only stores a bare
minimum of information in the instance dict. The rest is retrieved
when accessed *as an attribute*, a fact which is entirely lost in your
example. I understand you're trying not to bundle it too much with the
way models work, so maybe this was intentional, but I do think it's
worth noting, just in case it's not intentional.
> Handling this specially for model instances would be really ugly:
>
> def __call__(self, value, all_values={}, instance=None):
> if instance is not None:
> if hasattr(self.other_field, instance) and getattr
> (self.other_field, instance) == self.other_value and not value:
> raise ValidationError(self.error_message, REQUIRED)
> elif self.other_field in all_values and all_values
> [self.other_field] == self.other_value and not value:
> raise ValidationError(self.error_message, REQUIRED)
>
> and entirely unnecessary under the all_values approach.
I agree that it's a bit ugly, but I'd hardly call it unnecessary,
given the possibly of using descriptors on models. Forms aren't
susceptible to the same problem, because their values are stored in a
true dict regardless. Offhand, I can only think of one way to get a
uniform syntax that works the way you seem to want it to, and that's
by bundling forms and models in some other objects with a unified
interface, designed solely for validation. Something like the
following (untested, coding off the top of my head):
class ValidatableModel(object):
def __init__(self, instance):
self._instance = instance
def __getitem__(self, name):
if hasattr(self._instance, name):
return getattr(self._instance, name)
raise KeyError(name)
class ValidatableForm(object):
def __init__(self, instance):
self._instance = instance
def __getitem__(self, name):
return self._instance.cleaned_data[name]
There could be more methods to be able to iterate over fields and
whatever other niceties you would want to include, but that way you
have a uniform dict interface to work with, regardless of which object
type gets passed in, and it still preserves the ability to use
descriptors on model attributes.
That said, I'm not particularly fond of that solution myself, because
I personally feel that models and forms *should* be treated a bit
differently; the field types for each are just too different in their
capabilities. I haven't been involved with this validation stuff
except for improving file support for it, but I'm not sure a unified
approach would be as big of a win as you claim.
-Gul
(and if I'm missing some obvious bit of history here, feel free to correct me)
The second bit is relatively minor. We're in complete control of that
and can write the validators however is needed. Getting the former bit
right is more important, since that's the public API.
Leading off with a slight summary of what follows: I suspect this is,
ultimately, a bit of a coin toss. I'm inclined to come down more in line
with Honza's feelings. I've debated a lot with myself whether this is
because a lot of that was originally the same as the ideas I had or
because I prefer it technically, and I'm pretty sure it's the latter.
In either case, there's some roughness in the API simply because models
and forms are not parallel sorts of objects. So some shoe-horning is
required somewhere. How many layers of cotton wool ... er.. abstraction
to wrap around things is the question and it's good that there are
multiple possibilities being discussed, since it means we can have some
confidence the problem has been looked at from multiple angles.
> In core validators, no special case handling of forms and models
> is needed even in multi-value validators. All what is needed is
> an abstraction of "the set of other values".
> AlwaysMatchesOtherField, RequiredIfOtherFieldDoesNotEqual etc
> will work with that abstraction, primary keys and other complex
> model fields also work for same-type comparison.
>
> My initial idea was to reduce the model instance to a dict (as
> discussed previously), however, this is really too restrictive.
> Passing the form or model instance and treating it in dict-like
> manner provides much more flexibility.
Indeed. For models, the whole self.__dict__.copy() stuff was looking a
bit ugly, when "self" is lying around begging to be used. I understand
that it's an attempt to unify data access in reusable validators, but
there has to be a nicer way.
In the standard set you have written so far. There will be in what
people write for custom stuff. I'm not quite sure how you define
"complex", though.
> If a
> need for them should arise, it's probably easiest to override
> validate() in the corresponding class, calling super().validate()
> and then perform any additional checks there.
>
> Custom validators can either use the get_value() abstraction to
> support both models and forms or be model/form-specific and use
> the instance in whatever way they like. Checking whether you deal
> with a model or form instance is generally not required -- if you
> accidentally pass a custom form validator to a model, it will
> visibly break and vice-versa.
>
> Honza doesn't like this, mainly because it's too easy to shoot
> yourself into foot with this (as outlined above, I disagree with
> that) and as of now we are using the following instead to handle
> model and form fields uniformly:
>
> def _get_value(name, all_values, instance, do_rise=True):
> if instance is not None and hasattr(instance, name):
> return getattr(instance, name)
> if name in all_values:
> return all_values[name]
> if do_raise:
> raise AttributeError('%s not found in form values or model
> instance'
> % name)
> return False
The idea of a utility function looks appropriate to me. It feels a bit
more explicit and doesn't force policy. I'd like to tweak the
implementation (name, parameters, return types and error messages) but
the idea is correct.
[...]
> Honza's comments:
> -----------------
>
> The problem is that I do not want Form class to be passed to
> validators for several reasons:
Agreed. For models, instance=self, similarly, has to be right, since we
have a more-or-less persistent instance that we are operating on. Forms
are a transient entity (they pop into existence to validate data and
then go away), so "instance" isn't such a useful concept there.
> Form shouldn't be anything except something that obtains data
> from request and can validate them, passing it validators could
> promote adding some methods and properties to be used in
> validators, instead of in the form's clean() method.
>
> Form and model treat data differently, that's just the way it is
> and as it should be - pretending it's not true will just create
> opportunity for nasty surprises.
>
> Validators wouldn't work outside forms and models (it's easy to
> pass in a dictionary of all_values, but fabricating a class with
> the same method is work).
I don't quite understand what this means, but I kind of see what Honza
is getting at. However, I suspect we might need a utility function of
some kind for getting the data value if we want to avoid passing in
self.__dict__ (no copy needed, surely) as the all_values for model
validators. But it can be just a utility function. A class shouldn't be
necessary.
>
> I would still have to check whether I am dealing with model or
> form when I would like to access some model's methods and
> properties that cannot be retrieved this way.
That's kind of a side-issue and not a reason not to pass around the
instance (i.e. not something to drive the API reasoning). But it does
seem to provide a good way to check things at the moment and we could
lock that in.
Again, I'm kind of leaning towards what you have at the moment (Honza's
preferred approach), but it's only a slight lean. Probably not worth
sweating over too much, I suspect, since if somebody really doesn't like
that API, they can use your wrapper class approach in the validators
they write for themselves.
Regards,
Malcolm
Well, it's the the last issue, but it's certainly a stumbling block.
I've been spending quite a lot of time lately trying to merge model
validation into trunk and, as you'll see in the 1.1. announcement, it's
not really there yet. There are a number of backwards incompatibilities
that have been introduced along the way and some of the API is fairly
hard to use.
> Let me present the
> approach that looks sane to me.
>
> The old Field.default_error_messages dict and corresponding
> logic is no longer required (and removed in my branch) as default
> error messages live in core/validators now.
It *is* still required. Firstly, it's a very useful way of specifying
holistic error messages. Secondly, removing it would be backwards
incompatible in a major way (many custom fields would no longer work
properly, for example).
[...]
> In reality, ValidationError contains only a single error message. The
> list is required for aggregating several errors from the validator
> collection associated with a field.
> However, as of now, the
> ValidationError class tries to be both a single value and a
> collection, resulting in quite a mess.
Definitely. I've managed to simplify it somewhat, but it requires some
conventions. Part of the problems come from trying to make model field
validation and form field validation so alike in an area where it isn't
necessarily so. It turns out to be easier to have
exceptions.ValidationError be more dictionary-like by default, although
I'm still playing with the API a bit there.
There are two possibilities here. One is that if a default error message
exists for the field, we only display that when a validation error
occurs. Otherwise we aggregate the individual validation error messages.
The other approach is to always aggregate but we have the convention
that the default (field-wide) error message is descriptive, in the sense
that it describes what is expected ("must be an integer between 1 and
101"), whilst the individual validator error messages are more
proscriptive about the actual problem that has been found ("is not an
integer"). The latter always state a problem, not a requirement, so they
read well when aggregated with the description.
>
> This is a classic case of the Composite pattern (http://
> en.wikipedia.org/wiki/Composite_pattern ), so I propose that
> * ValidationErrors remain single values
> * core/validators gets an additional ValidationErrorComposite class
> that derives from ValidationError and has the same interface
That's pretty good analysis. I've done the same thing in the merged code
I've been doing, except slightly differently.
Django.core.exceptions.ValidationError is the composite collection of
error messages.
Django.forms.utils.ValidationError, which you guys had nuked, remains
the way to make a single HTML presentation of form field errors (that
had to go back anyway for backwards-compatibility reasons, since
otherwise every single custom field cleaning method would potentially
break, at a minimum). There's a constructor for forms.ValidationError
that takes an exceptions.ValidationError to do the conversion.
Presenting model field validation errors isn't something that we have a
default for, since they aren't tied to HTML output or anything like
that. If presenting them as HTML output, the forms.ValidationError class
could be used (that's how ModelForms would do it), but, for other cases,
pulling out the errors on an as-needs required
When I get out from under some work stuff this week, I'm going to commit
what I've been merging (it a merge of yours and Honza's git branches and
a lot of editing and review changes) into a subversion branch so that
more people can test it. I'm then happy to pull in and review changes
from git branches if you guys want to work on it there, but we need to
have this done in a more general way. We've all dropped the ball a bit
-- I thought the problems that were creeping in were going to simple to
fix up in review; I was wrong. It's the broken-windows problem, where
one little variation from "great" leads to another and another, without
realising it.
Regards,
Malcolm