There are several problems in model validation as of now. As pouring
them out here would create a too long, ill-formatted post, I created a
separate page for it at
This is just "design gruntwork", a basic text body analyzing the
relations between components and problems and offering the most
evident but not necessarily most elegant solutions. I hope to spark a
proper discussion here that hopefully converges to a solid design.
On Sat, 2009-01-17 at 09:45 -0800, mrts wrote: > There are several problems in model validation as of now. As pouring > them out here would create a too long, ill-formatted post, I created a > separate page for it at
(a) The link leads to an error page, although it appears to be a 500 error (would be nice if github didn't make it look like it could be a 404 or a 500).
(b) Please do write it out and post it here so that we can have the discussion on the mailing list. That way, everybody is already subscribed to the right place (django-developers automatically reaches the appropriate audience) Wikis make it much harder to track history and authorship, amongst other problems. We have the mailing list for design discussions.
On Jan 18, 5:17 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> (b) Please do write it out and post it here so that we can have the
> discussion on the mailing list.
Let’s step back, distance ourselves from the current implementation
and look at how forms, models and modelforms should ideally
interact validation-wise at a somewhat abstract level.
The main focus is on fields.
Form fields
===========
In validation context, a form field has the following
responsibilities:
* get input value (a string) from user,
* validate that the value is in expected format
(*format validation*, preconditions),
* convert the value to a Python type,
* validate that the resulting value satisfies any conditions set
for the value (*value validation*, postconditions).
Model fields
============
Ditto for model fields, omitting the first step: format validation,
coercion to Python and value validation is generally required.
If, however, we want to more radically depart from current
behaviour, then it is possible and indeed reasonable to assume that
the values assigned to model fields are already the expected Python
types, as it is the application developer’s responsibility to assure
that. This would get rid of the format validation and coercion step
(these are already handled by the form layer). to_python() is
currently used in model fields for controlling the conversion.
Interaction between form and model fields
=========================================
Forms and models should be orthogonal and oblivious of each other.
Modelforms serves as the glue and mapping between the two, being
aware of the internals of both.
A modelform should have the following responsibilites in validation
context:
* clean the form (performs format validation, coercion to a Python
type and value validation),
* assign the form field values to the associated model fields,
* inform the model fields that basic validation has already been done
to
avoid duplicated validation and call any additional validation
methods.
Thus, the modelform should be able to invoke only the additional
and custom model field validators, *skipping the default coercers
and validators* that can be assumed to be the same in similar form
and model field classes (e.g. an IntegerField in forms and
IntegerField in models).
Let's look at an contrived example that illustrates the point.
Let IntegerForeignKeyField be an imaginary foreign key field in a
model that accepts only integer foreign keys. The corresponding
form field would be a PositiveIntegerField. Suppose that the
developer has provided a custom validator for the
IntegerForeignKeyField that allows foreign keys only to objects
where the is_active boolean field is True.
When a modelform uses them both, it is the form field's
responsibility to assure that the value is converted to a Python
int and that the int is positive. IntegerForeignKeyField should
not re-run neither the coercion nor basic validation routines.
However, IntegerForeignKeyField must
1) validate that the integer points to an existing row in the
related table (i.e. run a foreignkey-specific validator),
2) validate that the row has is_active == True (i.e. run a custom
validator).
In this case, coercion and basic validation is cheap, but that's
not always the case.
General validation principles
=============================
The extended conclusion of the sections above is as follows.
Double validation should never happen. There has to be a way to
convey that a value is already validated and no further validation
is necessary. No code duplication in type coercion or validation
should occur in models and forms. The following distinct
converters and validators participate in the coercion/validation
process:
* format validators that validate that a given string can be
converted to a Python type,
* converters that coerce a given string to a Python type,
* value validators that assure that the value returned by a
converter satisfies any conditions set for the value.
It should be easy to add custom validators and intercept or
override the process at any step.
And now something completely different
======================================
"Every problem in computer science can be solved by
another level of indirection."
--- source unknown
Both form and model fields need similar functionality that is not
well served by just duplicating validation and coercion functions
in them.
It would make sense to factor out the common functionality and
decouple it from both. A mixin class would serve that purpose well.
The following methods would live in the mixin:
* clean() that calls to_python() and then validate(),
* to_python() that takes care of format validation and coercion,
* validate() that takes care of value validation and running
custom validators.
It would have fields for storing the default and custom validators
and perhaps a state field for controlling which parts of the
validation process have already been run.
I've hacked up an incomplete rough draft at
http://dpaste.com/110671/ to illustrate this. Unfortunately it does
not illustrate all the points, but the general idea should
hopefully come through.
Action plan for 1.1
===================
The mixin approach is not in scope for 1.1. We go with what we
already have, factoring the bits in current model fields' to_python
to a separate django.utils.typeconverters library
(see http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a017... for the first steps in that direction) and using both
django.core.validators and django.utils.typeconverters throughout
model and form fields.
I'm not sure about Honza's plans in regard avoiding duplicate
validation, hopefully he comments on that himself.
Also, let me remind that model and form objects have not been
discussed in this post (and it's already more than 150 lines long),
only fields.
As I understand it, this is primarily about avoiding duplicate validation of some pieces of data when it's not necessary, right? So it's really only applicable to the ModelForm case?
This is a pretty good summary of the situation, although I suspect there's a fairly easy solution at hand, which I've outlined below.
On Mon, 2009-01-19 at 02:47 -0800, mrts wrote:
[...]
> The main focus is on fields.
> Form fields > ===========
This all sounds like a description of the current situation, which is fine.
> Model fields > ============
> Ditto for model fields, omitting the first step: format validation, > coercion to Python and value validation is generally required.
Agreed.
> If, however, we want to more radically depart from current > behaviour, then it is possible and indeed reasonable to assume that > the values assigned to model fields are already the expected Python > types, as it is the application developer’s responsibility to assure > that. This would get rid of the format validation and coercion step > (these are already handled by the form layer). to_python() is > currently used in model fields for controlling the conversion
As a practical matter, that ship has already sailed. Lots of code assigns things like strings to datetime fields. I've always thought that was an unfortunate decision, but it's hardly a showstopper. Model validators will need to be able to convert from reasonable input variations to the right Python object. It's not unreasonable to expect, however, that after validation/cleaning has occurred on a model, all the attributes contains things of the right type (assuming the model is valid). That means that subsequent code working with, e.g, a datetime attribute on a valid model doesn't have to worry that mymodel.foo.strftime() will raise an exception because "foo" happens to be a string, not a datetime.
> Interaction between form and model fields > =========================================
> Forms and models should be orthogonal and oblivious of each other. > Modelforms serves as the glue and mapping between the two, being > aware of the internals of both.
> A modelform should have the following responsibilites in validation > context:
> * clean the form (performs format validation, coercion to a Python > type and value validation), > * assign the form field values to the associated model fields,
> * inform the model fields that basic validation has already been done > to > avoid duplicated validation and call any additional validation > methods.
That's one approach, but the alternative I've sketched below is another one. Short version: when the form field validation would match what the model field is going to do anyway, don't do anything at the form level. The model field validation is about to be called anyway.
> Thus, the modelform should be able to invoke only the additional > and custom model field validators, *skipping the default coercers > and validators* that can be assumed to be the same in similar form > and model field classes (e.g. an IntegerField in forms and > IntegerField in models).
That's a slightly flawed assumption, as it won't necessarily be true. Particularly in the case of custom form field overrides on a model form. There could be any number of variations, from the range of data permitted to the types.
[Aside: It's certainly reasonable to think about the duplicated effort, but I wouldn't worry about it too much. It's not a huge amount of duplicated computation, since the case when something is already of the right type and passes normalisation and validation tends to be quite quick (an if-check or two and then moving on). So if we don't have a fantastically neat solution for this initially in the communcation between ModelForms and Models, the world won't stop spinning. That's only one use-case in the much larger scheme of things (again, I'm not dismissing it, but let's not get too hung up on this case). That being said... ]
The solution here might not be too difficult and doesn't need the requirement of communication from the form to the model about what it's already done.
To wit, when a formfield is constructed automatically from the model field, if the default formfield doesn't do any validation/normalisation beyond what the normal model field does, the formfield() method on the Field subclass returns a formfield with no validators. Thus, when the form cleaning process is run, all the validation for that field is done by the model. Thus form fields returned for model fields could well be slightly modified (in that their validator list is different) from what you get directly from using forms.fields.*.
That dovetails nicely with the reuqirements of custom form fields. When a custom form field is created, the form field validation won't be skipped (unless the creator wants to remove any validation from the form field), so any differences between the default model field validation and the custom form field validation aren't overlooked: they both get run (which is the correct behaviour).
> General validation principles > =============================
> The extended conclusion of the sections above is as follows.
> Double validation should never happen.
It doesn't hurt if it does and sometimes both form and models will need to run validation on the same piece of data (e.g. the custom form field case).
> And now something completely different > ======================================
> "Every problem in computer science can be solved by > another level of indirection." > --- source unknown
"... and now you (often) have two problems" -- Malcolm (précising a few hundred other people). :-)
> Both form and model fields need similar functionality that is not > well served by just duplicating validation and coercion functions > in them.
The two data flows are similar, but not identical. I'd tend to shy away from class-ifying this if we don't have to. That is, prefer direct functions over yet another class, simply because there's not really a class structure there -- a need to poke at class internals -- and it's not going to enough functions to require namespacing.
Attempts to abstract away similar-but-not-the-same algorithms can often lead to more difficult code than writing the same thing out directly in the two places. It tends to show up in the number of parameters you pass into the abstract algorithm and/or the complexity of the return types. And, in this case, it's only two places, not fourteen or anything like that. So whether this is worth it tends to be implementation specific.
> Action plan for 1.1 > ===================
> The mixin approach is not in scope for 1.1. We go with what we > already have, factoring the bits in current model fields' to_python > to a separate django.utils.typeconverters library > (see http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a017... > for the first steps in that direction) and using both > django.core.validators and django.utils.typeconverters throughout > model and form fields.
As a style issue, there's a bit of an arbitrary split going on there. Type conversion is really part of what we're calling "validation" (since the type coercion is part of the normalisation part of the process, which we tend to lump into validation). I'd lump things a bit closer together in the source. There's not really an external use-case scenario for the converters and, even within Django, they don't have a lot of user outside of the validators (we can't do a lot of sharing with the database-specific backend converters, since they do tend to be backend-specific).
Maybe (and looking at my notes, I might upgrade that to probably, given the number of times I've mentioned it), we need to make a directory for django.core.validators instead of a single file. That gives us room to split out converters from things that work with standard types. But definitely try to keep them close together in the source, rather than putting them in django.utils, which can sometimes be a dumping ground for stuff that better belongs elsewhere.
> I'm not sure about Honza's plans in regard avoiding duplicate > validation, hopefully he comments on that himself. > Also, let me remind that model and form objects have not been > discussed in this post (and it's already more than 150 lines long), > only fields.
I don't think those two situations are particularly tricky, are they?
Validation on forms shouldn't change from what it is now. Partly because it Just Works pretty logically, but primarily because we should do the hard work so that users don't have to and don't need to rewrite their code.
We do need a multi-field validation method on models, similar to Form.clean(). There's currently a Model.clean() method in Honza's code, which I've been reviewing, and I've got a note to probably change the name a bit there so that the clean() name can be reserved for people wanting to write multi-field validation, keeping things in parallel with the way Forms work. That's a naming issue, not a substantive one, however. As you've noted, the algorithm for forms and models isn't that difference.
Anyway, nice write-up. Be interesting to see what others think.
On Monday 19 Jan 2009 6:52:15 pm Rajeev J Sebastian wrote:
> On Mon, Jan 19, 2009 at 4:17 PM, mrts <m...@mrts.pri.ee> wrote: > > And now something completely different > > ======================================
> > "Every problem in computer science can be solved by > > another level of indirection."
> - David Wheeler
> " ... except the problem of too many levels of indirection" > - Kevlin Henney
On Jan 19, 1:43 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> one. Short version: when the form field validation would match what the
> model field is going to do anyway, don't do anything at the form level.
> The model field validation is about to be called anyway.
> The solution here might not be too difficult and doesn't need the
> requirement of communication from the form to the model about what it's
> already done.
> To wit, when a formfield is constructed automatically from the model
> field, if the default formfield doesn't do any validation/normalisation
> beyond what the normal model field does, the formfield() method on the
> Field subclass returns a formfield with no validators. Thus, when the
> form cleaning process is run, all the validation for that field is done
> by the model. Thus form fields returned for model fields could well be
> slightly modified (in that their validator list is different) from what
> you get directly from using forms.fields.*.
> That dovetails nicely with the reuqirements of custom form fields. When
> a custom form field is created, the form field validation won't be
> skipped (unless the creator wants to remove any validation from the form
> field), so any differences between the default model field validation
> and the custom form field validation aren't overlooked: they both get
> run (which is the correct behaviour).
Looks like a good action plan.
> Attempts to abstract away similar-but-not-the-same algorithms can often
> lead to more difficult code than writing the same thing out directly in
> the two places. It tends to show up in the number of parameters you pass
> into the abstract algorithm and/or the complexity of the return types.
> And, in this case, it's only two places, not fourteen or anything like
> that. So whether this is worth it tends to be implementation specific.
Yeah, I won't pursue this further, although I still think the result
would be cleaner, not uglier.
> As a style issue, there's a bit of an arbitrary split going on there.
> Type conversion is really part of what we're calling "validation" (since
> the type coercion is part of the normalisation part of the process,
> which we tend to lump into validation). I'd lump things a bit closer
> together in the source. There's not really an external use-case scenario
> for the converters and, even within Django, they don't have a lot of
> user outside of the validators (we can't do a lot of sharing with the
> database-specific backend converters, since they do tend to be
> backend-specific).
> Maybe (and looking at my notes, I might upgrade that to probably, given
> the number of times I've mentioned it), we need to make a directory for
> django.core.validators instead of a single file. That gives us room to
> split out converters from things that work with standard types. But
> definitely try to keep them close together in the source, rather than
> putting them in django.utils, which can sometimes be a dumping ground
> for stuff that better belongs elsewhere.
As seen in the commit,
http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a017... ,
my initial quick solution was to lump them into the validation
module (as there's lot's of intermingling between validators and
coercers,
exemplified by merging to_pyton() and validate() to clean() in forms).
The directory-based approach is best, I'll go with it -- but it's yet
uncertain
when as I have to handle pressing matters at work during daytime.
On Jan 19, 11:23 pm, mrts <m...@mrts.pri.ee> wrote:
> The directory-based approach is best, I'll go with it -- but it's yet
> uncertain
> when as I have to handle pressing matters at work during daytime.
I've implemented some fundamental changes that need review. The commit
is at http://github.com/mrts/honza-django/commit/482086df8d24b99f466152c51d... . The changes are not finished yet, but they should illustrate the
proposed approach. I may fail to see all the negative implications as
I've been mostly addressing fields, not the whole picture. Critique
most welcome.
Changes:
1. consolidate validators and coercers into django/core/validators/
{__init__,typeconverters}.py as suggested by Malcolm
2. make both forms and fields use similar logic in clean():
core.validators:
def clean(field_instance, value, all_values,
field_container_instance):
if value is not None:
value = field_instance.to_python(value)
field_instance.validate(value, all_values,
field_container_instance)
return value
models.fields:
class Field(object):
...
def clean(self, value, model_instance):
if model_instance is not None:
all_values = model_instance.__dict__.copy()
else:
all_values = {}
return validators.clean(self, value, all_values,
model_instance)
forms.fields:
class Field(object):
def clean(self, value, form_instance=None):
if form_instance is not None:
all_values = form_instance.cleaned_data
else:
all_values = {}
return validators.clean(self, value, all_values,
form_instance)
Rationale: make validators that need to use other fields (e.g.
RequiredIfOtherFieldEquals) work uniformly on model and form fields. A
side note: the `instance` attribute is not used in validator functions
and I can't see a clear use case for it, so it looks like it can be
removed -- prove me wrong please (I do see obscure corner cases where
it could be useful -- if one needs to distinguish between forms and
models in custom validators or do gettatr('something', instance), but
these should really be handled in clean() manually).
3. as seen from above, to_python(), validate() and validators are back
in form fields as this greatly simplifies code. clean() is still
overridable and mostly backwards-compatible. There are a few fields
that have been ported to the new logic in forms/fields.py. Form fields
accept custom validators in `validators` attribute.
4. typeconverters take additional params, mainly intended for custom/
localized date/time input handling, but also useful when overriding
to_python in custom fields:
def to_date(value, params=None):
# TODO: use params for date format
This is not yet fully implemented in the current changeset.
Unfortunately I've been unable to contribute as much time as I would
have liked (the fact that the commit is from 2.30 AM my time should
say it all), but I hope to drive field validation to completion soon
after receiving feedback on the decisions above.
> A > side note: the `instance` attribute is not used in validator functions > and I can't see a clear use case for it, so it looks like it can be > removed -- prove me wrong please (I do see obscure corner cases where > it could be useful -- if one needs to distinguish between forms and > models in custom validators or do gettatr('something', instance), but > these should really be handled in clean() manually).
For models, the "instance" will the models "self" attribute. So that one can do checks based on other information available on the model instance. It's kind of the whole point behind the "model-aware" portion of model-aware validation.
Asking that everything like that gets pushed to clean() is being awkward. clean() is for multi-field validation, the individual validators are for single field validation. That doesn't mean the latter cannot use other information available on the form.
I'll try and make time to look at this, along with other recent work in this area, over the weekend.
> > A
> > side note: the `instance` attribute is not used in validator functions
> > and I can't see a clear use case for it, so it looks like it can be
> > removed -- prove me wrong please (I do see obscure corner cases where
> > it could be useful -- if one needs to distinguish between forms and
> > models in custom validators or do gettatr('something', instance), but
> > these should really be handled in clean() manually).
> For models, the "instance" will the models "self" attribute. So that one
> can do checks based on other information available on the model
> instance. It's kind of the whole point behind the "model-aware" portion
> of model-aware validation.
> Asking that everything like that gets pushed to clean() is being
> awkward. clean() is for multi-field validation, the individual
> validators are for single field validation. That doesn't mean the latter
> cannot use other information available on the form.
As can be seen from the above, fields are already passed to validators
via all_values = model_instance.__dict__.copy() for multi-field
validation. But I agree that requiring to override clear() for
anything more advanced is too restrictive, so let the instance be part
of the signature.
> I'll try and make time to look at this, along with other recent work in
> this area, over the weekend.
That would be most welcome, perhaps you can pop by #django-dev for
more rapid idea exchange?
On Fri, Jan 23, 2009 at 2:56 AM, mrts <m...@mrts.pri.ee> wrote:
> On Jan 23, 3:40 am, Malcolm Tredinnick <malc...@pointy-stick.com>
> wrote:
>> On Thu, 2009-01-22 at 17:27 -0800, mrts wrote:
>> [....]
>> > A
>> > side note: the `instance` attribute is not used in validator functions
>> > and I can't see a clear use case for it, so it looks like it can be
>> > removed -- prove me wrong please (I do see obscure corner cases where
>> > it could be useful -- if one needs to distinguish between forms and
>> > models in custom validators or do gettatr('something', instance), but
>> > these should really be handled in clean() manually).
>> For models, the "instance" will the models "self" attribute. So that one
>> can do checks based on other information available on the model
>> instance. It's kind of the whole point behind the "model-aware" portion
>> of model-aware validation.
>> Asking that everything like that gets pushed to clean() is being
>> awkward. clean() is for multi-field validation, the individual
>> validators are for single field validation. That doesn't mean the latter
>> cannot use other information available on the form.
> As can be seen from the above, fields are already passed to validators
> via all_values = model_instance.__dict__.copy() for multi-field
> validation. But I agree that requiring to override clear() for
> anything more advanced is too restrictive, so let the instance be part
> of the signature.
>> I'll try and make time to look at this, along with other recent work in
>> this area, over the weekend.
> That would be most welcome, perhaps you can pop by #django-dev for
> more rapid idea exchange?
As the uniform all values approach has created a bit of confusion, let
me present a larger example:
Validators in core.validators should not be concerned with either
model or form internals if possible. This is currently straightforward
to achieve by passing all values always as a dict via
form.cleaned_data or model.__dict__.copy() (dict copying is cheap, the
original instance dict is unsafe to pass around).
This makes multi-field validators work uniformly for both models and
fields. Otherwise you have to always handle identical validation
differently for forms and models -- this is extremely smelly in my
opinion.
And now the example:
class RequiredIfOtherFieldEquals(object):
def __call__(self, value, all_values={}, instance=None):
if self.other_field in all_values and all_values
[self.other_field] == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)
works both for forms and models with the proposed approach (all_values
is model_instance.__dict__.copy() in model field validation case).
Handling this specially for model instances would be really ugly:
def __call__(self, value, all_values={}, instance=None):
if instance is not None:
if hasattr(self.other_field, instance) and getattr
(self.other_field, instance) == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)
elif self.other_field in all_values and all_values
[self.other_field] == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)
and entirely unnecessary under the all_values approach.
I haven't been following everything, but I do have a couple comments to make here.
On Fri, Jan 23, 2009 at 8:04 AM, mrts <m...@mrts.pri.ee> wrote: > works both for forms and models with the proposed approach (all_values > is model_instance.__dict__.copy() in model field validation case).
One thing to consider is that not all data is always available via the dictionary, particularly with regard to models. Anything that uses a descriptor (such as relations to other models) only stores a bare minimum of information in the instance dict. The rest is retrieved when accessed *as an attribute*, a fact which is entirely lost in your example. I understand you're trying not to bundle it too much with the way models work, so maybe this was intentional, but I do think it's worth noting, just in case it's not intentional.
> Handling this specially for model instances would be really ugly:
> def __call__(self, value, all_values={}, instance=None): > if instance is not None: > if hasattr(self.other_field, instance) and getattr > (self.other_field, instance) == self.other_value and not value: > raise ValidationError(self.error_message, REQUIRED) > elif self.other_field in all_values and all_values > [self.other_field] == self.other_value and not value: > raise ValidationError(self.error_message, REQUIRED)
> and entirely unnecessary under the all_values approach.
I agree that it's a bit ugly, but I'd hardly call it unnecessary, given the possibly of using descriptors on models. Forms aren't susceptible to the same problem, because their values are stored in a true dict regardless. Offhand, I can only think of one way to get a uniform syntax that works the way you seem to want it to, and that's by bundling forms and models in some other objects with a unified interface, designed solely for validation. Something like the following (untested, coding off the top of my head):
class ValidatableModel(object): def __init__(self, instance): self._instance = instance def __getitem__(self, name): if hasattr(self._instance, name): return getattr(self._instance, name) raise KeyError(name)
There could be more methods to be able to iterate over fields and whatever other niceties you would want to include, but that way you have a uniform dict interface to work with, regardless of which object type gets passed in, and it still preserves the ability to use descriptors on model attributes.
That said, I'm not particularly fond of that solution myself, because I personally feel that models and forms *should* be treated a bit differently; the field types for each are just too different in their capabilities. I haven't been involved with this validation stuff except for improving file support for it, but I'm not sure a unified approach would be as big of a win as you claim.
-Gul
(and if I'm missing some obvious bit of history here, feel free to correct me)
After discussing with Honza, we agreed that the dichotomy between
forms and models that was present before will remain, i.e. instance
will always be a model instance if given and all_values will always be
form.cleaned_data.
Honza's rationale was that it's common to have properties in models
and therefore model_instance.__dict__ (or forms.models.model_to_dict)
may be too constraining. Thus, a generic value getter will be used in
validators to provide uniform access to dict values and instance
fields as follows:
def _get_value(name, instance, all_values):
if instance is not None and hasattr(instance, name):
return getattr(instance, name)
if name in all_values:
return all_values[name]
raise AttributeError('%s not found in forms values or model
instance')
class RequiredIfOtherFieldEquals(object):
def __call__(self, value, all_values={}, instance=None):
if _get_value(self.other_field, instance, all_values) ==
self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)
After several discussions with Honza, we are still on somewhat
different positions what the validator function signature should
be and how core validators should access the fields of a form or
a model instance.
In core validators, no special case handling of forms and models
is needed even in multi-value validators. All what is needed is
an abstraction of "the set of other values".
AlwaysMatchesOtherField, RequiredIfOtherFieldDoesNotEqual etc
will work with that abstraction, primary keys and other complex
model fields also work for same-type comparison.
My initial idea was to reduce the model instance to a dict (as
discussed previously), however, this is really too restrictive.
Passing the form or model instance and treating it in dict-like
manner provides much more flexibility.
That can be achieved by the following:
class Model(...):
...
def get_value(self, field_name, *args):
"Default is specified with the first positional argument"
if args:
return getattr(self, field_name, args[0])
return getattr(self, field_name)
class BaseForm(...)
...
def get_value(self, field_name, *args):
"Default is specified with the first positional argument"
if args:
return self.cleaned_data.get(field_name, args[0])
return self.cleaned_data[field_name]
(It's not possible to implement the __getitem__ protocol as it
already works differently in forms.)
In this case the form/model dichotomy is not required in
validator signature, simple validators work as follows:
def is_slug(value, instance=None):
(access only value)
and multi-value validators as follows:
class AlwaysMatchesOtherField(object):
...
def __call__(self, value, instance=None):
if value != instance.get_value(self.other):
raise ValidationError(...)
There are currently no model-specific complex validators. If a
need for them should arise, it's probably easiest to override
validate() in the corresponding class, calling super().validate()
and then perform any additional checks there.
Custom validators can either use the get_value() abstraction to
support both models and forms or be model/form-specific and use
the instance in whatever way they like. Checking whether you deal
with a model or form instance is generally not required -- if you
accidentally pass a custom form validator to a model, it will
visibly break and vice-versa.
Honza doesn't like this, mainly because it's too easy to shoot
yourself into foot with this (as outlined above, I disagree with
that) and as of now we are using the following instead to handle
model and form fields uniformly:
def _get_value(name, all_values, instance, do_rise=True):
if instance is not None and hasattr(instance, name):
return getattr(instance, name)
if name in all_values:
return all_values[name]
if do_raise:
raise AttributeError('%s not found in form values or model
instance'
% name)
return False
class AlwaysMatchesOtherField(object):
...
def __call__(self, value, all_values={}, instance=None):
if value != _get_value(self.other, all_values, instance):
raise ValidationError(...)
Honza's comments:
-----------------
The problem is that I do not want Form class to be passed to
validators for several reasons:
Form shouldn't be anything except something that obtains data
from request and can validate them, passing it validators could
promote adding some methods and properties to be used in
validators, instead of in the form's clean() method.
Form and model treat data differently, that's just the way it is
and as it should be - pretending it's not true will just create
opportunity for nasty surprises.
Validators wouldn't work outside forms and models (it's easy to
pass in a dictionary of all_values, but fabricating a class with
the same method is work).
I would still have to check whether I am dealing with model or
form when I would like to access some model's methods and
properties that cannot be retrieved this way.
I have more reasons, but these are the major ones, if it's not
enough, contact me and we can talk ;).
On Sat, 2009-01-24 at 14:13 -0800, mrts wrote: > After several discussions with Honza, we are still on somewhat > different positions what the validator function signature should > be and how core validators should access the fields of a form or > a model instance.
The second bit is relatively minor. We're in complete control of that and can write the validators however is needed. Getting the former bit right is more important, since that's the public API.
Leading off with a slight summary of what follows: I suspect this is, ultimately, a bit of a coin toss. I'm inclined to come down more in line with Honza's feelings. I've debated a lot with myself whether this is because a lot of that was originally the same as the ideas I had or because I prefer it technically, and I'm pretty sure it's the latter.
In either case, there's some roughness in the API simply because models and forms are not parallel sorts of objects. So some shoe-horning is required somewhere. How many layers of cotton wool ... er.. abstraction to wrap around things is the question and it's good that there are multiple possibilities being discussed, since it means we can have some confidence the problem has been looked at from multiple angles.
> In core validators, no special case handling of forms and models > is needed even in multi-value validators. All what is needed is > an abstraction of "the set of other values". > AlwaysMatchesOtherField, RequiredIfOtherFieldDoesNotEqual etc > will work with that abstraction, primary keys and other complex > model fields also work for same-type comparison.
> My initial idea was to reduce the model instance to a dict (as > discussed previously), however, this is really too restrictive. > Passing the form or model instance and treating it in dict-like > manner provides much more flexibility.
Indeed. For models, the whole self.__dict__.copy() stuff was looking a bit ugly, when "self" is lying around begging to be used. I understand that it's an attempt to unify data access in reusable validators, but there has to be a nicer way.
> class Model(...): > ... > def get_value(self, field_name, *args): > "Default is specified with the first positional argument" > if args: > return getattr(self, field_name, args[0]) > return getattr(self, field_name)
> class BaseForm(...) > ... > def get_value(self, field_name, *args): > "Default is specified with the first positional argument" > if args: > return self.cleaned_data.get(field_name, args[0]) > return self.cleaned_data[field_name]
> (It's not possible to implement the __getitem__ protocol as it > already works differently in forms.)
> In this case the form/model dichotomy is not required in > validator signature, simple validators work as follows:
> def is_slug(value, instance=None): > (access only value)
> and multi-value validators as follows:
> class AlwaysMatchesOtherField(object): > ... > def __call__(self, value, instance=None): > if value != instance.get_value(self.other): > raise ValidationError(...)
> There are currently no model-specific complex validators.
In the standard set you have written so far. There will be in what people write for custom stuff. I'm not quite sure how you define "complex", though.
> If a > need for them should arise, it's probably easiest to override > validate() in the corresponding class, calling super().validate() > and then perform any additional checks there.
> Custom validators can either use the get_value() abstraction to > support both models and forms or be model/form-specific and use > the instance in whatever way they like. Checking whether you deal > with a model or form instance is generally not required -- if you > accidentally pass a custom form validator to a model, it will > visibly break and vice-versa.
> Honza doesn't like this, mainly because it's too easy to shoot > yourself into foot with this (as outlined above, I disagree with > that) and as of now we are using the following instead to handle > model and form fields uniformly:
> def _get_value(name, all_values, instance, do_rise=True): > if instance is not None and hasattr(instance, name): > return getattr(instance, name) > if name in all_values: > return all_values[name] > if do_raise: > raise AttributeError('%s not found in form values or model > instance' > % name) > return False
The idea of a utility function looks appropriate to me. It feels a bit more explicit and doesn't force policy. I'd like to tweak the implementation (name, parameters, return types and error messages) but the idea is correct.
[...]
> Honza's comments: > -----------------
> The problem is that I do not want Form class to be passed to > validators for several reasons:
Agreed. For models, instance=self, similarly, has to be right, since we have a more-or-less persistent instance that we are operating on. Forms are a transient entity (they pop into existence to validate data and then go away), so "instance" isn't such a useful concept there.
> Form shouldn't be anything except something that obtains data > from request and can validate them, passing it validators could > promote adding some methods and properties to be used in > validators, instead of in the form's clean() method.
> Form and model treat data differently, that's just the way it is > and as it should be - pretending it's not true will just create > opportunity for nasty surprises.
> Validators wouldn't work outside forms and models (it's easy to > pass in a dictionary of all_values, but fabricating a class with > the same method is work).
I don't quite understand what this means, but I kind of see what Honza is getting at. However, I suspect we might need a utility function of some kind for getting the data value if we want to avoid passing in self.__dict__ (no copy needed, surely) as the all_values for model validators. But it can be just a utility function. A class shouldn't be necessary.
> I would still have to check whether I am dealing with model or > form when I would like to access some model's methods and > properties that cannot be retrieved this way.
That's kind of a side-issue and not a reason not to pass around the instance (i.e. not something to drive the API reasoning). But it does seem to provide a good way to check things at the moment and we could lock that in.
Again, I'm kind of leaning towards what you have at the moment (Honza's preferred approach), but it's only a slight lean. Probably not worth sweating over too much, I suspect, since if somebody really doesn't like that API, they can use your wrapper class approach in the validators they write for themselves.
The last unsolved model-validation design issue is the error message
protocol (my work on fields is waiting on it). Let me present the
approach that looks sane to me.
The old Field.default_error_messages dict and corresponding
logic is no longer required (and removed in my branch) as default
error messages live in core/validators now.
In field construction, the custom error logic is plain:
class Form(...):
...
def full_clean(self):
...
for name, field in self.fields.items():
try:
...
# clean() calls to_python() and validate(),
# raising ValidationErrors
value = field.clean(value)
...
except ValidationError, e:
e.custom_messages = field.error_messages
self._errors[name] = self.error_class(e.messages)
The e.custom_messages = field.error_messages binds custom { code:
message } overrides to the ValidationError.
e.messages is a property that returns the list of error messages.
In reality, ValidationError contains only a single error message. The
list is required for aggregating several errors from the validator
collection associated with a field. However, as of now, the
ValidationError class tries to be both a single value and a
collection, resulting in quite a mess.
This is a classic case of the Composite pattern (http://
en.wikipedia.org/wiki/Composite_pattern ), so I propose that
* ValidationErrors remain single values
* core/validators gets an additional ValidationErrorComposite class
that derives from ValidationError and has the same interface
It will be used as follows (example for form field, model field
similar):
class Field(...):
...
def clean(self, value, all_values={}):
if value is not None:
value = self.to_python(value)
self.validate(value, all_values)
return value
def validate(self, value, all_values={}, form_instance=None):
value_is_empty = value in validators.EMPTY_VALUES
if self.required and value_is_empty:
raise ValidationError(_("This field is required."),
validators.REQUIRED)
error_collection = ValidationErrorComposite()
for validator in self.validators:
if value_is_empty and not getattr(validator, 'always_test',
False):
continue
try:
validator(value, all_values)
except ValidationError, e:
error_collection.add(e)
if error_collection:
raise error_collection
User code can remain oblivious of the composite, catching
ValidationErrors as before and using them as specified.
On Wed, Feb 18, 2009 at 1:28 PM, mrts <m...@mrts.pri.ee> wrote:
> The last unsolved model-validation design issue is the error message
> protocol (my work on fields is waiting on it). Let me present the
> approach that looks sane to me.
> The old Field.default_error_messages dict and corresponding
> logic is no longer required (and removed in my branch) as default
> error messages live in core/validators now.
What if I want to use the same error_messages on multiple instances of
a given field?
Now I would subclass the FormField and override it's default_messages,
with your approach it would require overriding __init__. I am not
saying that that is a bad thing, just that it still has a reason.
Besides not all validation will be done inside validators, some will
remain on the Field classes.
> class Form(...):
> ...
> def full_clean(self):
> ...
> for name, field in self.fields.items():
> try:
> ...
> # clean() calls to_python() and validate(),
> # raising ValidationErrors
> value = field.clean(value)
> ...
> except ValidationError, e:
> e.custom_messages = field.error_messages
> self._errors[name] = self.error_class(e.messages)
> The e.custom_messages = field.error_messages binds custom { code:
> message } overrides to the ValidationError.
> e.messages is a property that returns the list of error messages.
> In reality, ValidationError contains only a single error message. The
> list is required for aggregating several errors from the validator
> collection associated with a field. However, as of now, the
> ValidationError class tries to be both a single value and a
> collection, resulting in quite a mess.
> This is a classic case of the Composite pattern (http://
> en.wikipedia.org/wiki/Composite_pattern ), so I propose that
> * ValidationErrors remain single values
> * core/validators gets an additional ValidationErrorComposite class
> that derives from ValidationError and has the same interface
What if then validators want to raise multiple messages? Of FormFields
for that matter?
for example:
"Must be greated than 1000."
"If field Y is supplied, X must be greater than Y."
"I don't like your shoes!"
Would they then raise ValidationErrorComposite instead? So you just
pushed the check to a different layer. I agree that it makes more
sense in it's new location, but the distinction still has to be made.
And how would ValidationErrorComposite.add(exception) work then? It
would have to check the exception type to know whether to append or
extend it's message list.
I agree that handling of different datat types (string, list or dict)
in ValidationError isn't pretty, but it IMHO greatly simplifies it's
usage and handling.
> It will be used as follows (example for form field, model field
> similar):
> class Field(...):
> ...
> def clean(self, value, all_values={}):
> if value is not None:
> value = self.to_python(value)
> self.validate(value, all_values)
> return value
unrelated to custom messages - you cannot pass all_values to field for
validation, because at the time field validation is called, not all
fields have been cleaned. That's why validator handling is done in
(form/model).clean.
> value_is_empty = value in validators.EMPTY_VALUES
> if self.required and value_is_empty:
> raise ValidationError(_("This field is required."),
> validators.REQUIRED)
> error_collection = ValidationErrorComposite()
> for validator in self.validators:
> if value_is_empty and not getattr(validator, 'always_test',
> False):
> continue
> try:
> validator(value, all_values)
> except ValidationError, e:
> error_collection.add(e)
> if error_collection:
> raise error_collection
> User code can remain oblivious of the composite, catching
> ValidationErrors as before and using them as specified.
> On Wed, Feb 18, 2009 at 1:28 PM, mrts <m...@mrts.pri.ee> wrote:
> > The last unsolved model-validation design issue is the error message
> > protocol (my work on fields is waiting on it). Let me present the
> > approach that looks sane to me.
> > The old Field.default_error_messages dict and corresponding
> > logic is no longer required (and removed in my branch) as default
> > error messages live in core/validators now.
> What if I want to use the same error_messages on multiple instances of
> a given field?
> Now I would subclass the FormField and override it's default_messages,
> with your approach it would require overriding __init__. I am not
> saying that that is a bad thing, just that it still has a reason.
Yup, design decision needed:
is the need for class-wide overrides of error messages common enough
to justify leaving it as-is (considering that this can still be done
via __init__)?
> Besides not all validation will be done inside validators, some will
> remain on the Field classes.
As of now, I've managed to weed out all validation from fields into
validators. The result is not entirely stable yet and it brings in
minor backwards-compatibility issues, so it remains to be seen if this
is entirely viable.
> > The e.custom_messages = field.error_messages binds custom { code:
> > message } overrides to the ValidationError.
> > e.messages is a property that returns the list of error messages.
> > In reality, ValidationError contains only a single error message. The
> > list is required for aggregating several errors from the validator
> > collection associated with a field. However, as of now, the
> > ValidationError class tries to be both a single value and a
> > collection, resulting in quite a mess.
> > This is a classic case of the Composite pattern (http://
> > en.wikipedia.org/wiki/Composite_pattern ), so I propose that
> > * ValidationErrors remain single values
> > * core/validators gets an additional ValidationErrorComposite class
> > that derives from ValidationError and has the same interface
> What if then validators want to raise multiple messages? Of FormFields
> for that matter?
> for example:
> "Must be greated than 1000."
> "If field Y is supplied, X must be greater than Y."
> "I don't like your shoes!"
In core/validators there are none that need to raise multiple error
messages.
> Would they then raise ValidationErrorComposite instead? So you just
> pushed the check to a different layer. I agree that it makes more
> sense in it's new location, but the distinction still has to be made.
Yes, if that need arises, they can throw a ValidationErrorComposite.
> And how would ValidationErrorComposite.add(exception) work then? It
> would have to check the exception type to know whether to append or
> extend it's message list.
I see no problem there.
def add(e):
if isinstance(e, ValidationErrorComposite):
add the collection of ValidationErrors to self.errors
else:
add the single e to self.errors
> I agree that handling of different datat types (string, list or dict)
> in ValidationError isn't pretty, but it IMHO greatly simplifies it's
> usage and handling.
> > It will be used as follows (example for form field, model field
> > similar):
> > class Field(...):
> > ...
> > def clean(self, value, all_values={}):
> > if value is not None:
> > value = self.to_python(value)
> > self.validate(value, all_values)
> > return value
> unrelated to custom messages - you cannot pass all_values to field for
> validation, because at the time field validation is called, not all
> fields have been cleaned. That's why validator handling is done in
> (form/model).clean.
Ah, sorry, there was just an unintentional omission in my example. It
should have obviously been as follows:
class Form(...):
...
def full_clean(self):
...
for name, field in self.fields.items():
try:
...
# self.cleaned_data was missing
value = field.clean(value, self.cleaned_data)
...
except ValidationError, e:
...
But regarding the concern that not all fields have been cleaned --
AFAIK this has always been the constraint: you can only refer to
fields declared before the current one while using cleaned_data in
validation.
On Wed, 2009-02-18 at 04:28 -0800, mrts wrote: > The last unsolved model-validation design issue is the error message > protocol (my work on fields is waiting on it).
Well, it's the the last issue, but it's certainly a stumbling block. I've been spending quite a lot of time lately trying to merge model validation into trunk and, as you'll see in the 1.1. announcement, it's not really there yet. There are a number of backwards incompatibilities that have been introduced along the way and some of the API is fairly hard to use.
> Let me present the > approach that looks sane to me.
> The old Field.default_error_messages dict and corresponding > logic is no longer required (and removed in my branch) as default > error messages live in core/validators now.
It *is* still required. Firstly, it's a very useful way of specifying holistic error messages. Secondly, removing it would be backwards incompatible in a major way (many custom fields would no longer work properly, for example).
[...]
> In reality, ValidationError contains only a single error message. The > list is required for aggregating several errors from the validator > collection associated with a field. > However, as of now, the > ValidationError class tries to be both a single value and a > collection, resulting in quite a mess.
Definitely. I've managed to simplify it somewhat, but it requires some conventions. Part of the problems come from trying to make model field validation and form field validation so alike in an area where it isn't necessarily so. It turns out to be easier to have exceptions.ValidationError be more dictionary-like by default, although I'm still playing with the API a bit there.
There are two possibilities here. One is that if a default error message exists for the field, we only display that when a validation error occurs. Otherwise we aggregate the individual validation error messages.
The other approach is to always aggregate but we have the convention that the default (field-wide) error message is descriptive, in the sense that it describes what is expected ("must be an integer between 1 and 101"), whilst the individual validator error messages are more proscriptive about the actual problem that has been found ("is not an integer"). The latter always state a problem, not a requirement, so they read well when aggregated with the description.
> This is a classic case of the Composite pattern (http:// > en.wikipedia.org/wiki/Composite_pattern ), so I propose that > * ValidationErrors remain single values > * core/validators gets an additional ValidationErrorComposite class > that derives from ValidationError and has the same interface
That's pretty good analysis. I've done the same thing in the merged code I've been doing, except slightly differently. Django.core.exceptions.ValidationError is the composite collection of error messages.
Django.forms.utils.ValidationError, which you guys had nuked, remains the way to make a single HTML presentation of form field errors (that had to go back anyway for backwards-compatibility reasons, since otherwise every single custom field cleaning method would potentially break, at a minimum). There's a constructor for forms.ValidationError that takes an exceptions.ValidationError to do the conversion.
Presenting model field validation errors isn't something that we have a default for, since they aren't tied to HTML output or anything like that. If presenting them as HTML output, the forms.ValidationError class could be used (that's how ModelForms would do it), but, for other cases, pulling out the errors on an as-needs required
When I get out from under some work stuff this week, I'm going to commit what I've been merging (it a merge of yours and Honza's git branches and a lot of editing and review changes) into a subversion branch so that more people can test it. I'm then happy to pull in and review changes from git branches if you guys want to work on it there, but we need to have this done in a more general way. We've all dropped the ball a bit -- I thought the problems that were creeping in were going to simple to fix up in review; I was wrong. It's the broken-windows problem, where one little variation from "great" leads to another and another, without realising it.
On Feb 19, 12:49 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> On Wed, 2009-02-18 at 04:28 -0800, mrts wrote:
> > The old Field.default_error_messages dict and corresponding
> > logic is no longer required (and removed in my branch) as default
> > error messages live in core/validators now.
> It *is* still required. Firstly, it's a very useful way of specifying
> holistic error messages. Secondly, removing it would be backwards
> incompatible in a major way (many custom fields would no longer work
> properly, for example).
Agreed.
> When I get out from under some work stuff this week, I'm going to commit
> what I've been merging (it a merge of yours and Honza's git branches and
> a lot of editing and review changes) into a subversion branch so that
> more people can test it. I'm then happy to pull in and review changes
> from git branches if you guys want to work on it there, but we need to
> have this done in a more general way. We've all dropped the ball a bit
> -- I thought the problems that were creeping in were going to simple to
> fix up in review; I was wrong. It's the broken-windows problem, where
> one little variation from "great" leads to another and another, without
> realising it.
Sounds good. Keep us updated and let know if you need us to work on
something.