Model-validation: call for discussions

43 views
Skip to first unread message

mrts

unread,
Jan 17, 2009, 12:45:14 PM1/17/09
to Django developers
There are several problems in model validation as of now. As pouring
them out here would create a too long, ill-formatted post, I created a
separate page for it at

http://wiki.github.com/mrts/honza-django/form-and-model-validation

This is just "design gruntwork", a basic text body analyzing the
relations between components and problems and offering the most
evident but not necessarily most elegant solutions. I hope to spark a
proper discussion here that hopefully converges to a solid design.

Malcolm Tredinnick

unread,
Jan 17, 2009, 10:17:04 PM1/17/09
to django-d...@googlegroups.com
On Sat, 2009-01-17 at 09:45 -0800, mrts wrote:
> There are several problems in model validation as of now. As pouring
> them out here would create a too long, ill-formatted post, I created a
> separate page for it at
>
> http://wiki.github.com/mrts/honza-django/form-and-model-validation

(a) The link leads to an error page, although it appears to be a 500
error (would be nice if github didn't make it look like it could be a
404 or a 500).

(b) Please do write it out and post it here so that we can have the
discussion on the mailing list. That way, everybody is already
subscribed to the right place (django-developers automatically reaches
the appropriate audience) Wikis make it much harder to track history and
authorship, amongst other problems. We have the mailing list for design
discussions.

Thanks,
Malcolm


mrts

unread,
Jan 19, 2009, 5:47:40 AM1/19/09
to Django developers
On Jan 18, 5:17 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> (b) Please do write it out and post it here so that we can have the
> discussion on the mailing list.

Let’s step back, distance ourselves from the current implementation
and look at how forms, models and modelforms should ideally
interact validation-wise at a somewhat abstract level.

The main focus is on fields.

Form fields
===========

In validation context, a form field has the following
responsibilities:

* get input value (a string) from user,

* validate that the value is in expected format
(*format validation*, preconditions),

* convert the value to a Python type,

* validate that the resulting value satisfies any conditions set
for the value (*value validation*, postconditions).

Model fields
============

Ditto for model fields, omitting the first step: format validation,
coercion to Python and value validation is generally required.

If, however, we want to more radically depart from current
behaviour, then it is possible and indeed reasonable to assume that
the values assigned to model fields are already the expected Python
types, as it is the application developer’s responsibility to assure
that. This would get rid of the format validation and coercion step
(these are already handled by the form layer). to_python() is
currently used in model fields for controlling the conversion.

Interaction between form and model fields
=========================================

Forms and models should be orthogonal and oblivious of each other.
Modelforms serves as the glue and mapping between the two, being
aware of the internals of both.

A modelform should have the following responsibilites in validation
context:

* clean the form (performs format validation, coercion to a Python
type and value validation),

* assign the form field values to the associated model fields,

* inform the model fields that basic validation has already been done
to
avoid duplicated validation and call any additional validation
methods.

Thus, the modelform should be able to invoke only the additional
and custom model field validators, *skipping the default coercers
and validators* that can be assumed to be the same in similar form
and model field classes (e.g. an IntegerField in forms and
IntegerField in models).

Let's look at an contrived example that illustrates the point.

Let IntegerForeignKeyField be an imaginary foreign key field in a
model that accepts only integer foreign keys. The corresponding
form field would be a PositiveIntegerField. Suppose that the
developer has provided a custom validator for the
IntegerForeignKeyField that allows foreign keys only to objects
where the is_active boolean field is True.

When a modelform uses them both, it is the form field's
responsibility to assure that the value is converted to a Python
int and that the int is positive. IntegerForeignKeyField should
not re-run neither the coercion nor basic validation routines.
However, IntegerForeignKeyField must

1) validate that the integer points to an existing row in the
related table (i.e. run a foreignkey-specific validator),

2) validate that the row has is_active == True (i.e. run a custom
validator).

In this case, coercion and basic validation is cheap, but that's
not always the case.

General validation principles
=============================

The extended conclusion of the sections above is as follows.

Double validation should never happen. There has to be a way to
convey that a value is already validated and no further validation
is necessary. No code duplication in type coercion or validation
should occur in models and forms. The following distinct
converters and validators participate in the coercion/validation
process:

* format validators that validate that a given string can be
converted to a Python type,

* converters that coerce a given string to a Python type,

* value validators that assure that the value returned by a
converter satisfies any conditions set for the value.

It should be easy to add custom validators and intercept or
override the process at any step.

And now something completely different
======================================

"Every problem in computer science can be solved by
another level of indirection."
--- source unknown

Both form and model fields need similar functionality that is not
well served by just duplicating validation and coercion functions
in them.

It would make sense to factor out the common functionality and
decouple it from both. A mixin class would serve that purpose well.

The following methods would live in the mixin:
* clean() that calls to_python() and then validate(),
* to_python() that takes care of format validation and coercion,
* validate() that takes care of value validation and running
custom validators.

It would have fields for storing the default and custom validators
and perhaps a state field for controlling which parts of the
validation process have already been run.

I've hacked up an incomplete rough draft at
http://dpaste.com/110671/ to illustrate this. Unfortunately it does
not illustrate all the points, but the general idea should
hopefully come through.

Action plan for 1.1
===================

The mixin approach is not in scope for 1.1. We go with what we
already have, factoring the bits in current model fields' to_python
to a separate django.utils.typeconverters library
(see http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a01785181a91a37971
for the first steps in that direction) and using both
django.core.validators and django.utils.typeconverters throughout
model and form fields.

I'm not sure about Honza's plans in regard avoiding duplicate
validation, hopefully he comments on that himself.

Also, let me remind that model and form objects have not been
discussed in this post (and it's already more than 150 lines long),
only fields.

Malcolm Tredinnick

unread,
Jan 19, 2009, 6:43:07 AM1/19/09
to django-d...@googlegroups.com
As I understand it, this is primarily about avoiding duplicate
validation of some pieces of data when it's not necessary, right? So
it's really only applicable to the ModelForm case?

This is a pretty good summary of the situation, although I suspect
there's a fairly easy solution at hand, which I've outlined below.

On Mon, 2009-01-19 at 02:47 -0800, mrts wrote:

[...]


>
> The main focus is on fields.
>
> Form fields
> ===========

This all sounds like a description of the current situation, which is
fine.

> Model fields
> ============
>
> Ditto for model fields, omitting the first step: format validation,
> coercion to Python and value validation is generally required.

Agreed.

>
> If, however, we want to more radically depart from current
> behaviour, then it is possible and indeed reasonable to assume that
> the values assigned to model fields are already the expected Python
> types, as it is the application developer’s responsibility to assure
> that. This would get rid of the format validation and coercion step
> (these are already handled by the form layer). to_python() is
> currently used in model fields for controlling the conversion

As a practical matter, that ship has already sailed. Lots of code
assigns things like strings to datetime fields. I've always thought that
was an unfortunate decision, but it's hardly a showstopper. Model
validators will need to be able to convert from reasonable input
variations to the right Python object. It's not unreasonable to expect,
however, that after validation/cleaning has occurred on a model, all the
attributes contains things of the right type (assuming the model is
valid). That means that subsequent code working with, e.g, a datetime
attribute on a valid model doesn't have to worry that
mymodel.foo.strftime() will raise an exception because "foo" happens to
be a string, not a datetime.


> Interaction between form and model fields
> =========================================
>
> Forms and models should be orthogonal and oblivious of each other.
> Modelforms serves as the glue and mapping between the two, being
> aware of the internals of both.
>
> A modelform should have the following responsibilites in validation
> context:
>
> * clean the form (performs format validation, coercion to a Python
> type and value validation),

> * assign the form field values to the associated model fields,
>
> * inform the model fields that basic validation has already been done
> to
> avoid duplicated validation and call any additional validation
> methods.

That's one approach, but the alternative I've sketched below is another
one. Short version: when the form field validation would match what the
model field is going to do anyway, don't do anything at the form level.
The model field validation is about to be called anyway.

> Thus, the modelform should be able to invoke only the additional
> and custom model field validators, *skipping the default coercers
> and validators* that can be assumed to be the same in similar form
> and model field classes (e.g. an IntegerField in forms and
> IntegerField in models).

That's a slightly flawed assumption, as it won't necessarily be true.
Particularly in the case of custom form field overrides on a model form.
There could be any number of variations, from the range of data
permitted to the types.

[Aside: It's certainly reasonable to think about the duplicated effort,
but I wouldn't worry about it too much. It's not a huge amount of
duplicated computation, since the case when something is already of the
right type and passes normalisation and validation tends to be quite
quick (an if-check or two and then moving on). So if we don't have a
fantastically neat solution for this initially in the communcation
between ModelForms and Models, the world won't stop spinning. That's
only one use-case in the much larger scheme of things (again, I'm not
dismissing it, but let's not get too hung up on this case). That being
said... ]

The solution here might not be too difficult and doesn't need the
requirement of communication from the form to the model about what it's
already done.

To wit, when a formfield is constructed automatically from the model
field, if the default formfield doesn't do any validation/normalisation
beyond what the normal model field does, the formfield() method on the
Field subclass returns a formfield with no validators. Thus, when the
form cleaning process is run, all the validation for that field is done
by the model. Thus form fields returned for model fields could well be
slightly modified (in that their validator list is different) from what
you get directly from using forms.fields.*.

That dovetails nicely with the reuqirements of custom form fields. When
a custom form field is created, the form field validation won't be
skipped (unless the creator wants to remove any validation from the form
field), so any differences between the default model field validation
and the custom form field validation aren't overlooked: they both get
run (which is the correct behaviour).

> General validation principles
> =============================
>
> The extended conclusion of the sections above is as follows.
>
> Double validation should never happen.

It doesn't hurt if it does and sometimes both form and models will need
to run validation on the same piece of data (e.g. the custom form field
case).

> And now something completely different
> ======================================
>
> "Every problem in computer science can be solved by
> another level of indirection."
> --- source unknown

"... and now you (often) have two problems" -- Malcolm (précising a few
hundred other people). :-)

>
> Both form and model fields need similar functionality that is not
> well served by just duplicating validation and coercion functions
> in them.

The two data flows are similar, but not identical. I'd tend to shy away
from class-ifying this if we don't have to. That is, prefer direct
functions over yet another class, simply because there's not really a
class structure there -- a need to poke at class internals -- and it's
not going to enough functions to require namespacing.

Attempts to abstract away similar-but-not-the-same algorithms can often
lead to more difficult code than writing the same thing out directly in
the two places. It tends to show up in the number of parameters you pass
into the abstract algorithm and/or the complexity of the return types.
And, in this case, it's only two places, not fourteen or anything like
that. So whether this is worth it tends to be implementation specific.

> Action plan for 1.1
> ===================
>
> The mixin approach is not in scope for 1.1. We go with what we
> already have, factoring the bits in current model fields' to_python
> to a separate django.utils.typeconverters library
> (see http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a01785181a91a37971
> for the first steps in that direction) and using both
> django.core.validators and django.utils.typeconverters throughout
> model and form fields.

As a style issue, there's a bit of an arbitrary split going on there.
Type conversion is really part of what we're calling "validation" (since
the type coercion is part of the normalisation part of the process,
which we tend to lump into validation). I'd lump things a bit closer
together in the source. There's not really an external use-case scenario
for the converters and, even within Django, they don't have a lot of
user outside of the validators (we can't do a lot of sharing with the
database-specific backend converters, since they do tend to be
backend-specific).

Maybe (and looking at my notes, I might upgrade that to probably, given
the number of times I've mentioned it), we need to make a directory for
django.core.validators instead of a single file. That gives us room to
split out converters from things that work with standard types. But
definitely try to keep them close together in the source, rather than
putting them in django.utils, which can sometimes be a dumping ground
for stuff that better belongs elsewhere.

> I'm not sure about Honza's plans in regard avoiding duplicate
> validation, hopefully he comments on that himself.

> Also, let me remind that model and form objects have not been
> discussed in this post (and it's already more than 150 lines long),
> only fields.

I don't think those two situations are particularly tricky, are they?

Validation on forms shouldn't change from what it is now. Partly because
it Just Works pretty logically, but primarily because we should do the
hard work so that users don't have to and don't need to rewrite their
code.

We do need a multi-field validation method on models, similar to
Form.clean(). There's currently a Model.clean() method in Honza's code,
which I've been reviewing, and I've got a note to probably change the
name a bit there so that the clean() name can be reserved for people
wanting to write multi-field validation, keeping things in parallel with
the way Forms work. That's a naming issue, not a substantive one,
however. As you've noted, the algorithm for forms and models isn't that
difference.

Anyway, nice write-up. Be interesting to see what others think.

Regards,
Malcolm

Rajeev J Sebastian

unread,
Jan 19, 2009, 8:22:15 AM1/19/09
to django-d...@googlegroups.com
On Mon, Jan 19, 2009 at 4:17 PM, mrts <mr...@mrts.pri.ee> wrote:
> And now something completely different
> ======================================
>
> "Every problem in computer science can be solved by
> another level of indirection."
- David Wheeler

" ... except the problem of too many levels of indirection"
- Kevlin Henney

http://en.wikipedia.org/wiki/Abstraction_layer
http://en.wikipedia.org/wiki/Butler_Lampson

Regards
Rajeev J Sebastian

Kenneth Gonsalves

unread,
Jan 19, 2009, 8:28:20 AM1/19/09
to django-d...@googlegroups.com

wikipedia is *really* different

--
regards
KG
http://lawgon.livejournal.com

mrts

unread,
Jan 19, 2009, 4:23:28 PM1/19/09
to Django developers
On Jan 19, 1:43 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> one. Short version: when the form field validation would match what the
> model field is going to do anyway, don't do anything at the form level.
> The model field validation is about to be called anyway.

[snip]

> The solution here might not be too difficult and doesn't need the
> requirement of communication from the form to the model about what it's
> already done.
>
> To wit, when a formfield is constructed automatically from the model
> field, if the default formfield doesn't do any validation/normalisation
> beyond what the normal model field does, the formfield() method on the
> Field subclass returns a formfield with no validators. Thus, when the
> form cleaning process is run, all the validation for that field is done
> by the model. Thus form fields returned for model fields could well be
> slightly modified (in that their validator list is different) from what
> you get directly from using forms.fields.*.
>
> That dovetails nicely with the reuqirements of custom form fields. When
> a custom form field is created, the form field validation won't be
> skipped (unless the creator wants to remove any validation from the form
> field), so any differences between the default model field validation
> and the custom form field validation aren't overlooked: they both get
> run (which is the correct behaviour).

Looks like a good action plan.

> Attempts to abstract away similar-but-not-the-same algorithms can often
> lead to more difficult code than writing the same thing out directly in
> the two places. It tends to show up in the number of parameters you pass
> into the abstract algorithm and/or the complexity of the return types.
> And, in this case, it's only two places, not fourteen or anything like
> that. So whether this is worth it tends to be implementation specific.

Yeah, I won't pursue this further, although I still think the result
would be cleaner, not uglier.

> As a style issue, there's a bit of an arbitrary split going on there.
> Type conversion is really part of what we're calling "validation" (since
> the type coercion is part of the normalisation part of the process,
> which we tend to lump into validation). I'd lump things a bit closer
> together in the source. There's not really an external use-case scenario
> for the converters and, even within Django, they don't have a lot of
> user outside of the validators (we can't do a lot of sharing with the
> database-specific backend converters, since they do tend to be
> backend-specific).
>
> Maybe (and looking at my notes, I might upgrade that to probably, given
> the number of times I've mentioned it), we need to make a directory for
> django.core.validators instead of a single file. That gives us room to
> split out converters from things that work with standard types. But
> definitely try to keep them close together in the source, rather than
> putting them in django.utils, which can sometimes be a dumping ground
> for stuff that better belongs elsewhere.

As seen in the commit,
http://github.com/mrts/honza-django/commit/a8239b063591acc367add0a01785181a91a37971
,
my initial quick solution was to lump them into the validation
module (as there's lot's of intermingling between validators and
coercers,
exemplified by merging to_pyton() and validate() to clean() in forms).

The directory-based approach is best, I'll go with it -- but it's yet
uncertain
when as I have to handle pressing matters at work during daytime.

mrts

unread,
Jan 22, 2009, 8:27:13 PM1/22/09
to Django developers
On Jan 19, 11:23 pm, mrts <m...@mrts.pri.ee> wrote:
> The directory-based approach is best, I'll go with it -- but it's yet
> uncertain
> when as I have to handle pressing matters at work during daytime.

I've implemented some fundamental changes that need review. The commit
is at http://github.com/mrts/honza-django/commit/482086df8d24b99f466152c51d2badee6ee6147d
. The changes are not finished yet, but they should illustrate the
proposed approach. I may fail to see all the negative implications as
I've been mostly addressing fields, not the whole picture. Critique
most welcome.

Changes:

1. consolidate validators and coercers into django/core/validators/
{__init__,typeconverters}.py as suggested by Malcolm

2. make both forms and fields use similar logic in clean():

core.validators:

def clean(field_instance, value, all_values,
field_container_instance):
if value is not None:
value = field_instance.to_python(value)
field_instance.validate(value, all_values,
field_container_instance)
return value

models.fields:

class Field(object):
...
def clean(self, value, model_instance):
if model_instance is not None:
all_values = model_instance.__dict__.copy()
else:
all_values = {}
return validators.clean(self, value, all_values,
model_instance)

forms.fields:

class Field(object):
def clean(self, value, form_instance=None):
if form_instance is not None:
all_values = form_instance.cleaned_data
else:
all_values = {}
return validators.clean(self, value, all_values,
form_instance)

Rationale: make validators that need to use other fields (e.g.
RequiredIfOtherFieldEquals) work uniformly on model and form fields. A
side note: the `instance` attribute is not used in validator functions
and I can't see a clear use case for it, so it looks like it can be
removed -- prove me wrong please (I do see obscure corner cases where
it could be useful -- if one needs to distinguish between forms and
models in custom validators or do gettatr('something', instance), but
these should really be handled in clean() manually).

3. as seen from above, to_python(), validate() and validators are back
in form fields as this greatly simplifies code. clean() is still
overridable and mostly backwards-compatible. There are a few fields
that have been ported to the new logic in forms/fields.py. Form fields
accept custom validators in `validators` attribute.

4. typeconverters take additional params, mainly intended for custom/
localized date/time input handling, but also useful when overriding
to_python in custom fields:

DateField:

def to_python(self, value, params=None):
typeconverters.to_date(value, params)

def to_date(value, params=None):
# TODO: use params for date format

This is not yet fully implemented in the current changeset.

Unfortunately I've been unable to contribute as much time as I would
have liked (the fact that the commit is from 2.30 AM my time should
say it all), but I hope to drive field validation to completion soon
after receiving feedback on the decisions above.

Malcolm Tredinnick

unread,
Jan 22, 2009, 8:40:05 PM1/22/09
to django-d...@googlegroups.com
On Thu, 2009-01-22 at 17:27 -0800, mrts wrote:
[....]

> A
> side note: the `instance` attribute is not used in validator functions
> and I can't see a clear use case for it, so it looks like it can be
> removed -- prove me wrong please (I do see obscure corner cases where
> it could be useful -- if one needs to distinguish between forms and
> models in custom validators or do gettatr('something', instance), but
> these should really be handled in clean() manually).

For models, the "instance" will the models "self" attribute. So that one
can do checks based on other information available on the model
instance. It's kind of the whole point behind the "model-aware" portion
of model-aware validation.

Asking that everything like that gets pushed to clean() is being
awkward. clean() is for multi-field validation, the individual
validators are for single field validation. That doesn't mean the latter
cannot use other information available on the form.

I'll try and make time to look at this, along with other recent work in
this area, over the weekend.

Malcolm


mrts

unread,
Jan 22, 2009, 8:56:41 PM1/22/09
to Django developers
On Jan 23, 3:40 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> On Thu, 2009-01-22 at 17:27 -0800, mrts wrote:
>
> [....]
>
> >  A
> > side note: the `instance` attribute is not used in validator functions
> > and I can't see a clear use case for it, so it looks like it can be
> > removed -- prove me wrong please (I do see obscure corner cases where
> > it could be useful -- if one needs to distinguish between forms and
> > models in custom validators or do gettatr('something', instance), but
> > these should really be handled in clean() manually).
>
> For models, the "instance" will the models "self" attribute. So that one
> can do checks based on other information available on the model
> instance. It's kind of the whole point behind the "model-aware" portion
> of model-aware validation.
>
> Asking that everything like that gets pushed to clean() is being
> awkward. clean() is for multi-field validation, the individual
> validators are for single field validation. That doesn't mean the latter
> cannot use other information available on the form.

As can be seen from the above, fields are already passed to validators
via all_values = model_instance.__dict__.copy() for multi-field
validation. But I agree that requiring to override clear() for
anything more advanced is too restrictive, so let the instance be part
of the signature.

> I'll try and make time to look at this, along with other recent work in
> this area, over the weekend.

That would be most welcome, perhaps you can pop by #django-dev for
more rapid idea exchange?

Honza Král

unread,
Jan 23, 2009, 7:13:48 AM1/23/09
to django-d...@googlegroups.com
I should be around during the weekend so I will give it a try as well...

Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

mrts

unread,
Jan 23, 2009, 8:04:01 AM1/23/09
to Django developers
As the uniform all values approach has created a bit of confusion, let
me present a larger example:

Validators in core.validators should not be concerned with either
model or form internals if possible. This is currently straightforward
to achieve by passing all values always as a dict via
form.cleaned_data or model.__dict__.copy() (dict copying is cheap, the
original instance dict is unsafe to pass around).

This makes multi-field validators work uniformly for both models and
fields. Otherwise you have to always handle identical validation
differently for forms and models -- this is extremely smelly in my
opinion.

And now the example:

class RequiredIfOtherFieldEquals(object):
def __call__(self, value, all_values={}, instance=None):
if self.other_field in all_values and all_values
[self.other_field] == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)

works both for forms and models with the proposed approach (all_values
is model_instance.__dict__.copy() in model field validation case).

Handling this specially for model instances would be really ugly:

def __call__(self, value, all_values={}, instance=None):
if instance is not None:
if hasattr(self.other_field, instance) and getattr
(self.other_field, instance) == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)
elif self.other_field in all_values and all_values
[self.other_field] == self.other_value and not value:
raise ValidationError(self.error_message, REQUIRED)

and entirely unnecessary under the all_values approach.

Everybody is most welcome to prove me wrong :)

Marty Alchin

unread,
Jan 23, 2009, 9:38:25 AM1/23/09
to django-d...@googlegroups.com
I haven't been following everything, but I do have a couple comments
to make here.

On Fri, Jan 23, 2009 at 8:04 AM, mrts <mr...@mrts.pri.ee> wrote:
> works both for forms and models with the proposed approach (all_values
> is model_instance.__dict__.copy() in model field validation case).

One thing to consider is that not all data is always available via the
dictionary, particularly with regard to models. Anything that uses a
descriptor (such as relations to other models) only stores a bare
minimum of information in the instance dict. The rest is retrieved
when accessed *as an attribute*, a fact which is entirely lost in your
example. I understand you're trying not to bundle it too much with the
way models work, so maybe this was intentional, but I do think it's
worth noting, just in case it's not intentional.

> Handling this specially for model instances would be really ugly:
>
> def __call__(self, value, all_values={}, instance=None):
> if instance is not None:
> if hasattr(self.other_field, instance) and getattr
> (self.other_field, instance) == self.other_value and not value:
> raise ValidationError(self.error_message, REQUIRED)
> elif self.other_field in all_values and all_values
> [self.other_field] == self.other_value and not value:
> raise ValidationError(self.error_message, REQUIRED)
>
> and entirely unnecessary under the all_values approach.

I agree that it's a bit ugly, but I'd hardly call it unnecessary,
given the possibly of using descriptors on models. Forms aren't
susceptible to the same problem, because their values are stored in a
true dict regardless. Offhand, I can only think of one way to get a
uniform syntax that works the way you seem to want it to, and that's
by bundling forms and models in some other objects with a unified
interface, designed solely for validation. Something like the
following (untested, coding off the top of my head):

class ValidatableModel(object):
def __init__(self, instance):
self._instance = instance
def __getitem__(self, name):
if hasattr(self._instance, name):
return getattr(self._instance, name)
raise KeyError(name)

class ValidatableForm(object):
def __init__(self, instance):
self._instance = instance
def __getitem__(self, name):
return self._instance.cleaned_data[name]

There could be more methods to be able to iterate over fields and
whatever other niceties you would want to include, but that way you
have a uniform dict interface to work with, regardless of which object
type gets passed in, and it still preserves the ability to use
descriptors on model attributes.

That said, I'm not particularly fond of that solution myself, because
I personally feel that models and forms *should* be treated a bit
differently; the field types for each are just too different in their
capabilities. I haven't been involved with this validation stuff
except for improving file support for it, but I'm not sure a unified
approach would be as big of a win as you claim.

-Gul

(and if I'm missing some obvious bit of history here, feel free to correct me)

mrts

unread,
Jan 23, 2009, 9:54:03 AM1/23/09
to Django developers
After discussing with Honza, we agreed that the dichotomy between
forms and models that was present before will remain, i.e. instance
will always be a model instance if given and all_values will always be
form.cleaned_data.

Honza's rationale was that it's common to have properties in models
and therefore model_instance.__dict__ (or forms.models.model_to_dict)
may be too constraining. Thus, a generic value getter will be used in
validators to provide uniform access to dict values and instance
fields as follows:

def _get_value(name, instance, all_values):
if instance is not None and hasattr(instance, name):
return getattr(instance, name)
if name in all_values:
return all_values[name]
raise AttributeError('%s not found in forms values or model
instance')

class RequiredIfOtherFieldEquals(object):

def __call__(self, value, all_values={}, instance=None):
if _get_value(self.other_field, instance, all_values) ==

mrts

unread,
Jan 23, 2009, 9:59:50 AM1/23/09
to Django developers
On Jan 23, 4:38 pm, Marty Alchin <gulop...@gmail.com> wrote:
> I haven't been following everything, but I do have a couple comments
> to make here.

Thanks, interesting points. The get_value approach looks simpler
though, so unless you or anybody else disagrees I'll implement this.

mrts

unread,
Jan 24, 2009, 5:13:58 PM1/24/09
to Django developers
After several discussions with Honza, we are still on somewhat
different positions what the validator function signature should
be and how core validators should access the fields of a form or
a model instance.

In core validators, no special case handling of forms and models
is needed even in multi-value validators. All what is needed is
an abstraction of "the set of other values".
AlwaysMatchesOtherField, RequiredIfOtherFieldDoesNotEqual etc
will work with that abstraction, primary keys and other complex
model fields also work for same-type comparison.

My initial idea was to reduce the model instance to a dict (as
discussed previously), however, this is really too restrictive.
Passing the form or model instance and treating it in dict-like
manner provides much more flexibility.

That can be achieved by the following:

class Model(...):
...
def get_value(self, field_name, *args):
"Default is specified with the first positional argument"
if args:
return getattr(self, field_name, args[0])
return getattr(self, field_name)

class BaseForm(...)
...
def get_value(self, field_name, *args):
"Default is specified with the first positional argument"
if args:
return self.cleaned_data.get(field_name, args[0])
return self.cleaned_data[field_name]

(It's not possible to implement the __getitem__ protocol as it
already works differently in forms.)

In this case the form/model dichotomy is not required in
validator signature, simple validators work as follows:

def is_slug(value, instance=None):
(access only value)

and multi-value validators as follows:

class AlwaysMatchesOtherField(object):
...
def __call__(self, value, instance=None):
if value != instance.get_value(self.other):
raise ValidationError(...)

There are currently no model-specific complex validators. If a
need for them should arise, it's probably easiest to override
validate() in the corresponding class, calling super().validate()
and then perform any additional checks there.

Custom validators can either use the get_value() abstraction to
support both models and forms or be model/form-specific and use
the instance in whatever way they like. Checking whether you deal
with a model or form instance is generally not required -- if you
accidentally pass a custom form validator to a model, it will
visibly break and vice-versa.

Honza doesn't like this, mainly because it's too easy to shoot
yourself into foot with this (as outlined above, I disagree with
that) and as of now we are using the following instead to handle
model and form fields uniformly:

def _get_value(name, all_values, instance, do_rise=True):
if instance is not None and hasattr(instance, name):
return getattr(instance, name)
if name in all_values:
return all_values[name]
if do_raise:
raise AttributeError('%s not found in form values or model
instance'
% name)
return False

class AlwaysMatchesOtherField(object):
...
def __call__(self, value, all_values={}, instance=None):
if value != _get_value(self.other, all_values, instance):
raise ValidationError(...)

Honza's comments:
-----------------

The problem is that I do not want Form class to be passed to
validators for several reasons:

Form shouldn't be anything except something that obtains data
from request and can validate them, passing it validators could
promote adding some methods and properties to be used in
validators, instead of in the form's clean() method.

Form and model treat data differently, that's just the way it is
and as it should be - pretending it's not true will just create
opportunity for nasty surprises.

Validators wouldn't work outside forms and models (it's easy to
pass in a dictionary of all_values, but fabricating a class with
the same method is work).

I would still have to check whether I am dealing with model or
form when I would like to access some model's methods and
properties that cannot be retrieved this way.

I have more reasons, but these are the major ones, if it's not
enough, contact me and we can talk ;).

Malcolm Tredinnick

unread,
Jan 27, 2009, 2:41:46 AM1/27/09
to django-d...@googlegroups.com
On Sat, 2009-01-24 at 14:13 -0800, mrts wrote:
> After several discussions with Honza, we are still on somewhat
> different positions what the validator function signature should
> be and how core validators should access the fields of a form or
> a model instance.

The second bit is relatively minor. We're in complete control of that
and can write the validators however is needed. Getting the former bit
right is more important, since that's the public API.

Leading off with a slight summary of what follows: I suspect this is,
ultimately, a bit of a coin toss. I'm inclined to come down more in line
with Honza's feelings. I've debated a lot with myself whether this is
because a lot of that was originally the same as the ideas I had or
because I prefer it technically, and I'm pretty sure it's the latter.

In either case, there's some roughness in the API simply because models
and forms are not parallel sorts of objects. So some shoe-horning is
required somewhere. How many layers of cotton wool ... er.. abstraction
to wrap around things is the question and it's good that there are
multiple possibilities being discussed, since it means we can have some
confidence the problem has been looked at from multiple angles.

> In core validators, no special case handling of forms and models
> is needed even in multi-value validators. All what is needed is
> an abstraction of "the set of other values".
> AlwaysMatchesOtherField, RequiredIfOtherFieldDoesNotEqual etc
> will work with that abstraction, primary keys and other complex
> model fields also work for same-type comparison.
>
> My initial idea was to reduce the model instance to a dict (as
> discussed previously), however, this is really too restrictive.
> Passing the form or model instance and treating it in dict-like
> manner provides much more flexibility.

Indeed. For models, the whole self.__dict__.copy() stuff was looking a
bit ugly, when "self" is lying around begging to be used. I understand
that it's an attempt to unify data access in reusable validators, but
there has to be a nicer way.

In the standard set you have written so far. There will be in what
people write for custom stuff. I'm not quite sure how you define
"complex", though.

> If a
> need for them should arise, it's probably easiest to override
> validate() in the corresponding class, calling super().validate()
> and then perform any additional checks there.
>
> Custom validators can either use the get_value() abstraction to
> support both models and forms or be model/form-specific and use
> the instance in whatever way they like. Checking whether you deal
> with a model or form instance is generally not required -- if you
> accidentally pass a custom form validator to a model, it will
> visibly break and vice-versa.
>
> Honza doesn't like this, mainly because it's too easy to shoot
> yourself into foot with this (as outlined above, I disagree with
> that) and as of now we are using the following instead to handle
> model and form fields uniformly:
>
> def _get_value(name, all_values, instance, do_rise=True):
> if instance is not None and hasattr(instance, name):
> return getattr(instance, name)
> if name in all_values:
> return all_values[name]
> if do_raise:
> raise AttributeError('%s not found in form values or model
> instance'
> % name)
> return False

The idea of a utility function looks appropriate to me. It feels a bit
more explicit and doesn't force policy. I'd like to tweak the
implementation (name, parameters, return types and error messages) but
the idea is correct.

[...]


> Honza's comments:
> -----------------
>
> The problem is that I do not want Form class to be passed to
> validators for several reasons:

Agreed. For models, instance=self, similarly, has to be right, since we
have a more-or-less persistent instance that we are operating on. Forms
are a transient entity (they pop into existence to validate data and
then go away), so "instance" isn't such a useful concept there.

> Form shouldn't be anything except something that obtains data
> from request and can validate them, passing it validators could
> promote adding some methods and properties to be used in
> validators, instead of in the form's clean() method.
>
> Form and model treat data differently, that's just the way it is
> and as it should be - pretending it's not true will just create
> opportunity for nasty surprises.
>
> Validators wouldn't work outside forms and models (it's easy to
> pass in a dictionary of all_values, but fabricating a class with
> the same method is work).

I don't quite understand what this means, but I kind of see what Honza
is getting at. However, I suspect we might need a utility function of
some kind for getting the data value if we want to avoid passing in
self.__dict__ (no copy needed, surely) as the all_values for model
validators. But it can be just a utility function. A class shouldn't be
necessary.

>
> I would still have to check whether I am dealing with model or
> form when I would like to access some model's methods and
> properties that cannot be retrieved this way.

That's kind of a side-issue and not a reason not to pass around the
instance (i.e. not something to drive the API reasoning). But it does
seem to provide a good way to check things at the moment and we could
lock that in.

Again, I'm kind of leaning towards what you have at the moment (Honza's
preferred approach), but it's only a slight lean. Probably not worth
sweating over too much, I suspect, since if somebody really doesn't like
that API, they can use your wrapper class approach in the validators
they write for themselves.

Regards,
Malcolm

mrts

unread,
Feb 18, 2009, 7:28:11 AM2/18/09
to Django developers
The last unsolved model-validation design issue is the error message
protocol (my work on fields is waiting on it). Let me present the
approach that looks sane to me.

The old Field.default_error_messages dict and corresponding
logic is no longer required (and removed in my branch) as default
error messages live in core/validators now.

In field construction, the custom error logic is plain:

CharFied.__init__(self, ..., error_messages={}):
...
self.error_messages = error_messages

And it is used as follows:

class Form(...):
...
def full_clean(self):
...
for name, field in self.fields.items():
try:
...
# clean() calls to_python() and validate(),
# raising ValidationErrors
value = field.clean(value)
...
except ValidationError, e:
e.custom_messages = field.error_messages
self._errors[name] = self.error_class(e.messages)

The e.custom_messages = field.error_messages binds custom { code:
message } overrides to the ValidationError.
e.messages is a property that returns the list of error messages.

In reality, ValidationError contains only a single error message. The
list is required for aggregating several errors from the validator
collection associated with a field. However, as of now, the
ValidationError class tries to be both a single value and a
collection, resulting in quite a mess.

This is a classic case of the Composite pattern (http://
en.wikipedia.org/wiki/Composite_pattern ), so I propose that
* ValidationErrors remain single values
* core/validators gets an additional ValidationErrorComposite class
that derives from ValidationError and has the same interface

It will be used as follows (example for form field, model field
similar):

class Field(...):
...
def clean(self, value, all_values={}):
if value is not None:
value = self.to_python(value)
self.validate(value, all_values)
return value

def validate(self, value, all_values={}, form_instance=None):
value_is_empty = value in validators.EMPTY_VALUES
if self.required and value_is_empty:
raise ValidationError(_("This field is required."),
validators.REQUIRED)

error_collection = ValidationErrorComposite()

for validator in self.validators:
if value_is_empty and not getattr(validator, 'always_test',
False):
continue
try:
validator(value, all_values)
except ValidationError, e:
error_collection.add(e)

if error_collection:
raise error_collection

User code can remain oblivious of the composite, catching
ValidationErrors as before and using them as specified.

mrts

unread,
Feb 18, 2009, 7:31:19 AM2/18/09
to Django developers
On Feb 18, 2:28 pm, mrts <m...@mrts.pri.ee> wrote:
>    def validate(self, value, all_values={}, form_instance=None):

That should have been

def validate(self, value, all_values={}):

Honza Král

unread,
Feb 18, 2009, 1:03:43 PM2/18/09
to django-d...@googlegroups.com
Hi,
see inline text.

On Wed, Feb 18, 2009 at 1:28 PM, mrts <mr...@mrts.pri.ee> wrote:
>
> The last unsolved model-validation design issue is the error message
> protocol (my work on fields is waiting on it). Let me present the
> approach that looks sane to me.
>
> The old Field.default_error_messages dict and corresponding
> logic is no longer required (and removed in my branch) as default
> error messages live in core/validators now.

What if I want to use the same error_messages on multiple instances of
a given field?

Now I would subclass the FormField and override it's default_messages,
with your approach it would require overriding __init__. I am not
saying that that is a bad thing, just that it still has a reason.

Besides not all validation will be done inside validators, some will
remain on the Field classes.
What if then validators want to raise multiple messages? Of FormFields
for that matter?
for example:

"Must be greated than 1000."
"If field Y is supplied, X must be greater than Y."
"I don't like your shoes!"

Would they then raise ValidationErrorComposite instead? So you just
pushed the check to a different layer. I agree that it makes more
sense in it's new location, but the distinction still has to be made.

And how would ValidationErrorComposite.add(exception) work then? It
would have to check the exception type to know whether to append or
extend it's message list.

I agree that handling of different datat types (string, list or dict)
in ValidationError isn't pretty, but it IMHO greatly simplifies it's
usage and handling.

>
> It will be used as follows (example for form field, model field
> similar):
>
> class Field(...):
> ...
> def clean(self, value, all_values={}):
> if value is not None:
> value = self.to_python(value)
> self.validate(value, all_values)
> return value
>
> def validate(self, value, all_values={}, form_instance=None):

unrelated to custom messages - you cannot pass all_values to field for
validation, because at the time field validation is called, not all
fields have been cleaned. That's why validator handling is done in
(form/model).clean.

mrts

unread,
Feb 18, 2009, 2:05:23 PM2/18/09
to Django developers
On Feb 18, 8:03 pm, Honza Král <honza.k...@gmail.com> wrote:
> Hi,
> see inline text.
>
> On Wed, Feb 18, 2009 at 1:28 PM, mrts <m...@mrts.pri.ee> wrote:
>
> > The last unsolved model-validation design issue is the error message
> > protocol (my work on fields is waiting on it). Let me present the
> > approach that looks sane to me.
>
> > The old Field.default_error_messages dict and corresponding
> > logic is no longer required (and removed in my branch) as default
> > error messages live in core/validators now.
>
> What if I want to use the same error_messages on multiple instances of
> a given field?
>
> Now I would subclass the FormField and override it's default_messages,
> with your approach it would require overriding __init__. I am not
> saying that that is a bad thing, just that it still has a reason.

Yup, design decision needed:

is the need for class-wide overrides of error messages common enough
to justify leaving it as-is (considering that this can still be done
via __init__)?


> Besides not all validation will be done inside validators, some will
> remain on the Field classes.

As of now, I've managed to weed out all validation from fields into
validators. The result is not entirely stable yet and it brings in
minor backwards-compatibility issues, so it remains to be seen if this
is entirely viable.
In core/validators there are none that need to raise multiple error
messages.

> Would they then raise ValidationErrorComposite instead? So you just
> pushed the check to a different layer. I agree that it makes more
> sense in it's new location, but the distinction still has to be made.

Yes, if that need arises, they can throw a ValidationErrorComposite.

> And how would ValidationErrorComposite.add(exception) work then? It
> would have to check the exception type to know whether to append or
> extend it's message list.

I see no problem there.

def add(e):
if isinstance(e, ValidationErrorComposite):
add the collection of ValidationErrors to self.errors
else:
add the single e to self.errors

> I agree that handling of different datat types (string, list or dict)
> in ValidationError isn't pretty, but it IMHO greatly simplifies it's
> usage and handling.

Let's just leave it to core devs to decide.

> > It will be used as follows (example for form field, model field
> > similar):
>
> > class Field(...):
> >   ...
> >   def clean(self, value, all_values={}):
> >       if value is not None:
> >           value = self.to_python(value)
> >       self.validate(value, all_values)
> >       return value
>
> >   def validate(self, value, all_values={}, form_instance=None):
>
> unrelated to custom messages - you cannot pass all_values to field for
> validation, because at the time field validation is called, not all
> fields have been cleaned. That's why validator handling is done in
> (form/model).clean.

Ah, sorry, there was just an unintentional omission in my example. It
should have obviously been as follows:

class Form(...):
...
def full_clean(self):
...
for name, field in self.fields.items():
try:
...
# self.cleaned_data was missing
value = field.clean(value, self.cleaned_data)
...
except ValidationError, e:
...

But regarding the concern that not all fields have been cleaned --
AFAIK this has always been the constraint: you can only refer to
fields declared before the current one while using cleaned_data in
validation.

Malcolm Tredinnick

unread,
Feb 18, 2009, 5:49:34 PM2/18/09
to django-d...@googlegroups.com
On Wed, 2009-02-18 at 04:28 -0800, mrts wrote:
> The last unsolved model-validation design issue is the error message
> protocol (my work on fields is waiting on it).

Well, it's the the last issue, but it's certainly a stumbling block.
I've been spending quite a lot of time lately trying to merge model
validation into trunk and, as you'll see in the 1.1. announcement, it's
not really there yet. There are a number of backwards incompatibilities
that have been introduced along the way and some of the API is fairly
hard to use.

> Let me present the
> approach that looks sane to me.
>
> The old Field.default_error_messages dict and corresponding
> logic is no longer required (and removed in my branch) as default
> error messages live in core/validators now.

It *is* still required. Firstly, it's a very useful way of specifying
holistic error messages. Secondly, removing it would be backwards
incompatible in a major way (many custom fields would no longer work
properly, for example).


[...]


> In reality, ValidationError contains only a single error message. The
> list is required for aggregating several errors from the validator
> collection associated with a field.


> However, as of now, the
> ValidationError class tries to be both a single value and a
> collection, resulting in quite a mess.

Definitely. I've managed to simplify it somewhat, but it requires some
conventions. Part of the problems come from trying to make model field
validation and form field validation so alike in an area where it isn't
necessarily so. It turns out to be easier to have
exceptions.ValidationError be more dictionary-like by default, although
I'm still playing with the API a bit there.

There are two possibilities here. One is that if a default error message
exists for the field, we only display that when a validation error
occurs. Otherwise we aggregate the individual validation error messages.

The other approach is to always aggregate but we have the convention
that the default (field-wide) error message is descriptive, in the sense
that it describes what is expected ("must be an integer between 1 and
101"), whilst the individual validator error messages are more
proscriptive about the actual problem that has been found ("is not an
integer"). The latter always state a problem, not a requirement, so they
read well when aggregated with the description.

>
> This is a classic case of the Composite pattern (http://
> en.wikipedia.org/wiki/Composite_pattern ), so I propose that
> * ValidationErrors remain single values
> * core/validators gets an additional ValidationErrorComposite class
> that derives from ValidationError and has the same interface

That's pretty good analysis. I've done the same thing in the merged code
I've been doing, except slightly differently.
Django.core.exceptions.ValidationError is the composite collection of
error messages.

Django.forms.utils.ValidationError, which you guys had nuked, remains
the way to make a single HTML presentation of form field errors (that
had to go back anyway for backwards-compatibility reasons, since
otherwise every single custom field cleaning method would potentially
break, at a minimum). There's a constructor for forms.ValidationError
that takes an exceptions.ValidationError to do the conversion.

Presenting model field validation errors isn't something that we have a
default for, since they aren't tied to HTML output or anything like
that. If presenting them as HTML output, the forms.ValidationError class
could be used (that's how ModelForms would do it), but, for other cases,
pulling out the errors on an as-needs required

When I get out from under some work stuff this week, I'm going to commit
what I've been merging (it a merge of yours and Honza's git branches and
a lot of editing and review changes) into a subversion branch so that
more people can test it. I'm then happy to pull in and review changes
from git branches if you guys want to work on it there, but we need to
have this done in a more general way. We've all dropped the ball a bit
-- I thought the problems that were creeping in were going to simple to
fix up in review; I was wrong. It's the broken-windows problem, where
one little variation from "great" leads to another and another, without
realising it.


Regards,
Malcolm

mrts

unread,
Feb 19, 2009, 6:03:58 PM2/19/09
to Django developers
On Feb 19, 12:49 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> On Wed, 2009-02-18 at 04:28 -0800, mrts wrote:
> > The old Field.default_error_messages dict and corresponding
> > logic is no longer required (and removed in my branch) as default
> > error messages live in core/validators now.
>
> It *is* still required. Firstly, it's a very useful way of specifying
> holistic error messages. Secondly, removing it would be backwards
> incompatible in a major way (many custom fields would no longer work
> properly, for example).

Agreed.

> When I get out from under some work stuff this week, I'm going to commit
> what I've been merging (it a merge of yours and Honza's git branches and
> a lot of editing and review changes) into a subversion branch so that
> more people can test it. I'm then happy to pull in and review changes
> from git branches if you guys want to work on it there, but we need to
> have this done in a more general way. We've all dropped the ball a bit
> -- I thought the problems that were creeping in were going to simple to
> fix up in review; I was wrong. It's the broken-windows problem, where
> one little variation from "great" leads to another and another, without
> realising it.

Sounds good. Keep us updated and let know if you need us to work on
something.

Best,
Mart
Reply all
Reply to author
Forward
0 new messages