Model aware validation for newforms (#6845)

2 views
Skip to first unread message

Honza Král

unread,
Mar 20, 2008, 4:15:08 PM3/20/08
to django-d...@googlegroups.com
Hi all,
during a sprint at PyCON 2008 I wrote first simple version of model validation.

Main features:
* model fields can define list of validators (kwarg {{{validators}}})
- validator is a function that accepts one argument and throws
{{{django.core.validation.validationError}}} if anything doesn't look
right
* models now have a validate method that
* call the validation for fields
* can be overridden to allow for custom validation
* raises {{{django.core.validation.validationError}}} if anything goes amiss
* formfields' {{{clean}}} method has been deprecated and split into
two - {{{to_python}}} and {{{validate}}}
* {{{to_python}}} does the type coersion, throws
{{{django.core.validation.TypeCoersionError}}}
* {{{validate}}} doesnt return anything, just raises an exception
* new kwarg {{{validators}}} allows for passing validators to
formfields as well
* form method clean has been renamed to validate and also doesn't
return anything.
* hooks for validation individual fields on form should now be
named {{{validate_FIELD_NAME}}} and take the value to validate as only
parameter


Steps that remain to be done:
* verify how much code has been broken by this
* clean the code, update comments
* add more tests for the new behavior
* add documentation
* add validation to formsets to validate for unique and
unique_together across all the forms in the formset
* allow to turn some of the validation off ???
* unique(_together) for performance reasons

see patch at
http://code.djangoproject.com/ticket/6845


any feedback is welcome, please note that it is a first working draft,
many things are probably broken and much work remains to be done


--
Honza Král
E-Mail: Honza...@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

Rajeev J Sebastian

unread,
Mar 20, 2008, 4:31:31 PM3/20/08
to django-d...@googlegroups.com
Hi Honza,

Good to see you overtake me in this :) (and I can be assured that I
will get a better implementation soon!)

One idea is to have validators be in three states:

Validated, Invalid and Partial.

Partial is where the system can automatically "correct" or "filter"
some value for a field.

modelform.is_valid() would return True if form is either Validated or
Partial (this causes partial to act as a filter).

modelform.is_partially_valid() would return True if only partially
validated, which allows to show the form again to the user for
verification (if the programmer wants to do that).

E.g. partial validators:

* 1-2001 can be corrected to 1-1-2001, by a
automatically-add-day-for-incomplete-date validator
* shit turns to s*** or even empty string, by a curse-word-filter validator

Anyway, its just a suggestion.

Regards
Rajeev J Sebastian

Malcolm Tredinnick

unread,
Mar 20, 2008, 5:20:17 PM3/20/08
to django-d...@googlegroups.com

On Thu, 2008-03-20 at 15:15 -0500, Honza Král wrote:
> Hi all,
> during a sprint at PyCON 2008 I wrote first simple version of model validation.
>
> Main features:
> * model fields can define list of validators (kwarg {{{validators}}})
> - validator is a function that accepts one argument and throws
> {{{django.core.validation.validationError}}} if anything doesn't look
> right

What is this single argument? Pretty much by definition model-aware
means it's going to get "self", right? You also need the value to
validate, which makes two arguments.

> * models now have a validate method that
> * call the validation for fields
> * can be overridden to allow for custom validation
> * raises {{{django.core.validation.validationError}}} if anything goes amiss
> * formfields' {{{clean}}} method has been deprecated and split into
> two - {{{to_python}}} and {{{validate}}}

Since you're always going to call to_python and then validate in the
common case, why deprecate clean()? Can't it just do the natural thing
and call those two one after another?

> * {{{to_python}}} does the type coersion, throws
> {{{django.core.validation.TypeCoersionError}}}

You'll want to be checking the spelling of "coercion" before going too
much further. :-)


> * {{{validate}}} doesnt return anything, just raises an exception
> * new kwarg {{{validators}}} allows for passing validators to
> formfields as well
> * form method clean has been renamed to validate and also doesn't
> return anything.
> * hooks for validation individual fields on form should now be
> named {{{validate_FIELD_NAME}}} and take the value to validate as only
> parameter
>
>
> Steps that remain to be done:
> * verify how much code has been broken by this

Every single custom form in all applications for a start. :-(

I realise the to_python/validate split is so that validators can be
reused, but this is a bit sad.

> * clean the code, update comments
> * add more tests for the new behavior
> * add documentation
> * add validation to formsets to validate for unique and
> unique_together across all the forms in the formset
> * allow to turn some of the validation off ???
> * unique(_together) for performance reasons
>
> see patch at
> http://code.djangoproject.com/ticket/6845

One bug that immediately jumps out from reading the patch: if you're
going to inherit from StrAndUnicode, it should be the first class. The
whole idea is that it provides the base __str__ wrapper and if you do
something like "class Foo(list, StrAndUnicode):...", a call to
str(some_foo) will call list.__str__, which isn't what you want.

I'm a little bit concerned to see a parameter called "form_data"
creeping into models. Is it really tied to forms or just a normal data
dictionary of field -> value mappings? Is it reasonable to say that
model population must use the right types? So forms call to_python on
their arguments and normal code populating models should use the right
types instead of being lazy and introducing hard-to-find errors.

You've introduce a class called LegacyField. What is a non-legacy field?
I tend to dislike the name "legacy" as rule anyway, since it implies old
and busted and it's not clear from the code why that's so. Why is a
LegacyField not just a Field, I guess is what I'm asking.

I see all the existing validators have been booted into
oldforms/validators.py, but I don't any replacements for them. Since
we've advertised that oldforms is deprecated and will be removed in one
release, what's the replacement situation? We have to write everything
ourselves or are there common things coming in a future patch?

I realise a lot of this was no doubt hashed out at PyCon, but this is
for those of us who weren't there. It generally looks reasonable once
coming to grips with the wish to split to_python from validation, which
I think is a good goal, but I don't think models should have to touch
to_python -- Python has well-defined types for a reason. I feel
uncomfortable about some parts and want to think about those some more,
but it's nice to see this back on track again.

Regards,
Malcolm


--
What if there were no hypothetical questions?
http://www.pointy-stick.com/blog/

Honza Král

unread,
Mar 20, 2008, 6:22:18 PM3/20/08
to django-d...@googlegroups.com
On Thu, Mar 20, 2008 at 4:20 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:
>
>
> On Thu, 2008-03-20 at 15:15 -0500, Honza Král wrote:
> > Hi all,
> > during a sprint at PyCON 2008 I wrote first simple version of model validation.
> >
> > Main features:
> > * model fields can define list of validators (kwarg {{{validators}}})
> > - validator is a function that accepts one argument and throws
> > {{{django.core.validation.validationError}}} if anything doesn't look
> > right
>
> What is this single argument? Pretty much by definition model-aware
> means it's going to get "self", right? You also need the value to
> validate, which makes two arguments.

no, this is a function validating just one value (not a method
associated with the model), if you want to validate across more than
one field, you need to do this in validate_FIELD method or directly in
validate() method on the model. Similar to the way neforms work.

>
>
> > * models now have a validate method that
> > * call the validation for fields
> > * can be overridden to allow for custom validation
> > * raises {{{django.core.validation.validationError}}} if anything goes amiss
> > * formfields' {{{clean}}} method has been deprecated and split into
> > two - {{{to_python}}} and {{{validate}}}
>
> Since you're always going to call to_python and then validate in the
> common case, why deprecate clean()? Can't it just do the natural thing
> and call those two one after another?

that how its now implemented in LegacyField - clean does (pseudocode):

value = to_python(value)
validate(value)
return value

I just want to separate the logic and encourage people to do it as
well, I don't really feel that strong about removing clean, but I
strongly believe that to_python and validate should be separate and
behave differently (returning value VS just throwing exceptions etc.)

>
> > * {{{to_python}}} does the type coersion, throws
> > {{{django.core.validation.TypeCoersionError}}}
>
> You'll want to be checking the spelling of "coercion" before going too
> much further. :-)

oops, thanks

>
>
>
> > * {{{validate}}} doesnt return anything, just raises an exception
> > * new kwarg {{{validators}}} allows for passing validators to
> > formfields as well
> > * form method clean has been renamed to validate and also doesn't
> > return anything.
> > * hooks for validation individual fields on form should now be
> > named {{{validate_FIELD_NAME}}} and take the value to validate as only
> > parameter
> >
> >
> > Steps that remain to be done:
> > * verify how much code has been broken by this
>
> Every single custom form in all applications for a start. :-(

yeah, its probably the clean on fields that inherit from builtin Field
classes, I thought that might happen

>
> I realise the to_python/validate split is so that validators can be
> reused, but this is a bit sad.

it doesn't have to be like that, its just a temporary brain death on
my part - I currently call to_python and validate in form.full_clean
instead of relying on clean and implementing clean as mentioned above.
it can be corrected - if we call field.clean() in form.full_clean()
the old code should still work.

>
>
> > * clean the code, update comments
> > * add more tests for the new behavior
> > * add documentation
> > * add validation to formsets to validate for unique and
> > unique_together across all the forms in the formset
> > * allow to turn some of the validation off ???
> > * unique(_together) for performance reasons
> >
> > see patch at
> > http://code.djangoproject.com/ticket/6845
>
> One bug that immediately jumps out from reading the patch: if you're
> going to inherit from StrAndUnicode, it should be the first class. The
> whole idea is that it provides the base __str__ wrapper and if you do
> something like "class Foo(list, StrAndUnicode):...", a call to
> str(some_foo) will call list.__str__, which isn't what you want.
>
> I'm a little bit concerned to see a parameter called "form_data"
> creeping into models. Is it really tied to forms or just a normal data
> dictionary of field -> value mappings?

it relies on it being a dictionary indexed by field names, if we name
it differently, it should be OK (something like new_data or data_dict)


Is it reasonable to say that
> model population must use the right types?

there is no magic involved, it just calls setattr, you can use any
type that you could use directly. The only thing changed is that a
call to validate doesn't change the data in background, it just does
that for the validation and keep the original values.

So forms call to_python on
> their arguments and normal code populating models should use the right
> types instead of being lazy and introducing hard-to-find errors.
>
> You've introduce a class called LegacyField. What is a non-legacy field?

yeah, as with all the names - I am terrible at naming things, the
LegacyField is the thing that broke all custom forms. it needs to be
repaired not to do so and a new name should be found. If we decide to
wrap to_python and validate inside a clean, then it could disappear
completely.

> I tend to dislike the name "legacy" as rule anyway, since it implies old
> and busted and it's not clear from the code why that's so. Why is a
> LegacyField not just a Field, I guess is what I'm asking.

see above

> I see all the existing validators have been booted into
> oldforms/validators.py, but I don't any replacements for them.

I only did a few -

validate_email
validate_ip_address4
validate_phone_number
validate_slug
validate_existing_url

I changed the naming since I didn't like the CamelCase and name
starting with is -- isMethod should imho return True/False, not raise
an exception

Since
> we've advertised that oldforms is deprecated and will be removed in one
> release, what's the replacement situation? We have to write everything
> ourselves or are there common things coming in a future patch?

I intend to transform all the validators that make sense in the new
schema of things (=those validating only one value separately) and
maybe some helper functions for the other (haven't thought this one
through but essentially something that would allow people to validate
across more fields without tedious coding in model.validate())

>
> I realise a lot of this was no doubt hashed out at PyCon, but this is
> for those of us who weren't there. It generally looks reasonable once
> coming to grips with the wish to split to_python from validation, which
> I think is a good goal, but I don't think models should have to touch
> to_python -- Python has well-defined types for a reason.

just to be clear - the to_python used in models was there when I
started (and has nothing to do with to_python I introduced to form
fields), I didn't add that and I didn't like to begin with. I don't
see why IntegerField or DateField should accept strings as its values
when we have a form library that returns a correct type. But that
would break even more code.


I feel
> uncomfortable about some parts and want to think about those some more,
> but it's nice to see this back on track again.

i am not 100% happy about it myself but I feel it is a reasonable
starting point for both in-depth discussions and implementation.


thanks very much for the input, I hope I answered some of your
questions, feel free to poke some more holes in the patch :)

>
> Regards,
> Malcolm
>
>
> --
> What if there were no hypothetical questions?
> http://www.pointy-stick.com/blog/
>
>
>
>
> >
>

--

Malcolm Tredinnick

unread,
Mar 21, 2008, 5:40:28 AM3/21/08
to django-d...@googlegroups.com

On Thu, 2008-03-20 at 17:22 -0500, Honza Král wrote:
> On Thu, Mar 20, 2008 at 4:20 PM, Malcolm Tredinnick
> <mal...@pointy-stick.com> wrote:
[...]

> > Is it reasonable to say that
> > model population must use the right types?
>
> there is no magic involved, it just calls setattr, you can use any
> type that you could use directly. The only thing changed is that a
> call to validate doesn't change the data in background, it just does
> that for the validation and keep the original values.

Unsurprisingly, I don't like that at all. :-(

It's not a question of magic, since there's none in sight here. It's a
question of being Python code, rather than just a storage bucket for
whatever people throw at it.

'2008-07-16' is *not* a valid date in Python. It's a string. So if
somebody hands me a model that they claim passes validate() and it has a
date field on it, I can't actually use Python's very standard date
functions on it in my code, because the date field might actually be a
string. In fact, since any type can be converted to a string and back,
every single time I talk to an allegedly valid model, I have to put it
"if isinstance(thingie, basestring):...". Models should be a Python
representation of data, not a human version.

One of the nice things about newform's current clean() is it normalises
data to the correct type as well as running validation. Writing
validators is then much easier. Similarly, I'd prefer that a model isn't
valid unless the attributes are the right type. So either people should
do the Right Thing(tm) and assign the right type -- creating a model
from a form it's easy, because things will have already been
to_python-ised and creating a model from code should be easy, too, since
you have access to datetime, etc -- or else model "validation" needs to
also normalise (and be called wash_rinse_iron_and_validate()) or we need
a convert_to_the_proper_type_so_as_not_to_drive_people_insane() method.

Right now we're making it difficult to write clean code because model
instances can't be confirmed as being really valid Python data, as
opposed to this polymorphic stuff validate() is proposed to return. What
you're talking about means the validation function is only saying "one
day I could convert this to proper Python if I felt like it. Here I am,
brain the size of a planet and they ask me to keep things as strings.
Call that job satisfaction? 'Cause I don't."


> So forms call to_python on
> > their arguments and normal code populating models should use the right
> > types instead of being lazy and introducing hard-to-find errors.
> >
> > You've introduce a class called LegacyField. What is a non-legacy field?
>
> yeah, as with all the names - I am terrible at naming things, the
> LegacyField is the thing that broke all custom forms. it needs to be
> repaired not to do so and a new name should be found. If we decide to
> wrap to_python and validate inside a clean, then it could disappear
> completely.

You didn't answer my question, though: Why is LegacyField needed? Why
can't they be just Field? I'm not understanding what part of the API for
a current Field won't work here. You may not need to go to all the hard
of picking a new name. Well known hard problem, that one. So let's avoid
it if we can.

> > I see all the existing validators have been booted into
> > oldforms/validators.py, but I don't any replacements for them.
>
> I only did a few -
>
> validate_email
> validate_ip_address4
> validate_phone_number

Oh, dear. Since validate_phone_number actually doesn't do so for the
vast majority of the planet, this would be a good time to move these
sorts of quaint North Americanisms into localflavor/us or
localflavor/ca. :-)

Ditto for validating US state fields. People have to update their code
in any case.


Thanks for the other answers. It explains where the code is coming from.

Regards,
Malcolm

--
Why can't you be a non-conformist like everyone else?
http://www.pointy-stick.com/blog/

Honza Král

unread,
Mar 22, 2008, 6:12:57 PM3/22/08
to django-d...@googlegroups.com
Hi,
I uploaded new version of the patch.


I added clean method on Model that saves the coerced types and than
runs validate() by the same logic is used in form fields.

I would still like to see the type coercion on django models gone -
Model.integer_field = '123' shouldn't imho work at all. Until (if
ever) we decide to do that, .to_python will do the coercion and allow
validate to work without knowledge of any types (f.to_python is only
called for values from new_data, and that's not saved).

>
>
> > So forms call to_python on
> > > their arguments and normal code populating models should use the right
> > > types instead of being lazy and introducing hard-to-find errors.
> > >
> > > You've introduce a class called LegacyField. What is a non-legacy field?
> >
> > yeah, as with all the names - I am terrible at naming things, the
> > LegacyField is the thing that broke all custom forms. it needs to be
> > repaired not to do so and a new name should be found. If we decide to
> > wrap to_python and validate inside a clean, then it could disappear
> > completely.
>
> You didn't answer my question, though: Why is LegacyField needed? Why
> can't they be just Field? I'm not understanding what part of the API for
> a current Field won't work here. You may not need to go to all the hard
> of picking a new name. Well known hard problem, that one. So let's avoid
> it if we can.

is't gone know, I rebooted my brain and figured that I don't really
have to call to_python and validate from form.full_clean and settled
for calling clean, which in turns calls to_python and validate by
default. Nothing should change for custom fields (didn't validate that
yet though)

>
>
> > > I see all the existing validators have been booted into
> > > oldforms/validators.py, but I don't any replacements for them.
> >
> > I only did a few -
> >
> > validate_email
> > validate_ip_address4
> > validate_phone_number
>
> Oh, dear. Since validate_phone_number actually doesn't do so for the
> vast majority of the planet, this would be a good time to move these
> sorts of quaint North Americanisms into localflavor/us or
> localflavor/ca. :-)
>
> Ditto for validating US state fields. People have to update their code
> in any case.

couldn't agree more, will do so in the next patch

>
>
> Thanks for the other answers. It explains where the code is coming from.
>
> Regards,
> Malcolm
>
> --
> Why can't you be a non-conformist like everyone else?
>
>
> http://www.pointy-stick.com/blog/
>
>
> >
>

--

Reply all
Reply to author
Forward
0 new messages