Making models validation-aware

5 views
Skip to first unread message

Adrian Holovaty

unread,
Jan 22, 2006, 2:43:22 AM1/22/06
to django-d...@googlegroups.com
I've been messing with manipulator code in magic-removal, and it
struck me that we could avoid a lot of the boilerplate, cruft and
newbie confusion by making models validation-aware. Example:

class Poll(Model):
pub_date = DateTimeField()
question = CharField(maxlength=30)
num_votes = IntegerField()

>>> a = Poll(pub_date='2005-01-01', question='How are you?')

# a.errors is a property that returns a dictionary
# of field names to lists of errors. Maybe this could
# be named something else, so as not to clash with the
# field namespace -- perhaps "_errors" with a leading
# underscore.
>>> a.errors
{'num_votes': ['This field is required']}

>>> a.save()
Traceback (most recent call last):
...
CannotSave: Poll object has validation errors

# Set the attribute to fix the validation error.
>>> a.num_votes = 3

# No more errors.
>>> a.errors
{}

# Saves to database successfully.
>>> a.save()

This would mean a model would do validation each time save() is
called. We could offer a _no_validate=True override to bypass
validation for performance (if, say, you're bulk-loading a ton of data
and are confident it's in the correct format).

>>> p = Person(name='John')
>>> p._no_validate = True
>>> p.save() # Wouldn't run validation.

Form validation and redisplay would get a heckuvalot easier with this.

p = Poll(pub_date=request.POST.get('pub_date'),
question=request.POST.get('question'),
num_votes=request.POST.get('num_votes'))
if not p.errors:
p.save()
else:
# Display form, passing it the "p" object.

This would also mean the automatic manipulators would go away, because
they wouldn't be needed. (The manipulator framework would stay,
because it's the way to do form validation/redisplay that's decoupled
from models.)

Thoughts?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Tom Tobin

unread,
Jan 22, 2006, 3:35:48 AM1/22/06
to django-d...@googlegroups.com
On 1/22/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> I've been messing with manipulator code in magic-removal, and it
> struck me that we could avoid a lot of the boilerplate, cruft and
> newbie confusion by making models validation-aware.

+1; this would even be useful in non-standard scenarios. E.g., I have
a cronjob for my feed aggregator which is regularly pulling in
potentially bad data; being able to leverage the admin/form validation
code here would be great.

hugo

unread,
Jan 22, 2006, 6:11:11 AM1/22/06
to Django developers
># a.errors is a property that returns a dictionary
> # of field names to lists of errors. Maybe this could
> # be named something else, so as not to clash with the
> # field namespace -- perhaps "_errors" with a leading
> # underscore.

Make that a defined Exception instead of an attribute and I might like
it. Something along these lines:

try:
p = Poll(....)
except ValidationErrors, e:
print e.errors

So the ValidationErrors would be an exception that collects all
ValidationError exceptions that happen in the validation phase. We
really should transport exceptions and notifications by the mechanism
that is designed to transport those in Python ;-)

Oh, and we have to think about things like attribute assignement: do we
want validation happen on every __setattr__? And what about needed
format conversions - should every attribute now accept strings and do
silent conversion behind the scenes (sorry, but that smells like Perl
to me ;-) )?

bye, Georg

Adrian Holovaty

unread,
Jan 22, 2006, 5:12:28 PM1/22/06
to django-d...@googlegroups.com
On 1/22/06, hugo <g...@hugo.westfalen.de> wrote:
> Make that a defined Exception instead of an attribute and I might like
> it. Something along these lines:
>
> try:
> p = Poll(....)
> except ValidationErrors, e:
> print e.errors
>
> So the ValidationErrors would be an exception that collects all
> ValidationError exceptions that happen in the validation phase. We
> really should transport exceptions and notifications by the mechanism
> that is designed to transport those in Python ;-)

The reason I was thinking of using an attribute rather than raising
exceptions is for developer convenience. What if I wanted to do
something like this:

>>> p = Poll(pub_date='2005-01-01', question='How are you?')
>>> p.num_votes = 3
>>> p.save()

If the system raised validation exceptions every time data was set on
a model object, the above code wouldn't be possible. Here's what would
happen:

>>> p = Poll(pub_date='2005-01-01', question='How are you?')


Traceback (most recent call last):
...

ValidationError: {'num_votes': ['This field is required.']}

So that would mean one couldn't pass in *some* of the fields in a
model's __init__() and fill in the rest later, because the object
would be validated each time you added stuff to it. This seems quite
annoying.

What I'm suggesting is that validation only happens when you do
save(), or when you access the "errors" attribute. This means you can
set data on the model to your heart's content until you actually want
to validate it.

> Oh, and we have to think about things like attribute assignement: do we
> want validation happen on every __setattr__? And what about needed
> format conversions - should every attribute now accept strings and do
> silent conversion behind the scenes (sorry, but that smells like Perl
> to me ;-) )?

This is a good question. I don't see a *huge* problem with allowing
attributes to accept strings. They already do, essentially -- even
date fields accept them. We'd make it so that each Field knew how to
convert its input data into the requisite Python type (datetime or
whatever).

Maniac

unread,
Jan 22, 2006, 7:29:40 PM1/22/06
to django-d...@googlegroups.com
Adrian Holovaty wrote:

> p = Poll(pub_date=request.POST.get('pub_date'),
> question=request.POST.get('question'),
> num_votes=request.POST.get('num_votes'))
>
>

Some automation is needed for this. Currently automatic manipulators can
deal with copy of POST dict themselves. This saves from synchronizing
view code with model's fields.

Robert Wittams

unread,
Jan 22, 2006, 7:58:57 PM1/22/06
to django-d...@googlegroups.com
I don't have any time to do anything about this, but I can't really see
how this is going to work. Eg, edit inline. Is your plan now to totally
destroy edit inline for non-ajax uses? Also, the whole problem that the
follow argument to manipulators solves arises again ( how to deal with
partially updating an object).

> Form validation and redisplay would get a heckuvalot easier with this.
>
> p = Poll(pub_date=request.POST.get('pub_date'),
> question=request.POST.get('question'),
> num_votes=request.POST.get('num_votes'))

So here we run into the old "MultiValueDict is an idiotic leaky
abstraction" problem. Do users remember for themselves whether to use
.get or .getlist for each value? What do they do here for files and
many-to-manys? Do people really enjoy repeating request.POST.get, and
remembering all the constructor arguments to their models?

> if not p.errors:
> p.save()
> else:
> # Display form, passing it the "p" object.

Totally ignores the previous discussion about file fields and the need
to use multiple requests to make things work vaguely sanely.

> This would also mean the automatic manipulators would go away, because
> they wouldn't be needed.

But generic views and the admin would keep working by magic?

> (The manipulator framework would stay,
> because it's the way to do form validation/redisplay that's decoupled
> from models.)

And what is the "manipulator framework", if it is not the automatic
manipulators? I really don't get what would "go away", and what would stay.

Sorry I couldn't finish what I wanted to do with manipulators... though
I did think it was pretty obvious how to finish it off ( ie I had pretty
much spelled it out on the mailing list.) Slightly suprised that you
reverted to the (imo horrific) code from trunk...

Obviously, you can do it any way you like, I just really don't see that
this way is going to work well at all. Good luck.

Rob

hugo

unread,
Jan 23, 2006, 4:34:51 AM1/23/06
to Django developers
This:

>What I'm suggesting is that validation only happens when you do
>save(), or when you access the "errors" attribute. This means you can
>set data on the model to your heart's content until you actually want
>to validate it.

won't work with that:

>This is a good question. I don't see a *huge* problem with allowing
>attributes to accept strings. They already do, essentially -- even
>date fields accept them. We'd make it so that each Field knew how to
>convert its input data into the requisite Python type (datetime or
>whatever).

Reason: when using the above partial-create-and-later-assign-rest way
of creating objects, either the conversion will already happen in the
creation/assignement - and in that case it will throw validation errors
(for example if a string isn't a valid date format), or the object will
be invalid with regard to it's model definition until the .save()
happens. I don't think I would like the second version - where the
object might contain still string data, because it wasn't saved, yet.
In that case I would much prefer the validation happen as early as
possible - but that would require to at least provide all required
fields in the object creation, because otherwise the validation would
kick in on object creation.

bye, Georg

jacobian

unread,
Jan 23, 2006, 10:25:41 AM1/23/06
to Django developers
Adrian Holovaty wrote:
> I've been messing with manipulator code in magic-removal, and it
> struck me that we could avoid a lot of the boilerplate, cruft and
> newbie confusion by making models validation-aware.

In general, I think this is a very good idea. I can't say I'm 100%
sold on the implementation you propose; in particular, I'm not very
happy with ``obj.errors``. As you point out, this is a possible
conflict with field names, and I think "no field name restrictions" is
somewhere in Django's constitution, so...

I also very much don't want valdation happening on each __setattr__.
Consider the following::

o = MyObject()
for field in some_dict:
value = munge(some_dict[field])
setattr(o, field, value)

This is a pretty common pattern in data import scripts and the like;
the overhead of running validation multiple times (when you really only
need to run it once) is annoying and (possibly) slow.

Also, it raises the question about what do to about validators like
``RequiredIfOtherFieldGiven`` or whatever.

So, here's my proposal: make the validation step only happen upon
``save()`` unless you specifically ask for it. For example::

p = Poll(**some_data)
try:
p.save()
except ValidationErrors, e:
# e.errors is something like {'num_votes': ['This field is
required']}
do_something_with_errors(e.errors)

Or, as you might use in a view::

p = Poll(**some_data)
errors = p.validate()
if errors:
# errors is as above
do_something_with_errors(e)
else:
p.save()

How's that sound?

Jacob

Joseph Kocherhans

unread,
Jan 23, 2006, 12:02:01 PM1/23/06
to django-d...@googlegroups.com
On 1/22/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> I've been messing with manipulator code in magic-removal, and it
> struck me that we could avoid a lot of the boilerplate, cruft and
> newbie confusion by making models validation-aware.

Let me see if I can clarify the new use of manipulators here. You
either use a manipulator *or* you have to explicitly set every
attribute of your model. Only manipulators know which elements from
the POST data ought to be passed into a model's constructor via
something like **kwargs, so you'd have to use one if you didn't want
to go through the MyModel(myattr=request.POST.get('myattr') stuff.
This makes sense to me. One less concept for newbies to learn, even
though I think I'd use the manipulator 99% of the time. Manipulators
would still take care of the edit_inline stuff, and the admin system
and generic views would most likely still use manipulators.

I would add that I'd like the validator to be separate from the model.
In other words, the model should *use* a validator, not *be* a
validator. I think it should be possible to replace the validator just
like you can replace the default manager. (validator_list should
probably be passed into the validator's constructor)

As far as the p._no_validation syntax goes, can't we just do
p.save(validate=False)? That seems a lot cleaner to me than a "magic"
attribute. A 'validate_on_save' attribute for Model Meta classes would
be nice too, but maybe YAGNI.

Joseph

Adrian Holovaty

unread,
Jan 23, 2006, 6:06:46 PM1/23/06
to django-d...@googlegroups.com
On 1/23/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> Let me see if I can clarify the new use of manipulators here. You
> either use a manipulator *or* you have to explicitly set every
> attribute of your model. Only manipulators know which elements from
> the POST data ought to be passed into a model's constructor via
> something like **kwargs, so you'd have to use one if you didn't want
> to go through the MyModel(myattr=request.POST.get('myattr') stuff.

Almost. :) Under this proposal, the automatic manipulators would no
longer exist. You bring up a good point about dealing with the POST
data, though, so we could add a helper function that takes a model and
request.POST and returns a populated model instance.

> This makes sense to me. One less concept for newbies to learn, even
> though I think I'd use the manipulator 99% of the time. Manipulators
> would still take care of the edit_inline stuff, and the admin system
> and generic views would most likely still use manipulators.

I don't see a need for edit_inline to be used outside of the admin, so
that complexity would be removed. The admin itself would handle
edit_inline (through much simpler code).

> I would add that I'd like the validator to be separate from the model.
> In other words, the model should *use* a validator, not *be* a
> validator. I think it should be possible to replace the validator just
> like you can replace the default manager. (validator_list should
> probably be passed into the validator's constructor)

Right -- fields could still be passed validator_list, as they currently are.

> As far as the p._no_validation syntax goes, can't we just do
> p.save(validate=False)? That seems a lot cleaner to me than a "magic"
> attribute. A 'validate_on_save' attribute for Model Meta classes would
> be nice too, but maybe YAGNI.

Nice idea! That's a lot cleaner than the magic attribute.

Reply all
Reply to author
Forward
0 new messages