The model API and strong typing

180 views
Skip to first unread message

Martin Matusiak

unread,
Mar 1, 2014, 3:14:19 AM3/1/14
to django-d...@googlegroups.com
Hi django-dev,

I have often come across bugs in code where attributes of a model
instance were being assigned invalid values like None. Later on
someone tries to save the model which blows up with an IntegrityError.

Say you have a model like this:

class User(models.Model):
name = models.CharField(max_length=20)

I can assign anything I want to this attribute:

bill = User(name='bill')
bill.name = None
bill.name = 1
bill.name = {}
bill.name = User

At some point I'll try to do bill.save() and the database will reject
my incorrectly typed values. But in a large codebase this might happen
far away from the point where the bad assignment was made and it's
hard to track it down.

Now in the case of sqlite, which is very permissive, it will happily
accept values like 1 and {} and store them as "1" and "{}" (wtf)
respectively. So if I'm running my tests against sqlite I won't even
know that this will bite me in production.

The thing is that it would be pretty easy to provide a strongly typed
model layer using descriptors. So that every time I assign to
bill.name it will raise ValidationError if the value isn't a string,
if the string is longer than max_length etc. Is there a rationale for
why we don't do this?


Martin

Petite Abeille

unread,
Mar 1, 2014, 9:38:18 AM3/1/14
to django-d...@googlegroups.com

On Mar 1, 2014, at 9:14 AM, Martin Matusiak <nume...@gmail.com> wrote:

> Is there a rationale for why we don't do this?

Perhaps because handling data constraint is the db job?

In the case of sqlite, which is very free form, simply define the proper check constraints.

For example:

create table foo
(
bar text not null,
check( typeof( bar ) = 'text' and length( bar ) <= 10 )
);


sqlite> insert into foo values( 1 );
Error: CHECK constraint failed: foo

sqlite> insert into foo values( '12345678901' );
Error: CHECK constraint failed: foo

Aymeric Augustin

unread,
Mar 1, 2014, 12:07:08 PM3/1/14
to django-d...@googlegroups.com
On 1 mars 2014, at 09:14, Martin Matusiak <nume...@gmail.com> wrote:

> Now in the case of sqlite, which is very permissive, it will happily
> accept values like 1 and {} and store them as "1" and "{}" (wtf)
> respectively. So if I'm running my tests against sqlite I won't even
> know that this will bite me in production.

You should never, ever, for any reason, use a different database in
development, test and production. It’s a bad idea. It doesn’t work.

> The thing is that it would be pretty easy to provide a strongly typed
> model layer using descriptors. So that every time I assign to
> bill.name it will raise ValidationError if the value isn't a string,
> if the string is longer than max_length etc. Is there a rationale for
> why we don't do this?

An important reason is performance. If creating and saving an instance with
10 fields took 0.01 seconds, that would be poor. For the same reason,
Model.save() doesn’t call Model.full_clean() (unless you override it).

A structural reason is that you can’t tell “assigning an attribute to an
instance in order to save it to the database later” apart from “assigning an
attribute to an instance being loaded from the database”. So your proposal
doesn’t play nice with lazy loading of related instances, for example.

Another reason is the impossibility of running validations involving multiple
fields. If a validator depends on the value of two fields, one has to be set
before the other, and you can’t validate at that point.

To sum up, I think this would be very hard to implement and not provide a
consistent user experience in the end.

--
Aymeric.

Russell Keith-Magee

unread,
Mar 1, 2014, 6:29:43 PM3/1/14
to Django Developers
On Sun, Mar 2, 2014 at 1:07 AM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:
On 1 mars 2014, at 09:14, Martin Matusiak <nume...@gmail.com> wrote:

> Now in the case of sqlite, which is very permissive, it will happily
> accept values like 1 and {} and store them as "1" and "{}" (wtf)
> respectively. So if I'm running my tests against sqlite I won't even
> know that this will bite me in production.

You should never, ever, for any reason, use a different database in
development, test and production. It's a bad idea. It doesn't work.

> The thing is that it would be pretty easy to provide a strongly typed
> model layer using descriptors. So that every time I assign to
> bill.name it will raise ValidationError if the value isn't a string,
> if the string is longer than max_length etc. Is there a rationale for
> why we don't do this?

An important reason is performance. If creating and saving an instance with
10 fields took 0.01 seconds, that would be poor. For the same reason,
Model.save() doesn't call Model.full_clean() (unless you override it).

Correcting the record here - the reason Model.save() doesn't call Model.full_clean() is historical, not performance. Model validation wasn't added until v1.2, and we couldn't work out a way to introduce enforcement of model validation without potentially breaking existing code. 

There may be a performance reason as well, but that was a secondary concern at the time.
 
A structural reason is that you can't tell "assigning an attribute to an
instance in order to save it to the database later" apart from "assigning an
attribute to an instance being loaded from the database". So your proposal
doesn't play nice with lazy loading of related instances, for example.

Another reason is the impossibility of running validations involving multiple
fields. If a validator depends on the value of two fields, one has to be set
before the other, and you can't validate at that point.

For me, this is the bigger reason why validation at time of assignment isn't viable. 

In order to do multi-field validation at time of assignment, we'd need to introduce some sort of transactional behaviour. Just thinking about this makes my brain hurt.

Alternatively, we'd need to introduce validation for single fields, but not enforce cross-field validation until object save. This strikes me as fundamentally confusing behaviour.
 
To sum up, I think this would be very hard to implement and not provide a
consistent user experience in the end.

I concur.

Yours,
Russ Magee %-) 

Shai Berger

unread,
Mar 2, 2014, 2:49:35 AM3/2/14
to django-d...@googlegroups.com
On Saturday 01 March 2014 19:07:08 Aymeric Augustin wrote:
> On 1 mars 2014, at 09:14, Martin Matusiak <nume...@gmail.com> wrote:
> > Now in the case of sqlite, which is very permissive, it will happily
> > accept values like 1 and {} and store them as "1" and "{}" (wtf)
> > respectively. So if I'm running my tests against sqlite I won't even
> > know that this will bite me in production.
>
> You should never, ever, for any reason, use a different database in
> development, test and production. It's a bad idea. It doesn't work.
>
... except when you're developing for more than one database. Then you're
allowed to use just one in development, but you should have CI covering all
your targets for testing. Separating testing from production really has no
excuse, AFAICT.

Shai.

Aymeric Augustin

unread,
Mar 2, 2014, 3:24:13 AM3/2/14
to django-d...@googlegroups.com
On 2 mars 2014, at 08:49, Shai Berger <sh...@platonix.com> wrote:

> On Saturday 01 March 2014 19:07:08 Aymeric Augustin wrote:
>
>> You should never, ever, for any reason, use a different database in
>> development, test and production. It's a bad idea. It doesn't work.
>>
> ... except when you're developing for more than one database. Then you're
> allowed to use just one in development

Fair enough.

> but you should have CI covering all your targets for testing. Separating
> testing from production really has no excuse, AFAICT.

I’ve heard about a web framework that doesn’t have CI for all its targets… ;-)

--
Aymeric.

Shai Berger

unread,
Mar 2, 2014, 3:30:33 AM3/2/14
to django-d...@googlegroups.com
> targets... ;-)

We tried, and we'll keep trying. And actually, I do run CI for the missing
target, it's just not public.

But yes, as you said: Fair enough.

Shai.

Piper Merriam

unread,
Mar 3, 2014, 3:46:39 PM3/3/14
to django-d...@googlegroups.com


On Saturday, March 1, 2014 4:29:43 PM UTC-7, Russell Keith-Magee wrote:

On Sun, Mar 2, 2014 at 1:07 AM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:
On 1 mars 2014, at 09:14, Martin Matusiak <nume...@gmail.com> wrote:

A structural reason is that you can't tell "assigning an attribute to an
instance in order to save it to the database later" apart from "assigning an
attribute to an instance being loaded from the database". So your proposal
doesn't play nice with lazy loading of related instances, for example.

Another reason is the impossibility of running validations involving multiple
fields. If a validator depends on the value of two fields, one has to be set
before the other, and you can't validate at that point.

For me, this is the bigger reason why validation at time of assignment isn't viable. 

In order to do multi-field validation at time of assignment, we'd need to introduce some sort of transactional behaviour. Just thinking about this makes my brain hurt.

 
Russ

This got me thinking about what the API for doing multi-field validation could look like.  I came up with two ideas that may or may not hold water.

The simplest solution I could think of would be a setter method on the model that took **kwargs and set them on the model, calling validation after they've all been set.  Kind of a gross API, but it should work.

The maybe cleaner idea would be a context manager that deferred validation till the end of the context block.

with instance.validation_block():
    x = 4
    y = 5


Here's a gist that hopefully demonstrates the idea.  https://gist.github.com/pipermerriam/e858cd6f2c2535d8868d
 
I'm +0 on this idea overall, but I think it'd be nice to have a better system for model validation than the current state of things.  I'm just not sure what that looks like.

Russell Keith-Magee

unread,
Mar 3, 2014, 6:59:33 PM3/3/14
to Django Developers
On Tue, Mar 4, 2014 at 4:46 AM, Piper Merriam <aaronm...@gmail.com> wrote:


On Saturday, March 1, 2014 4:29:43 PM UTC-7, Russell Keith-Magee wrote:

On Sun, Mar 2, 2014 at 1:07 AM, Aymeric Augustin <aymeric....@polytechnique.org> wrote:

On 1 mars 2014, at 09:14, Martin Matusiak <nume...@gmail.com> wrote:

A structural reason is that you can't tell "assigning an attribute to an
instance in order to save it to the database later" apart from "assigning an
attribute to an instance being loaded from the database". So your proposal
doesn't play nice with lazy loading of related instances, for example.

Another reason is the impossibility of running validations involving multiple
fields. If a validator depends on the value of two fields, one has to be set
before the other, and you can't validate at that point.

For me, this is the bigger reason why validation at time of assignment isn't viable. 

In order to do multi-field validation at time of assignment, we'd need to introduce some sort of transactional behaviour. Just thinking about this makes my brain hurt.

 
Russ

This got me thinking about what the API for doing multi-field validation could look like.  I came up with two ideas that may or may not hold water.

The simplest solution I could think of would be a setter method on the model that took **kwargs and set them on the model, calling validation after they've all been set.  Kind of a gross API, but it should work.

So, for the low low price of preventing you from using = for assignment, you get data validation! 

Not sure I see this as an improvement :-)
 
The maybe cleaner idea would be a context manager that deferred validation till the end of the context block.

with instance.validation_block():
    x = 4
    y = 5

 
Yes - like I said, transactional behaviour. And it makes my brain hurt thinking about it :-)

Yours,
Russ Magee %-)
Reply all
Reply to author
Forward
0 new messages