Deprecate change pk + save behavior for 1.4

189 views
Skip to first unread message

Anssi Kääriäinen

unread,
Nov 30, 2011, 2:13:57 AM11/30/11
to Django developers
I am suggesting the following PK change behavior will be deprecated in
1.4. The problem is this:
class SomeModel:
name = models.CharField(max_length=20, primary_key=True)

s = SomeModel(name='foo')
s.save()
s.name='bar'
s.save()

At this point you end up with both 'foo' and 'bar' objects in the DB.
This is counter intuitive. However this isn't that bad currently, as
natural primary keys aren't too common. I do hope that multipart
primary keys do get into Django some day. If you have a model:

class User:
firstname = models.CharField
lastname = models.CharField
pk = (firstname, lastname)

then if trying to change an users name in the most obvious way:

u = User.objects.get(firstname='Anssi', lastname='Kääriäinen')
u.firstname='Matti'
u.save()

results in both Anssi and Matti users in the DB, it is simply a bad
API. So I suggest deprecating this behavior in 1.4. Django doesn't
need to be intelligent enough to update & cascade the PK change,
although that would be a nice feature later on. Just erroring out on
PK change would be good enough.

The reason for doing the deprecation now is that it would be nice that
this behavior is already removed when multicolumn primary keys are
introduced into Django.

There is a ticket related to this: #2259.

- Anssi

Luke Plant

unread,
Nov 30, 2011, 6:42:23 AM11/30/11
to django-d...@googlegroups.com
On 30/11/11 07:13, Anssi Kääriäinen wrote:

> The reason for doing the deprecation now is that it would be nice that
> this behavior is already removed when multicolumn primary keys are
> introduced into Django.
>
> There is a ticket related to this: #2259.

Here is another that could be helped by this change, depending on
implementation - #14615
The decisions on that ticket basically boils down to the question of how
we detect a new object (which is waiting for PK from the DB). The
current solution of comparing with None (used in various places) fails
for nullable primary keys.

Luke

--
Noise proves nothing. Often a hen who has merely laid an egg
cackles as if she laid an asteroid.
-- Mark Twain

Luke Plant || http://lukeplant.me.uk/

Kääriäinen Anssi

unread,
Nov 30, 2011, 7:26:24 AM11/30/11
to django-d...@googlegroups.com
"""
> The reason for doing the deprecation now is that it would be nice that
> this behavior is already removed when multicolumn primary keys are
> introduced into Django.
>
> There is a ticket related to this: #2259.

Here is another that could be helped by this change, depending on
implementation - #14615
The decisions on that ticket basically boils down to the question of how
we detect a new object (which is waiting for PK from the DB). The
current solution of comparing with None (used in various places) fails
for nullable primary keys.
"""

I can think of two basic approaches to this: define a __setattr__ for Models,
and check if the pk is set after fetch from DB. This has at least three
problems:
1. It is likely that users have custom __setattr__ methods that do not use
super().__setattr__ but change the dict directly.
2. This way it is somewhat hard to detect if the PK has actually changed
or not. You can (and many users likely currently do) set the value to the same
value it is already.
3. This will make model __init__ slower (although there are tricks to mitigate
this effect).

The other way is storing old_pk in model._state, and compare the PK to
that when saving. If changed, error. This would work best if there was a
NoneMarker object for the cases where there is no PK from DB, so you could
solve #14615 easily, too.

This could result in somewhat larger memory usage. Although normally you
could store the same string (or other object) in db_pk as you store in the
__dict__ of the model. This would mean minimal memory overhead unless
you change a lot of PKs in one go. Are there problematic (mutable object
based) model fields, where you would need to store a copy of the field's
value? We could possibly have an attribute "mutable object based field" for
the problematic fields...

One way to mitigate the speed effect is use of AST for model init. I have done
some experiments about this, see: https://github.com/akaariai/ast_model. That
does come with its own problems, but if templates are going to be using AST,
then we could use it in other places needing speedups, too.

- Anssi

Ian Clelland

unread,
Nov 30, 2011, 2:25:17 PM11/30/11
to django-d...@googlegroups.com
Is this referring exclusively to natural, or user-specified primary key columns? Despite Luke's reference to nullable primary keys (are these even allowed by SQL?), a common idiom for copying objects is this:

    obj.pk = None
    obj.save()

I have used use this pattern in more instances than I can remember; whether for duplicating objects, or for making new variants of existing objects. I would hate to see the behaviour deprecated, or worse, for the old object to simply get reassigned a new (or null) id.

For changing natural primary key fields, I would prefer to see a pattern like this:

class User:
   firstname = models.CharField
   lastname = models.CharField
   pk = (firstname, lastname)

u = User.objects.get(firstname='Anssi', lastname='Kääriäinen')
u.firstname='Matti'
u.save(force_update=True)

(specifically, with the force_update parameter being required for a PK change). Then, as long as we store the original PK values, the object can be updated in place. A bare save() would work just as currently changing the id field does -- create a new row if possible, otherwise, update the row whose PK matches the new values.


--
Regards,
Ian Clelland
<clel...@gmail.com>

Kääriäinen Anssi

unread,
Nov 30, 2011, 3:40:42 PM11/30/11
to django-d...@googlegroups.com

"""
Is this referring exclusively to natural, or user-specified primary key columns? Despite Luke's reference to nullable primary keys (are these even allowed by SQL?), a common idiom for copying objects is this:
"""

Not allowed by SQL specification, but many databases do allow them (source wikipedia).

"""
obj.pk = None
obj.save()

I have used use this pattern in more instances than I can remember; whether for duplicating objects, or for making new variants of existing objects. I would hate to see the behaviour deprecated, or worse, for the old object to simply get reassigned a new (or null) id.
"""

If nullable primary keys are going to be allowed, then the above can not work. You would need to use NoneMarker in there, or .save() would need a kwarg for backwards compatibility mode. obj.clone() is still another possibility. Maybe nullable primary keys should be forbidden?

"""
For changing natural primary key fields, I would prefer to see a pattern like this:

class User:
firstname = models.CharField
lastname = models.CharField
pk = (firstname, lastname)

u = User.objects.get(firstname='Anssi', lastname='Kääriäinen')
u.firstname='Matti'
u.save(force_update=True)
"""

That is a possibility, although currently that has well defined meaning: try to update the object with pk ('Matti', 'Kääriäinen'), error if it does not exist in the DB.

"""
(specifically, with the force_update parameter being required for a PK change). Then, as long as we store the original PK values, the object can be updated in place. A bare save() would work just as currently changing the id field does -- create a new row if possible, otherwise, update the row whose PK matches the new values.
"""

IMHO forbidding creation of a new object while leaving the old object in place when calling save() is needed. Current behavior is unintuitive. One clear indication of this being unintuitive is that even Django's admin does not get it right. If bare save() will be deprecated, then an upgrade path for current uses is needed. A new kwarg for save (old_pk=None would be a possibility) or obj.clone() would be needed.

Solving all these problems before 1.4 seems hard.

- Anssi

Ian Clelland

unread,
Nov 30, 2011, 4:15:46 PM11/30/11
to django-d...@googlegroups.com
On Wed, Nov 30, 2011 at 12:40 PM, Kääriäinen Anssi <anssi.ka...@thl.fi> wrote:

"""
Is this referring exclusively to natural, or user-specified primary key columns? Despite Luke's reference to nullable primary keys (are these even allowed by SQL?), a common idiom for copying objects is this:
"""

Not allowed by SQL specification, but many databases do allow them (source wikipedia).

/me runs off to go correct Wikipedia ;) 

I checked the Wikipedia article on Primary Key first, and didn't see that, but I did note this:

A table can have at most one primary key, but more than one unique key. A primary key is a combination of columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have an implicit NOT NULL constraint while unique keys do not.


 A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently).
 

Technically, a primary key constraint is simply a combination of a unique constraint and a not-null constraint.


A primary key constraint combines a NOT NULL constraint and a unique constraint in a single declaration. That is, it prohibits multiple rows from having the same value in the same column or combination of columns and prohibits values from being null. 
 


"""
   obj.pk = None
   obj.save()

I have used use this pattern in more instances than I can remember; whether for duplicating objects, or for making new variants of existing objects. I would hate to see the behaviour deprecated, or worse, for the old object to simply get reassigned a new (or null) id.
"""

If nullable primary keys are going to be allowed, then the above can not work.

That would be a huge backwards compatibility issue (from my perspective, at least).
 
You would need to use NoneMarker in there, or .save() would need a kwarg for backwards compatibility mode. obj.clone() is still another possibility. Maybe nullable primary keys should be forbidden?

(I like that last one best :) )
 

"""
For changing natural primary key fields, I would prefer to see a pattern like this:

class User:
  firstname = models.CharField
  lastname = models.CharField
  pk = (firstname, lastname)

u = User.objects.get(firstname='Anssi', lastname='Kääriäinen')
u.firstname='Matti'
u.save(force_update=True)
"""

That is a possibility, although currently that has well defined meaning: try to update the object with pk ('Matti', 'Kääriäinen'), error if it does not exist in the DB.

That is also true, and makes the whole situation more difficult to resolve.
The trouble stems from the fact that once the user changes any of the primary key fields, we currently lose the ability to refer uniquely to the row in the database that represents "the current object". (This may actually go against one of Codd's rules, but then, the ORM isn't actually a database itself).

Originally, Django ORM objects had essentially no metadata to tie them to the database, once retrieved. They could be only associated with a particular row by their implicit id field, which made the force_insert and force_update (and the mess of logic in Model.save()) necessary, and made idioms like "clear the pk and save to copy" possible.

With multi-db, Django has moved part-way to marrying an ORM object with the actual database storage location -- objects have a hidden record of the database from which they were retrieved, but id is still just a mutable field like any other. It sounds like what is being proposed is that model instances should not only know which database, but which table row their backing data is stored in -- and that should persist despite any changes made in code to any field in the instance.

(I'm still not sure whether I should support or oppose this, but my knee-jerk reaction is "Hey! That's going to break half of my management commands!")
 

"""
(specifically, with the force_update parameter being required for a PK change). Then, as long as we store the original PK values, the object can be updated in place. A bare save() would work just as currently changing the id field does -- create a new row if possible, otherwise, update the row whose PK matches the new values.
"""

IMHO forbidding creation of a new object while leaving the old object in place when calling save() is needed. Current behavior is unintuitive. One clear indication of this being unintuitive is that even Django's admin does not get it right. If bare save() will be deprecated, then an upgrade path for current uses is needed. A new kwarg for save (old_pk=None would be a possibility) or obj.clone() would be needed.

I'm not sure that I agree -- I don't know if there needs to be a fundamental distinction between a new model instance and one that was retrieved from the database. I do agree that there should be a way to specify "change the primary key on this object" vs "save a new object with this primary key".

(And I'm waiting for someone to come in and say "The current behaviour is documented, leave it alone. Use filter(pk=old_value).update(pk=new_value) if you want to change a primary key. It's all in code anyway, and you should know which fields are PK fields before you go messing with them." :) )

Solving all these problems before 1.4 seems hard.

Agreed.
 

 - Anssi

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Ian Clelland

unread,
Nov 30, 2011, 4:19:45 PM11/30/11
to django-d...@googlegroups.com
On Wed, Nov 30, 2011 at 1:15 PM, Ian Clelland <clel...@gmail.com> wrote:
On Wed, Nov 30, 2011 at 12:40 PM, Kääriäinen Anssi <anssi.ka...@thl.fi> wrote:
"""
Is this referring exclusively to natural, or user-specified primary key columns? Despite Luke's reference to nullable primary keys (are these even allowed by SQL?), a common idiom for copying objects is this:
"""

Not allowed by SQL specification, but many databases do allow them (source wikipedia).

/me runs off to go correct Wikipedia ;) 

I checked the Wikipedia article on Primary Key first, and didn't see that, but I did note this:

A table can have at most one primary key, but more than one unique key. A primary key is a combination of columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have an implicit NOT NULL constraint while unique keys do not.


 A PRIMARY KEY is a unique index where all key columns must be defined as NOT NULL. If they are not explicitly declared as NOT NULL, MySQL declares them so implicitly (and silently).
 

Technically, a primary key constraint is simply a combination of a unique constraint and a not-null constraint.


A primary key constraint combines a NOT NULL constraint and a unique constraint in a single declaration. That is, it prohibits multiple rows from having the same value in the same column or combination of columns and prohibits values from being null. 
 

SQLite, of course, is special (http://www.sqlite.org/lang_createtable.html):

According to the SQL standard, PRIMARY KEY should always imply NOT NULL. Unfortunately, due to a long-standing coding oversight, this is not the case in SQLite. Unless the column is an INTEGER PRIMARY KEY SQLite allows NULL values in a PRIMARY KEY column. We could change SQLite to conform to the standard (and we might do so in the future), but by the time the oversight was discovered, SQLite was in such wide use that we feared breaking legacy code if we fixed the problem. So for now we have chosen to continue allowing NULLs in PRIMARY KEY columns.

But,
 
Developers should be aware, however, that we may change SQLite to conform to the SQL standard in future and should design new programs accordingly.

I would consider Django 1.4+ to fall under the "new programs" umbrella :)

Kääriäinen Anssi

unread,
Nov 30, 2011, 4:42:10 PM11/30/11
to django-d...@googlegroups.com
"""
/me runs off to go correct Wikipedia ;)

I checked the Wikipedia article on Primary Key first, and didn't see that, but I did note this:

A table can have at most one primary key, but more than one unique key. A primary key is a combination of columns which uniquely specify a row. It is a special case of unique keys. One difference is that primary keys have an implicit NOT NULL constraint while unique keys do not.
"""

I was confused by this sentance in the wikipedia article:
Note that some DBMS require explicitly marking primary-key columns as NOT NULL.

"""
I'm not sure that I agree -- I don't know if there needs to be a fundamental distinction between a new model instance and one that was retrieved from the database. I do agree that there should be a way to specify "change the primary key on this object" vs "save a new object with this primary key".
"""

The problem, as I see it, is that it is all too easy to do .save() and end up duplicates in the DB while the user expects an update of the PK. Django admin has currently exactly this problem. Currently this is not that big of an problem, as natural primary keys aren't common. You can update the PK with some trickery, but that is not what I try to solve. I try to just forbid the "whoops, created an duplicate by accident" problem. This needs the information about the "old_pk".

One nice little problem more: Multitable inheritance allows object with multiple primary keys...

class A(models.Model):
f1 = models.IntegerField(primary_key=True)

class B(A):
f2 = models.IntegerField(primary_key=True)

# Now, B's primary key is f2, but when saving B, the underlying A instance needs to be saved too, and its primary key is f1. So, in save B has effectively 2 primary keys.

b = B(f1=1, f2=1)
b.save()
B.objects.all()
[B obj: f1 = 1, f2 = 1]

b.f2 = 2
b.save()
IntegrityError (tries to save new B: f1=1, f2=2, but f1 needs to be unique)
b.f2 = 1
b.f1 = 2
b.save()
B.objects.all()
[B obj: f1 = 2, f2 = 1]
A.objects.all()
[A obj: f1 = 1, A obj: f1 = 2] # We got a new A obj, but no new B obj.

Add in multitable multi-inheritance... Making this work reliably in all situations seems complex. So, no simple solution in sight, and final nail for 1.4 inclusion.

- Anssi

# sidenote:
# try b.save() using postgresql again after the integrity error in the above example:
# DatabaseError: current transaction is aborted, commands ignored until end of transaction block
# connection.rollback()
# TransactionManagementError: This code isn't under transaction management
# Luckily the transaction is still rolled back :)

Luke Plant

unread,
Dec 1, 2011, 9:00:37 AM12/1/11
to django-d...@googlegroups.com
On 30/11/11 21:19, Ian Clelland wrote:

> I would consider Django 1.4+ to fall under the "new programs" umbrella :)

Agreed. Thanks for all that research. Since we currently don't support
nullable PKs (implicit in our 'pk is None' idiom), there is no need to
do so going forward, and that allows a simpler solution for #14615,
eliminating that bug from this discussion.

Thanks,

Luke

--
The fashion wears out more apparel than the man.
-- William Shakespeare

Luke Plant || http://lukeplant.me.uk/

Tomek Paczkowski

unread,
Dec 1, 2011, 10:26:24 AM12/1/11
to django-d...@googlegroups.com
Settings pk as None is recommended method of model cloning, as per:

Anssi Kääriäinen

unread,
Dec 1, 2011, 9:38:04 PM12/1/11
to Django developers
On Nov 30, 2:26 pm, Kääriäinen Anssi <anssi.kaariai...@thl.fi> wrote:
> I can think of two basic approaches to this: define a __setattr__ for Models,
> and check if the pk is set after fetch from DB. This has at least three
> problems:
>  1. It is likely that users have custom __setattr__ methods that do not use
> super().__setattr__ but change the dict directly.
>  2. This way it is somewhat hard to detect if the PK has actually changed
> or not. You can (and many users likely currently do) set the value to the same
> value it is already.
>  3. This will make model __init__ slower (although there are tricks to mitigate
> this effect).
>
> The other way is storing old_pk in model._state, and compare the PK to
> that when saving. If changed, error. This would work best if there was a
> NoneMarker object for the cases where there is no PK from DB, so you could
> solve #14615 easily, too.
>
> This could result in somewhat larger memory usage. Although normally you
> could store the same string (or other object) in db_pk as you store in the
> __dict__ of the model. This would mean minimal memory overhead unless
> you change a lot of PKs in one go. Are there problematic (mutable object
> based) model fields, where you would need to store a copy of the field's
> value? We could possibly have an attribute "mutable object based field" for
> the problematic fields...

I tried the second approach, and the mutable object based fields makes
it basically unworkable. The problematic case turned out to be
FileField. Now, that could be hacked around, but other inplace-mutable
fields do exist, at least in user code.

Unfortunately, the __setattr__ based method would have exactly the
same problem if Django tries to detect actual changes, not just
setting the attribute. When __setattr__ is called, the value in
__dict__ might have changed because it is already mutated in-place.

So, the following ideas for a way forward come to mind:
- ban in-place mutation of field values. Backwards incompatible. If
backwards compatibility would not be a problem, this restriction would
not be that bad - most common data types are not mutable (strings,
ints, floats, time types, Decimal). The actual value of FileField
isn't mutable (it is just a string) but FileField does some magic
behind the scenes which results in the problem. Fixing pk-change API
by introducing an API where you must remember to clone mutable values
isn't a good idea..
- Store hash-value and detect changes based on that. Performance
killer, possibility of hash collisions.
- Have the "I'm mutable" flag for fields. I don't know if this is
workable from performance standpoint. In practice the implementation
could be descriptor based for in-place mutable fields.
- Detect only __setattr__ calls without actual changed value
tracking. I don't think this is workable at all, as setting the pk to
its own value is probably something not that rare.
- Do nothing.

The bad part about do-nothing is that:
- I really do not like the API of .save() not changing the PK, but
instead duplicating the object. Maybe I am alone here.
- If Django would know which fields are changed, it would be
possible to have .save(only_changed=True). This would turn a save of
non-changed instance into a no-op, and for example allows using HOT
updates in PostgreSQL. Other DBs are likely to benefit from this, too.
A common use case this would benefit is data syncing scripts.

So, any other ideas forward? Detecting PK change seemed simple enough
to implement. Devil in the details, as usual :) It seems SQLAlchemy
does have a knowledge of which attributes are inplace-mutable and
dodges this issue that way. Not sure though, their codebase is bit too
complex to get a good grip by just a glance-over...

- Anssi

The testing patch can be found from
https://github.com/akaariai/django/compare/master...save_change_pk
The patch does updates of only changed fields, which is how the
FileField issue was found. model_fields is the failing test.

Michael Hudson-Doyle

unread,
Dec 1, 2011, 9:39:14 PM12/1/11
to django-d...@googlegroups.com
On 2 December 2011 15:38, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
> The bad part about do-nothing is that:
>  - I really do not like the API of .save() not changing the PK, but
> instead duplicating the object. Maybe I am alone here.

Definitely not.

Cheers,
mwh

Anssi Kääriäinen

unread,
Dec 2, 2011, 6:57:32 AM12/2/11
to Django developers
On Dec 2, 4:38 am, Anssi Kääriäinen <anssi.kaariai...@thl.fi> wrote:
>   - Have the "I'm mutable" flag for fields. I don't know if this is
> workable from performance standpoint. In practice the implementation
> could be descriptor based for in-place mutable fields.

I implemented POC of this and __setattr__ based way of traking
variable changes. I think the solution given in the linked github
branch could work. It is currently ugly, especially the save
signature, which has new flags compat_mode and only_changed (came
nearly free as side-product of this). The interaction between the
different flags is the ugliest part.

Now for the funny part of this. I suspected that __setattr__ would
make init slower. But using a little trick, __init__ is now actually
almost 30% faster for a 10 field model. The trick is checking if
setattr needs to be called. It needs to be called if the user has
defined __setattr__ for some subclass of Model, otherwise we can call
directly object.__setattr__. For some reason, this is considerably
faster than calling setattr(self, f.attname, val). I hope the approach
is acceptable. The same trick could be employed directly to trunk
version of Django, resulting in the same speedup.

The biggest problems I see with the above mentioned hack are that 1)
this is a hack, 2) this does not support dynamically changing
__setattr__. I would not worry about 2), that seems fragile as hell
anyways. This could be fixed if need be. 3) I do not know Python well
enough to say if this hack will skip some crucial thing setattr would
do. And 4) how does this interact with other versions of Python (pypy
for example). On the other hand this is just the normal Python
performance optimization: set commonly used attributes to local scope
and use them from there to avoid lookup overhead.

Without the hack, model __init__ will be up to 100% slower than
currently, so that hack is a requirement for __setattr__ being
acceptable.

Apart of the above mentioned problems there are some TODOs:
- I haven't tested MutableFieldDescriptor almost at all.
- Currently save() after changing PK (to other value than None) will
raise Exception, DeprecationWarning was the goal.
- Rethink the logic in save_base.
- Are there common situation where this leads to performance losses?
setting object attributes will be considerably slower.
- Get some feedback: is the approach taken is too crazy?
- Verify that the approach really works...

All tests do pass on sqlite3 (with 2 UnicodeWarnings).

Code is here: https://github.com/akaariai/django/compare/master...save_change_pk2

Maybe I should open a ticket for this and move this discussion to
trac?

- Anssi

Kääriäinen Anssi

unread,
Dec 2, 2011, 11:56:57 AM12/2/11
to django-d...@googlegroups.com
"""
Now for the funny part of this. I suspected that __setattr__ would
make init slower. But using a little trick, __init__ is now actually
almost 30% faster for a 10 field model. The trick is checking if
setattr needs to be called. It needs to be called if the user has
defined __setattr__ for some subclass of Model, otherwise we can call
directly object.__setattr__. For some reason, this is considerably
faster than calling setattr(self, f.attname, val). I hope the approach
is acceptable. The same trick could be employed directly to trunk
version of Django, resulting in the same speedup.
"""

Now I can't reproduce this speedup. I get pretty much the same speed on
master vs the __setattr__ trick. I don't know what I was doing wrong before,
I tested this literally tens of times trying to understand what is happening.
It is not that surprising that this speedup isn't real, as the speedup seemed
too good to be true.

So, forget about the above optimization for current Django trunk. However
it is still needed if the __setattr__ way of tracking attribute changes is going
to be used, as otherwise model __init__ will be much slower than currently.

I do understand if this is not wanted, as this adds some complexity and if
the only benefit is preventing accidental duplicates due to PK change,
it is questionable if it is worth it. However saving only changed attrs (and
skipping the save completely if there are no changes) could be nice in some
situations.

Maybe I should sleep a little before hacking more... :) Anyways it is probably
good to let this all wait until 1.5. There are pretty big design decisions here.

- Anssi

Ian Clelland

unread,
Dec 2, 2011, 12:06:56 PM12/2/11
to django-d...@googlegroups.com
On Thu, Dec 1, 2011 at 6:38 PM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
On Nov 30, 2:26 pm, Kääriäinen Anssi <anssi.kaariai...@thl.fi> wrote:
\> The other way is storing old_pk in model._state, and compare the PK to

> that when saving. If changed, error. This would work best if there was a
> NoneMarker object for the cases where there is no PK from DB, so you could
> solve #14615 easily, too.
 
I tried the second approach, and the mutable object based fields makes
it basically unworkable. The problematic case turned out to be
FileField. Now, that could be hacked around, but other inplace-mutable
fields do exist, at least in user code.

That's really too bad; I was hoping that that approach would work. (Also, I really hope nobody is using a FileField for a primary key ;) )

Is the problem here that we can't reliably tell whether the data that is going out to the DB layer has changed? I would think that no matter how the data is modified (in-place vs. setattr), that the one thing we could rely on, and the one thing that actually matters in this situation, is the serialised representation of the data. For a FileField, that would be the filesystem path (editing the file in place without changing the path wouldn't give you the duplication problems that you are having); for an IntegerField, it's just the number itself.

It should be the case that, no matter what sort of python magic a particular developer has added, it is equivalence at the SQL level that is causing problems. Maybe it's because I haven't tried to hack at this myself, but I can't see why storing a copy of the PK fields DB-representation on load, and checking them on save, isn't sufficient. There is a memory cost, but it should be small, unless you have very large fields for primary keys in your database, in which case you are already suffering from them, certainly :)


Kääriäinen Anssi

unread,
Dec 2, 2011, 8:54:11 PM12/2/11
to django-d...@googlegroups.com
"""
That's really too bad; I was hoping that that approach would work. (Also, I really hope nobody is using a FileField for a primary key ;) )

Is the problem here that we can't reliably tell whether the data that is going out to the DB layer has changed? I would think that no matter how the data is modified (in-place vs. setattr), that the one thing we could rely on, and the one thing that actually matters in this situation, is the serialised representation of the data. For a FileField, that would be the filesystem path (editing the file in place without changing the path wouldn't give you the duplication problems that you are having); for an IntegerField, it's just the number itself.

It should be the case that, no matter what sort of python magic a particular developer has added, it is equivalence at the SQL level that is causing problems. Maybe it's because I haven't tried to hack at this myself, but I can't see why storing a copy of the PK fields DB-representation on load, and checking them on save, isn't sufficient. There is a memory cost, but it should be small, unless you have very large fields for primary keys in your database, in which case you are already suffering from them, certainly :)
"""

Good idea. Maybe the best possibility for change tracking is offered by Field.value_to_string (returns string suitable for serialization). Using value_to_string the following would work:
- add a flag "is_immutable" to fields. If set, just put the DB value directly to _state.old_pk upon model initialization. If not set, call value_to_string, and store that in old_pk. I don't think there will be many PK fields which are mutable, and even if there are, the value_to_string trick should work.
- upon save, do the same again, if is_immutable is set, track changes by the actual attribute value, if not set, check value_to_string.

Getting the raw SQL string representation will be hard, for example PostgreSQL ListField would get the value from psycopg as a list, and would send it back to psycopg as a list. Maybe copy.copy() for the DB value would work.

I don't know how likely it is that people use FileFields, ListFields or other problematic cases as PK values. The easy way out would be to define that PK fields must be immutable, or the field must support change tracking itself by providing a suitable descriptor (Django could provide a base class). After that everything should be relatively easy. Come to think of it, mutable PK fields are probably pretty rare currently, as saving an object back to the DB after PK change might have some problems...

I think I will pursuit the immutable PK approach, and see how it works (back to square one). BTW are there -1 calls on this approach, or the pk change tracking in general?

- Anssi

Carl Meyer

unread,
Dec 2, 2011, 9:18:48 PM12/2/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi Anssi,

On 12/02/2011 06:54 PM, K��ri�inen Anssi wrote:
> I think I will pursuit the immutable PK approach, and see how it
> works (back to square one). BTW are there -1 calls on this approach,
> or the pk change tracking in general?

I haven't been fully following this thread, but I will say that I'm not
yet convinced that the ORM behavior should be changed such that saving
an instance with a modified PK updates the row rather than saving a new
instance. The two foremost concerns:

1. Backwards compatibility. I think this was mentioned, but I didn't see
a clear plan put forward. We certainly can't just break existing code
that is using PK-modification or PK-clearing as a cloning technique.

2. Foreign keys. Do you plan to introduce a full update-cascade
framework similar to the delete-cascade one, to handle the case where
there are other models with FKs pointing to the model whose PK you are
about to modify? How will you handle databases that aren't able to
suspend constraint checking until the end of a transaction?

I could be convinced otherwise, but at this point my feeling is that the
current behavior is a reasonably good fit for the character of the
Django ORM: simple and (not quite anymore, but still almost) stateless.
For better or worse, a PK is what uniquely identifies a model instance
as corresponding to a particular table row. If you change the PK, your
instance no longer corresponds to the same row.

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7ZhwgACgkQ8W4rlRKtE2ejtwCgkkH0+hX92NPQPFfhpJjkxB3B
UWEAn3ae30eKxRdkqWmqYYs4/ka4FY1G
=7VB1
-----END PGP SIGNATURE-----

Kääriäinen Anssi

unread,
Dec 2, 2011, 9:29:46 PM12/2/11
to django-d...@googlegroups.com
"""
On 12/02/2011 06:54 PM, Kääriäinen Anssi wrote:
> I think I will pursuit the immutable PK approach, and see how it
> works (back to square one). BTW are there -1 calls on this approach,
> or the pk change tracking in general?

I haven't been fully following this thread, but I will say that I'm not
yet convinced that the ORM behavior should be changed such that saving
an instance with a modified PK updates the row rather than saving a new
instance.
"""

At this point this is not the idea. The idea is to just disallow this (assuming
multicolumn PK firstname, lastname):

user = User(firstname = 'Jack', lastname = 'Smith')
user.save()
user.firstname = 'John'
user.save()

Current behavior will leave Jack Smith in the DB, and save John Smith as
new object. I my opinion it is too easy to clone the object accidentally.

The idea would be to raise exception from the second save (deprecation
warning at this point). A way to get the current behavior is needed too, but
the user should explicitly request that.

Later on, maybe there should be some way to actually update the PK. But
that is not the current plan.

- Anssi

Carl Meyer

unread,
Dec 2, 2011, 10:03:30 PM12/2/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/02/2011 07:29 PM, K��ri�inen Anssi wrote:
> At this point this is not the idea. The idea is to just disallow this (assuming
> multicolumn PK firstname, lastname):
>
> user = User(firstname = 'Jack', lastname = 'Smith')
> user.save()
> user.firstname = 'John'
> user.save()
>
> Current behavior will leave Jack Smith in the DB, and save John Smith as
> new object. I my opinion it is too easy to clone the object accidentally.

Did we gain support for multi-column PK recently and I missed it? Or is
this just a hypothetical example assuming such support is added in the
future?

> The idea would be to raise exception from the second save (deprecation
> warning at this point). A way to get the current behavior is needed too, but
> the user should explicitly request that.

Hmm, I'm not so sure this is really a problem, or enough of a problem to
warrant making people change currently-working code. If you use
non-sequence PKs (and usually you shouldn't), you should be aware of
what you're doing.

If the real problem is the way the admin exposes the possibility of
accidental cloning to end users, maybe we should be looking at a simpler
admin-level fix (like not making PK fields editable by default)?

Carl

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7ZkYIACgkQ8W4rlRKtE2fwGgCeOan3naNFeAddJHFCP0XGp7wh
LLgAn2g9MUPrafpE+h9yz4kWtQOnZht2
=ID4P
-----END PGP SIGNATURE-----

Anssi Kääriäinen

unread,
Dec 3, 2011, 4:13:12 AM12/3/11
to Django developers
On Dec 3, 5:03 am, Carl Meyer <c...@oddbird.net> wrote:
> Did we gain support for multi-column PK recently and I missed it? Or is
> this just a hypothetical example assuming such support is added in the
> future?

Yes, this is a hypothetical example, inspired by the GSoC work on
multicolumn primary keys [https://github.com/koniiiik/django].

This thread started with the multicolumn PK example, and the point of
the thread is that when Django gets multicolumn primary keys, the
current behavior is more likely to bite users. Doing the obvious thing
when trying to update the primary key will results in duplicates in
the DB. At the moment this is not that big of a problem, as natural
primary keys aren't that common.

> > The idea would be to raise exception from the second save (deprecation
> > warning at this point). A way to get the current behavior is needed too, but
> > the user should explicitly request that.
>
> Hmm, I'm not so sure this is really a problem, or enough of a problem to
> warrant making people change currently-working code. If you use
> non-sequence PKs (and usually you shouldn't), you should be aware of
> what you're doing.
>
> If the real problem is the way the admin exposes the possibility of
> accidental cloning to end users, maybe we should be looking at a simpler
> admin-level fix (like not making PK fields editable by default)?

Admin should be fixed [#2259]. Making PK fields non-editable in
ModelForms would be good, too. Is it OK to consider the current
ModelForm behavior a bug, or will it need a deprecation cycle?

In my opinion admin getting this wrong is a strong indication about
how non-intuitive the API is.

It is OK that Django doesn't support primary key updates. But if it
doesn't support update of PK, then lets error out on primary key
change. Current behavior of .save() is actually: "save the object to
DB, except when PK has changed, then do a clone". That is a bad API
for natural primary keys.

About breaking current code: my intention is to have a flag to .save()
which would allow current code to work. .save(clone=True) would get
current behavior back. Setting the PK to None and doing a save will
work for AutoField PKs.

I opened a ticket for this: https://code.djangoproject.com/ticket/17332

- Anssi

Carl Meyer

unread,
Dec 3, 2011, 12:18:53 PM12/3/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/03/2011 02:13 AM, Anssi K��ri�inen wrote:
> Admin should be fixed [#2259]. Making PK fields non-editable in
> ModelForms would be good, too. Is it OK to consider the current
> ModelForm behavior a bug, or will it need a deprecation cycle?

I think it would need a deprecation cycle. People certainly could be
making use of the current behavior (and there would need to be a way to
get the PK field back into the form, e.g. by explicitly listing it in
"fields"). Excluding PK from ModelForm by default isn't clearly fixing a
bug, it's more protecting people from unintuitive behavior.

(Note that in admin we have the option to display it read-only by
default; ModelForm has no such feature, we'd have to exclude it entirely
by default).

> In my opinion admin getting this wrong is a strong indication about
> how non-intuitive the API is.

That's a fair point.

> It is OK that Django doesn't support primary key updates. But if it
> doesn't support update of PK, then lets error out on primary key
> change. Current behavior of .save() is actually: "save the object to
> DB, except when PK has changed, then do a clone". That is a bad API
> for natural primary keys.
>
> About breaking current code: my intention is to have a flag to .save()
> which would allow current code to work. .save(clone=True) would get
> current behavior back. Setting the PK to None and doing a save will
> work for AutoField PKs.

So what about admin users who are currently relying on this behavior as
a way to clone objects with natural PKs (given that the
save-and-add-another button is useless with natural PKs unless you can
explicitly give the value for the new row)?

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7aWf0ACgkQ8W4rlRKtE2dOowCfRvvpxE3dCAou2Ag/NGzl1t94
8PQAn33d9xdfpEQfcORUAkMER+PEvh1B
=Rhc7
-----END PGP SIGNATURE-----

Carl Meyer

unread,
Dec 3, 2011, 12:20:51 PM12/3/11
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/03/2011 10:18 AM, Carl Meyer wrote:
> So what about admin users who are currently relying on this behavior as
> a way to clone objects with natural PKs (given that the
> save-and-add-another button is useless with natural PKs unless you can
> explicitly give the value for the new row)?

Er, I meant save-as-new, of course.

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk7aWnMACgkQ8W4rlRKtE2dI0QCgq3JDRwTGT6D2li3V9jw7IJuJ
F20AoJTKWP9vQuHNVzo7Us/Vw/mey5os
=om2k
-----END PGP SIGNATURE-----

Anssi Kääriäinen

unread,
Dec 3, 2011, 1:58:34 PM12/3/11
to Django developers
On Dec 3, 7:18 pm, Carl Meyer <c...@oddbird.net> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/03/2011 02:13 AM, Anssi K ri inen wrote:
>
> > Admin should be fixed [#2259]. Making PK fields non-editable in
> > ModelForms would be good, too. Is it OK to consider the current
> > ModelForm behavior a bug, or will it need a deprecation cycle?
>
> I think it would need a deprecation cycle. People certainly could be
> making use of the current behavior (and there would need to be a way to
> get the PK field back into the form, e.g. by explicitly listing it in
> "fields"). Excluding PK from ModelForm by default isn't clearly fixing a
> bug, it's more protecting people from unintuitive behavior.
>
> (Note that in admin we have the option to display it read-only by
> default; ModelForm has no such feature, we'd have to exclude it entirely
> by default).

This turns out to be a show-stopper. I think that in admin we could
just make the field non-editable when editing an object, but editable
when saving a new object. Save-as-new would be a problem, but probably
some workaround could be found.

But for ModelForms, what to do? You would want to have the same
behavior as in admin: editable when adding, but non-editable when
editing an existing object. But the problem is, we do not have the
ability to show the value as non-editable. Could this be implemented
by a html attribute editable="false"? Ugly, but should work for most
cases.

I do agree this needs a deprecation cycle.

> > It is OK that Django doesn't support primary key updates. But if it
> > doesn't support update of PK, then lets error out on primary key
> > change. Current behavior of .save() is actually: "save the object to
> > DB, except when PK has changed, then do a clone". That is a bad API
> > for natural primary keys.
>
> > About breaking current code: my intention is to have a flag to .save()
> > which would allow current code to work. .save(clone=True) would get
> > current behavior back. Setting the PK to None and doing a save will
> > work for AutoField PKs.
>
> So what about admin users who are currently relying on this behavior as
> a way to clone objects with natural PKs (given that the
> save-and-add-another button is useless with natural PKs unless you can
> explicitly give the value for the new row)?

Javascript magic? save-as-new disabled for natural PKs, instead you
get a "copy" button which opens another window? In other words, good
ideas welcome...

- Anssi

Stephen Burrows

unread,
Dec 5, 2011, 4:43:06 AM12/5/11
to Django developers
I've often longed for a "read-only" concept in ModelForms. For
example, if I want the "site" field to always have a certain value
(i.e. the current site) but also want to enforce multiple-column
uniqueness (i.e. slug/site). I can write it myself, but it always
feels like I shouldn't have to. Probably since the admin already
provides that functionality.

Point being, that's something that, if it is implemented, should
probably be implemented separately from the pk stuff which is
currently being discussed.

Personally, I haven't had much use for natural primary keys since I
started using Django. (Though probably that's because it's so easy in
django to use the AutoField pk and because so many things expect
integer pks.) Are there a lot of people who use them?

--Stephen

Anssi Kääriäinen

unread,
Dec 5, 2011, 5:50:13 AM12/5/11
to Django developers
On Dec 5, 11:43 am, Stephen Burrows <stephen.r.burr...@gmail.com>
wrote:

> I've often longed for a "read-only" concept in ModelForms. For
> example, if I want the "site" field to always have a certain value
> (i.e. the current site) but also want to enforce multiple-column
> uniqueness (i.e. slug/site). I can write it myself, but it always
> feels like I shouldn't have to. Probably since the admin already
> provides that functionality.

Me too. The good news is that it should be relatively easy to
implement this after template based form rendering is implemented. You
get the read-only widgets nearly for free. There is some added logic
in the forms itself to handle the case of "do not handle data for this
field", although that might boil down to "if field.read_only:
continue" in a couple of places.

The bad news is that template based rendering is blocked on faster
template system. A test I did some time ago showed nearly 10x slower
rendering in the worst case using template rendering compared to
current "render in Python" implementation. The funny thing was that
Jinja2 was actually _faster_ than current implementation, although
that might be because of different escaping and localization.

> Point being, that's something that, if it is implemented, should
> probably be implemented separately from the pk stuff which is
> currently being discussed.

Agreed.

> Personally, I haven't had much use for natural primary keys since I
> started using Django. (Though probably that's because it's so easy in
> django to use the AutoField pk and because so many things expect
> integer pks.) Are there a lot of people who use them?

There currently aren't that many users, and that is why the current
behaviour of save cloning the object isn't a big problem. This will
change if natural primary keys are included in Django. By definition
they are part of the data model, and in most cases subject to change.
Even if they are not subject to change, Django doesn't make it easy to
enforce that (no support for read-only form fields, and save doesn't
guard against accidental changes).

So, my opinion is:
- It is a good idea to block accidental PK changes in .save(). I
think I have a working solution in #17332
- It is not a good idea to do this before read-only form fields are
introduced.
- It is a good idea to have as little state as possible for model
instances. If first item is implemented, it unfortunately requires
tracking of PK state.
- It is a good idea to allow custom subclasses to track more state
if they wish to do so. The patch in 17332 should make this easier to
do (a little change needed, though: self._state = self._state_class()
instead of self._state = ModelState()).

In conclusion: I think the best thing to do is visit this subject
again during 1.5 development cycle.

- Anssi

Reply all
Reply to author
Forward
0 new messages