Why doesn't saving a related model update the _id field?

37 views
Skip to first unread message

Malcolm Box

unread,
Jun 8, 2014, 10:34:21 AM6/8/14
to django...@googlegroups.com
I'm confused by Django's behaviour when saving related models. Take for example:

class X(models.Model):
    pass

class Y(models.Model):
   x = models.ForeignKey(X)

Now if I create some objects (unsaved):

x = X()
y = Y(x=x)

All well so far. But now odd things happen when I save:

A) y.save() throws an integrity error because there's no PK for x

I kind of understand this, but it's not obvious to me why Django doesn't at least try to save the related object first

B) y.x.save(); y.save() also throws an integrity error because y.x_id is None. 

However, y.x.id is not None, so I don't understand why it can't update y.x_id (and thus make the save succeed).

C) y.x.save(); y.x = y.x; y.save() - succeeds, but I don't see why the y.x = y.x is needed.

Is this a deliberate design decision, something I'm misunderstanding, or a bug/implementation artefact?

I'm running into this with serialization in Django Rest Framework - my API provides a facade over something that's actually stored across two models, so when creating the resource I want to deserialise the data into the two related models. DRF serializers by default return unsaved versions of the model, but this is broken by the above.

Any insight into what's going on and why would be much appreciated.

Cheers,

Malcolm


Russell Keith-Magee

unread,
Jun 8, 2014, 10:08:21 PM6/8/14
to Django Users
On Sun, Jun 8, 2014 at 10:34 PM, Malcolm Box <mal...@tellybug.com> wrote:
I'm confused by Django's behaviour when saving related models. Take for example:

class X(models.Model):
    pass

class Y(models.Model):
   x = models.ForeignKey(X)

Now if I create some objects (unsaved):

x = X()
y = Y(x=x)

All well so far. But now odd things happen when I save:

A) y.save() throws an integrity error because there's no PK for x

I kind of understand this, but it's not obvious to me why Django doesn't at least try to save the related object first

Ok - so how does Django decide that the related object needs to be saved? 

If it saves all related objects, then saving one object could result in a save call being invoked on every object in the database (since y points to x, which points to a, which points to b,…). I hope we can agree that a cascading save like this would be a bad idea.

If it's not *every* related object, then we need to make a decision - which ones get saved? Ok - so lets say we just save the newly created objects (i.e., objects with no primary keys. 

That means that the following would work:

x = X(value=37)
y = Y(x=x)
y.save()

and on retrieval, y.x.value == 37. Sure - that makes sense. But what about:

x = X(value=37)
x.save()
x.value = 42
y = Y(x=x)
y.save()

and on retrieval, y.x.value == 37. Huh? Why? Oh - it's because in *that* case, x was already in existence, so it wasn't re-saved as a result of y being created. So now we've got inconsistent behaviour, depending on when save() has been called on an object.

The only way I can see to rectify *this* problem would be to keep a track of every value that has been modified, and save any "modified" objects. This is in the realm of the possible -- and it has been proposed in the past -- but it means carrying a lot of accounting baggage around on *every* attribute change. 
 
B) y.x.save(); y.save() also throws an integrity error because y.x_id is None. 

However, y.x.id is not None, so I don't understand why it can't update y.x_id (and thus make the save succeed).

C) y.x.save(); y.x = y.x; y.save() - succeeds, but I don't see why the y.x = y.x is needed.

Is this a deliberate design decision, something I'm misunderstanding, or a bug/implementation artefact?

It's a deliberate design decision, for reasons that my example above hopefully makes clear. The reason the re-assignment is needed in your example is because y.x implies a query; if you directly save the original object (i.e., x.save(), not y.x.save()), you should find the reassignment isn't needed.

I'm running into this with serialization in Django Rest Framework - my API provides a facade over something that's actually stored across two models, so when creating the resource I want to deserialise the data into the two related models. DRF serializers by default return unsaved versions of the model, but this is broken by the above.

Any insight into what's going on and why would be much appreciated.

Unfortunately, I don't have enough experience with DRF to suggest a solution here.

Yours,
Russ Magee %-)

Malcolm Box

unread,
Jun 10, 2014, 7:25:19 AM6/10/14
to django...@googlegroups.com
On Monday, 9 June 2014 03:08:21 UTC+1, Russell Keith-Magee wrote:
On Sun, Jun 8, 2014 at 10:34 PM, Malcolm Box <mal...@tellybug.com> wrote:
I'm confused by Django's behaviour when saving related models. Take for example:
<snip details of classes and saving behaviour>
I kind of understand this, but it's not obvious to me why Django doesn't at least try to save the related object first

Ok - so how does Django decide that the related object needs to be saved? 

If it saves all related objects, then saving one object could result in a save call being invoked on every object in the database (since y points to x, which points to a, which points to b,…). I hope we can agree that a cascading save like this would be a bad idea.

If it's not *every* related object, then we need to make a decision - which ones get saved? Ok - so lets say we just save the newly created objects (i.e., objects with no primary keys. 

That means that the following would work:

x = X(value=37)
y = Y(x=x)
y.save()

and on retrieval, y.x.value == 37. Sure - that makes sense. But what about:

x = X(value=37)
x.save()
x.value = 42
y = Y(x=x)
y.save()

and on retrieval, y.x.value == 37. Huh? Why? Oh - it's because in *that* case, x was already in existence, so it wasn't re-saved as a result of y being created. So now we've got inconsistent behaviour, depending on when save() has been called on an object.

Sure, that would be the side effect of assigning a pre-existing object. The use case I'm thinking of is creating an entire tree of un-saved objects in memory, and then having save() on the root Do The Right Thing. If that was inconsistent with assigning pre-existing objects, I could live with it.


The only way I can see to rectify *this* problem would be to keep a track of every value that has been modified, and save any "modified" objects. This is in the realm of the possible -- and it has been proposed in the past -- but it means carrying a lot of accounting baggage around on *every* attribute change. 

Surely the extra baggage is a modified flag, set on each object when any attribute is changed? Then  save() can "simply" follow relationships and save any modified objects.

 
B) y.x.save(); y.save() also throws an integrity error because y.x_id is None. 

However, y.x.id is not None, so I don't understand why it can't update y.x_id (and thus make the save succeed).

C) y.x.save(); y.x = y.x; y.save() - succeeds, but I don't see why the y.x = y.x is needed.

Is this a deliberate design decision, something I'm misunderstanding, or a bug/implementation artefact?

It's a deliberate design decision, for reasons that my example above hopefully makes clear. The reason the re-assignment is needed in your example is because y.x implies a query; if you directly save the original object (i.e., x.save(), not y.x.save()), you should find the reassignment isn't needed.

Ah, the issue I'm running into is that the point at which y.save() is called is separated from where y.x was assigned - so I no longer have a reference to the original object (except ... via y.x, so I guess I don't understand why that doesn't work the same as saving the original x...) 

Thanks for the explanation Russ, much appreciated.

Cheers,

Malcolm

Tom Evans

unread,
Jun 10, 2014, 11:49:51 AM6/10/14
to django...@googlegroups.com
I almost replied to Russell's reply to make this point explicitly
clear - BDFL advice tends to be fully accurate.

It is an error to assign an unsaved object as a relation of another
object (the saved state of that object is not relevant) for the
reasons that Russell explained. I don't know why it does not raise a
runtime error at this point, it would be possible.

Once you have assigned an unsaved object, behaviour is undefined - the
behaviour is only defined when saved objects are assigned as a
relation - and therefore anything can happen after that point. When
you have undefined behaviour, DTRT is not possible (if you can do the
right thing, you can define the behaviour).

That makes trees of unsaved related objects right out!

Cheers

Tom
Reply all
Reply to author
Forward
0 new messages