reconsider re-opening ticket 901

392 views
Skip to first unread message

Wim Feijen

unread,
May 11, 2013, 4:03:20 AM5/11/13
to django-d...@googlegroups.com
Hi,

Following up on the discussion on:

I'd like to start a clear discussion about re-opening ticket 901.

Ticket 901 was marked 7 years ago as won't fix and has been subject of debate since.

I must say that I for myself was looking for such a method until I found that foo = Foo.objects.get(id=foo.id) would work; which did not seem obvious to me then and now.

Brandon put forward an argument, which is: 
reload() would cover an important use case that did not get a hearing in the above discussion: that you might already have handed the object off to some other library, which is keeping a reference from somewhere deep in its data structures for use when you make a final call to it. If one or more data structures that are being built already hold the object, then it is not reasonable to have to track down the object and try to swap in a new copy everywhere. There should be a simple way to have the existing object's attributes update in-place.

and Anssi thinks it is a good idea to implement this.

My question is, can we please re-open this ticket and/or discuss here what are the benefits and drawbacks of having a method reload() ?

For clarity, this question is in no way meant to be offensive. And if you want to discuss personal views about this ticket or the behaviour of django community in general, I politely request you to do so at: https://groups.google.com/forum/?fromgroups=#!topic/django-developers/DUQtBrM2iTs

Best regards, Wim

Anssi Kääriäinen

unread,
May 11, 2013, 7:09:00 AM5/11/13
to Django developers
On 11 touko, 11:03, Wim Feijen <w...@go2people.nl> wrote:
> Hi,
>
> Following up on the discussion on:https://groups.google.com/forum/?fromgroups=#!topic/django-developers...
>
> I'd like to start a clear discussion about re-opening ticket 901.https://code.djangoproject.com/ticket/901
>
> Ticket 901 was marked 7 years ago as won't fix and has been subject of
> debate since.
>
> I must say that I for myself was looking for such a method until I found
> that foo = Foo.objects.get(id=foo.id) would work; which did not seem
> obvious to me then and now.
>
> Brandon put forward an argument, which is:
> reload() would cover an important use case that did not get a hearing in
> the above discussion: that you might already have handed the object off to
> some other library, which is keeping a reference from somewhere deep in its
> data structures for use when you make a final call to it. If one or more
> data structures that are being built already hold the object, then it is
> not reasonable to have to track down the object and try to swap in a new
> copy everywhere. There should be a simple way to have the existing object's
> attributes update in-place.
>
> and Anssi thinks it is a good idea to implement this.

The reasons why I think a reload() or refresh() method is a good idea:
- The above mentioned fact that if foo is already in data structure,
then you can't just reassign to foo. This is more and more an issue in
Django itself due to caching of related objects and
prefetch_related().
- If you are in some of the model's methods you can't do self =
qs.get(pk=self.pk), more generally reassignment in any method/function
will not refresh the object in the caller of the method/function.
- We could implement deferred loading using reload/refresh(), and
thus allow for a way to customize how deferred loading happens. An
example is the idea of loading all fields at a time instead of the
current default of loading one deferred field at a time.
- This could also work as a way for allowing related objects cache
invalidation. The related objects are cached on first access, and
after that you don't have any way to force reload of the cached
object. (Maybe separate issue to be dealt with later on. But refresh()
as a name, and then eventually adding cached object invalidation seems
like a good idea to me.)
- While it might seem trivial to implement reload()/refresh() when
needed, there are some bugs waiting to happen:
- Not going through setattr() - if you assign to self.__dict__
directly you might miss __setattr__ or a descriptor's __set__ (this is
a problem for custom fields defining to_python() for example).
- Reloading fields for deferred models with qs.values() doesn't
work correctly. Django uses descriptors for both to_python()
conversion and for deferred fields implementation, so if you use
values() to_python() will not be called for deferred fields as the
deferred descriptor overrides the normal descriptor. See SubfieldBase
or FileField for problematic cases.
- Not handling field.attname/field.name distinction correctly. For
example consider self.user.pk vs self.user_id. If you reload user_id,
but don't invalidate self.user the values will not necessarily match.

> My question is, can we please re-open this ticket and/or discuss here what
> are the benefits and drawbacks of having a method reload() ?

Benefits mentioned above, the only drawback I can see is the addition
of one more method to Model API. So, +1 from me to the feature.

- Anssi

Shai Berger

unread,
May 11, 2013, 5:46:00 PM5/11/13
to django-d...@googlegroups.com
On Saturday 11 May 2013, Anssi Kääriäinen wrote:
>
> - While it might seem trivial to implement reload()/refresh() when
> needed,

Indeed, it might; which is why I would assume many people have already done
it. For these people, a new reload()/refresh() in core may go unnoticed or
redundantly overridden. In the "test discovery" thread, Russel has mentioned
an "upgrade checks" proposal[0] -- if it comes to fruition, and this ticket is
reopened, I suggest that a check for pre-existing reoad/refresh methods on
models be included in the upgrade check.

Shai.

Shai Berger

unread,
May 11, 2013, 5:49:02 PM5/11/13
to django-d...@googlegroups.com
[resend, reference included; sorrt for the noise]
[0] https://groups.google.com/d/msg/django-developers/0xuznNH7aP0/CCqyDTpWSj4J

Russell Keith-Magee

unread,
May 11, 2013, 7:55:45 PM5/11/13
to django-d...@googlegroups.com
On Sat, May 11, 2013 at 4:03 PM, Wim Feijen <w...@go2people.nl> wrote:
Hi,

Following up on the discussion on:

I'd like to start a clear discussion about re-opening ticket 901.

Ticket 901 was marked 7 years ago as won't fix and has been subject of debate since.

I must say that I for myself was looking for such a method until I found that foo = Foo.objects.get(id=foo.id) would work; which did not seem obvious to me then and now.

The existence of this workaround is probably the single biggest technical reason that this ticket hasn't made any significant progress.
 
Brandon put forward an argument, which is: 
reload() would cover an important use case that did not get a hearing in the above discussion: that you might already have handed the object off to some other library, which is keeping a reference from somewhere deep in its data structures for use when you make a final call to it. If one or more data structures that are being built already hold the object, then it is not reasonable to have to track down the object and try to swap in a new copy everywhere. There should be a simple way to have the existing object's attributes update in-place.

I'm sure I understand this argument. Python objects are passed around by reference, not by value, so if you've passed in a Django object deep into another library, that library will be pointing at the same instance. If the instance is changed, everywhere holding a handle to that reference will be updated.

To that end - I want to make sure that we're clear about what we're talking about here. 

What is on the table is essentially adding a refresh() call on an object instance that is an API analog of ".get(id=self.id)" (although the implementation will need to do a bit more than that).

We are *not* talking about ticket #17, object instance caching. Calling refresh() *will not* update *every* instance of a given object. The following would be the effective API:

>>> a = MyObj.objects.get(id=42)
>>> a.message = 'hello'
>>> a.save()
>>> b = MyObj.objects.get(id=42)
>>> c = MyObj.objects.get(id=42)

>>> a.message = 'goodbye'
>>> a.save()
>>> print b.message
'hello'
>>> b.refresh()
>>> print b.message
'goodbye'
>>> print c.message
'hello'

The only place I've had use for this sort of functionality is in test suites, where the test setUp configures some objects, and then a test of a view modifies that object. As a result of calling the view, the database value is changed, but the object held by the test isn't modified in place. In this situation, a call to refresh() would be a nice convenience. However, it's a minor convenience; the "get by pk" trick works fine everywhere I've had this problem.

and Anssi thinks it is a good idea to implement this.

My question is, can we please re-open this ticket and/or discuss here what are the benefits and drawbacks of having a method reload() ?

For clarity, this question is in no way meant to be offensive. And if you want to discuss personal views about this ticket or the behaviour of django community in general, I politely request you to do so at: https://groups.google.com/forum/?fromgroups=#!topic/django-developers/DUQtBrM2iTs

Wim - You've done exactly what the core team has asked - you've started a discussion. We're not opposed to discussion, and we're not opposed to saying we're wrong if someone can make a strong case. Nothing you've done or said here is offensive. 

Yours,
Russ Magee %-)
 

Andrew Ingram

unread,
May 11, 2013, 8:10:48 PM5/11/13
to django-d...@googlegroups.com
On 12 May 2013, at 00:55, Russell Keith-Magee <rus...@keith-magee.com> wrote:
I'm sure I understand this argument. Python objects are passed around by reference, not by value, so if you've passed in a Django object deep into another library, that library will be pointing at the same instance. If the instance is changed, everywhere holding a handle to that reference will be updated.

To that end - I want to make sure that we're clear about what we're talking about here. 

What is on the table is essentially adding a refresh() call on an object instance that is an API analog of ".get(id=self.id)" (although the implementation will need to do a bit more than that).

We are *not* talking about ticket #17, object instance caching. Calling refresh() *will not* update *every* instance of a given object. The following would be the effective API:

>>> a = MyObj.objects.get(id=42)
>>> a.message = 'hello'
>>> a.save()
>>> b = MyObj.objects.get(id=42)
>>> c = MyObj.objects.get(id=42)

>>> a.message = 'goodbye'
>>> a.save()
>>> print b.message
'hello'
>>> b.refresh()
>>> print b.message
'goodbye'
>>> print c.message
'hello'

I was under the impression that the suggested use case was more like this:

>>> a = MyObj.objects.get(id=42)
>>> a.message = 'hello'
>>> b = {'obj': a}
>>> print b['obj'].message
'hello'
>>> c = MyObj.objects.get(id=42)
>>> c.message = 'goodbye'
>>> c.save()
>>> a = MyObj.objects.get(id=42)  # existing 'workaround', gives us a new instance
>>> print a.message
'goodbye'
>>> print b['obj'].message  # b['obj'] still points to the original instance `a`
'hello'

If a reload() method existed, `a` could have been reloaded with `a.reload()`, and then `b['obj']` would have been up-to-date as well. As it stands, every reference to `a` has to be explicitly re-fetched from the database.

Am I missing something obvious?


- Andy

Alex Ogier

unread,
May 11, 2013, 9:02:26 PM5/11/13
to django-d...@googlegroups.com
On Sat, May 11, 2013 at 7:55 PM, Russell Keith-Magee <rus...@keith-magee.com> wrote:
I'm sure I understand this argument. Python objects are passed around by reference, not by value, so if you've passed in a Django object deep into another library, that library will be pointing at the same instance. If the instance is changed, everywhere holding a handle to that reference will be updated.

Yes, but that is not what foo = MyObj.objects.get(id=foo.id) does. That assigns a new reference to a new object to the name foo. Other references, even references to the same instance as foo, are not refreshed. This is precisely why a .reload() method is useful -- what is wanted is to mutate the instance that is referred to, not to create a new reference and assign it to the old name.

To that end - I want to make sure that we're clear about what we're talking about here. 

What is on the table is essentially adding a refresh() call on an object instance that is an API analog of ".get(id=self.id)" (although the implementation will need to do a bit more than that).

It's not really the same at all, .reload() or .refresh() would the existing instance in place. It's more akin to foo.__dict__ = MyObj.objects.get(id=foo.id).__dict__

Best,
Alex Ogier

Anssi Kääriäinen

unread,
May 11, 2013, 9:36:00 PM5/11/13
to Django developers
On 12 touko, 02:55, Russell Keith-Magee <russ...@keith-magee.com>
wrote:
> To that end - I want to make sure that we're clear about what we're talking
> about here.
>
> What is on the table is essentially adding a refresh() call on an object
> instance that is an API analog of ".get(id=self.id)" (although the
> implementation will need to do a bit more than that).

Concrete API proposal: Model.refresh() reloads all non-deferred local
field values (that is, all fields in the current model which have a
database column). In addition refresh() will make sure that cached
values dependent of the reloaded values will be cleared. That is, if
you do foo.refresh(), then when foo.user_id is reloaded foo.user will
be cleared (assuming the reloaded user_id is different from the
original value). This is to ensure that next access to foo.user will
reload the related object, too.

The refresh() method makes sure descriptors are honored, for example
custom field's to_python() method will be called for refreshed values.

The refresh() method accepts *args, so that one can specify which
fields to reload. This is useful so that deferred attribute loading
can use refresh(), and by overriding refresh() it will be possible to
customize how deferred loading happens.

Deferred field loading is altered to use .refresh().

It might be useful to add model.deferred_fields property. The property
returns a set of currently deferred fields in the instance. If this
was available it would be easy to force reload of all currently
deferred fields, but no reload of other fields. This would also be
useful for loading all deferred fields when any deferred field is
accessed. But this can wait for later...

Does the proposal look acceptable?

- Anssi

Shai Berger

unread,
May 12, 2013, 4:24:53 AM5/12/13
to django-d...@googlegroups.com
Hi,

There's one minor issue that I'm not entirely clear of with this proposal:

On Sunday 12 May 2013, Anssi Kääriäinen wrote:
>
> Concrete API proposal: Model.refresh() reloads all non-deferred local
> field values (that is, all fields in the current model which have a
> database column). In addition refresh() will make sure that cached
> values dependent of the reloaded values will be cleared. That is, if
> you do foo.refresh(), then when foo.user_id is reloaded foo.user will
> be cleared (assuming the reloaded user_id is different from the
> original value). This is to ensure that next access to foo.user will
> reload the related object, too.
>

The limitation (assuming the reloaded user_id is different from the
original value) does make sense; removing it also makes sense -- then, after
foo.refresh(), foo.user will always get the updated user object (and always
incur a database hit).

Relatedly, we now cache back-references of 1-to-1 relations on the instance;
those should probably be updated (on remote objects!) too, or else some
serious inconsistencies may be created (when the _id field changes, that is).

Other than that, +1.

Shai.

Aymeric Augustin

unread,
May 12, 2013, 12:15:22 PM5/12/13
to django-d...@googlegroups.com
On 12 mai 2013, at 03:36, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:

> Concrete API proposal: Model.refresh() reloads all non-deferred local
> field values (that is, all fields in the current model which have a
> database column). In addition refresh() will make sure that cached
> values dependent of the reloaded values will be cleared. That is, if
> you do foo.refresh(), then when foo.user_id is reloaded foo.user will
> be cleared (assuming the reloaded user_id is different from the
> original value). This is to ensure that next access to foo.user will
> reload the related object, too.
>
> The refresh() method makes sure descriptors are honored, for example
> custom field's to_python() method will be called for refreshed values.

That sounds good to me.

> The refresh() method accepts *args, so that one can specify which
> fields to reload. This is useful so that deferred attribute loading
> can use refresh(), and by overriding refresh() it will be possible to
> customize how deferred loading happens.
>
> Deferred field loading is altered to use .refresh().

I'm not totally following you here. I suppose it would make sense if I were
more familiar with the implementation.

> It might be useful to add model.deferred_fields property. The property
> returns a set of currently deferred fields in the instance. If this
> was available it would be easy to force reload of all currently
> deferred fields, but no reload of other fields. This would also be
> useful for loading all deferred fields when any deferred field is
> accessed. But this can wait for later…

Yes, that would be an internal API anyway.

> Does the proposal look acceptable?

Yes.

My only objection to this ticket was the lack of a concrete proposal!

--
Aymeric.

Aymeric Augustin

unread,
May 12, 2013, 12:27:37 PM5/12/13
to django-d...@googlegroups.com
On 12 mai 2013, at 10:24, Shai Berger <sh...@platonix.com> wrote:

> Relatedly, we now cache back-references of 1-to-1 relations on the instance

Django has cached them for a long time. It's just a bit more efficient and
deterministic as of Django 1.5. :)

> those should probably be updated (on remote objects!) too, or else some
> serious inconsistencies may be created (when the _id field changes, that is).

Let's say model A has a one-to-one relation to model B — ie. a has a b_id.
The reverse relation is b.a, and you're discussing b.refresh().

Since b will be refreshed in-place, a.b will point to an updated instance of
b, without any special handling.

For consistency, the cached value of b.a could be cleared as well. However, it
doesn't really depend on b itself. It only depends on the PK, and PKs are
immutable (changing the PK makes a copy). If it turns out to be easier not to
clear that value, I'll happily leave this decision to the person who
implements the patch!

--
Aymeric.



Shai Berger

unread,
May 12, 2013, 1:10:10 PM5/12/13
to django-d...@googlegroups.com
On Sunday 12 May 2013, Aymeric Augustin wrote:
> On 12 mai 2013, at 10:24, Shai Berger <sh...@platonix.com> wrote:
> > Relatedly, we now cache back-references of 1-to-1 relations on the
> > instance
>
> Django has cached them for a long time. It's just a bit more efficient and
> deterministic as of Django 1.5. :)
>
> > those should probably be updated (on remote objects!) too, or else some
> > serious inconsistencies may be created (when the _id field changes, that
> > is).
>
> Let's say model A has a one-to-one relation to model B — ie. a has a b_id.
> The reverse relation is b.a, and you're discussing b.refresh().
>

No, I mean:

aa= A.objects.get(...)
bb= aa.b
(some other operation, changing aa.b_id in the database)
aa.refresh()

Now -- unless we do something about it -- bb.a is still aa, but aa.b is not bb
and not even equal to it; thus, "bb.a.b == bb" is false, which currently
cannot happen without serious meddling if I'm not mistaken.

Shai.

Anssi Kääriäinen

unread,
May 12, 2013, 1:27:50 PM5/12/13
to Django developers
On 12 touko, 19:15, Aymeric Augustin
<aymeric.augus...@polytechnique.org> wrote:
> > The refresh() method accepts *args, so that one can specify which
> > fields to reload. This is useful so that deferred attribute loading
> > can use refresh(), and by overriding refresh() it will be possible to
> > customize how deferred loading happens.
>
> > Deferred field loading is altered to use .refresh().
>
> I'm not totally following you here. I suppose it would make sense if I were
> more familiar with the implementation.

The deferred attribute implementation is as follows: all deferred
attributes have a DeferredAttribute descriptor (found from
models.query_utils IIRC), so on access of foo.somefield the
descriptor's __get__() is called. The __get__() checks if the value is
already found from the instance, if not it ends up fetching the value
from DB. The fetching of the value is equivalent to refresh() of
somefield (but of course not equivalent to full refresh).

So, if you override refresh() to do this:

def refresh(self, *fields):
if set(fields).intersects(self.deferred_fields):
fields.extend(self.deferred_fields)
super().refresh(*fields)

Now, any time any deferred field is accessed (or refreshed manually),
then all of the deferred attributes are loaded in one go. This will
give a solution to Adrian's request of loading all deferred fields in
one go when one is accessed. Granted, there is no declarative way to
do this, nor is there per-queryset API. I think that cases where there
is need to customize deferred attribute loading are rare, so there is
no need for dedicated API.

In addition the DeferredAttribute implementation will be a little more
DRY.

- Anssi

Jacob Kaplan-Moss

unread,
May 12, 2013, 3:54:41 PM5/12/13
to django-developers
On Sat, May 11, 2013 at 6:36 PM, Anssi Kääriäinen
<anssi.ka...@thl.fi> wrote:
> Does the proposal look acceptable?

Yes. +1 from me.

Jacob

Tim Chase

unread,
May 12, 2013, 4:13:25 PM5/12/13
to django-d...@googlegroups.com, anssi.ka...@thl.fi
On 2013-05-11 18:36, Anssi Kääriäinen wrote:
> On 12 touko, 02:55, Russell Keith-Magee <russ...@keith-magee.com>
> > What is on the table is essentially adding a refresh() call on an
> > object instance that is an API analog of
> > ".get(id=self.id)"

I guess my minor quibble is about the name itself and possible
clashes with existing fields/methods:

class MyModelA(Model):
# ...
refresh = BooleanField(...)
# ...
class MyModelB(Model):
# ...
def refresh(...): do_something_refreshing()
# ...

a = MyModelA.objects.get(pk=some_id)
b = MyModelB.objects.get(pk=other_id)
# ...
if a.refresh: # legacy code expects a BooleanField
# whoops, what happens here under this proposal?
if b.refresh(): # legacy code expects local logic, not Django logic
# or what gets called here?

I wouldn't want to see any breakage when upgrading between versions.
I don't have a good proposal, other than perhaps _refresh() or
__refresh__(), or cramming it into Meta (I don't know if that would
even work).

-tkc



Luke Sneeringer

unread,
May 12, 2013, 4:35:51 PM5/12/13
to django-d...@googlegroups.com, django-d...@googlegroups.com, anssi.ka...@thl.fi
In MyModelB, at least, the subclass' refresh method would win, since it's the subclass.

I'm not sure about MyModelA, since I am not quite sure how the metaclass' processing would intersect.

That said, if there's demand for the feature, it's probably worth this cost. (Bias: I would use it if it existed, although my use case differs from the discussion here.)

Sent from my iPad
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Russell Keith-Magee

unread,
May 12, 2013, 10:54:32 PM5/12/13
to django-d...@googlegroups.com
On Sun, May 12, 2013 at 9:02 AM, Alex Ogier <alex....@gmail.com> wrote:
On Sat, May 11, 2013 at 7:55 PM, Russell Keith-Magee <rus...@keith-magee.com> wrote:
I'm sure I understand this argument. Python objects are passed around by reference, not by value, so if you've passed in a Django object deep into another library, that library will be pointing at the same instance. If the instance is changed, everywhere holding a handle to that reference will be updated.

Yes, but that is not what foo = MyObj.objects.get(id=foo.id) does. That assigns a new reference to a new object to the name foo. Other references, even references to the same instance as foo, are not refreshed. This is precisely why a .reload() method is useful -- what is wanted is to mutate the instance that is referred to, not to create a new reference and assign it to the old name.

Understood - I was just using ".get(id=self.id)" as a shorthand for what the underlying method will effectively be doing. I know the implementation will be a bit more subtle than that.
 
Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
May 12, 2013, 11:00:47 PM5/12/13
to django-d...@googlegroups.com
+1 from me. 

The use case for .refresh(*args) being used to pull back multiple deferred fields is the real deal-maker for me, because it dovetails really nicely with something Adrian posted about a month ago:

 
It seems to me that ".refresh()" is remarkably close to the "get everything that was deferred" behaviour that Adrian was looking to add. The fact that it can also be used to update an instance to the current database state is a nice side effect in this case.

Russ %-)

Chris Wilson

unread,
May 12, 2013, 3:05:01 PM5/12/13
to django-d...@googlegroups.com
Hi all,

On Sun, 12 May 2013, Shai Berger wrote:

>>> those should probably be updated (on remote objects!) too, or else
>>> some serious inconsistencies may be created (when the _id field
>>> changes, that is).
>
> aa= A.objects.get(...)
> bb= aa.b
> (some other operation, changing aa.b_id in the database)
> aa.refresh()
>
> Now -- unless we do something about it -- bb.a is still aa, but aa.b is not bb
> and not even equal to it; thus, "bb.a.b == bb" is false, which currently
> cannot happen without serious meddling if I'm not mistaken.

Hibernate (as an implementation of the Java Persistence API, JPA) tries to
do things like this by implementing a global cache of objects loaded from
the database, so that there's only one copy of A and B in memory, which
all references point to.

This makes it possible for A, when its link to B changes to point to a
different object B', to find all the instances of B in memory (because
there's only one, and it's in the object cache) and update them to no
longer point back to A.

However it does this at enormous cost in flexibility, sanity and error
recovery. I strongly recommend against going down this route:
http://blog.old.aptivate.org/2010/11/26/hibernate-ejb-and-the-unique-constraint/

It's always possible to have an out-of-date model object in memory with
Django, with or without refresh() (which just allows us to update it
manually). I suggest we let that sleeping dog lie.

Cheers, Chris.
--
Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838
Future Business, Cam City FC, Milton Rd, Cambridge, CB4 1UY, UK

Aptivate is a not-for-profit company registered in England and Wales
with company number 04980791.

Alex Ogier

unread,
May 13, 2013, 12:34:08 PM5/13/13
to django-d...@googlegroups.com
On Sun, May 12, 2013 at 3:05 PM, Chris Wilson <ch...@aptivate.org> wrote:
Hi all,


On Sun, 12 May 2013, Shai Berger wrote:

those should probably be updated (on remote objects!) too, or else some serious inconsistencies may be created (when the _id field changes, that is).

aa= A.objects.get(...)
bb= aa.b
(some other operation, changing aa.b_id in the database)
aa.refresh()

Now -- unless we do something about it -- bb.a is still aa, but aa.b is not bb
and not even equal to it; thus, "bb.a.b == bb" is false, which currently
cannot happen without serious meddling if I'm not mistaken.

Hibernate (as an implementation of the Java Persistence API, JPA) tries to do things like this by implementing a global cache of objects loaded from the database, so that there's only one copy of A and B in memory, which all references point to.

This makes it possible for A, when its link to B changes to point to a different object B', to find all the instances of B in memory (because there's only one, and it's in the object cache) and update them to no longer point back to A.

However it does this at enormous cost in flexibility, sanity and error recovery. I strongly recommend against going down this route:
http://blog.old.aptivate.org/2010/11/26/hibernate-ejb-and-the-unique-constraint/

It's always possible to have an out-of-date model object in memory with Django, with or without refresh() (which just allows us to update it manually). I suggest we let that sleeping dog lie.

It does make sense to do this though, in the particular case of one-to-one relationships (for the same reasons it makes sense to cache back-references in one-to-one relationships). The reason is that this maintains a python-only invariant that is independent of whether the relationship is persistent in the database. There's no attempt to update all model instances of the referred-to model, only the single one that is currently referred to by the Python instance. This is consistent with the behavior of setting a OneToOneField to None. For example consider the following:

bb = B()
bb.save()
aa = A(b=bb)
aa.save()
assert aa.b is bb
assert bb.a is aa   # back-reference is cached

aa.b = None   # back-reference is cleared
assert aa.b is None
assert bb.a is None

This should be consistent with the behavior when .refresh() is called -- if that involves clearing aa.b, then bb.a should also be cleared, it's that simple. Ideally this is done by just setting all fields via setattr(), which will trigger this code (and any arbitrary code for user-defined fields Django doesn't know about). In short the following should be equivalent to the previous example:

bb = B()
bb.save()
aa = A(b=bb)
aa.save()
assert aa.b is bb
assert bb.a is aa   # back-reference is cached

aaa = A.objects.get(id=aa.id)
aaa.b = None
aaa.save()

aa.refresh()   # back-reference is cleared
assert aa.b is None
assert bb.a is None

Hopefully that clears things up.

Best,
Alex Ogier

Russell Keith-Magee

unread,
May 14, 2013, 12:09:04 AM5/14/13
to django-d...@googlegroups.com
As far as I can make out, that's exactly the same use case -- you're just making the example a little bit more explicit regarding the fact that 'a' is an object referred to by reference, so if 'a' is updated, b['a'] will also be updated.

The point I was trying to make is that we're *not* talking about every instance of "object with PK 42" being transparently updated. It's an explicit API call on a specific object instance.

Russ %-)

Alex Ogier

unread,
May 14, 2013, 6:15:46 AM5/14/13
to django-d...@googlegroups.com
On Tue, May 14, 2013 at 12:09 AM, Russell Keith-Magee <rus...@keith-magee.com> wrote:
As far as I can make out, that's exactly the same use case -- you're just making the example a little bit more explicit regarding the fact that 'a' is an object referred to by reference, so if 'a' is updated, b['a'] will also be updated.

The point I was trying to make is that we're *not* talking about every instance of "object with PK 42" being transparently updated. It's an explicit API call on a specific object instance.

I think what he is saying is that this is not just shorthand for some other way of achieving the same behavior. It's a totally new behavior that has plenty of corner cases such as foreign keys, and especially OneToOneFields.

This is the shortest workaround I can come up with for the .refresh() method: "a.__dict__ = dict(a.__dict__).update(type(a).objects.get(**{a._meta.pk.name: a.pk}).__dict__)". Unfortunately, that clobbers any non-field attributes you've changed since you created the instance (though it preserves new ones). Alternatively, you could iterate over Meta.fields calling "setattr(old_a, field_name, getattr(new_a, field_name))" for each one, but that has the problem that field setters may expect to do some massaging of the data on the way in which can cause data corruption. Basically this is a Hard Problem(tm) with no obvious workaround.

Best,
Alex Ogier

Shai Berger

unread,
May 14, 2013, 7:38:33 AM5/14/13
to django-d...@googlegroups.com
On Tuesday 14 May 2013, Alex Ogier wrote:
>
> It's a totally new behavior that has
> plenty of corner cases such as foreign keys, and especially OneToOneFields.
>
Another one is initializers: get() or any other method of fetching an object
from the database will call __init__() with the fields as *args; this allows a
model to manipulate these fields or calculate derived fields or whatever.
refresh() may put such objects in an inconsistent state by default -- to
prevent this, we would need a rule along the lines of "every model that
overrides __init__() also needs to override refresh()".

Shai.

Alex Ogier

unread,
May 14, 2013, 10:43:23 AM5/14/13
to django-d...@googlegroups.com
There's also a systemic problem with any field that contains a python reference. What happens when the value mutates in the database? Do you create new instances of the value? What if it *didn't* mutate, how do you know that?

Imagine a hypothetical python dict field, that serializes a Python dict to JSON or a string or something and stores it in the database. What happens in the following case?

a = A(dict_field={"hello": "world"})
d = a.dict_field
a.save()
a.refresh()
d["hello"] = "planet"    # does this mutate a.dict_field? does the answer change if somebody changed the database in between saving and refreshing?

In the case of the OneToOneField, we need that the related Python instance not be changed if the value didn't change, or else we will break the one-to-one correspondence of the Python instances (two related one-to-one models point back to our model). There doesn't really seem to be any way to do that generally, so you probably have to create new instances every time even when the value doesn't change except in this special case. Are OneToOneFields special enough to warrant that? What about just your average many-to-one relationship ("b = a.b_set[0]; a.refresh(); assert b not in a.b_set")?

Basically, having a .refresh() necessitates having mutation semantics on fields, where previously fields (not including deferred fields) were always immutable. Either you are loading a new instance from the database, or you are mutating a python value and re-serializing it to the database format. You never have to reconcile python mutations with database mutations. Even deferred fields have only one allowed mutation - from uninitialized to initialized.

Best,
Alex Ogier

Tim Chase

unread,
May 14, 2013, 11:07:25 AM5/14/13
to django-d...@googlegroups.com, alex....@gmail.com
On 2013-05-14 10:43, Alex Ogier wrote:
> What happens in the following case?
>
> a = A(dict_field={"hello": "world"})
> d = a.dict_field
> a.save()
> a.refresh()
> d["hello"] = "planet" # does this mutate a.dict_field? does the
> answer change if somebody changed the database in between saving
> and refreshing?

I'd expect the same as the stdlib's pickle module: when you load an
object from a pickled stream (what I'd consider analogous to
the .refresh() topic at hand), you create a new object. So "d" would
still reference the original dict_field, while the a.dict_field would
contain a new dict object that doesn't contain "hello" as a key.

It is easy to explain, consistent with the stdlib, and unlikely to
harbor strange edge conditions once you understand what it's doing.
It also allows for doing things like

d = a.dict_field
a.save()
# semi-lengthy-process
a.refresh()
if a.dict_field != d:
somebody_changed_something_while_we_were_occupied("!!!")

Prudent? Maybe, maybe not. Believable? sure. But if "d"
auto-updates upon refresh, there's no easy way to make this test
(short of doing a deep-copy of "d" to preserve it).

I think DB-trumps-local is pretty sensible in both cases you describe
(above, and the OneToOne), that if you reload, you get what the DB
tells you and you're responsible for refreshing any internal
references you have reaching into the object. As our preschooler has
learned, "you get what you get, and you don't throw a fit." :-)

-tkc




Anssi Kääriäinen

unread,
May 14, 2013, 1:03:14 PM5/14/13
to Django developers
IMO we should always create new instances for mutable field values.
Else we are in for huge amount of complexity. An exception is cached
related instances. For those an useful optimization is to not throw
them away if the related object's local id hasn't changed. The user
can refresh() the related instances if need be, but there is no way to
keep them if they are automatically thrown out... And if the local id
has changed, then the local cached value + the remote cached value for
reverse o2o field should be cleared. Next access will fetch a fresh
object from DB, but refresh() itself wont do that.

As for how to implement the feature on technical level: I think we
will need to go through get() of new instance and then either direct
assignment to __dict__ for each fetched field, or use setattr() for
each fetched field. Getting a new instance is a good idea so that pre/
post_init signals are called (some fields rely on those). Direct
__dict__ assignment is what deferred field loading does currently so
that might be better idea than setattr(). And setattr() was already
called, but for different instance.

We should likely try to rely less on __init__ in general, and have a
different path for model instance creation when loading from DB.
Direct assignment to the object's __dict__ would make DB loading work
more like unpickle. See https://code.djangoproject.com/ticket/19501
for details.

- Anssi
Reply all
Reply to author
Forward
0 new messages