reconsider re-opening ticket 901 | Wim Feijen | 5/11/13 1:03 AM | Hi, Following up on the discussion on: I'd like to start a clear discussion about re-opening ticket 901. Ticket 901 was marked 7 years ago as won't fix and has been subject of debate since. I must say that I for myself was looking for such a method until I found that foo = Foo.objects.get(id=foo.id) would work; which did not seem obvious to me then and now. Brandon put forward an argument, which is: reload() would cover an important use case that did not get a hearing in the above discussion: that you might already have handed the object off to some other library, which is keeping a reference from somewhere deep in its data structures for use when you make a final call to it. If one or more data structures that are being built already hold the object, then it is not reasonable to have to track down the object and try to swap in a new copy everywhere. There should be a simple way to have the existing object's attributes update in-place. and Anssi thinks it is a good idea to implement this. My question is, can we please re-open this ticket and/or discuss here what are the benefits and drawbacks of having a method reload() ? For clarity, this question is in no way meant to be offensive. And if you want to discuss personal views about this ticket or the behaviour of django community in general, I politely request you to do so at: https://groups.google.com/forum/?fromgroups=#!topic/django-developers/DUQtBrM2iTs Best regards, Wim |
Re: reconsider re-opening ticket 901 | Anssi Kääriäinen | 5/11/13 4:09 AM | On 11 touko, 11:03, Wim Feijen <w...@go2people.nl> wrote:> Following up on the discussion on:https://groups.google.com/forum/?fromgroups=#!topic/django-developers... > > I'd like to start a clear discussion about re-opening ticket 901.https://code.djangoproject.com/ticket/901 >The reasons why I think a reload() or refresh() method is a good idea: - The above mentioned fact that if foo is already in data structure, then you can't just reassign to foo. This is more and more an issue in Django itself due to caching of related objects and prefetch_related(). - If you are in some of the model's methods you can't do self = qs.get(pk=self.pk), more generally reassignment in any method/function will not refresh the object in the caller of the method/function. - We could implement deferred loading using reload/refresh(), and thus allow for a way to customize how deferred loading happens. An example is the idea of loading all fields at a time instead of the current default of loading one deferred field at a time. - This could also work as a way for allowing related objects cache invalidation. The related objects are cached on first access, and after that you don't have any way to force reload of the cached object. (Maybe separate issue to be dealt with later on. But refresh() as a name, and then eventually adding cached object invalidation seems like a good idea to me.) - While it might seem trivial to implement reload()/refresh() when needed, there are some bugs waiting to happen: - Not going through setattr() - if you assign to self.__dict__ directly you might miss __setattr__ or a descriptor's __set__ (this is a problem for custom fields defining to_python() for example). - Reloading fields for deferred models with qs.values() doesn't work correctly. Django uses descriptors for both to_python() conversion and for deferred fields implementation, so if you use values() to_python() will not be called for deferred fields as the deferred descriptor overrides the normal descriptor. See SubfieldBase or FileField for problematic cases. - Not handling field.attname/field.name distinction correctly. For example consider self.user.pk vs self.user_id. If you reload user_id, but don't invalidate self.user the values will not necessarily match. Benefits mentioned above, the only drawback I can see is the addition of one more method to Model API. So, +1 from me to the feature. - Anssi |
Re: reconsider re-opening ticket 901 | Shai Berger | 5/11/13 2:46 PM | On Saturday 11 May 2013, Anssi Kääriäinen wrote:Indeed, it might; which is why I would assume many people have already done it. For these people, a new reload()/refresh() in core may go unnoticed or redundantly overridden. In the "test discovery" thread, Russel has mentioned an "upgrade checks" proposal[0] -- if it comes to fruition, and this ticket is reopened, I suggest that a check for pre-existing reoad/refresh methods on models be included in the upgrade check. Shai. |
Re: reconsider re-opening ticket 901 | Shai Berger | 5/11/13 2:49 PM | [resend, reference included; sorrt for the noise]
[0] https://groups.google.com/d/msg/django-developers/0xuznNH7aP0/CCqyDTpWSj4J |
Re: reconsider re-opening ticket 901 | Russell Keith-Magee | 5/11/13 4:55 PM | On Sat, May 11, 2013 at 4:03 PM, Wim Feijen <w...@go2people.nl> wrote:Hi, The existence of this workaround is probably the single biggest technical reason that this ticket hasn't made any significant progress.
I'm sure I understand this argument. Python objects are passed around by reference, not by value, so if you've passed in a Django object deep into another library, that library will be pointing at the same instance. If the instance is changed, everywhere holding a handle to that reference will be updated.
To that end - I want to make sure that we're clear about what we're talking about here. What is on the table is essentially adding a refresh() call on an object instance that is an API analog of ".get(id=self.id)" (although the implementation will need to do a bit more than that).
We are *not* talking about ticket #17, object instance caching. Calling refresh() *will not* update *every* instance of a given object. The following would be the effective API: >>> a = MyObj.objects.get(id=42) >>> a.message = 'hello' >>> a.save() >>> b = MyObj.objects.get(id=42) >>> c = MyObj.objects.get(id=42)
>>> a.message = 'goodbye' >>> a.save() >>> print b.message 'hello' >>> b.refresh()
>>> print b.message 'goodbye' >>> print c.message 'hello' The only place I've had use for this sort of functionality is in test suites, where the test setUp configures some objects, and then a test of a view modifies that object. As a result of calling the view, the database value is changed, but the object held by the test isn't modified in place. In this situation, a call to refresh() would be a nice convenience. However, it's a minor convenience; the "get by pk" trick works fine everywhere I've had this problem.
Wim - You've done exactly what the core team has asked - you've started a discussion. We're not opposed to discussion, and we're not opposed to saying we're wrong if someone can make a strong case. Nothing you've done or said here is offensive.
Yours, Russ Magee %-) |
Re: reconsider re-opening ticket 901 | Andrew Ingram | 5/11/13 5:10 PM |
I was under the impression that the suggested use case was more like this:
>>> b = {'obj': a} >>> print b['obj'].message 'hello' >>> c = MyObj.objects.get(id=42) >>> c.message = 'goodbye' >>> c.save() >>> a = MyObj.objects.get(id=42) # existing 'workaround', gives us a new instance >>> print a.message 'goodbye' >>> print b['obj'].message # b['obj'] still points to the original instance `a` 'hello' If a reload() method existed, `a` could have been reloaded with `a.reload()`, and then `b['obj']` would have been up-to-date as well. As it stands, every reference to `a` has to be explicitly re-fetched from the database. Am I missing something obvious? - Andy |
Re: reconsider re-opening ticket 901 | Alex Ogier | 5/11/13 6:02 PM | On Sat, May 11, 2013 at 7:55 PM, Russell Keith-Magee <rus...@keith-magee.com> wrote:
Yes, but that is not what foo = MyObj.objects.get(id=foo.id) does. That assigns a new reference to a new object to the name foo. Other references, even references to the same instance as foo, are not refreshed. This is precisely why a .reload() method is useful -- what is wanted is to mutate the instance that is referred to, not to create a new reference and assign it to the old name.
It's not really the same at all, .reload() or .refresh() would the existing instance in place. It's more akin to foo.__dict__ = MyObj.objects.get(id=foo.id).__dict__
Best, Alex Ogier |
Re: reconsider re-opening ticket 901 | Anssi Kääriäinen | 5/11/13 6:36 PM | On 12 touko, 02:55, Russell Keith-Magee <russ...@keith-magee.com>
wrote: > To that end - I want to make sure that we're clear about what we're talkingConcrete API proposal: Model.refresh() reloads all non-deferred local field values (that is, all fields in the current model which have a database column). In addition refresh() will make sure that cached values dependent of the reloaded values will be cleared. That is, if you do foo.refresh(), then when foo.user_id is reloaded foo.user will be cleared (assuming the reloaded user_id is different from the original value). This is to ensure that next access to foo.user will reload the related object, too. The refresh() method makes sure descriptors are honored, for example custom field's to_python() method will be called for refreshed values. The refresh() method accepts *args, so that one can specify which fields to reload. This is useful so that deferred attribute loading can use refresh(), and by overriding refresh() it will be possible to customize how deferred loading happens.Deferred field loading is altered to use .refresh(). It might be useful to add model.deferred_fields property. The property returns a set of currently deferred fields in the instance. If this was available it would be easy to force reload of all currently deferred fields, but no reload of other fields. This would also be useful for loading all deferred fields when any deferred field is accessed. But this can wait for later... Does the proposal look acceptable? - Anssi |
Re: reconsider re-opening ticket 901 | Shai Berger | 5/12/13 1:24 AM | Hi,
There's one minor issue that I'm not entirely clear of with this proposal: The limitation (assuming the reloaded user_id is different from the original value) does make sense; removing it also makes sense -- then, after foo.refresh(), foo.user will always get the updated user object (and always incur a database hit). Relatedly, we now cache back-references of 1-to-1 relations on the instance; those should probably be updated (on remote objects!) too, or else some serious inconsistencies may be created (when the _id field changes, that is). Other than that, +1. Shai. |
Re: reconsider re-opening ticket 901 | Aymeric Augustin | 5/12/13 9:15 AM | On 12 mai 2013, at 03:36, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:That sounds good to me. I'm not totally following you here. I suppose it would make sense if I were more familiar with the implementation. > accessed. But this can wait for later… Yes, that would be an internal API anyway. Yes. My only objection to this ticket was the lack of a concrete proposal! -- Aymeric. |
Re: reconsider re-opening ticket 901 | Aymeric Augustin | 5/12/13 9:27 AM | On 12 mai 2013, at 10:24, Shai Berger <sh...@platonix.com> wrote:Django has cached them for a long time. It's just a bit more efficient and deterministic as of Django 1.5. :) Let's say model A has a one-to-one relation to model B — ie. a has a b_id. The reverse relation is b.a, and you're discussing b.refresh(). Since b will be refreshed in-place, a.b will point to an updated instance of b, without any special handling. For consistency, the cached value of b.a could be cleared as well. However, it doesn't really depend on b itself. It only depends on the PK, and PKs are immutable (changing the PK makes a copy). If it turns out to be easier not to clear that value, I'll happily leave this decision to the person who implements the patch! -- Aymeric. |
Re: reconsider re-opening ticket 901 | Shai Berger | 5/12/13 10:10 AM | On Sunday 12 May 2013, Aymeric Augustin wrote:No, I mean: aa= A.objects.get(...) bb= aa.b (some other operation, changing aa.b_id in the database) aa.refresh() Now -- unless we do something about it -- bb.a is still aa, but aa.b is not bb and not even equal to it; thus, "bb.a.b == bb" is false, which currently cannot happen without serious meddling if I'm not mistaken. Shai. |
Re: reconsider re-opening ticket 901 | Anssi Kääriäinen | 5/12/13 10:27 AM | On 12 touko, 19:15, Aymeric Augustin
<aymeric.augus...@polytechnique.org> wrote:The deferred attribute implementation is as follows: all deferred attributes have a DeferredAttribute descriptor (found from models.query_utils IIRC), so on access of foo.somefield the descriptor's __get__() is called. The __get__() checks if the value is already found from the instance, if not it ends up fetching the value from DB. The fetching of the value is equivalent to refresh() of somefield (but of course not equivalent to full refresh). So, if you override refresh() to do this: def refresh(self, *fields): if set(fields).intersects(self.deferred_fields): fields.extend(self.deferred_fields) super().refresh(*fields) Now, any time any deferred field is accessed (or refreshed manually), then all of the deferred attributes are loaded in one go. This will give a solution to Adrian's request of loading all deferred fields in one go when one is accessed. Granted, there is no declarative way to do this, nor is there per-queryset API. I think that cases where there is need to customize deferred attribute loading are rare, so there is no need for dedicated API. In addition the DeferredAttribute implementation will be a little more DRY. - Anssi |
Re: reconsider re-opening ticket 901 | Jacob Kaplan-Moss | 5/12/13 12:54 PM | On Sat, May 11, 2013 at 6:36 PM, Anssi KääriäinenYes. +1 from me. Jacob |
Re: reconsider re-opening ticket 901 | Tim Chase | 5/12/13 1:13 PM | On 2013-05-11 18:36, Anssi Kääriäinen wrote: > > What is on the table is essentially adding a refresh() call on anI guess my minor quibble is about the name itself and possible clashes with existing fields/methods: class MyModelA(Model): # ... refresh = BooleanField(...) # ... class MyModelB(Model): # ... def refresh(...): do_something_refreshing() # ... a = MyModelA.objects.get(pk=some_id) b = MyModelB.objects.get(pk=other_id) # ... if a.refresh: # legacy code expects a BooleanField # whoops, what happens here under this proposal? if b.refresh(): # legacy code expects local logic, not Django logic # or what gets called here? I wouldn't want to see any breakage when upgrading between versions. I don't have a good proposal, other than perhaps _refresh() or __refresh__(), or cramming it into Meta (I don't know if that would even work). -tkc |
Re: reconsider re-opening ticket 901 | Luke Sneeringer | 5/12/13 1:35 PM | In MyModelB, at least, the subclass' refresh method would win, since it's the subclass.
I'm not sure about MyModelA, since I am not quite sure how the metaclass' processing would intersect. That said, if there's demand for the feature, it's probably worth this cost. (Bias: I would use it if it existed, although my use case differs from the discussion here.) Sent from my iPad > -- > You received this message because you are subscribed to the Google Groups "Django developers" group. > To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com. > To post to this group, send email to django-d...@googlegroups.com. > Visit this group at http://groups.google.com/group/django-developers?hl=en. > For more options, visit https://groups.google.com/groups/opt_out. > > |
Re: reconsider re-opening ticket 901 | Russell Keith-Magee | 5/12/13 7:54 PM | On Sun, May 12, 2013 at 9:02 AM, Alex Ogier <alex....@gmail.com> wrote: Understood - I was just using ".get(id=self.id)" as a shorthand for what the underlying method will effectively be doing. I know the implementation will be a bit more subtle than that.
Yours, Russ Magee %-) |
Re: reconsider re-opening ticket 901 | Russell Keith-Magee | 5/12/13 8:00 PM | +1 from me. The use case for .refresh(*args) being used to pull back multiple deferred fields is the real deal-maker for me, because it dovetails really nicely with something Adrian posted about a month ago:
It seems to me that ".refresh()" is remarkably close to the "get everything that was deferred" behaviour that Adrian was looking to add. The fact that it can also be used to update an instance to the current database state is a nice side effect in this case.
Russ %-) |
Re: reconsider re-opening ticket 901 | Chris Wilson | 5/12/13 12:05 PM | Hi all,
> aa= A.objects.get(...)Hibernate (as an implementation of the Java Persistence API, JPA) tries to do things like this by implementing a global cache of objects loaded from the database, so that there's only one copy of A and B in memory, which all references point to. This makes it possible for A, when its link to B changes to point to a different object B', to find all the instances of B in memory (because there's only one, and it's in the object cache) and update them to no longer point back to A. However it does this at enormous cost in flexibility, sanity and error recovery. I strongly recommend against going down this route: http://blog.old.aptivate.org/2010/11/26/hibernate-ejb-and-the-unique-constraint/ It's always possible to have an out-of-date model object in memory with Django, with or without refresh() (which just allows us to update it manually). I suggest we let that sleeping dog lie. Cheers, Chris. -- Aptivate | http://www.aptivate.org | Phone: +44 1223 967 838 Future Business, Cam City FC, Milton Rd, Cambridge, CB4 1UY, UK Aptivate is a not-for-profit company registered in England and Wales with company number 04980791. |
Re: reconsider re-opening ticket 901 | Alex Ogier | 5/13/13 9:34 AM | On Sun, May 12, 2013 at 3:05 PM, Chris Wilson <ch...@aptivate.org> wrote: Hi all, It does make sense to do this though, in the particular case of one-to-one relationships (for the same reasons it makes sense to cache back-references in one-to-one relationships). The reason is that this maintains a python-only invariant that is independent of whether the relationship is persistent in the database. There's no attempt to update all model instances of the referred-to model, only the single one that is currently referred to by the Python instance. This is consistent with the behavior of setting a OneToOneField to None. For example consider the following:
bb = B() bb.save() aa = A(b=bb) aa.save() assert aa.b is bb assert bb.a is aa # back-reference is cached
aa.b = None # back-reference is cleared assert aa.b is None assert bb.a is None This should be consistent with the behavior when .refresh() is called -- if that involves clearing aa.b, then bb.a should also be cleared, it's that simple. Ideally this is done by just setting all fields via setattr(), which will trigger this code (and any arbitrary code for user-defined fields Django doesn't know about). In short the following should be equivalent to the previous example:
bb = B() bb.save() aa = A(b=bb) aa.save() assert aa.b is bb assert bb.a is aa # back-reference is cached
aaa = A.objects.get(id=aa.id) aaa.b = None aaa.save() aa.refresh() # back-reference is cleared assert aa.b is None
assert bb.a is None Hopefully that clears things up. Best, Alex Ogier |
Re: reconsider re-opening ticket 901 | Russell Keith-Magee | 5/13/13 9:09 PM | As far as I can make out, that's exactly the same use case -- you're just making the example a little bit more explicit regarding the fact that 'a' is an object referred to by reference, so if 'a' is updated, b['a'] will also be updated.
The point I was trying to make is that we're *not* talking about every instance of "object with PK 42" being transparently updated. It's an explicit API call on a specific object instance.
Russ %-) |
Re: reconsider re-opening ticket 901 | Alex Ogier | 5/14/13 3:15 AM | On Tue, May 14, 2013 at 12:09 AM, Russell Keith-Magee <rus...@keith-magee.com> wrote:
I think what he is saying is that this is not just shorthand for some other way of achieving the same behavior. It's a totally new behavior that has plenty of corner cases such as foreign keys, and especially OneToOneFields.
This is the shortest workaround I can come up with for the .refresh() method: "a.__dict__ = dict(a.__dict__).update(type(a).objects.get(**{a._meta.pk.name: a.pk}).__dict__)". Unfortunately, that clobbers any non-field attributes you've changed since you created the instance (though it preserves new ones). Alternatively, you could iterate over Meta.fields calling "setattr(old_a, field_name, getattr(new_a, field_name))" for each one, but that has the problem that field setters may expect to do some massaging of the data on the way in which can cause data corruption. Basically this is a Hard Problem(tm) with no obvious workaround.
Best, Alex Ogier |
Re: reconsider re-opening ticket 901 | Shai Berger | 5/14/13 4:38 AM | On Tuesday 14 May 2013, Alex Ogier wrote:Another one is initializers: get() or any other method of fetching an object from the database will call __init__() with the fields as *args; this allows a model to manipulate these fields or calculate derived fields or whatever. refresh() may put such objects in an inconsistent state by default -- to prevent this, we would need a rule along the lines of "every model that overrides __init__() also needs to override refresh()". Shai. |
Re: reconsider re-opening ticket 901 | Alex Ogier | 5/14/13 7:43 AM | There's also a systemic problem with any field that contains a python reference. What happens when the value mutates in the database? Do you create new instances of the value? What if it *didn't* mutate, how do you know that?
Imagine a hypothetical python dict field, that serializes a Python dict to JSON or a string or something and stores it in the database. What happens in the following case?
a = A(dict_field={"hello": "world"}) d = a.dict_field a.save() a.refresh() d["hello"] = "planet" # does this mutate a.dict_field? does the answer change if somebody changed the database in between saving and refreshing?
In the case of the OneToOneField, we need that the related Python instance not be changed if the value didn't change, or else we will break the one-to-one correspondence of the Python instances (two related one-to-one models point back to our model). There doesn't really seem to be any way to do that generally, so you probably have to create new instances every time even when the value doesn't change except in this special case. Are OneToOneFields special enough to warrant that? What about just your average many-to-one relationship ("b = a.b_set[0]; a.refresh(); assert b not in a.b_set")?
Basically, having a .refresh() necessitates having mutation semantics on fields, where previously fields (not including deferred fields) were always immutable. Either you are loading a new instance from the database, or you are mutating a python value and re-serializing it to the database format. You never have to reconcile python mutations with database mutations. Even deferred fields have only one allowed mutation - from uninitialized to initialized.
Best, Alex Ogier |
Re: reconsider re-opening ticket 901 | Tim Chase | 5/14/13 8:07 AM | On 2013-05-14 10:43, Alex Ogier wrote:I'd expect the same as the stdlib's pickle module: when you load an object from a pickled stream (what I'd consider analogous to the .refresh() topic at hand), you create a new object. So "d" would still reference the original dict_field, while the a.dict_field would contain a new dict object that doesn't contain "hello" as a key. It is easy to explain, consistent with the stdlib, and unlikely to harbor strange edge conditions once you understand what it's doing. It also allows for doing things like d = a.dict_field a.save() # semi-lengthy-process a.refresh() if a.dict_field != d: somebody_changed_something_while_we_were_occupied("!!!") Prudent? Maybe, maybe not. Believable? sure. But if "d" auto-updates upon refresh, there's no easy way to make this test (short of doing a deep-copy of "d" to preserve it). I think DB-trumps-local is pretty sensible in both cases you describe (above, and the OneToOne), that if you reload, you get what the DB tells you and you're responsible for refreshing any internal references you have reaching into the object. As our preschooler has learned, "you get what you get, and you don't throw a fit." :-) -tkc |
Re: reconsider re-opening ticket 901 | Anssi Kääriäinen | 5/14/13 10:03 AM | On 14 touko, 18:07, Tim Chase <django.us...@tim.thechases.com> wrote:
> On 2013-05-14 10:43, Alex Ogier wrote: > > > What happens in the following case? > > > a = A(dict_field={"hello": "world"}) > > d = a.dict_field > > a.save() > > a.refresh() > > d["hello"] = "planet" # does this mutate a.dict_field? does the > > answer change if somebody changed the database in between saving > > and refreshing? > > I'd expect the same as the stdlib's pickle module: when you load an > object from a pickled stream (what I'd consider analogous to > the .refresh() topic at hand), you create a new object. So "d" would > still reference the original dict_field, while the a.dict_field would > contain a new dict object that doesn't contain "hello" as a key. > > It is easy to explain, consistent with the stdlib, and unlikely to > harbor strange edge conditions once you understand what it's doing. > It also allows for doing things like > > d = a.dict_field > a.save() > # semi-lengthy-process > a.refresh() > if a.dict_field != d: > somebody_changed_something_while_we_were_occupied("!!!") > > Prudent? Maybe, maybe not. Believable? sure. But if "d" > auto-updates upon refresh, there's no easy way to make this test > (short of doing a deep-copy of "d" to preserve it). > > I think DB-trumps-local is pretty sensible in both cases you describe > (above, and the OneToOne), that if you reload, you get what the DB > tells you and you're responsible for refreshing any internal > references you have reaching into the object. As our preschooler has > learned, "you get what you get, and you don't throw a fit." :-) IMO we should always create new instances for mutable field values. Else we are in for huge amount of complexity. An exception is cached related instances. For those an useful optimization is to not throw them away if the related object's local id hasn't changed. The user can refresh() the related instances if need be, but there is no way to keep them if they are automatically thrown out... And if the local id has changed, then the local cached value + the remote cached value for reverse o2o field should be cleared. Next access will fetch a fresh object from DB, but refresh() itself wont do that. As for how to implement the feature on technical level: I think we will need to go through get() of new instance and then either direct assignment to __dict__ for each fetched field, or use setattr() for each fetched field. Getting a new instance is a good idea so that pre/ post_init signals are called (some fields rely on those). Direct __dict__ assignment is what deferred field loading does currently so that might be better idea than setattr(). And setattr() was already called, but for different instance. We should likely try to rely less on __init__ in general, and have a different path for model instance creation when loading from DB. Direct assignment to the object's __dict__ would make DB loading work more like unpickle. See https://code.djangoproject.com/ticket/19501 for details. - Anssi |