I have a small improvement to suggest for one-to-one fields: Make them cache back-references on related objects. That is, assume
class A(model): pass
class B(model): a = OneToOneField(A)
a1 = A.objects.get(...) # assume a1 has a related B record b1 = B.objects.get(...)
Today, both (a1.b.a is a1) and (b1.a.b is b1) are false; each of the expressions generates two database hits. I propose to fix this.
I've run into the issue while doing some performance-improvement work on a large project; in this project, the painful instance of the problem was a user-profile model. The developers used the reverse of the 1-to-1 on the profile model instead of django.contrib.auth.models.User.get_profile(), and they often navigated between the user object and the profile object in both ways (sometimes even for good reasons), generating redundant database hits.
I wrote tests for the problem, and a fix, in the context of the project; I've now ported it to a fix in Django itself. It is a small fix, and I think beyond its general value, it will allow removing the special-case-caching from User.get_profile().
Do you see a reason why I should not post a ticket?
On Wed, Jan 18, 2012 at 1:07 PM, Shai Berger <s...@platonix.com> wrote: > I have a small improvement to suggest for one-to-one fields: Make them cache > back-references on related objects. That is, assume
Yes! Good improvement. And we should do the same thing for one-to-many fields (ForeignKeys):
""" class Author(models.Model): name = models.CharField(...)
class Book(models.Model): author = models.ForeignKey(Author)
john = Author.objects.get(name='john') books = list(john.book_set.all())
# Does a database query, but should be smart enough to simply return john. books[0].author """
I'm pretty sure there's a long-standing ticket for this, but I'm not sure which one it is. Shai, does your solution approach this in a way that can solve the issue for ForeignKeys as well?
On Thu, Jan 19, 2012 at 5:14 PM, Adrian Holovaty <adr...@holovaty.com> wrote: > On Wed, Jan 18, 2012 at 1:07 PM, Shai Berger <s...@platonix.com> wrote: >> I have a small improvement to suggest for one-to-one fields: Make them cache >> back-references on related objects. That is, assume
> Yes! Good improvement. And we should do the same thing for one-to-many > fields (ForeignKeys):
... > john = Author.objects.get(name='john') > books = list(john.book_set.all())
> # Does a database query, but should be smart enough to simply return john. > books[0].author > """
> I'm pretty sure there's a long-standing ticket for this, but I'm not > sure which one it is. Shai, does your solution approach this in a way > that can solve the issue for ForeignKeys as well?
This one seems much more likely to be a breaking change - it isn't that unusual to do something like:
special_book = Book.objects.get(...) special_book.fiddly_bit = True # note no save
for author in Author.objects.all(): for book in author.books.all(): if book.fiddly_bit: book.related_books.add(special_book)
The semantics of this change under a new caching layer.
If the intention is that it's really the same object (and not a new instance of the same db object retrieved from a caching layer) then this introduces actions at a distance.
> On Thu, Jan 19, 2012 at 5:14 PM, Adrian Holovaty <adr...@holovaty.com> wrote: >> john = Author.objects.get(name='john') >> books = list(john.book_set.all())
>> # Does a database query, but should be smart enough to simply return john. >> books[0].author >> """
>> I'm pretty sure there's a long-standing ticket for this, but I'm not >> sure which one it is. Shai, does your solution approach this in a way >> that can solve the issue for ForeignKeys as well?
> This one seems much more likely to be a breaking change - it isn't > that unusual to do something like:
> special_book = Book.objects.get(...) > special_book.fiddly_bit = True > # note no save
> for author in Author.objects.all(): > for book in author.books.all(): > if book.fiddly_bit: > book.related_books.add(special_book)
> The semantics of this change under a new caching layer.
> If the intention is that it's really the same object (and not a new > instance of the same db object retrieved from a caching layer) then > this introduces actions at a distance.
I don't think Adrian is proposing anything as extensive as #17. What he's proposing (IIUC) wouldn't change the semantics of your sample code at all. All it would do is prepopulate the FK field on the results of a reverse-FK query, so the innermost "book.author" here doesn't need to do a query:
for author in Author.objects.all(): for book in author.books.all(): book.author
There wouldn't be any action-at-a-distance with other Book instances.
Carl -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> john = Author.objects.get(name='john') > books = list(john.book_set.all())
> # Does a database query, but should be smart enough to simply return john. > books[0].author > """
> I'm pretty sure there's a long-standing ticket for this, but I'm not > sure which one it is. Shai, does your solution approach this in a way > that can solve the issue for ForeignKeys as well?
Only the tests...
My solution is a little modification on the descriptors used for the related objects. But for foreign keys, the relevant descriptor returns a manager which uses the QuerySet of the related model's default manager. To get the desired behavior, that manager should modify (or wrap) this QuerySet, to seed the cache on each object it returns.
I may have time to work on something like this, but probably only next week.
On Thu, Jan 19, 2012 at 12:15 PM, Carl Meyer <c...@oddbird.net> wrote: > I don't think Adrian is proposing anything as extensive as #17. What > he's proposing (IIUC) wouldn't change the semantics of your sample code > at all. All it would do is prepopulate the FK field on the results of a > reverse-FK query, so the innermost "book.author" here doesn't need to do > a query:
> for author in Author.objects.all(): > for book in author.books.all(): > book.author
> There wouldn't be any action-at-a-distance with other Book instances.
Right, what Carl said -- no action-at-a-distance, just mere cache population on each related object. I don't think there would be a risk of backwards-incompatible side effects.
> On Thu, Jan 19, 2012 at 12:15 PM, Carl Meyer <c...@oddbird.net> wrote: > > I don't think Adrian is proposing anything as extensive as #17. What > > he's proposing (IIUC) wouldn't change the semantics of your sample code > > at all. All it would do is prepopulate the FK field on the results of a > > reverse-FK query, so the innermost "book.author" here doesn't need to do > > a query:
> > for author in Author.objects.all(): > > for book in author.books.all(): > > book.author
> > There wouldn't be any action-at-a-distance with other Book instances.
> Right, what Carl said -- no action-at-a-distance, just mere cache > population on each related object. I don't think there would be a risk > of backwards-incompatible side effects.
It would certainly be worth at least a note in the release notes, as it is a change.
The only situation where I can imagine someone relying on the current behaviour is in regression tests. I can see code like this being in place inside a test suite:
def setUp(self): # We need a foo for every test in this test case self.foo = Foo.objects.get(id=1)
def test_that_munges_are_really_saved(self): # Munge the bar. This should also set foo.baz to 12 bar = self.foo.bar munge_bar(bar)
# check that the Foo really got updated. Before Yestermas, the Foo got updated # in memory, but the changes weren't being saved to the db self.assertEqual(bar.foo.baz, 12)
This code would have been written with the current behaviour in place, and then completely forgotten about, buried in a suite of tests. It's easy to argue that the test code is incorrect, and should be using a different method to check the state of the database, but a test like this could continue to pass even in the event of an actual regression, and the developers wouldn't even know that their test was wrong. At least with a mention in the release notes, [diligent] developers would have been warned, and could check their code for these sorts of assumptions.