Caching back-refernces on one-to-one fields

81 views
Skip to first unread message

Shai Berger

unread,
Jan 18, 2012, 2:07:26 PM1/18/12
to Django developers
Hi Django devs,

I have a small improvement to suggest for one-to-one fields: Make them cache
back-references on related objects. That is, assume

class A(model):
pass

class B(model):
a = OneToOneField(A)

a1 = A.objects.get(...) # assume a1 has a related B record
b1 = B.objects.get(...)

Today, both (a1.b.a is a1) and (b1.a.b is b1) are false; each of the
expressions generates two database hits. I propose to fix this.

I've run into the issue while doing some performance-improvement work on a
large project; in this project, the painful instance of the problem was a
user-profile model. The developers used the reverse of the 1-to-1 on the profile
model instead of django.contrib.auth.models.User.get_profile(), and they often
navigated between the user object and the profile object in both ways
(sometimes even for good reasons), generating redundant database hits.

I wrote tests for the problem, and a fix, in the context of the project; I've
now ported it to a fix in Django itself. It is a small fix, and I think beyond
its general value, it will allow removing the special-case-caching from
User.get_profile().

Do you see a reason why I should not post a ticket?

Thanks,
Shai.

Jeremy Dunck

unread,
Jan 18, 2012, 2:41:42 PM1/18/12
to django-d...@googlegroups.com
On Wed, Jan 18, 2012 at 11:07 AM, Shai Berger <sh...@platonix.com> wrote:
> Do you see a reason why I should not post a ticket?

+1

Florian Apolloner

unread,
Jan 19, 2012, 5:44:48 AM1/19/12
to django-d...@googlegroups.com
Hi,


On Wednesday, January 18, 2012 8:07:26 PM UTC+1, Shai Berger wrote:

Do you see a reason why I should not post a ticket?


No, please go ahead. Funny you mention that problem now, I ran over the same thing yesterday :)

Cheers,
Florian

Adrian Holovaty

unread,
Jan 19, 2012, 12:14:48 PM1/19/12
to django-d...@googlegroups.com
On Wed, Jan 18, 2012 at 1:07 PM, Shai Berger <sh...@platonix.com> wrote:
> I have a small improvement to suggest for one-to-one fields: Make them cache
> back-references on related objects. That is, assume

Yes! Good improvement. And we should do the same thing for one-to-many
fields (ForeignKeys):

"""
class Author(models.Model):
name = models.CharField(...)

class Book(models.Model):
author = models.ForeignKey(Author)

john = Author.objects.get(name='john')
books = list(john.book_set.all())

# Does a database query, but should be smart enough to simply return john.
books[0].author
"""

I'm pretty sure there's a long-standing ticket for this, but I'm not
sure which one it is. Shai, does your solution approach this in a way
that can solve the issue for ForeignKeys as well?

Adrian

Jeremy Dunck

unread,
Jan 19, 2012, 1:02:01 PM1/19/12
to django-d...@googlegroups.com
On Thu, Jan 19, 2012 at 5:14 PM, Adrian Holovaty <adr...@holovaty.com> wrote:
> On Wed, Jan 18, 2012 at 1:07 PM, Shai Berger <sh...@platonix.com> wrote:
>> I have a small improvement to suggest for one-to-one fields: Make them cache
>> back-references on related objects. That is, assume
>
> Yes! Good improvement. And we should do the same thing for one-to-many
> fields (ForeignKeys):
>
...

> john = Author.objects.get(name='john')
> books = list(john.book_set.all())
>
> # Does a database query, but should be smart enough to simply return john.
> books[0].author
> """
>
> I'm pretty sure there's a long-standing ticket for this, but I'm not
> sure which one it is. Shai, does your solution approach this in a way
> that can solve the issue for ForeignKeys as well?

This one seems much more likely to be a breaking change - it isn't
that unusual to do something like:

special_book = Book.objects.get(...)
special_book.fiddly_bit = True
# note no save

for author in Author.objects.all():
for book in author.books.all():
if book.fiddly_bit:
book.related_books.add(special_book)

The semantics of this change under a new caching layer.

If the intention is that it's really the same object (and not a new
instance of the same db object retrieved from a caching layer) then
this introduces actions at a distance.

This is ye olde Django ticket:
https://code.djangoproject.com/ticket/17

And for context: ;-)
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx

Carl Meyer

unread,
Jan 19, 2012, 1:15:21 PM1/19/12
to django-d...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I don't think Adrian is proposing anything as extensive as #17. What
he's proposing (IIUC) wouldn't change the semantics of your sample code
at all. All it would do is prepopulate the FK field on the results of a
reverse-FK query, so the innermost "book.author" here doesn't need to do
a query:

for author in Author.objects.all():
for book in author.books.all():

book.author

There wouldn't be any action-at-a-distance with other Book instances.

Carl
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk8YXbgACgkQ8W4rlRKtE2fWPQCfXUp1NVmTaNvwZVPpyOsFVAtP
+UsAoJqJ5l0nz+98L+Z+dJ8rdPe5DeM8
=BTg6
-----END PGP SIGNATURE-----

Shai Berger

unread,
Jan 19, 2012, 1:13:33 PM1/19/12
to django-d...@googlegroups.com
Hi again,

>
> john = Author.objects.get(name='john')
> books = list(john.book_set.all())
>
> # Does a database query, but should be smart enough to simply return john.
> books[0].author
> """
>
> I'm pretty sure there's a long-standing ticket for this, but I'm not
> sure which one it is. Shai, does your solution approach this in a way
> that can solve the issue for ForeignKeys as well?
>

Only the tests...

My solution is a little modification on the descriptors used for the related
objects. But for foreign keys, the relevant descriptor returns a manager which
uses the QuerySet of the related model's default manager. To get the desired
behavior, that manager should modify (or wrap) this QuerySet, to seed the
cache on each object it returns.

I may have time to work on something like this, but probably only next week.

Shai.

Adrian Holovaty

unread,
Jan 19, 2012, 4:27:58 PM1/19/12
to django-d...@googlegroups.com
On Thu, Jan 19, 2012 at 12:15 PM, Carl Meyer <ca...@oddbird.net> wrote:
> I don't think Adrian is proposing anything as extensive as #17. What
> he's proposing (IIUC) wouldn't change the semantics of your sample code
> at all. All it would do is prepopulate the FK field on the results of a
> reverse-FK query, so the innermost "book.author" here doesn't need to do
> a query:
>
> for author in Author.objects.all():
>    for book in author.books.all():
>        book.author
>
> There wouldn't be any action-at-a-distance with other Book instances.

Right, what Carl said -- no action-at-a-distance, just mere cache
population on each related object. I don't think there would be a risk
of backwards-incompatible side effects.

Adrian

Ian Clelland

unread,
Jan 20, 2012, 1:10:07 PM1/20/12
to django-d...@googlegroups.com
It would certainly be worth at least a note in the release notes, as it is a change.

The only situation where I can imagine someone relying on the current behaviour is in regression tests. I can see code like this being in place inside a test suite:

def setUp(self):
    # We need a foo for every test in this test case
    self.foo = Foo.objects.get(id=1)

def test_that_munges_are_really_saved(self):
    # Munge the bar. This should also set foo.baz to 12
    bar = self.foo.bar
    munge_bar(bar)

    # check that the Foo really got updated. Before Yestermas, the Foo got updated
    # in memory, but the changes weren't being saved to the db
    self.assertEqual(bar.foo.baz, 12)

This code would have been written with the current behaviour in place, and then completely forgotten about, buried in a suite of tests.
It's easy to argue that the test code is incorrect, and should be using a different method to check the state of the database, but a test like this could continue to pass even in the event of an actual regression, and the developers wouldn't even know that their test was wrong. At least with a mention in the release notes, [diligent] developers would have been warned, and could check their code for these sorts of assumptions.



--
Regards,
Ian Clelland
<clel...@gmail.com>
Reply all
Reply to author
Forward
0 new messages