Understanding FKs vs JOIN

bradford

unread,

Jul 9, 2006, 12:22:22 AM7/9/06

to Django users

I have a Model Foo, with a FK to user.

if i do f = Foo.objects.get(id=1) and all i need from that is f.title
and j.user.username in my template... is it returning more info than
needed with User. I mean, is it getting the entire user object for
each returned record?

If it is returning the entire user object, would it then be best to use
just SQL and a JOIN?

Malcolm Tredinnick

unread,

Jul 9, 2006, 12:44:47 AM7/9/06

to django...@googlegroups.com

You typically won't see a lot of difference, I would expect. The
database still has to read the whole row from disk, unless you are
retrieving a field that it can get entirely out of an index (which is a
consideration sometimes when designing queries in large systems -- but
not something Django worries about, since it's really work out
automatically).

A general rule of thumb (not quite "urban legend" status, but sometimes
a little tricky to measure easily all the same) is that a database will
spend more time filtering down to the rowset you want than retrieving
the actual data. In other words, the "select" field list is not a big
factor in the overall time of the query. Of course, there are exceptions
to this: a select clause with a some complex aggregate functions in it
(and a lot of rows), or a very wide table (and if that is a problem,
moving the large columns into another table if you don't always need
them is the solution) are two exceptions, for example.

Other cases where you might the data retrieval (as opposed to filtering)
taking a more prominent role is when you have smallish data sets that
can be filtered very quickly. But in those cases, stop caring! It's so
small that it doesn't matter anyway.

I would also think the overhead of creating the Python object is not
likely to be significant (unless your total time is so small as to be
inside the "noise").

Still, the definitive answer is always "time it and see", if it really
matters (such as if you really are trying to squeeze every last
microsecond out of the processing time). I'm writing all this without
having actually benchmarked anything, but the above knowledge is based
on my own experience and stuff I've read over the years (and I have
worked with large enough data sets that serious query optimisations,
such as controlling table join orders, or running over indexes in the
right order, made a difference). It shouldn't be too hard to set up a
large data set and try out both approaches. The results might be
interesting.

Regards,
Malcolm

bradford

unread,

Jul 9, 2006, 12:47:56 AM7/9/06

to Django users

Thanks, Malcolm -- as always :)

Jeremy Dunck

unread,

Jul 9, 2006, 4:16:42 PM7/9/06

to django...@googlegroups.com

If, instead, you're fetching lots of foo objects and need to get users
for all of them, you might want to use select_related, which does the
joins on the original fetch of Foo's, rather than making a separate
request per .user

http://www.djangoproject.com/documentation/db_api/#select-related

Caution, though, since select_related follows as far as it can, which
might be further than you actually want (say, if user then has a FK to
something else). If that's the case, maybe you want extra(), which
would allow you to do a subquery for the user name.

http://www.djangoproject.com/documentation/db_api/#extra-select-none-where-none-params-none-tables-none

Something like:
Foo.objects.all().extra({'user_name':'select app_user.user_name from
app_user where app_user.id = app_foo.user'})

Reply all

Reply to author

Forward