Extracting results of a join

17 views
Skip to first unread message

Geoff Kuenning

unread,
Oct 10, 2016, 3:30:54 AM10/10/16
to django...@googlegroups.com
I have a table, Photos, which contains pretty much what you'd expect. A
second table, Albums, contains a field named "key", a title, and other
information. The two are related by a ManyToManyField through
AlbumMembers, which relates the photo ID to the album key ("album"). I
mention AlbumMembers for completeness; it doesn't enter into this
question.

Finally, I have a very sparse table, AlbumOrder, which contains a photo
(ID) and a numeric "order" field. That table has only 44 rows total, 22
for the album I'm testing with.

When I need to collect a list of photos and their order fields (when
specified) I first select the photos from an album into a query set
named "photos" and then use code somewhat like this (I've removed some
error checking that's not relevant here):

for photo in photos.filter(albumorder__album = album.key):
photo.ordering = photo.albumorder_set.filter(album = album.key)

I've timed this code, and for a large album with 65K photos it takes
several seconds to complete. I presume that the problem is that each
filtration inside the loop needs a separate interaction with the SQL
server.

On the other hand, I can pick up the same information in SQL with a join:

SELECT photos.id, `order`, album FROM photos LEFT JOIN album_order ON album_order.photo = photos.id WHERE album = '00000000' OR album IS NULL;

That takes only 0.27 seconds even including the time needed to output
65K lines to the screen.

So is there a way to get Django to issue this join and collect the
results into an aggregate set? I've read through the docs and
experimented with different filter and double-underscore notations,
without success. It seems silly to do the work with a loop when a
correctly written join can do it so much faster. (And I'll note that if
AlbumOrder had more entries, the loop would take far longer.)

(I've thought about issuing the join directly from my code and reading
from a database cursor, but that seems un-Djangoish.)
--
Geoff Kuenning ge...@cs.hmc.edu http://www.cs.hmc.edu/~geoff/

Software, like bridges, should be elegant and visually pleasing as
well as functional. Ugly constructs, designs, and languages should be
avoided like the plague.

Sanjay Bhangar

unread,
Oct 10, 2016, 4:10:59 AM10/10/16
to django...@googlegroups.com
On Mon, Oct 10, 2016 at 12:56 PM, Geoff Kuenning <ge...@cs.hmc.edu> wrote:


So is there a way to get Django to issue this join and collect the
results into an aggregate set?  I've read through the docs and
experimented with different filter and double-underscore notations,
without success.  It seems silly to do the work with a loop when a
correctly written join can do it so much faster.  (And I'll note that if
AlbumOrder had more entries, the loop would take far longer.)

 
Hi Geoff,

Have you looked at the docs for `select_related` and `prefetch_related`? It sounds like they maybe what you're looking for here:


So your example query:

>     for photo in photos.filter(albumorder__album = album.key):
        photo.ordering = photo.albumorder_set.filter(album = album.key)

Would be something like `for photo in photos.filter(albumorder__album = album.key).select_related('albumorder'): ...`

You need to tell Django which related tables to include in the initial SQL query, and select_related and prefetch_related accomplish this, in subtly different ways (will let you read the docs to figure that out since I haven't quite myself ;-) )

Hope this helps! select_related has gotten me out of many DB optimization issues in the past and now I'll never ship code without carefully examining where all using a `select_related` would significantly speed up queries.

-Sanjay
Reply all
Reply to author
Forward
0 new messages