Many to Many...so many queries

95 views
Skip to first unread message

TheIvIaxx

unread,
Mar 16, 2010, 11:24:53 PM3/16/10
to Django users
Hello all, i have a question about a certain query i have. Here is my
model setup:

class Term():
term = CharField()

class Image():
image = FileField()
terms = ForeignKey(Term)

These have been abbreviated for simiplicity, ut you get the gist of
it. Anyhow i have to query for a few hundred Image objects, then get
a list of Term objects for each of these. Really i just need the IDs
of the Terms. Currently i have my query like this:

images = Image.objects.all()

responseImages = []
for i in images:
terms = [term.id for term in n.terms.all()]
responseObjects.append({'image': n, 'terms': terms})

Am i losing some efficiency here? Seems like a fairly common
operation, but I think each of the list comprehensions is a db hit.
on ~300 objects, thats a lot of queries.

Any advice on this one?

Thanks

aditya

unread,
Mar 16, 2010, 11:32:15 PM3/16/10
to Django users
Is there any way you could reduce the # of images to return? Another
thing you could do is cache this info so you don't have to do it
multiple times.

Aditya

TheIvIaxx

unread,
Mar 16, 2010, 11:37:47 PM3/16/10
to Django users
i suppose i could, but that will be a last resort :) What about
dropping down to raw SQL just to get the IDs:

SELECT term_id FROM image_term WHERE image_id = %i

or is that discouraged? Can it be done with the ORM?

aditya

unread,
Mar 17, 2010, 12:00:03 AM3/17/10
to Django users
You can, and I'd be interested to know how much of a speedup that
gives. Here's the relevant page on writing raw SQL:
http://docs.djangoproject.com/en/dev/topics/db/sql/

If you know SQL (and it looks like you do) it should be familiar
territory. The interesting bits are deferring fields, and adding
annotations...maybe those could help speed up your query?


Aditya

Kenneth Gonsalves

unread,
Mar 17, 2010, 12:05:56 AM3/17/10
to django...@googlegroups.com
On Wednesday 17 Mar 2010 9:07:47 am TheIvIaxx wrote:
> i suppose i could, but that will be a last resort :) What about
> dropping down to raw SQL just to get the IDs:
>
> SELECT term_id FROM image_term WHERE image_id = %i
>
> or is that discouraged? Can it be done with the ORM?
>

look at 'get_values' and 'flat'
--
regards
Kenneth Gonsalves
Senior Associate
NRC-FOSS
http://certificate.nrcfoss.au-kbc.org.in

bruno desthuilliers

unread,
Mar 17, 2010, 4:38:21 AM3/17/10
to Django users
On Mar 17, 4:24 am, TheIvIaxx <theivi...@gmail.com> wrote:
> Hello all, i have a question about a certain query i have.  Here is my
> model setup:
>
> class Term():
>     term = CharField()
>
> class Image():
>     image = FileField()
>     terms = ForeignKey(Term)
>
> These have been abbreviated for simiplicity, ut you get the gist of
> it.  Anyhow i have to query for a few hundred Image objects, then get
> a list of Term objects for each of these.  Really i just need the IDs
> of the Terms.  Currently i have my query like this:
>
> images = Image.objects.all()

you can use 'select_related' here - it'll use a join to prefetch
related Term objects:

images = Image.objects.select_related('terms').all()

>
> responseImages = []
> for i in images:
>     terms = [term.id for term in n.terms.all()]
>     responseObjects.append({'image': n, 'terms': terms})

I guess this is not your real code !-)

I don't know what this 'responseObjects' is - , but if you use Django
templates, you just don't need all this above code. Just pass 'images'
in the template's context and you'll be fine:

<ul>
{% for image in images %}
<li>
<h3>{{ image.title }}</h3>
<ul class="terms">
{% for term in image.terms.all %}
<li>{{ term.id }}</li>
{% endfor %}
</ul>
</li>
{% endfor %}
</ul>


HTH

bruno desthuilliers

unread,
Mar 17, 2010, 4:41:52 AM3/17/10
to Django users
On Mar 17, 4:37 am, TheIvIaxx <theivi...@gmail.com> wrote:
> i suppose i could, but that will be a last resort :)  What about
> dropping down to raw SQL just to get the IDs:
>
> SELECT term_id FROM image_term WHERE image_id = %i
>
> or is that discouraged?  Can it be done with the ORM?

You'd have the exact same result using the ORM:

ids = image.terms.values_list('id', flat=True)

but this won't solve the real problem of doing images.count() queries
- which is what select_related is for.

TheIvIaxx

unread,
Mar 17, 2010, 1:55:08 PM3/17/10
to Django users
I made a mistake in my model definitions above. The Term field on
Image is a ManyToMany() not ForeignKey().

Anyhow I did look into value_list, however it didn't add much, if any,
performance gain. But the select_related did! That was exactly what
I needed. Thanks Bruno for the tip.


On Mar 17, 1:38 am, bruno desthuilliers

koenb

unread,
Mar 18, 2010, 3:42:07 AM3/18/10
to Django users
Take a look at something like django-selectreverse [1] for inspiration
on how to reduce your querycount for M2M relations.

Koen

[1] http://code.google.com/p/django-selectreverse/

bruno desthuilliers

unread,
Mar 18, 2010, 5:08:04 AM3/18/10
to Django users

On Mar 17, 6:55 pm, TheIvIaxx <theivi...@gmail.com> wrote:
> I made a mistake in my model definitions above.

And a couple errors in the "view" code snippet too FWIW !-)

>  The Term field on
> Image is a ManyToMany() not ForeignKey().
>
> Anyhow I did look into value_list, however it didn't add much, if any,
> performance gain.

value_list is mostly useful to avoid loading full model instances when
you only want a couple attributes. You won't probably notice the gain
on a 300 items set on your dev server, but when it comes to queries
returning thousand times more items on an production server, it can
really make a difference. But anyway - I mostly mentioned this because
it does exactly the same thing as your raw SQL query.

bruno desthuilliers

unread,
Mar 18, 2010, 5:12:07 AM3/18/10
to Django users

On Mar 18, 8:42 am, koenb <koen.bierm...@werk.belgie.be> wrote:
> Take a look at something like django-selectreverse [1] for inspiration
> on how to reduce your querycount for M2M relations.
>
> Koen
>
> [1]http://code.google.com/p/django-selectreverse/
>

I must be missing the point, but it looks seriously like reinventing
the wheel ? How is it better than using select_related ???

koenb

unread,
Mar 18, 2010, 2:15:52 PM3/18/10
to Django users
On 18 mrt, 10:12, bruno desthuilliers <bruno.desthuilli...@gmail.com>
wrote:

Select_related only populates forward one-to-many relationships
(foreignkeys).
It does not work for many-to-many relationships or reverse one-to-many
relationships (the XX_set descriptors on the other side of fk's).
You can not load those in one and the same query, but you can load all
those objects in one single query instead of N extra queries.
That is what selectreverse helps you to do. Look at the example on the
introduction page of the project to see what kind of queries I mean.

Koen

Sam Walters

unread,
Mar 19, 2010, 8:47:12 AM3/19/10
to django...@googlegroups.com
Yes, this conversation hits the nail on the head.

'select related' only works with forward facing keys.

http://code.google.com/p/django-selectreverse/ is a resonably good
piece of code to improve performance:

using a structure like:

class model1(model.Models) 1<----* class model2(model.Models)
model1
= models.foreignkey(model1)
1<----* class
model3(model.Models)
model1
= models.foreignkey(model1)
etc...

where your view/template code is:

for x in model1.objects.all()
#stuff
for y in x.model2_set.all()
#stuff
for y in x.model3_set.all()
#stuff

Gets awfully slow as you add modelX_set.all or scale it up to thousands of rows.

eg:
where model1 is 945 rows in the sql db:

time taken: 1.023
SQL queries: 1883

using select related:

time taken: 0.493
SQL queries: 21

I have not tried select related performance where there may be another
model with a reverse foreignkey facing model2.
I don't know how well it will face a multi level reverse foreign
key/many to many db structure.

Also don't forget to look at indexing for certain fields where
appropriate it can be a good performance improver:

http://docs.djangoproject.com/en/1.1/ref/models/fields/#db-index

cheers
sam_w

> --
> You received this message because you are subscribed to the Google Groups "Django users" group.
> To post to this group, send email to django...@googlegroups.com.
> To unsubscribe from this group, send email to django-users...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-users?hl=en.
>
>

TheIvIaxx

unread,
Mar 23, 2010, 1:33:45 PM3/23/10
to Django users
as far as I could tell, select_related did not follow m2m
relationships, only FK.

On Mar 18, 2:12 am, bruno desthuilliers

TheIvIaxx

unread,
Mar 23, 2010, 5:46:52 PM3/23/10
to Django users
yes, this util helped a lot! 300 down to 20 i think. I had a SQL
query ready, but then you lose all the ORM goodness.

I would think this should be part of the ORM, no?

In the SQL, I just had to add another join to the query generated by
the select_related() method. The problem is you get duplicates back,
in that the image object rows are the same except for a term id:

----------------------------------------
| ID | FILE | TERM ID |
----------------------------------------
| 1 | image.jpg | 1 |
| 1 | image.jpg | 2 |
----------------------------------------

Then you would just have to make a list out of the TERM IDs.

Anyhow, problem solved. Thanks!

Reply all
Reply to author
Forward
0 new messages