Optimal query for related names in onetomany or manytomany using Django Queryset

49 views
Skip to first unread message

Web Architect

unread,
Feb 2, 2018, 8:47:45 AM2/2/18
to Django users
Hi,

I am trying to optimise Django queries on my ecommerce website. One of the fundamental query is where I have no clue how to make efficient. It could be trivial and probably been known long time back. But I am new to Django and would appreciate any help. This is primarily for one to many or many to many relations.

Following is an example scenario:
(Please pardon my syntax as I want to put across the concept and not the exact django code unless it's really needed):

Model A:

class A(models.Model):
    # Fields of model A

Model B (which is related to A with foreign key):

class B(models.Model):
    a = models.ForeignKey('A', related_name='bs')

Now I would like to find out all As for which there is atleast one b. The only way I know is as follows:

A.objects.filter(bs__isnull=False)

But the above isn't an optimal way as with large of records in A and B, the above takes lot of time. It gets more inefficient if it's a many to many relationship.

Could anyone please let me know the most efficient way to use django queryset for the above scenario?

Thanks.

Andy

unread,
Feb 2, 2018, 9:20:53 AM2/2/18
to Django users
a) Maybe its an option to put the foreign key to the other model? This way you dont need to make a join to find out if there is a relation.

b) Save the existing ralation status to model A

c) cache the A.objects.filter(bs__isnull=False) query

But apart from that i fear you cannot do much more, since this is just a DB and not a Django ORM question.

Web Architect

unread,
Feb 2, 2018, 9:28:26 AM2/2/18
to Django users
Hi Andy,

Thanks for your response. I was pondering on option a before posting this query thinking there could be better ways in django/SQL to handle this. But now I would probably go with a.

Thanks.

Andy

unread,
Feb 2, 2018, 9:31:13 AM2/2/18
to Django users
not that i know of

Vijay Khemlani

unread,
Feb 2, 2018, 1:58:38 PM2/2/18
to django...@googlegroups.com
"with large of records in A and B, the above takes lot of time"

How long? At first glance it doesn't look like a complex query or something particularly inefficient for a DB.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/73ed5ff7-d4db-4057-a812-01c82bf08cf3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Furbee

unread,
Feb 2, 2018, 2:36:48 PM2/2/18
to django...@googlegroups.com
There are a couple options you could try to see which is the best fit for your data. With DEBUG=True in settings.py you can check the actual queries and process time. It depends on the sizes of A and B. Another query you can run is:

A.objects.exclude(id__in=B.objects.all().values_list('a_id', flat=True))

When I tried, it seemed to be about the same speed with my test data as the one you had A.objects.filter(bs__isnull=True).

To see what queries are generated and the query time with DEBUG=True:
Open your Django Python Shell
>>> A.objects.exclude(id__in=B.objects.all().values_list('a_id', flat=True))
>>> A.objects.filter(bs__isnull=True)
>>> from django.db import connection
>>> for q in connection.queries:
>>>     print("{0}: {1}".format(q['sql'], q['time']))

This will show you both queries generated and how long it took to get a response from your DB.

You can also write raw SQL, if you can make one more efficiently.

Furbee

Web Architect

unread,
Feb 3, 2018, 1:10:38 AM2/3/18
to Django users
Hi Vijay,

Thanks for your response.

In my scenario, there is also another model with manytomany relation with A:

class C(models.Model):
    a = manytomany('A', related_name='cs', through='D')

so, for around 25K records in A, 45K records in D and 18K records in B, following query takes more than 300ms and sometimes more than 500ms:

A.objects.filter(bs__isnull=False, cs__isnull=False).

Thanks.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Web Architect

unread,
Feb 3, 2018, 1:14:37 AM2/3/18
to Django users
Hi Furbee,

Thanks for your response. 

With my experience I have always noticed that a query within query kills the mysql and Mysqld CPU usage hits the ceiling. I would still check your alternate. 

I have mentioned the size of A and B in response to Vijay's reply. 
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

Vijay Khemlani

unread,
Feb 3, 2018, 6:13:26 AM2/3/18
to django...@googlegroups.com
Well, you should've said that in the first post.

First I would try with a saner DB (Postgres)

Also I don't think 300 ms is particularly bad, but in that case start looking into caching alternatives (e.g. memcached) or a search index (e.g. ElasticSearch) 

To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.

Furbee

unread,
Feb 4, 2018, 12:02:20 AM2/4/18
to django...@googlegroups.com
You can set up an index on multiple field, as well, so if you’re searching for As without a reference from B or C, using the index_together operative in the class Meta for that model. I’m not completely sure, but I think this may speed up you query time.

Thanks,

Furbee

Web Architect

unread,
Feb 7, 2018, 12:39:26 AM2/7/18
to Django users
Hi Furbee,

Thanks for the suggestion. Would look into it. 

Thanks. 

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
Reply all
Reply to author
Forward
0 new messages