Query Optimization (Huge Data in the Database)

56 views
Skip to first unread message

karthik challa

unread,
Jul 18, 2020, 11:39:01 PM7/18/20
to Django users
Hi Experts,

I am trying to execute the below query and the query is taking more than 5 minutes.

Here are the details

Model.py

class Url(models.Model):
 subdomain
= models.ForeignKey(Subdomain, null=True, blank=True, related_name='url_subdomain', on_delete=models.SET_NULL,db_index=True)
 full_url
= models.CharField(max_length=1000, unique=True, db_index=True)
 
class Meta:
 ordering
= ['full_url']

 
def __str__(self):
 
return self.full_url

1>Django query ORM 

subdomains = Subdomain.objects.all().annotate(numItems=Count('url_subdomain')).order_by('name')

2>SQL Query 

SELECT "urls_subdomain"."id", "urls_subdomain"."created_at", "urls_subdomain"."name", COUNT("urls_url"."id") AS "numItems" FROM "urls_subdomain" LEFT OUTER JOIN "urls_url" ON ("urls_subdomain"."id" = "urls_url"."subdomain_id") GROUP BY "urls_subdomain"."id" ORDER BY "urls_subdomain"."name" ASC

3>Templates

{% if subdomains %}
     
<div id="step-filter-tag" class="mb4">
         
<div class="b mb3">Subdomain</div>
             <div class="overflow-auto" style="max-height: 16rem;">
                 {% for item in subdomains %}
                     item
                 {% endfor %}
             </
div>
     
</div>

 
{% endif %}

Please let me know how can i optimize the preformance Thanks & Regards,
Karthik

Damanjeet Singh

unread,
Jul 19, 2020, 3:32:02 AM7/19/20
to django...@googlegroups.com
Hello ,

Prefetch_related can be helpful, see doc 



Regards,
Damanjeet



--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/83c9f9a3-1a14-4729-b2cb-3abeb06a8b2fo%40googlegroups.com.

Roger Gammans

unread,
Jul 19, 2020, 7:01:28 AM7/19/20
to django...@googlegroups.com
Hi

Just a side issue I spotted here, but I'm probably missing something, so I'm seeking to understand.

Is this the SQL the django query builder made? If so which database backend and django version are you using?

On Sun, 2020-07-19 at 08:30 +0100, Damanjeet Singh wrote:
2>SQL Query 

SELECT "urls_subdomain"."id", "urls_subdomain"."created_at", "urls_subdomain"."name", COUNT("urls_url"."id") AS "numItems" FROM "urls_subdomain" LEFT OUTER JOIN "urls_url" ON ("urls_subdomain"."id" = "urls_url"."subdomain_id") GROUP BY "urls_subdomain"."id" ORDER BY "urls_subdomain"."name" ASC

I only ask because normally in SQL the rule is "However, each table or view column in any nonaggregate expression in the <select> list must be included in the GROUP BY list" (quote from MS TSQL docs which was just the first hit my google search found) . And the name and create_at columns are missing in the GROUP BY clause in the quoted SQL statement.

It seems unlikely that Django is generating invalid SQL as we would have seen more issues (although the intent of the SQL expression above is clear, and I have met engines which can execute it fine)

TTFN
-- 
Roger Gammans <rgam...@gammascience.co.uk>

karthik challa

unread,
Jul 19, 2020, 10:48:46 AM7/19/20
to Django users
Hi Roger,

I have fetched this query from the Django Debug tool bar. I am using POSTGRES database and Django 2.2.3 version.

I am able to fetch the records which i require but the issue is with the performance the django query set or the query which i am making.


Thanks & Regards,
karthik

karthik challa

unread,
Jul 19, 2020, 10:49:04 AM7/19/20
to Django users
Thank you Damanjeet
To unsubscribe from this group and stop receiving emails from it, send an email to django...@googlegroups.com.

Roger Gammans

unread,
Jul 19, 2020, 12:23:00 PM7/19/20
to django...@googlegroups.com
Hi

I do appreciate what I was asking about wasn't your problem, I was looking at learning for myself etc.

For those that are following this, it seems that Postgres (as does MySQL) loosens the SQL requirements. In Postgres ( https://www.postgresql.org/docs/11/sql-select.html#SQL-GROUPBY ) the rule is:

When GROUP BY is present, or any aggregate functions are present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or when the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column.

Which is what we are seeing here. In django the postgres backend has a feature set so that the django SQL compiler knows that postgres can accepts simplified GROUP BY clauses. 

Actually I've simplified this a little as there are two seperate bits ( allows_group_by_pk, andself.allows_group_by_selected_pks) for different type of simplification, and a backend helper function.

Thank you for that fascinating excursion.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages