Hi everyone,
Happy New Year!!
Ok
so I found that Querset.get() is very slow for large datasets when multiple objects exists in very big numbers. I did following
changes in my local copy of django code and it improved the performance for very large
data sets significantly (like in a blink of second). Didn't had any
obvious effects for a table with like 10K records or so. I don't have proper stats to prove the performance.
So what was the issue?
Queryset.get() raises two exceptions
1. DoesNotExist
2. MultipleObjectsFound
In
case Multiple objects are found, Querset.get() raises an error with how
many objects are found. To do this it was evaluating query to find
length by iterating over the queryset which was creating a bottle-neck. For small datasets this was not abovious but for large datasets with more than 1 million recors this was slow.
So Instead I tried changing the method of counting using the Queryset.count(). If
count == 1, only then evaluated the query by calling
Querset._fetch_all(). The results were much than before.
So do you think this is right way? Should I raise a pr for the patch?
diff --git a/django/db/models/query.py b/django/db/models/query.py
index 38c1358..e442384 100644
--- a/django/db/models/query.py
+++ b/django/db/models/query.py
@@ -420,8 +420,9 @@ class QuerySet:
if not clone.query.select_for_update or connections[clone.db].features.supports_select_for_update_with_limit:
limit = MAX_GET_RESULTS
clone.query.set_limits(high=limit)
- num = len(clone)
+ num = clone.count()
if num == 1:
+ clone._fetch_all()
return clone._result_cache[0]
if not num:
raise self.model.DoesNotExist(
Thanks
Anudeep Samaiya