[Feature Request] Performant values_list query

120 views
Skip to first unread message

cristia...@gmail.com

unread,
Nov 15, 2015, 9:10:08 PM11/15/15
to django-d...@googlegroups.com
After some testing I have found out that even when using values_list to prevent massive object creation when fetching data, it is from 2 to 3 times slower tan using directly a cursor.execute query through the django cursor.
The issue started here http://stackoverflow.com/questions/33726004/django-queryset-vs-raw-query-performance when trying to improve some queries that were looking very slow on apache benchemark testing.
It seems like compiler.results_iter, called from ValuesListQuerySet is very, very slow, due to all the for loops in Python code compared to when using a raw query through a C connector (like mysqlclient), there's just too much boilerplate that might be posible to be removed.
As an example, a very ugly work around to those critical queries, was defining a function like this, which would convery a queryset into something usable by a cursor, which performs very, very efficiently, at least with mysqlclient connector.

    q = qs.query.get_compiler(qs.db).as_sql()
    with connection.cursor() as c:
        c.execute(q[0], q[1])
        for r in c:
            yield r
Would it be possible to have something similar to values_list, but that executes directly through a cursor, improving performance? I'm sure it will be less flexible than values_list, but the extra performance will be nice.



Cristiano Coelho

unread,
Nov 16, 2015, 8:02:42 AM11/16/15
to Django developers (Contributions to Django itself)
I would like to add, that minor change on a critical query meant an increase from 37 to 47 requests per second on a benchmark test with apache, that's a 22% improvement that will be bigger on larger results (this was just with 700 rows result set), compared to using some pagination with a limit of 100 that raises the requests per second value to almost 200, there's a clear bottleneck in there.

Looking at the code of values list queryset, maybe is it possible to replace the call of results_iter to directly use execute_sql(MULTI) avoiding all the unnecessary ifs and loops that results_iter does that is mainly useful for other query set objects, but not for values list. I did a test with the above and values list queries were performing as good as using directly a django cursor, but not as good as using as_sql() and then the cursor, but very close.

What would be the issues of adding such and optimization to values list? I'm even thinking of manually monkey patching it so all my values list queries are improved in my project, since I use values list most of the time due to performance reasons since building objects on large results is extremely slow, blame on python sadly.

Anssi Kääriäinen

unread,
Nov 16, 2015, 9:07:16 AM11/16/15
to django-d...@googlegroups.com
I don't think anybody has anything against making values() or
values_list() queries faster. The question is purely about
implementation. If you can offer something that is faster than the
current code without introducing backwards incompatibilities or making
the code a lot more complex, then such a change has a good possibility
to be merged.

You can likely find some conditions when you can fast-path the
results_iter() handling for values_list() querysets. It is certain
that you can't do this in all cases.

If you want to proceed with the implementation, the best way forward
is to just run the full Django test suite against your changes. It
will point out a couple of cases where you'll see why certain things
are done (from_db_value() for example).

Of course, you should work against Django's master branch.

- Anssi
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/7729d524-1449-48a4-92b2-72b866c19e7c%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Cristiano Coelho

unread,
Nov 16, 2015, 10:13:41 AM11/16/15
to Django developers (Contributions to Django itself)
Hi, thanks for the response! I have never developed nor ran the django test suite, I can certainly try as you mentioned, I was hoping for anyone that actually implemented values_list to give me a solid reason to not do any change as I'm probably wrong and the current way it is implemented is the fastest approach.
I guess I can play with the tests for a while. However I believe the tests will need to be ran against all db backends? Installing all databases will be a little complicated.

Tim Graham

unread,
Nov 16, 2015, 11:29:33 AM11/16/15
to Django developers (Contributions to Django itself)
I don't think there is much database backend specific logic as far as values_list() goes, but if you get something working on SQLite and send a pull request, we can easily run it on all database backends.

For performance testing, you might find https://github.com/django/djangobench/ useful.

Marc Tamlyn

unread,
Nov 16, 2015, 11:37:14 AM11/16/15
to django-d...@googlegroups.com
To give some brief historical context: values_list has been around a LONG time, so I wouldn't say anyone has consciously designed it with performance in mind above "it's faster than complete model loading".

The most noticeable thing which your code does not do which an implementation of values list would need to do is interpreting everything as the correct data type. It's possible though that you can optimise cases where no conversions are needed (which is the majority case with postgres). The code which implements db converters for normal ORM queries attempts to do this, I'm not sure how similar the code paths for values_list are.

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Cristiano Coelho

unread,
Nov 16, 2015, 12:12:11 PM11/16/15
to Django developers (Contributions to Django itself)
Interesting, maybe because I'm using MySQL with mysqlclient connector, but running the straight query with the django cursor wrapper always returns the correct data types, even dates with its it time zone when time zone is enabled, was it all coincidence? Would using a different backend break with a cursor query returning invalid/not expected data? Seems like a lot to test

Cristiano Coelho

unread,
Nov 16, 2015, 7:23:24 PM11/16/15
to Django developers (Contributions to Django itself)
Hi, I have downloaded the actual source code, and I have probably forgot to mention that I'm using django 1.7.10.
It seems like the compiler.py module got a little bit improved since then, what used to be a big and highly inefficient loop with many if conditions inside was reduced to a small loop with a single if (it could be improved even more, taking the if out of the loop and doing two separate loops based on the if condition, avoiding checking the same if condition over and over when you already know its value, which is very slow on python since there's no JIT optimizing this kind of loops.).
On another side, it also seems like ValuesListQuerySet was changed to ValuesListIterable, but the functionallity remains the same.
So I should probably do the tests with this version to see if the call to results_iter is still a big deal for a values_list query.

I really wish that values_list (and values?) would be used as a high performance query option with optimizations in mind, rather than just the implicit improvements when returning a list of tuples rather than model instances which are very expensive for large queries.

Josh Smeaton

unread,
Nov 16, 2015, 7:30:35 PM11/16/15
to Django developers (Contributions to Django itself)
The version of Django you use is going to have a large (code) impact on what is actually happening when calling values_list. The Values[List]QuerySet classes are gone in 1.9. 1.8 implemented a different/better way of converting values from the database to python. from_db_value came about in 1.8 I think which should fast path a lot of conversions. The stackoverflow post you linked to mentions Django 1.7. Can you run exactly the same tests using Django Master/1.9 and report back your findings? 

I don't doubt there's room for performance improvements if you go looking. As Anssi said, we'd definitely welcome improvements to performance where they can be found. But you should make sure the kinds of changes you want to make will have an impact when using the latest version of Django, because some of the low hanging fruit may have already been patched.

Cheers

Cristiano Coelho

unread,
Nov 16, 2015, 7:36:37 PM11/16/15
to Django developers (Contributions to Django itself)
You beat me to it. Yes I have just tested under django 1.8.6, and the issue I started this with is gone, both values_list and a direct raw query perform as good, so this is definetly an issue only on django 1.7.10 or less.
I can not test django 1.9 since my project is not really compatible with it, I have some backwards compatibility issues from 1.7.10 related to app loading that I was very lazy to fix so I stood with 1.7, but it seems like this will be time to update, assuming 1.9 will be as good as 1.8.

Sorry about all the fuzz, I guess I should have tested all this directly to 1.8 before making any post, but I apreciate the fast responses!
Perhaps it would be a good idea to warn about values_list bottleneck on 1.7!

Tim Graham

unread,
Nov 16, 2015, 7:47:56 PM11/16/15
to Django developers (Contributions to Django itself)
We don't patch the docs for versions of Django older than the latest stable release except in critical cases. For one, this causes problems for the translated versions of the docs because there's no process for updating those older versions. Also, Django 1.7 will be unsupported in a couple weeks upon the release of Django 1.9.

Josh Smeaton

unread,
Nov 16, 2015, 7:51:04 PM11/16/15
to Django developers (Contributions to Django itself)
Hey,

Nice to see most of the performance differences are resolved. Most of the heavy lifting will actually be compiling the SQL that is going to be executed now I think. You might want to take a look at generating the SQL yourself and executing directly with connection.cursor if you're still hitting critical performance bottlenecks. If you do go down this route please make sure you're testing before and after, because the results could change significantly.

I'm also going to take your suggestion about the inner if condition and see how that impacts performance. I'll report back here for those curious, and open a PR if it makes any noticeable difference.

Thanks for looking into this and showing your results too, very helpful!

Josh Smeaton

unread,
Nov 16, 2015, 9:48:49 PM11/16/15
to Django developers (Contributions to Django itself)
I finally got djangobench running, but I find the results too unpredictable. I resorted to some simple timeit measurements which I've shown here: https://gist.github.com/jarshwah/f28d84987aadc7bcb2f5

Result.. moving the if outside of the loop has about a 15% decrease in time taken for that block of code when there are no converters. Approx 6% when there are converters (the additional conversions take up the majority of time). I don't think making this change is going to be worth doing, but it could be a nice thing to batch up into a general micro-optimisation patch if we see some good numbers alongside said patch.

I'll leave it there for now personally. Welcome further investigation if someone would like to run with it :)

cristia...@gmail.com

unread,
Nov 16, 2015, 9:58:18 PM11/16/15
to django-d...@googlegroups.com
Nice you tested it out! You are right about that kind of optimization is really a micro optimization and would only help if there were a bunch of those related.
--
You received this message because you are subscribed to a topic in the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-developers/z4uPP8-A2-g/unsubscribe.
To unsubscribe from this group and all its topics, send an email to django-develop...@googlegroups.com.

To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages