Hi All,
Background to this post is available at
http://code.djangoproject.com/ticket/15361
I've created a better benchmark in order to test where the change in
the above ticket causes a performance regression. These are the
results of those tests.
First of all, the results are based on query.get() on 20000 Book
objects, generated by the following code [1]
I ran the following 2 benchmarks query_get [2] and query_get_multiple
[3] on MySQL and sqlite. query_get is a simple get() using a unique
indexed column and will return one object only. query_get_multiple is
a get() on an non-indexed column, it will return 1284 objects.
The benchmark results, as reported by djangobench [4] are as follows:
sqlite:
-----------------------------------
Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)
Running 'query_get' benchmark ...
Min: 0.000000 -> 0.000000: incomparable (one result was zero)
Avg: 0.000745 -> 0.000979: 1.3141x slower
Significant (t=-5.900348)
Stddev: 0.00263 -> 0.00297: 1.1318x larger (N = 10000)
Running 'query_get_multiple' benchmark ...
Min: 0.020000 -> 0.000000: incomparable (one result was zero)
Avg: 0.029883 -> 0.001072: 27.8759x faster
Significant (t=482.259535)
Stddev: 0.00511 -> 0.00309: 1.6519x smaller (N = 10000)
-----------------------------------
mysql
-----------------------------------
Running all benchmarks
Control: Django 1.3 beta 1 (in django-control)
Experiment: Django 1.3 beta 1 (in django-experiment)
Running 'query_get' benchmark ...
Min: 0.000000 -> 0.000000: incomparable (one result was zero)
Avg: 0.000810 -> 0.001039: 1.2827x slower
Significant (t=-5.591014)
Stddev: 0.00273 -> 0.00305: 1.1169x larger (N = 10000)
Running 'query_get_multiple' benchmark ...
Min: 0.020000 -> 0.000000: incomparable (one result was zero)
Avg: 0.028856 -> 0.001152: 25.0486x faster
Significant (t=429.695948)
Stddev: 0.00560 -> 0.00319: 1.7544x smaller (N = 10000)
-----------------------------------
As you can see, with 10000 trials, the speed differences are
significant (and repeatable) as are roughly as follows:
query.get() on a unique indexed column runs 1.3x slower
query.get() on an non-unique, non-index column runs 25-27x FASTER
I've done some very quick tests to see how these gains/losses scale
for various values of n objects:
The speedup for the non-indexed columns is exponential to n.
The slowdown for the indexed columns is roughly constant for all n.
So there you have it, we have a small regression in performance for
the most common case use, and a huge potential gain for the less used
(and some would argue, badly designed) query.
What do you think, is the gain worth the hit? Is it possible to have 2
different code paths depended on what column(s) the query is filtering
on?
-- Martin
P.S Between each trial, djangobench will try and reload the
initial_data.json fixture, which for 20000 objects is very time
consuming. If you want to reproduce the results yourself, I'd suggest
creating a database with the objects already in it, instead of relying
on fixtures.
----
[1] object generation script --
http://pastebin.com/6JAJDA6f
[2] query_get benchmark --
http://pastebin.com/qZBdvSie
[3] query_get_multiple benchmark --
http://pastebin.com/iEYsfmd5
[4] djangobench project (Luke's fork) --
https://github.com/spookylukey/djangobench