I'm trying to work out if Ticket #5030 is good to go or not.
Basically, when save() is called on a model, django does a SELECT
COUNT(*)... query to check that the primary key is good.
However, SELECT 1... is apparently quite a bit faster in many
circumstances, and we should use it if we can. It looks like postgres,
mysql, and sqlite all handle this nicely, but we're not sure about SQL
Server and Oracle. If anyone knows, or can apply the patch and check,
please let me know.
Thanks,
Simon
On 8/12/07, Simon Greenhill <d...@simon.net.nz> wrote:
> I'm trying to work out if Ticket #5030 is good to go or not.
> Basically, when save() is called on a model, django does a SELECT
> COUNT(*)... query to check that the primary key is good.
>
> However, SELECT 1... is apparently quite a bit faster in many
> circumstances, and we should use it if we can. It looks like postgres,
> mysql, and sqlite all handle this nicely, but we're not sure about SQL
> Server and Oracle. If anyone knows, or can apply the patch and check,
> please let me know.
Oracle is OK with this. Here are the execution plans for the two methods:
CREATE TABLE ABC ( ID NUMBER(10) NOT NULL, BLAH VARCHAR2(30) );
ALTER TABLE ABC ADD CONSTRAINT PK_ABC PRIMARY KEY (ID);
INSERT INTO ABC VALUES (1,'ZZZ') ;
INSERT INTO ABC VALUES (2,'WWW') ;
SELECT COUNT(*) FROM ABC WHERE ID = 2;
SELECT STATEMENT Optimizer Mode=CHOOSE
SORT AGGREGATE
INDEX UNIQUE SCAN V71MIKEC.PK_ABC
SELECT 1 FROM ABC WHERE ID = 2;
SELECT STATEMENT Optimizer Mode=CHOOSE
INDEX UNIQUE SCAN V71MIKEC.PK_ABC
You'll see that the "SELECT 1" doesn't do the "SORT AGGREGATE", which
shouldn't be all that expensive on a single row, but I suppose any
weight off of the RDBMS is a good thing.
Michael
MSSQL is also okay with this. Using the same setup as Michael's:
CREATE TABLE ABC ( ID INT NOT NULL, BLAH VARCHAR(30) );
ALTER TABLE ABC ADD CONSTRAINT PK_ABC PRIMARY KEY (ID);
INSERT INTO ABC VALUES (1,'ZZZ') ;
INSERT INTO ABC VALUES (2,'WWW') ;
SELECT COUNT(*) FROM ABC WHERE ID = 2;
The estimated execution plan for this is:
SELECT -> Compute Scalar -> Stream Aggregate -> ABC.PK_ABC (Clustered
Index Seek)
SELECT 1 FROM ABC WHERE ID = 2;
The estimated execution plan for this is:
SELECT -> ABC.PK_ABC (Clustered Index Seek)
Gives the right results in both cases. SELECT 1 definitely looks like a
faster execution plan.
Thanks,
George
--Simon
-Mike
1: http://code.djangoproject.com/ticket/2430
2: http://code.djangoproject.com/ticket/4399
>
> If you're interested in performance, would it not also be wise to
> issue a LIMIT 1 at the end of the query if possible? I believe SELECT
> 1 will just return N 1s and the RDBMS will not stop searching simply
> because it found it. There are also some tickets to make checking the
> existence of a filter result faster [1],[2].
And there's still the EXISTS function in sql ;-)
Michael
--
noris network AG - Deutschherrnstraße 15-19 - D-90429 Nürnberg -
Tel +49-911-9352-0 - Fax +49-911-9352-100
http://www.noris.de - The IT-Outsourcing Company
Vorstand: Ingo Kraupa (Vorsitzender), Joachim Astel, Hansjochen Klenk -
Vorsitzender des Aufsichtsrats: Stefan Schnabel - AG Nürnberg HRB 17689
I've checked in that patch in
http://code.djangoproject.com/changeset/5882 -- thanks to Simon for
bringing this up, and thanks to all the folks who verified that the
patch worked on the various database backends.
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com
That won't work, we have to back it out.
The patch re-introduces "LIMIT 1". LIMIT is non-standard SQL and doesn't
exist on Oracle (which was the reason for removing it in the first
place). We can't do this.
Regards,
Malcolm
--
Always try to be modest and be proud of it!
http://www.pointy-stick.com/blog/
Looking more carefully at it, just dropping the "LIMIT 1" is probably
the right thing. We are matching a primary key column. By definition, it
is unique. The database server knows that and it will be using an index
match in any case, so we aren't doing an O(n) scan of the table rows or
anything like that, unless the server implementation is broken.
That will be neater than dropping an Oracle-specific branch of execution
in there (we've managed to avoid that in base.py so far).
I'll fix it later today.
Regards,
Malcolm
--
Despite the cost of living, have you noticed how popular it remains?
http://www.pointy-stick.com/blog/
Aha, Oracle -- I forgot about Oracle. I've backed out the "LIMIT 1" in
changeset [5883]. Thanks for pointing this out!