SUM/GROUP performance and the Primary Key

Heinrich Moser

unread,

Nov 9, 2009, 11:50:22 AM11/9/09

to

Hi!

I have a table "myTable" with "myGuid" (uniqueidentifier), "myValues"
(float), "myGroup" (integer) and a bunch of other fields which are not
important right now. I want to do something as simple as:

SELECT SUM(myValues)
FROM myTable
WHERE myGuid IN (SELECT * FROM ##test)
GROUP BY myGroup

##test is just a temporary table with a single field (guid_filter)
containing a bunch of uniqueidentifiers.

Now here's the strange thing:

- When I create myTable with myGuid as the Primary Key (which seems
like the obvious thing to do), the query is slow (~12s).

- When I create myTable with myAutoInc, an integer auto-increment
field, as the Primary Key, the query is fast (~2s), even though the
WHERE clause still filters by myGuid. (myGuid has just a "normal"
non-clustered index in this scenario.)

Is there any logical explanation to this? My (naive) assumption was
that the first option is faster, since SQL Server can use the guid to
look up myValues rather than having to go through guid -> myAutoInc ->
myValues. So, the result was very surprising for me.

I've attached the SHOWPLAN_TEXT output below.

Greetings,
Heinzi

----

Slow scenario:

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [globalagg1009]=(0) THEN NULL ELSE [globalagg1011] END))
|--Parallelism(Gather Streams)
|--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([globalagg1009]=SUM([partialagg1008]), [globalagg1011]=SUM([partialagg1010])))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([myDB].[dbo].[myTable].[myGroup]), ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC))
|--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([partialagg1008]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [partialagg1010]=SUM([myDB].[dbo].[myTable].[myValues])))
|--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myGuid] ASC))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([myDB].[dbo].[myTable].[myGroup], [myDB].[dbo].[myTable].[myGuid]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([myDB].[dbo].[myTable].[myGuid], [Expr1015]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1014]) OPTIMIZED WITH UNORDERED PREFETCH)
| |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
| |--Index Seek(OBJECT:([myDB].[dbo].[myTable].[myGuid]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[myDB].[dbo].[myTable].[myGuid]) LOOKUP ORDERED FORWARD)

Fast scenario:

|--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [globalagg1009]=(0) THEN NULL ELSE [globalagg1011] END))
|--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([globalagg1009]=SUM([partialagg1008]), [globalagg1011]=SUM([partialagg1010])))
|--Parallelism(Gather Streams, ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC))
|--Stream Aggregate(GROUP BY:([myDB].[dbo].[myTable].[myGroup]) DEFINE:([partialagg1008]=COUNT_BIG([myDB].[dbo].[myTable].[myValues]), [partialagg1010]=SUM([myDB].[dbo].[myTable].[myValues])))
|--Sort(DISTINCT ORDER BY:([myDB].[dbo].[myTable].[myGroup] ASC, [myDB].[dbo].[myTable].[myAutoInc] ASC))
|--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([myDB].[dbo].[myTable].[myGroup], [myDB].[dbo].[myTable].[myAutoInc]))
|--Nested Loops(Inner Join, OUTER REFERENCES:([myDB].[dbo].[myTable].[myAutoInc], [Expr1017]) OPTIMIZED WITH UNORDERED PREFETCH)
|--Nested Loops(Inner Join, OUTER REFERENCES:([tempdb].[dbo].[##test].[guid_filter], [Expr1016]) OPTIMIZED WITH UNORDERED PREFETCH)
| |--Table Scan(OBJECT:([tempdb].[dbo].[##test]))
| |--Index Seek(OBJECT:([myDB].[dbo].[myTable].[myGuid]), SEEK:([myDB].[dbo].[myTable].[myGuid]=[tempdb].[dbo].[##test].[guid_filter]) ORDERED FORWARD)
|--Clustered Index Seek(OBJECT:([myDB].[dbo].[myTable].[PK__myTable__2334397B]), SEEK:([myDB].[dbo].[myTable].[myAutoInc]=[myDB].[dbo].[myTable].[myAutoInc]) LOOKUP ORDERED FORWARD)

Uri Dimant

unread,

Nov 10, 2009, 5:26:42 AM11/10/09

to

Hi
Kimberly says
/*
A GUID that is not sequential generated by the newid() function (in SQL
Server) can be a horribly bad choice - primarily because of the
fragmentation that it creates in the base table but also because of its
size. It's unnecessarily wide (it's 4 times wider than an int-based
identity - which can give you 2 billion (really, 4 billion) unique rows).
*/

http://www.codinghorror.com/blog/archives/000817.html

"Heinrich Moser" <use...@heinzi.at> wrote in message
news:87ljife...@msgid.heinzi.at...

Balaji

unread,

Nov 10, 2009, 5:33:48 AM11/10/09

to

Hi, In addition to Uri's comments Another reason may be because the GUID
data type takes 16 byte storage where else the int takes only 4 bytes this
may lead to increased storage and IO- Regards, Balaji

"Uri Dimant" <ur...@iscar.co.il> wrote in message
news:eoKtnAfY...@TK2MSFTNGP02.phx.gbl...

Heinrich Moser

unread,

Nov 24, 2009, 7:01:53 AM11/24/09

to

Thanks to both of you!

Greetings,
Heinzi