How to make SQL Server choose a smarter query plan?

Heinrich Moser

unread,

Mar 17, 2009, 12:25:29 PM3/17/09

to

Hi!

We're having the general problem that -- in one customer's database --
the performance of the queries varies widely. I've found one example
which is easy to explain and which can be reproduced reliably:

SQL A: SELECT * FROM table WHERE a = 1234 AND b <> 0

returns one record (by coincidence, for this particular choice of
field "a"; it could be more) and is very fast (< 1s). On the other
hand

SQL B: SELECT TOP 1 * FROM table WHERE a = 1234 AND b <> 0
ORDER BY primaryKeyOfTable DESC

returns the same record, but is very slow (~ 17s).

Apparently, SQL Server is choosing a very bad query plan here: SQL
Server is doing "smart stuff", like Index Seek, Hash Match, etc. for
SQL A, but for SQL B it's just making a slow Clustered Index Scan (the
table has ~600.000 records, BTW). Changing "WHERE a = 1234" to "WHERE
a = 1235" causes it to use a much better (and much faster) query plan.
This lead me to believe that it's a problem with outdated statistics.
However, neither "UPDATE STATISTICS table" nor "UPDATE STATISTICS
table WITH FULLSCAN" improve the situation.

Of course, the obvious question is: What *can* I do to ensure that SQL
Server chooses a better query plan? (And please don't tell me to
hard-code query hints into hundreds of SQL queries or to do a manual
"TOP 1" in business logic on the client...)

Greetings,
Heinzi

PS: This is a simplified example. We don't use "SELECT *". It also
means that I'm more interested in gerneric ways to improve query
performance rather than in optimizing this simple toy example.

PPS: Of course, fields "a" and "b" are both indexed.

PPPS: Removing the "TOP 1" in SQL B also makes the query very fast (<
1s), obviously with the same result set.

Erland Sommarskog

unread,

Mar 17, 2009, 7:20:42 PM3/17/09

to

Heinrich Moser (use...@heinzi.at) writes:
> We're having the general problem that -- in one customer's database --
> the performance of the queries varies widely. I've found one example
> which is easy to explain and which can be reproduced reliably:
>
> SQL A: SELECT * FROM table WHERE a = 1234 AND b <> 0
>
> returns one record (by coincidence, for this particular choice of
> field "a"; it could be more) and is very fast (< 1s). On the other
> hand
>
> SQL B: SELECT TOP 1 * FROM table WHERE a = 1234 AND b <> 0
> ORDER BY primaryKeyOfTable DESC
>
> returns the same record, but is very slow (~ 17s).
>
> Apparently, SQL Server is choosing a very bad query plan here: SQL
> Server is doing "smart stuff", like Index Seek, Hash Match, etc. for
> SQL A, but for SQL B it's just making a slow Clustered Index Scan (the
> table has ~600.000 records, BTW). Changing "WHERE a = 1234" to "WHERE
> a = 1235" causes it to use a much better (and much faster) query plan.
> This lead me to believe that it's a problem with outdated statistics.
> However, neither "UPDATE STATISTICS table" nor "UPDATE STATISTICS
> table WITH FULLSCAN" improve the situation.

Did you look at the output from DBCC SHOW_STATIISTCS? Seems like 1234
and 1235 are in different buckets in the histogram.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

Uri Dimant

unread,

Mar 18, 2009, 2:43:34 AM3/18/09

to

Hi
Why do you think that these query should perform identical ? What if you
have a CI with DESC order, does it run faster?

"Heinrich Moser" <use...@heinzi.at> wrote in message
news:87ljr4u...@msgid.heinzi.at...

Heinrich Moser

unread,

Mar 20, 2009, 11:31:47 AM3/20/09

to

Hi!

Erland Sommarskog <esq...@sommarskog.se> writes:
> Heinrich Moser (use...@heinzi.at) writes:
>> We're having the general problem that -- in one customer's database --
>> the performance of the queries varies widely. I've found one example
>> which is easy to explain and which can be reproduced reliably:
>>
>> SQL A: SELECT * FROM table WHERE a = 1234 AND b <> 0
>>
>> returns one record (by coincidence, for this particular choice of
>> field "a"; it could be more) and is very fast (< 1s). On the other
>> hand
>>
>> SQL B: SELECT TOP 1 * FROM table WHERE a = 1234 AND b <> 0
>> ORDER BY primaryKeyOfTable DESC
>>
>> returns the same record, but is very slow (~ 17s).
>>
>> Apparently, SQL Server is choosing a very bad query plan here: SQL
>> Server is doing "smart stuff", like Index Seek, Hash Match, etc. for
>> SQL A, but for SQL B it's just making a slow Clustered Index Scan (the
>> table has ~600.000 records, BTW). Changing "WHERE a = 1234" to "WHERE
>> a = 1235" causes it to use a much better (and much faster) query plan.
>> This lead me to believe that it's a problem with outdated statistics.
>> However, neither "UPDATE STATISTICS table" nor "UPDATE STATISTICS
>> table WITH FULLSCAN" improve the situation.
>
> Did you look at the output from DBCC SHOW_STATIISTCS? Seems like 1234
> and 1235 are in different buckets in the histogram.

Indeed, they are.

Nevertheless, I'm still a bit lost here. How does this information
help me improve the performance of a = 1234? I've tried adding a
statistic containing both fields a and b (and running update
statistics fullscan afterwards), but this does not change anything --
the query plan is still terrible.

Greetings,
Heinzi

Heinrich Moser

unread,

Mar 20, 2009, 12:05:07 PM3/20/09

to

(changed quoting)

Hi!

"Uri Dimant" <ur...@iscar.co.il> writes:

> "Heinrich Moser" <use...@heinzi.at> wrote:
>> We're having the general problem that -- in one customer's database --
>> the performance of the queries varies widely. I've found one example
>> which is easy to explain and which can be reproduced reliably:
>>
>> SQL A: SELECT * FROM table WHERE a = 1234 AND b <> 0
>>
>> returns one record (by coincidence, for this particular choice of
>> field "a"; it could be more) and is very fast (< 1s). On the other
>> hand
>>
>> SQL B: SELECT TOP 1 * FROM table WHERE a = 1234 AND b <> 0
>> ORDER BY primaryKeyOfTable DESC
>>
>> returns the same record, but is very slow (~ 17s).
>>
>> Apparently, SQL Server is choosing a very bad query plan here: SQL
>> Server is doing "smart stuff", like Index Seek, Hash Match, etc. for
>> SQL A, but for SQL B it's just making a slow Clustered Index Scan (the

>> table has ~600.000 records, BTW). [...]

>
> Why do you think that these query should perform identical ?

I'm not saying they should perform identical. I'm saying that SQL B
should perform much faster. Clearly,

1. executing SQL A according to its query plan (< 1s),
2. sorting this one row (trivial),
3. doing TOP 1 on this one row (trivial)

would be a much better query plan for SQL B than what SQL Server is
doing. My question is: How can I improve SQL Server's query plan
guessing?

> What if you have a CI with DESC order, does it run faster?

Yes (~ 3s). [SQL Server still uses a CI scan, so this does not solve
the problem but just eases the symptoms.]

Interstingly, changing the CI back to ASC order causes SQL B to take
~6s (instead of ~17s). Argh... indeterminism makes things so much
harder to debug.* :-(

Greetings,
Heinzi

* Of course, I'm running DBCC DROPCLEANBUFFERS before every test run
to avoid caching effects.

Erland Sommarskog

unread,

Mar 20, 2009, 7:24:19 PM3/20/09

to

>> Did you look at the output from DBCC SHOW_STATIISTCS? Seems like 1234
>> and 1235 are in different buckets in the histogram.
>
> Indeed, they are.
>
> Nevertheless, I'm still a bit lost here. How does this information
> help me improve the performance of a = 1234? I've tried adding a
> statistic containing both fields a and b (and running update
> statistics fullscan afterwards), but this does not change anything --
> the query plan is still terrible.

I didn't say that it would help, but at least it explains the mystery.

If you are on SQL 2008, it is possible that a filtered index or a
filtered statistics would help.

For eariler versions, I don't think I have much better suggestions than
to use an index hint. Had I seend the real table definitions and indexes,
maybe I would had had more ideas.

Uri Dimant

unread,

Mar 22, 2009, 4:07:16 AM3/22/09

to

Hi
I think in that case you will be better of using index hint

"Heinrich Moser" <use...@heinzi.at> wrote in message

news:87hc1ox...@msgid.heinzi.at...

WOLO Laurent

unread,

Jun 14, 2009, 5:35:33 PM6/14/09

to

The order by is a very cost operation, avoid it if you can

"Heinrich Moser" <use...@heinzi.at> a �crit dans le message de
news:87ljr4u...@msgid.heinzi.at...