Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Query takes 0 seconds w/1 parameter but 90 s with another. {with DDL & SP}

711 views

Skip to first unread message

Eric Sabine

unread,

Apr 23, 2002, 2:06:39 PM4/23/02

This is a long post with DDL and an SP. Sorry if it's too long to
read. Please consider the following. Nine databases all do an
inventory valuation month end dump into a mini data warehouse. What
is dumped is (a) the database that the data is coming from, (b) part
name, (c) that part's current weighted average value, and (d) that
part's quantity on hand. (b) and (c) are
obviously snapshots of a moment in time. Incidentally, the dump is
about 20 minutes before the midnight of the last day of the month.

Essentially what the stored procedure below does is calculate the
value of the part in 1 database's inventory and it ties that back to
the other databases that supplied the part(s) to the subject database.
For instance, if database 9 buys its inventory from databases 2 and
4, and 9 values part ABC at $1 and 2 values it for $0.90. I want to
know that database 9 overvalues that part by 10 cents. What makes is
more complicated is that database 2 and 4 may both supply the part to
9. Given that case, I consider 2 to be the primary supplier, hence I
could care less what 4 values it for. Now that I've said 2 is the
primary, I may discover a part that 2 doesn't carry therefore I want
to know what 4 values it at. Then I will consider 4's value of the
part. Also, these differences (primary's value - target's value) are
multipled by the target (9) database's quantity on hand.

While the code below runs fine (or so I think), when I introduce the
tertiary database into the mix, the query which took microseconds (<0)
can take up to 90 seconds. BIG difference... The 2 execute
statements that produce the different times are all the way at the
bottom of the SProc.

Anyway, ideas, criticisms, etc. are appreciated.

Eric

select @@version
Microsoft SQL Server 2000 - 8.00.534 (Intel X86) Nov 19 2001
13:23:50 Copyright (c) 1988-2000 Microsoft Corporation Standard
Edition on Windows NT 5.0 (Build 2195: Service Pack 2)

-- The datepart fields are just to make querying easier. It's a data
-- warehouse. Not a production db.

create table inventory (
db varchar(6), -- the database that provided the information
part_number varchar(10), -- the part
wtd_avg decimal (15,6), -- the weighted average cost
row_stamp datetime constraint DEF_ROW_STAMP__1 default getdate(), --
a time stamp
inv_month as (month(row_stamp)),-- just a date part of the time stamp
inv_year as (year(row_stamp)) -- just a date part of the time stamp
)
go

create table invt_location (
db varchar(6), -- the database that provided the information
part_number varchar(10), -- the part
location_wh varchar(10), -- the location of where the part is stored
(like a warehouse #)
qty_on_hand decimal (12,3), -- the quantity on hand
row_stamp datetime constraint DEF_ROW_STAMP__2 default getdate(), --
a time stamp
inv_month as (month(row_stamp)),-- just a date part of the time stamp
inv_year as (year(row_stamp))-- just a date part of the time stamp
)
go

set quoted_identifier on
go
set ansi_nulls off
go

if exists (select name from sysobjects
where name = 'overvalued_sales_database_inventories' and type = 'P')
drop procedure overvalued_sales_database_inventories
go

create proc dbo.overvalued_sales_database_inventories (
@mon int = 0,
@year int = 0,
@target_db varchar(6) = null,
@primary_db varchar(6) = null,
@secondary_db varchar(6) = null,
@tertiary_db varchar(6) = null
)
as

if (@mon = 0) or (@year = 0) or (@target_db is null) or (@primary_db
is null) or (@secondary_db is null)
begin
print 'Required data was not supplied.'
print 'Fail.'
return -1
end

select sum((c.wtd_avg - a.wtd_avg)* b.qty_on_hand)
as debit_or_credit_inventory, c.db
from inventory a
join invt_location b
on a.part_number = b.part_number
and a.db = b.db
and a.inv_month = b.inv_month
and a.inv_year = b.inv_year
left join
(
--select all parts from primary_db, grab their weighted average
select part_number, wtd_avg, db
from inventory
where db = @primary_db
and inv_month = @mon
and inv_year = @year
union all
--grab parts from secondary that are only in secondary, not primary
select t.part_number, t.wtd_avg, t.db
from inventory t
where t.db = @secondary_db
and t.inv_month = @mon
and t.inv_year = @year
and not exists
(
select *
from inventory
where db = @primary_db
and inv_month = @mon
and inv_year = @year
and t.part_number = part_number
)
union all
--grab parts from tertiary that are only in tertiary, not primary or
secondary
select u.part_number, u.wtd_avg, u.db
from inventory u
where u.db = @tertiary_db
and u.inv_month = @mon
and u.inv_year = @year
and not exists
(
select *
from inventory
where (db = @primary_db or db = @secondary_db)
and inv_month = @mon
and inv_year = @year
and u.part_number = part_number
)
) c
on a.part_number = c.part_number
where a.db = @target_db
and a.inv_month = @mon
and a.inv_year = @year
group by c.db

print 'Success.'
return 0

go
set quoted_identifier off
go
set ansi_nulls on
go

grant exec on overvalued_sales_database_inventories to dbo

/*
exec overvalued_sales_database_inventories
@mon = 3,
@year = 2002,
@target_db = 'db9',
@primary_db = 'db2',
@secondary_db = 'db4'

exec overvalued_sales_database_inventories
@mon = 3,
@year = 2002,
@target_db = 'db5',
@primary_db = 'db1',
@secondary_db = 'db3',
@tertiary_db = 'db7'
*/

Andrew J. Kelly

unread,

Apr 23, 2002, 4:15:28 PM4/23/02

Eric,

I didn't spend a lot of time going over the queries and such but it sounds
like it may be related to the selectivity of the different values supplied.
For instance if you have a table with 1 million rows and an index on a
column with say 100 different possible values. If you query based on a
value that happens to correspond to 150,000 of the rows in the table the
best solution is usually a table scan over an index seek. But if the value
has only 1000 rows that match the index is probably a better bet. So
running a query twice, each with a different value one may not always expect
comparable execution times. If you look at the execution plans of both
queries you will most likely see a table scan (or two) in the slow one and
index seeks (or index scans) in the other. You can see the selectivity with
DBCC SHOW_STATISTICS. This post below is also a good source of info when
dealing with parameters:

The reason for the performance difference stems from a feature called
"parameter sniffing". Consider a stored proc defined as follows:

CREATE PROC proc1 @p1 int AS
SELECT * FROM table1 WHERE c1 = @p1
GO

Keep in mind that the server has to compile a complete execution plan for
the proc before the proc begins to execute. In 6.5, at compile time SQL
didn't know what the value of @p1 was, so it had to make a lot of guesses
when compiling a plan. Suppose all of the actual parameter values for
"@p1 int" that a user ever passed into this stored proc were unique
integers that were greater than 0, but suppose 40% of the [c1] values in
[table1] were, in fact, 0. SQL would use the average density of the
column to estimate the number of rows that this predicate would return;
this would be an overestimate, and SQL would might choose a table scan
over an index seek based on the rowcount estimates. A table scan would
be the best plan if the parameter value was 0, but unfortunately it
happens that users will never or rarely pass @p1=0, so performance of the
stored proc for more typical parameters suffers.

In SQL 7.0 or 2000, suppose you executed this proc for the first time
(when the sp plan is not in cache) with the command "EXEC proc1 @p1 =
10". Parameter sniffing allows SQL to insert the known value of
parameter @p1 into the query at compile time before a plan for the query
is generated. Because SQL knows that the value of @p1 is not 0, it can
compile a plan that is tailored to the class of parameters that is
actually passed into the proc, so for example it might select an index
seek instead of a table scan based on the smaller estimated rowcount --
this is a good thing if most of the time 0 is not the value passed as
@p1. Generally speaking, this feature allows more efficient stored proc
execution plans, but a key requirement for everything to work as expected
is that the parameter values used for compilation be "typical".

In your case, the problem is that you have default NULL values for your
parameters ("@Today DATETIME = NULL, ...") that are not typical because
the parameter values are changed inside the stored proc before they are
used -- as a result NULL will never actually be used to search the
column. If the first execution of this stored proc doesn't pass in an
explicit value for the @Today parameter, SQL believes that its value will
be NULL. When SQL compiles the plan for this sp it substitutes NULL for
each occurrence of @Today that is embedded within a query.
Unfortunately, after execution begins the first thing the stored proc
does is change @Today to a non-NULL value if it is found to be NULL, but
unfortunately SQL doesn't know about this at compile time. Because NULL
is a very atypical parameter value, the plan that SQL generates may not
be a good one for the new value of the parameter that is assigned at
execution time.

So, the bottom line is that if you assign defaults to your sp parameters
and later use those same parameters in a query, the defaults should be
"typical" because they will be used during plan generation. If you must
use defaults and business logic dictates that they be atypical (as may be
the case here if app modifications are not an option), there are two
possible solutions if you determine that the substitution of atypical
parameter values is causing bad plans:

1. "Disable" parameter sniffing by using local DECLARE'd variables that
you SET equal to the parameters inside the stored proc, and use the local
variables instead of the offending parameters in the queries. This is the
solution that you found yourself. SQL can't use parameter sniffing in
this case so it must make some guesses, but in this case the guess based
on average column density is better than the plan based on a specific but
"wrong" parameter value (NULL).

2. Nest the affected queries somehow so that they run within a different
context that will require a distinct execution plan. There are several
possibilities here. for example:
a. Put the affected queries in a different "child" stored proc. If
you execute that stored proc within this one *after* the parameter @Today
has been changed to its final value, parameter sniffing will suddenly
become your friend because the value SQL uses to compile the queries
inside the child stored proc is the actual value that will be used in the
query.
b. Use sp_executesql to execute the affected queries. The plan won't
be generated until the sp_executesql stmt actually runs, which is of
course after the parameter values have been changed.
c. Use dynamic SQL ("EXEC (@sql)") to execute the affected queries.
An equivalent approach would be to put the query in a child stored proc
just like 2.a, but execute it within the parent proc with EXEC WITH
RECOMPILE.

Option #1 seems to have worked well for you in this case, although
sometimes one of the options in #2 is a preferable choice. Here are some
guidelines, although when you're dealing with something as complicated as
the query optimizer experimentation is often the best approach <g>:

- If you have only one "class" (defined as values that have similar
density in the table) of actual parameter value that is used within a
query (even if there are other classes of data in the base table that are
never or rarely searched on), 2.a. or 2.b is probably the best option.
This is because these options permit the actual parameter values to be
used during compilation which should result in the most efficient query
plan for that class of parameter.
- If you have multiple "classes" of parameter value (for example, for
the column being searched, half the table data is NULL, the other half
are unique integers, and you may do searches on either class), 2.c can be
effective. The downside is that a new plan for the query must be
compiled on each execution, but the upside is that the plan will always
be tailored to the parameter value being used for that particular
execution. This is best when there is no single execution plan that
provides acceptable execution time for all classes of parameters.

HTH -
Bart
------------
Bart Duncan
Microsoft SQL Server Support

Please reply to the newsgroup only - thanks.

This posting is provided "AS IS" with no warranties, and confers no
rights.

--
Andrew J. Kelly SQL MVP
Targitinteractive, Inc.

> */

Dieter Nöth

unread,

Apr 23, 2002, 4:36:54 PM4/23/02

Hi Eric,
i think it's possible to get rid of the UNIONs/NOT EXISTS in the derived
table c by using following approach:
...
left join
(select
part_number,
coalesce(max(case db when @primary_db then wtd_avg end),max(case db when
@secondary_db then wtd_avg end),max(case db when @tertiary_db then wtd_avg
end)) as wtd_avg,
coalesce((case db when @primary_db then db end),(case db when
@secondary_db then db end),(case db when @tertiary_db then db end)) as db,
from inventory
where db in (@primary_db, @secondary_db, @tertiary_db)

and inv_month = @mon
and inv_year = @year

group by part_number
) c
...

It's untestet, just keyed in, but should work
You could even have a 4th/5th table with almost no difference in speed ;-)

Dieter

"Eric Sabine" <mop...@hotmail.com> wrote in message
news:9eb76c53.02042...@posting.google.com...

Eric Sabine

unread,

Apr 23, 2002, 9:27:27 PM4/23/02

Thanks Dieter,
I will try this tomorrow.
Eric

"Dieter Nöth" <dno...@gmx.de> wrote in message
news:OQzPaZw6BHA.2208@tkmsftngp07...

Eric Sabine

unread,

Apr 23, 2002, 9:27:04 PM4/23/02

Thanks Bart. That was a very interesting post. I appreciate the help.

Eric

"Andrew J. Kelly" <ake...@targitinteractive.com> wrote in message
news:ut2U$Nw6BHA.2428@tkmsftngp07...

0 new messages