Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Elementary management of many indexes, best practice?

19 views
Skip to first unread message

rja.ca...@gmail.com

unread,
Mar 12, 2013, 10:48:20 AM3/12/13
to
Hi, today I'm looking for advice on how to manage a large number of similar
table indexes. I may have asked before, but it didn't get done.

I have data to work on spread through many tables in many databases, with
names such as [BO-Stud18].[dbo].[Calendar_2012], where all "Calendar" tables
have the same structure, or more or less so. Sometimes these tables are
dropped and re-created. This isn't an ideal design, but I'm kind of stuck
with it.

Also, stored procedures that operate on the data tend to create an index
when the programmer has decided that it's needed. But the indexes are not
necessarily well designed, and not necessarily appropriate, e.g. with
a unique key specified where it actually may not be unique in data.

Well, presently I'm considering putting all of the index definitions into
a huge stored procedure which will be called with parameters 'Calendar' and
the other parts of the table and database name separated out, and optionally
a particular index name, and in which, something like this happens:

IF ( @tableName = N'Calendar' )
BEGIN
IF ( @indexName = N'PK' OR @indexName IS NULL ) -- AND (1=0)
BEGIN
SET @template =
N'ALTER TABLE @{table} ADD CONSTRAINT @{index} PRIMARY KEY CLUSTERED
(authcode, id) --X';
SET @workstring =
REPLACE(REPLACE(
@template
, N'@{table}', @tableName_fq)
, N'@{index}', QUOTENAME(@indexNamePK))
;
EXEC sp_executesql @workstring;
END

IF ( @indexName = N'IX02' OR @indexName IS NULL )
BEGIN
SET @template =
N'CREATE NONCLUSTERED INDEX IX02 ON @{table}
(id, expectedCurrentCostcentre) ON [INDEX] --X';
SET @workstring = REPLACE(@template, N'@{table}', @tableName_fq);
EXEC sp_executesql @workstring;
END

IF ( @indexName = N'IX03' OR @indexName IS NULL )
BEGIN
SET @template =
N'CREATE NONCLUSTERED INDEX IX03 ON @{table}
(expectedCurrentCostcentre) ON [INDEX] --X';
SET @workstring = REPLACE(@template, N'@{table}', @tableName_fq);
EXEC sp_executesql @workstring;
END
END

ELSE -- a similar block for another index

When I want to change the design of indexes, I would put the new index
declaration in the procedure, and execute it for all of the tables that
should have that index. And I go on executing it forever.

Since this will be a big, ugly program, one refinement that comes to mind
is to have the index definitions for each table held in a function named
after that table, instead. Or, there may be a completely different good
way to do the job.

I don't want to go down the wrong path, I don't want to re-invent the wheel -
so, what do I want to know, but I don't know it?

Thanks!

Robert Carnegie

Erland Sommarskog

unread,
Mar 12, 2013, 5:36:36 PM3/12/13
to
(rja.ca...@gmail.com) writes:
> Hi, today I'm looking for advice on how to manage a large number of
> similar table indexes. I may have asked before, but it didn't get
> done.
>
> I have data to work on spread through many tables in many databases,
> with names such as [BO-Stud18].[dbo].[Calendar_2012], where all
> "Calendar" tables have the same structure, or more or less so.
> Sometimes these tables are dropped and re-created. This isn't an ideal
> design, but I'm kind of stuck with it.

I would probably start there, as home-built partitioning tends to be
painful. Only if you have really good reasons, like the need to
quickly age out all data, you should do things like this. And you
should use partitioned views in this case.

> Well, presently I'm considering putting all of the index definitions
> into a huge stored procedure which will be called with parameters
> 'Calendar' and the other parts of the table and database name separated
> out, and optionally a particular index name, and in which, something
> like this happens:

My strong preference is to keep all database code under version
control. The we organise tables, is that the table definition itself is
in one file:


CREATE TABLE abasubsystems(
subsystem ap_subsystem NOT NULL,
ss_label ap_sslabel NULL,
inhid int NULL,
sortorder int NOT NULL
CONSTRAINT ckc_ap_sub_sortorder CHECK (sortorder > 0),
isincomplete bit NOT NULL,
deregdate datetime NULL,
CONSTRAINT pk_ap_sub PRIMARY KEY NONCLUSTERED (subsystem)
)

As you see, this table also includes PK constraints and CHECK constraints.
However, foreign keys are in a separate file:

ALTER TABLE abasubsystems ADD
CONSTRAINT fk1_ap_sub_ap_inh FOREIGN KEY (inhid)
REFERENCES abainstallhistory (inhid),
CONSTRAINT fk2_ap_sub_ap_sla FOREIGN KEY (subsystem, ss_label)
REFERENCES abasslabels (subsystem, ss_label)

This makes it a lot easier to build the database, since we don't have to
have a certain order.

By now you have guess that the indexes are also in a separate file:

CREATE CLUSTERED INDEX ap_sub_sortorder_ix ON
abasubsystems (sortorder)
go

We have tool that permits us to build an empty database from these
files in version control; the tool loads the files in a well-defined
order. We also have a tool that can generate an update script,
by comparing the contents in version-control at two different labels.
When you add an index to an index file, that file will be reloaded,
but the tool will recognize that the existing index is unchanged and
not touch it.

As it happens, this tool is available for the public on my web site:
http://www.sommarskog.se/AbaPerls/index.html

This is just one way to skin the cat. You can also use SQL Compare from
Red Gate or SQL Server Data Tools, a free download from Microsoft. But
I don't like putting index definitions in stored procedures like you
are planning, because all the T-SQL trees will hide the index forest
for you.


--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx

rja.ca...@gmail.com

unread,
Mar 15, 2013, 10:23:09 PM3/15/13
to
On Tuesday, 12 March 2013 21:36:36 UTC, Erland Sommarskog wrote:
> (rja.ca...@gmail.com) writes:
> > I have data to work on spread through many tables in many databases,
> > with names such as [BO-Stud18].[dbo].[Calendar_2012], where all
> > "Calendar" tables have the same structure, or more or less so.
> > Sometimes these tables are dropped and re-created. This isn't an ideal
> > design, but I'm kind of stuck with it.
>
> I would probably start there, as home-built partitioning tends to be
> painful. Only if you have really good reasons, like the need to
> quickly age out all data, you should do things like this. And you
> should use partitioned views in this case.

Well, I know it's wrong, but, as I said, stuck with it.
This is for EIS and is a ludicrous way to set things up,
but I don't have authority to change it. Up to now I
haven't even got the server (2005, Standard) brought
up to the latest service pack - but we had a particularly
bad Friday, so, if I get the chance, what version
should I ask for?

> My strong preference is to keep all database code under version
> control. The [way] we organise tables, is that the table
> definition itself is in one file [with its] PK constraints
> and CHECK constraints. However, foreign keys are in a
> separate file. The indexes are also in a separate file.
>
> I don't like putting index definitions in stored procedures
> like you are planning, because all the T-SQL trees will hide
> the index forest for you.

What I've got now is worse, though; any program that runs
slow is liable to have been changed to create table indexes
on the spot. Having /one/ procedure that creates all
table indexes - being called separately for each table -
will keep things more coherent, and maybe even create a
presumption that a table will /have/ indexes, which
doesn't cross some of our developers' minds when they (!)
take it into their head to create one. On the other
hand, my boss likes under pressure to use the tuning tool
on a query and then create all of the indexes that it
proposes, at once, which I /think/ is wrong?

Apparently the best rule is not to be us.

By the way, are there any tips for Statistics?
I hope I've got this straight: they reflect the
distribution of data values in the table, but are not
necessarily current. I suspect that some or all of
the statistics that spontaneously appear on tables
ought to be considered for creating an index instead
(containing statistics?), although I also suppose that
then it would happen automatically. And we have found
that some ugly queries perform far better after executing
"UPDATE STATISTICS ... WITH FULLSCAN" on every table
in one or more databases, but, since /my/ duties
tend to involve table that get emptied or dropped and
recreated every night, for the EIS, we'd need to do
that ideally just after loading all the data into a
table and - maybe? - before creating indexes on it.
Would it improve the indexes too, or just create
a useless additional workload?

I assume that "FULLSCAN" makes statistics better.
And larger.

I'm considering creating a design, similar to one that
I've got that just hits all tables, that could allow
tables to be pre-chosen to get this done every night or
every week or once or never, and do it either by
calling it on one table during murky overnight
processing, or on any nominated tables that it hasn't
been done for by the end - as logged in another table -
and promote tables to having it done earlier if the
overnight processing itself seems likely to benefit
from that.

Erland Sommarskog

unread,
Mar 16, 2013, 7:32:53 AM3/16/13
to
(rja.ca...@gmail.com) writes:
> Well, I know it's wrong, but, as I said, stuck with it.
> This is for EIS and is a ludicrous way to set things up,

What is EIS?

> but I don't have authority to change it. Up to now I
> haven't even got the server (2005, Standard) brought
> up to the latest service pack - but we had a particularly
> bad Friday, so, if I get the chance, what version
> should I ask for?

Service Pack 4 is the last service pack for SQL 2005.

> What I've got now is worse, though; any program that runs
> slow is liable to have been changed to create table indexes
> on the spot. Having /one/ procedure that creates all
> table indexes - being called separately for each table -
> will keep things more coherent, and maybe even create a
> presumption that a table will /have/ indexes, which
> doesn't cross some of our developers' minds when they (!)
> take it into their head to create one. On the other
> hand, my boss likes under pressure to use the tuning tool
> on a query and then create all of the indexes that it
> proposes, at once, which I /think/ is wrong?

The purpose of the Database Tuning Advisor is that you collect a
workload for a longer period, say a day, and from this add or drop
indexes. If you feed it a single query it will tune that workload,
and be oblvious to other effects it may have.

I still think a T-SQL procedure is the wrong way. It's still better
to have things on files which are easier to read, and then have a
program in the language of you choice to read this file and create
the indexes. But T-SQL is not that language - it's not good at string
processing.

> By the way, are there any tips for Statistics?
> I hope I've got this straight: they reflect the
> distribution of data values in the table, but are not
> necessarily current.

Correct.

> I suspect that some or all of the statistics that spontaneously appear
> on tables ought to be considered for creating an index instead

Not necessarily. SQL Server creates statistics on anything it sees more or
less.

> I assume that "FULLSCAN" makes statistics better.
> And larger.

FULLSCAN does indeed make the statistics more accurate, but the size does
not change. The histogramme will never have more than 200 steps. The default
is sampled statistics, and that is said to be sufficient. Personally, I tend
to prefer fullscan if that is possible.

When you create an index, the statistics for the new index will always be
the result of a fully scanned index, the same if you rebuild the index. This
is because in this case, SQL Server has to read all rows anyway.

This white paper has some good information on statistics:
http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

rja.ca...@gmail.com

unread,
Mar 16, 2013, 5:12:03 PM3/16/13
to
On Saturday, 16 March 2013 11:32:53 UTC, Erland Sommarskog wrote:
> (rja.ca...@gmail.com) writes:
> > Well, I know it's wrong, but, as I said, stuck with it.
> > This is for EIS and is a ludicrous way to set things up,
>
> What is EIS?

I meant "Enterprise Information System", which I suppose is an
out-of-date term: reports for bosses. /Interactive/ reports.

> > but I don't have authority to change it. Up to now I
> > haven't even got the server (2005, Standard) brought
> > up to the latest service pack - but we had a particularly
> > bad Friday, so, if I get the chance, what version
> > should I ask for?
>
> Service Pack 4 is the last service pack for SQL 2005.

OK... so there aren't any hotfixes or cumulative updates that
you'd install as standard? I know we're not /supposed/ to do
that.

<http://www.sqldbaguru.com/cu-builds> is a catalogue of these,
and <http://support.microsoft.com/kb/2494120> is particularly
interesting because it fixes a security hole - which I think
I remember, from looking at documentation previously, is only in
the server admin or client tools, not in the server itself.
However, it bumps you from Service Pack 4 (9.00.5000)
to 9.00.5057.

It was six months ago, when I was told that installing the
service pack wasn't going to be scheduled.

A problem that we have currently, that may be associated with a
particular query, seems to lead to tempdb suddenly inflating to
over 250 gigabytes - the size of the separate partition that
someone with unusual forethought placed it on. We think this
is a completely unreasonable effect.

> > What I've got now is worse, though; any program that runs
> > slow is liable to have been changed to create table indexes
> > on the spot. Having /one/ procedure that creates all
> > table indexes - being called separately for each table -
> > will keep things more coherent, and maybe even create a
> > presumption that a table will /have/ indexes, which
> > doesn't cross some of our developers' minds when they (!)
> > take it into their head to create one. On the other
> > hand, my boss likes under pressure to use the tuning tool
> > on a query and then create all of the indexes that it
> > proposes, at once, which I /think/ is wrong?
>
> The purpose of the Database Tuning Advisor is that you collect a
> workload for a longer period, say a day, and from this add or drop
> indexes. If you feed it a single query it will tune that workload,
> and be oblvious to other effects it may have.

The trouble that I see with that is that our EIS team only has
one SQL server - no, don't leave... The tuning process is liable to
block normal use of the server (uh, right?), particularly if you're
tuning for a query where performance stinks.

My impression has been that tuning recommendations should be treated
as alternatives to choose from, rather than a prescribed remedy;
I seem to be told to create several indexes and statistics that
are quite similar. But perhaps that's a result of asking the tool
about just one query - it calculates anything that will help that?

Thank you for the advice on statistics. If the white paper descibes
them adequately then I suspect it will be difficult to read, but I
should apply time to it.

Erland Sommarskog

unread,
Mar 16, 2013, 6:08:33 PM3/16/13
to
(rja.ca...@gmail.com) writes:
> OK... so there aren't any hotfixes or cumulative updates that
> you'd install as standard? I know we're not /supposed/ to do
> that.

There are certainly a couple of CU after SP4. Exactly how many I don't
know, nor whether how critical they are.

> A problem that we have currently, that may be associated with a
> particular query, seems to lead to tempdb suddenly inflating to
> over 250 gigabytes - the size of the separate partition that
> someone with unusual forethought placed it on. We think this
> is a completely unreasonable effect.

Looks like this query needs tuning.

> The trouble that I see with that is that our EIS team only has
> one SQL server - no, don't leave... The tuning process is liable to
> block normal use of the server (uh, right?), particularly if you're
> tuning for a query where performance stinks.

Not really. You a trace during the day, and then you feed that to DTA.
Since this is reports for bosses, maybe you should run it for a week
or so, as there might be a daily pattern which is strong enough.

> My impression has been that tuning recommendations should be treated
> as alternatives to choose from, rather than a prescribed remedy;
> I seem to be told to create several indexes and statistics that
> are quite similar. But perhaps that's a result of asking the tool
> about just one query - it calculates anything that will help that?

Exactly! You feed it a workload, and if you lie and say that this
single query is your workload, it will comply.

Mark D Powell

unread,
Mar 27, 2013, 2:20:23 PM3/27/13
to
On Friday, March 15, 2013 10:23:09 PM UTC-4, rja.ca...@gmail.com wrote:
> On Tuesday, 12 March 2013 21:36:36 UTC, Erland Sommarskog wrote: > (rja.ca...@gmail.com) writes: > > I have data to work on spread through many tables in many databases, > > with names such as [BO-Stud18].[dbo].[Calendar_2012], where all > > "Calendar" tables have the same structure, or more or less so. > > Sometimes these tables are dropped and re-created. This isn't an ideal > > design, but I'm kind of stuck with it. > > I would probably start there, as home-built partitioning tends to be > painful. Only if you have really good reasons, like the need to > quickly age out all data, you should do things like this. And you > should use partitioned views in this case. Well, I know it's wrong, but, as I said, stuck with it. This is for EIS and is a ludicrous way to set things up, but I don't have authority to change it. Up to now I haven't even got the server (2005, Standard) brought up to the latest service pack - but we had a particularly bad Friday, so, if I get the chance, what version should I ask for? > My strong preference is to keep all database code under version > control. The [way] we organise tables, is that the table > definition itself is in one file [with its] PK constraints > and CHECK constraints. However, foreign keys are in a > separate file. The indexes are also in a separate file. > > I don't like putting index definitions in stored procedures > like you are planning, because all the T-SQL trees will hide > the index forest for you. What I've got now is worse, though; any program that runs slow is liable to have been changed to create table indexes on the spot. Having /one/ procedure that creates all table indexes - being called separately for each table - will keep things more coherent, and maybe even create a presumption that a table will /have/ indexes, which doesn't cross some of our developers' minds when they (!) take it into their head to create one. On the other hand, my boss likes under pressure to use the tuning tool on a query and then create all of the indexes that it proposes, at once, which I /think/ is wrong? Apparently the best rule is not to be us. By the way, are there any tips for Statistics? I hope I've got this straight: they reflect the distribution of data values in the table, but are not necessarily current. I suspect that some or all of the statistics that spontaneously appear on tables ought to be considered for creating an index instead (containing statistics?), although I also suppose that then it would happen automatically. And we have found that some ugly queries perform far better after executing "UPDATE STATISTICS ... WITH FULLSCAN" on every table in one or more databases, but, since /my/ duties tend to involve table that get emptied or dropped and recreated every night, for the EIS, we'd need to do that ideally just after loading all the data into a table and - maybe? - before creating indexes on it. Would it improve the indexes too, or just create a useless additional workload? I assume that "FULLSCAN" makes statistics better. And larger. I'm considering creating a design, similar to one that I've got that just hits all tables, that could allow tables to be pre-chosen to get this done every night or every week or once or never, and do it either by calling it on one table during murky overnight processing, or on any nominated tables that it hasn't been done for by the end - as logged in another table - and promote tables to having it done earlier if the overnight processing itself seems likely to benefit from that.

In discussions with MS tuning specialists who were hired to work on a MS product one of my customers has installed the analyst confirmed that the index tuning wizard over recommends the addition of indexes. An recently in discussion with another outside product expert he stated the feature has a serious bug involving generating indexes that include every column in the table, which I have seen on 2008R2 RTM. I have not checked patched releases yet. It will actually recommend several such indexes for the same table. This will pretty much kill insert performance and consumes a lot of space.

I have always taken the tuning wizard's recommendations with a grain of salt.

HTH -- Mark D Powell --
0 new messages