Load, reorg, runstats

Troels Arvin

unread,

Jan 18, 2011, 10:23:32 AM1/18/11

to

Hello,

Let's say I have a large table which needs to be reloaded from scratch
with (lots of) new data. The table has a clustering index and a few
ordinary indexes. The table needs to be put in an optimal state for
subsequent work (which includes minor updates once in a while).

What's the proper way to go?

Sequence a:
0. (Leave the indexes: Don't drop indexes before load.)
1. Load data (in replace mode).
3. RUNSTATS ON TABLE ... AND INDEXES ALL to speed up
the subsequent table re-organization?
4. Reorg table to make the clustering index have its
influence. (Or will LOAD be able to cluster data
while loading, knowing that there is clustering index?)
5. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.
6. Reorg indexes.
7. RUNSTATS ON TABLE ... AND INDEXES ALL again to let
the database know about the new situation.

Regarding sequence a: Maybe some RUNSTATS steps aren't needed?

Sequence b (like a, but drop indexes before load):
0. Drop indexes.
1. Load data (in replace mode).
2. Create indexes.
3-7 as above.

Regarding sequence b:
- According to the documentation, by default, indexing mode is
AUTOSELECT. And given that replace mode is being used,
AUTOSELECT should be clever enough to choose complete
index rebuilds, right?
- One would think that index creation would provide a lot of
statistics information to the database; can the database
use this knowledge, such that the runstats in step 3 is
implied?

Or is there a completely other sequence that I'm overlooking?

--
Troels

Ian

unread,

Jan 18, 2011, 12:49:40 PM1/18/11

to

First of all, you don't need to do step 6. If you do a classic
(offline) reorg, it will automatically rebuild all indexes for you.

Second, you only need to do RUNSTATS once, at the end of the process.

LOAD will not sort data for you if you are writing to a normal
(non-MDC) table, even if it has a clustering index.

However, if you can sort your input data so that it's already in the
correct order, you can instruct LOAD not to mess up the ordering, so
that your data will be clustered upon completion. This would allow
you to avoid the (presumably expensive) REORG.

So, if it's practical to get your new data in proper clustered order
before you re-load the table in your database, then you can completely
avoid the REORG.

There is a debate about whether it's better to drop the indexes and
then load the table or keep the indexes in place (and let LOAD rebuild
them). I think that this really depends on your specific environment:
how big is the table (both number of rows and row width), and how many
indexes are there (and how wide is each index key)?

When you build an index on a table, DB2 must scan the table to find
all of the unique key values. DB2 then sorts the keys and writes the
index. So, if you build 5 indexes, DB2 will scan the table 5 times.

If indexes exist on the table when you do a LOAD REPLACE, DB2 will
build temporary tables for each index *in parallel* during the load
phase. When the load phase completes, DB2 uses these temp tables to
build the indexes, one at a time. This eliminates the repetitive
table scans, but there is certainly a cost to the LOAD that is
associated with building these temp tables as data is loaded.

If the table is relatively small (i.e. narrow rows & lots of rows per
page) then building the indexes after the load could be faster. But
if the rows are very wide, the table scans could be pretty expensive.

You should see which option (dropping indexes or not) works best in
your environment.

To answer your other questions:

> Regarding sequence b:
> - According to the documentation, by default, indexing mode is
> AUTOSELECT. And given that replace mode is being used,
> AUTOSELECT should be clever enough to choose complete
> index rebuilds, right?

The indexing mode only applies if you're doing a LOAD INSERT.
LOAD REPLACE necessitates a full rebuild.

> - One would think that index creation would provide a lot of
> statistics information to the database; can the database
> use this knowledge, such that the runstats in step 3 is
> implied?

This is true; and when you create an index you can specify COLLECT
[DETAILED] STATISTICS. The point of this is to allow you to avoid
doing a full RUNSTATS on the table and all indexes if you are just
creating a new index.

Bottom line:

If you can sort your input data and keeping indexes in place works
well, then you can do the entire operation with a single LOAD command
(LOAD can collect statistics).

Otherwise do the minimum amount of work:

1) Drop indexes (if necessary)
2) Load
3) Recreate indexes (if necessary)
4) Reorg (if necessary)
5) Runstats (if necessary)

Good luck,

Troels Arvin

unread,

Jan 18, 2011, 1:29:56 PM1/18/11

to

Hello,

Ian wrote:
> First of all, you don't need to do step 6. If you do a classic
> (offline) reorg, it will automatically rebuild all indexes for you.

That's new to me. By "offline" you mean stating "USE <some tmpspace>",
right?

> Second, you only need to do RUNSTATS once, at the end of the process.

But will REORG not run faster if there are statistics available to help
the re-organization?

> If indexes exist on the table when you do a LOAD REPLACE, DB2 will build
> temporary tables for each index *in parallel* during the load phase.

Does this mean that the table will be copied in full n times (n being the
number of indexes), i.e. even including columns which are not part of the
index?

>> - One would think that index creation would provide a lot of
>> statistics information to the database; can the database use this
>> knowledge, such that the runstats in step 3 is implied?
>
> This is true; and when you create an index you can specify COLLECT
> [DETAILED] STATISTICS.

Ah - great.

> Bottom line:
[...]

Thanks for your input!

--
Troels

Mark A

unread,

Jan 18, 2011, 2:07:14 PM1/18/11

to

"Troels Arvin" <tro...@arvin.dk> wrote in message
news:4d35dc24$0$23759$1472...@news.sunsite.dk...

> That's new to me. By "offline" you mean stating "USE <some tmpspace>",
> right?

No. he means offline (classic) instead of "inplace." See the Command Refence
Manual.

> But will REORG not run faster if there are statistics available to help
> the re-organization?

No, it will not run faster with statistics.

> Does this mean that the table will be copied in full n times (n being the
> number of indexes), i.e. even including columns which are not part of the
> index?

No, only the index columns will be "copied," one for each index.