Working as a DB2 UDB Support Analyst, I frequently hear the following
question from customers who are dealing with an optimizer or query
planning problem:
"How do I recreate the same query access plan on my test environment
that I am getting on my production environment?"
Many times, there is a need to replicate a production environment to a
test environment, including recreating the same access plan for query
analysis purposes.
For example, in production, you could experience performance problems
caused by a query that is using a poor access plan and want to
replicate the access plan on a test system in order try some different
strategies, such as manipulating the statistics, changing the
optimization level, trying different settings for DB2 registry
variables, and so on, in order to improve performance.
In an ideal world, you would want to have the test environment as
closely matched to production as possible. That is, you would want to
use exactly the same hardware, operating system maintenance level and
configuration, DB2 level, and configuration in both environments, as
well as using the same data in test as in production. However, this
ideal cannot always be achieved. If the production environment has a
very large amount of data, you may simply not have the capacity to keep
a test copy of the production system.
The db2look utility can be used to achieve your goal, even though you
can't replicate all the details of production.
This article will explain how you can mimic a production system on a
test system without the need for actual data in order to recreate a
query planning problem. This ability will help you to debug queries and
understand access plan issues without interrupting work in the
production environment. Note, however, that if you want to test the
execution of the resulting access plan, you will still need to load
data from production onto test (as much data as possible). And there is
always the possibility that the differences between the test and
production systems are still enough that the execution characteristics
on test do not match those on production. This part of analysis
(performance tuning) is as much an art as science.
Other problems in the optimizer or query compiler area, such as
SQL0901N errors or instance crashes, can also be recreated using the
methods explained in this article. You can try various strategies, such
as testing the most recent fix pack (if the system is at an older fix
level), different optimization levels, different registry variables,
and so on, in order to see if these changes will correct the problem.
Let's look at the options to use with db2look in order to achieve this
goal.
Back to top
The db2look command and its options
Here are the commands you use to capture the needed information from
your production system:
Listing 1. Commands to recreate an optimizer problem
db2look -d <dbname> -l -o storage.out
db2look -d <dbname> -f -fd -o config.out
db2look -d <dbname> -e -a -m -o db2look.out
db2look -d <dbname> -e -a -m -t table1 table2 .... tableX -o table.ddl
Now let's look at these db2look command options in more detail.
Generate buffer pool, tablespace, and database partition group
information
db2look -d <dbname> -l -o storage.out
Here is a description of the options used in the db2look command above:
-d: Database name -- This must be specified.
-l: Generates database layout. This is the layout for database
partition groups, buffer pools and tablespaces.
-o: Redirects the output to the given file name. If the -o option is
not specified, then output goes to standard output (stdout), generally
the screen.
The -l option is important to mimicking your production environment.
Ideally, you want to have the same buffer pools, database partition
groups (if you're in a multi-partition environment), and tablespace
information (including temporary tablespaces). However, if you are
constrained by memory and cannot allocate the large buffer pools that
you have in production, then use the db2fopt command. I'll discuss this
command in more detail later in this section.
It is not always possible to have the same tablespaces set up in test
that you have in production. For example, you may have devices set up
with large sizes, and you may not have the flexibility to create the
same device sizes in test. Or, you may not have a separate tablespace
device available at all in the test environment. In addition, you might
not have the same paths set up in test that you have in production. You
would need to alter the paths, devices, and files appropriately to fit
your test environment.
The important information used by the optimizer for a tablespace is the
following. This is what you would want to make sure are the same on
both test and production. (Note: The numbers shown here are an example.
You should use the same settings on test as you do on production.)
PREFETCHSIZE 16
EXTENTSIZE 16
OVERHEAD 12.670000
TRANSFERRATE 0.180000
If a tablespace is "managed by database" on production, it should also
be "managed by database" on test. If it is "managed by system" on
production, it should also be that way on test.
Note: If this is a system with multiple physical partitions (MPP), the
number of partitions in the database partition group must be the same
on test. However, the number of physical machines does not have to be
the same. The number of logical partitions in the whole MPP environment
must be the same on both test and production.
Generate configuration parameters and registry variables
db2look -d <dbname> -f -fd -o config.out
Here, I've used the following parameters:
-f: Extracts configuration parameters and registry variables. If this
option is specified, -wrapper and -server options will be ignored.
-fd: Generates db2fopt statements for opt_buffpage and opt_sortheap,
along with other configuration and registry settings.
The output of the command looks like this:
Listing 2. Sample output from db2look command
$ db2look -d sample -f -fd
-- No userid was specified, db2look tries to use Environment variable
USER
-- USER is: SKAPOOR
-- This CLP file was created using DB2LOOK Version 8.2
-- Timestamp: Sat Mar 26 00:13:36 EST 2005
-- Database Name: SAMPLE
-- Database Manager Version: DB2/6000 Version 8.2.2
-- Database Codepage: 819
-- Database Collating Sequence is: UNIQUE
CONNECT TO SAMPLE;
--------------------------------------------------------
-- Database and Database Manager configuration parameters
--------------------------------------------------------
UPDATE DBM CFG USING cpuspeed 6.523521e-07;
UPDATE DBM CFG USING intra_parallel NO;
UPDATE DBM CFG USING federated NO;
UPDATE DBM CFG USING fed_noauth NO;
!db2fopt SAMPLE update opt_buffpage 50000;
!db2fopt SAMPLE update opt_sortheap 10000;
UPDATE DB CFG FOR SAMPLE USING locklist 1000;
UPDATE DB CFG FOR SAMPLE USING dft_degree 1;
UPDATE DB CFG FOR SAMPLE USING maxlocks 10;
UPDATE DB CFG FOR SAMPLE USING avg_appls 1;
UPDATE DB CFG FOR SAMPLE USING stmtheap 2048;
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5;
---------------------------------
-- Environment Variables settings
---------------------------------
!db2set DB2_ANTIJOIN=yes;
!db2set DB2_INLIST_TO_NLJN=yes;
COMMIT WORK;
CONNECT RESET;
TERMINATE;
The -f and -fd options are key options to use in order to extract
configuration parameter and environment variables, which the optimizer
uses during access plan phase. In Listing 2, above, note the following
output that resulted from the -fd option:
!db2fopt SAMPLE update opt_buffpage 50000;
!db2fopt SAMPLE update opt_sortheap 10000;
The db2fopt command tells the optimizer to use the specified value for
"Buffer pool size," rather than adding up the pages of the buffer pools
available. (Buffer pool size in the db2exfmt output is discussed
further in the buffer pool size section below.). For example, say that
you cannot afford to have large buffer pools due to memory constraints
on the test system and would like to configure the size the same
without actually having them in reality. Use the -fd option, which
would generate the db2fopt commands required to tell the optimizer to
use the specified size rather than calculating based on the buffer
pools available for this database.
It works the same way for sort heap, as we'll see in the sort heap
section below.
Here is the usage for the db2fopt command. Note, that the -fd option
with db2look picks the values for you but you should be aware of the
usage and as well aware of how to reset the values so that we go back
to using syscat.bufferpools for bufferpool pages and sortheap in the
database configuration.
C:\>db2fopt
Usage: db2fopt <database-alias> update [opt_buffpage <value>]
[opt_sortheap <value>f]
or db2fopt <database-alias> get [opt_buffpage] [opt_sortheap]]
If you would like to set the values for opt_buffpage and opt_sortheap,
issue:
db2fopt <dbname> update opt_buffpage <value> opt_sortheap <value>
For example:
C:\>db2fopt sample update opt_buffpage 50000 opt_sortheap 10000
Update succeeded
Make sure to terminate and reconnect to the database.
If you would like to view the values, issue:
C:\>db2fopt sample get opt_buffpage opt_sortheap opt_buffpage value is
50000 opt_sortheap value is 10000
And if you would like to reset the values so that we do not use these
two parameters, and go back to using syscat.bufferpools for bufferpool
pages estimate, and database configuration for sortheap size, issue:
C:\>db2fopt sample update opt_buffpage -1 opt_sortheap -1
Update succeeded
Make sure to terminate and reconnect to the database.
To make sure that they are reset, use the get option in db2fopt again:
C:\>db2fopt sample get opt_buffpage opt_sortheap
opt_buffpage value is -1
opt_sortheap value is -1
If you're a DBA, you will probably be using the DB2 SQL Explain Tool
(db2exfmt) to gain an understanding of your SQL access plan. The
db2exfmt tool is used to format the contents of the explain tables. If
you look at the output of one of the access plans using db2exfmt from
production, you will notice the following at the top of the plan.
(Note: For the most part, these parameters are picked up by the -f and
-fd option in the db2look output, with the exception of the dbheap
setting).
Listing 3. Sample output from db2exfmt
Database Context:
----------------
Parallelism: None
CPU Speed: 6.523521e-07
Comm Speed: 100
Buffer Pool size: 50000
Sort Heap size: 10000
Database Heap size: 5120
Lock List size: 1000
Maximum Lock List: 10
Average Applications: 1
Locks Available: 7849
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability
---------------- STATEMENT 1 SECTION 201 ----------------
QUERYNO: 1
QUERYTAG: CLP
Statement Type: Select
Updatable: No
Deletable: No
Query Degree: 1
If you go a bit further down in the db2exfmt output, right after the
access plan, you will see if you have any registry settings that affect
the optimizer plan.
Note: Again, unfortunately not all the relevant registry variables are
listed by db2look -f. You will need to add the ones that are missing.
In general, your registry variable settings on the test system should
be identical, or as close as possible, to the settings on production.
Listing 4. Registry settings that affect access plan
1) RETURN: (Return Result)
Cumulative Total Cost: 57.6764
Cumulative CPU Cost: 191909
Cumulative I/O Cost: 2
Cumulative Re-Total Cost: 5.37264
Cumulative Re-CPU Cost: 134316
Cumulative Re-I/O Cost: 0
Cumulative First Row Cost: 26.9726
Estimated Buffer pool Buffers: 2
Arguments:
---------
BLDLEVEL: (Build level)
DB2 v8.1.0.80 : s041221
ENVVAR : (Environment Variable)
DB2_ANTIJOIN=yes
DB2_INLIST_TO_NLJN = yes
STMTHEAP: (Statement heap size)
2048
Create data definition language (DDL)
The following db2look command creates the DDL to duplicate all database
objects, along with the configuration and statistical information.
db2look -d <dbname> -e -a -m -o db2look.out
Here we've used the following parameters:
-a: Generate statistics for all creators. If this option is specified,
then the -u option will be ignored.
-e: Extract DDL file needed to duplicate database. This option
generates a script containing DDL statements. The script can be run
against another database to recreate database objects.
-m: Run the db2look utility in mimic mode. This option generates a
script containing SQL UPDATE statements. These SQL UPDATE statements
capture all the statistics. This script can be run against another
database to replicate the original one. When the -m option is
specified, the -p, -g, and -s options are ignored .
Gather statistics and DDL for a database subset
To gather statistics and ddl for only certain tables and related
objects, use the following command:
db2look -d <dbname> -e -a -m -t <table1> <table2> .. <tableX> -o
table.ddl
Here, I've used the following additional parameter:
-t: Generate statistics for the specified tables. The maximum number of
tables that can be specified is 30.
In addition, if you do not use the -a option, you could use the -z
option:
-z: Schema name. If -z and -a are both specified, then -z will be
ignored. Schema name is ignored for the federated section.
Note: The -m option is very important. This option will collect all
the statistics from the system tables. The statistics must be the same
in test as they are in production and are key to your being able to
mimic the production environment in your test environment.
Back to top
More details db2exfmt output
Database manager level configuration parameters
Note: Use the command db2 "get dbm cfg", in order to look at these
parameters and db2 "update dbm cfg using <parameter> <value>" to update
a database manager configuration parameter.
Parallelism:
This parameter indicates whether inter- or intra-partition parallelism
is enabled. If this is DPF with multiple partitions, then you will see
Inter Partition Parallelism. If this is just SMP (intra_parallel
enabled) single node environment, then you will see Intra Partition
Parallelism. If both intra_parallel enabled and multiple partition
environment, you will see both Inter and Intra partitions parallelism
for this parameter. And finally, if there is no inter- or
intra-parallelism, this parameter will show NONE.
CPU Speed (cpuspeed):
The CPU speed (in milliseconds per instruction) is used by the SQL
optimizer to estimate the cost of performing certain operations.
Communications speed: (comm_bandwidth)
The value specified for the communications bandwidth (in megabytes per
second) is used by the SQL optimizer to estimate the cost of performing
certain operations between partition servers of a partitioned database
system.
Database level configuration parameters
Note: Use the command db2 "get db cfg for <dbname>" in order to look at
these parameters and db2 "update db cfg for <dbname> using <parameter>
<value>") to update a database configuration parameter.
Buffer pool size:
The buffer pool size shown in db2exfmt output is determined by the
buffpage parameter, if using buffpage as default for one buffer pool,
or a calculation based on the contents of syscat.bufferpools. The
number shown is the total number of buffer pool pages that are
allocated for the database. For example, let's say we have the
following buffer pools:
Table 1. Buffer pool setup
BUFFERPOOLNAME SIZE
IBMDEFAULTBP 1000
BP1 1000
BP2 4000
BPIND1 1000
BPIND2 1000
BPLONG 1000
BPTEMP 1000
Total: 10,000
The db2exfmt output would show the total size as the sum of the number
of pages in all the bufferpools. In our example above, it is 10,000.
Note: Pagesize does not matter, just the number of pages.
You could use the -fd option in db2look to use the db2fopt alternative
if you cannot afford to have the same amount of buffer pool allocated
in test as in production.
In MPP, the opt_buffpage is calculated per node, as the optimizer uses
the total buffer pool information for the node on which the query is
running. Hence, this change will only apply to the node on which this
tool is being run.
Sort heap size (SORTHEAP)
This parameter defines the maximum number of private memory pages to be
used for private sorts or the maximum number of shared memory pages to
be used for shared sorts.
You should set this to the same value as in production. Again, using
the -fd option in db2look, you will notice the following:
!db2fopt SAMPLE update opt_sortheap 256;
This will override the sortheap configuration parameter, and this is
what the optimizer will use as the sortheap value. Again, in reality,
the actual sortheap allocated at runtime will be determined by the
sortheap setting in the database configuration. As with opt_buffpage,
you can use opt_sortheap if you cannot afford to allocate the same size
of sortheap on your test system as you can on production.
Database heap size: (DBHEAP)
There is one database heap per database, and the database manager uses
it on behalf of all applications connected to the database. It contains
control block information for tables, indexes, table spaces, and buffer
pools.
Lock list size: (LOCKLIST)
This parameter indicates the amount of storage that is allocated to the
lock list.
Maximum lock list: (MAXLOCKS)
This parameter defines a percentage of the lock list held by an
application that must be filled before the database manager performs
escalation.
The locklist and maxlocks would help determine the type of locks that
will be held during a certain scan (index scan or table scan), along
with the isolation level. For example, in the plan you will notice,
say, an index scan operation:
IXSCAN: (Index Scan)
TABLOCK : (Table Lock intent)
INTENT SHARE
Note: Do not be concerned if Locks Available differs in your db2exfmt
output from your test system, compared to production -- the difference
has no effect on query planning.
Average applications: (AVG_APPLS)
This parameter is used by the SQL optimizer to help estimate how much
buffer pool will be available at run-time for the access plan chosen
(since the buffer pool is shared by all active applications connected
to the database).
Optimization Level: (DFT_QUERYOPT)
The query optimization class is used to direct the optimizer to use
different degrees of optimization when compiling SQL queries
Query Degree: (DFT_DEGREE)
The degree of intra-partition parallelism for an SQL statement. if set
to ANY, the optimizer is sensitive to the actual number of cpus that
are online. if you use ANY, then the number of cpus on test and
production should be configured the same, unless intra_parallel is
disabled.
In addition to the above changes, there are some others that you must
make sure are the same.
Number of frequent values retained: (NUM_FREQVALUES)
This parameter allows you to specify the number of "most frequent
values" that will be collected when the WITH DISTRIBUTION option is
specified on the RUNSTATS command.
Number of quantiles retained: (NUM_QUANTILES)
This parameter controls the number of quantiles that will be collected
when the WITH DISTRIBUTION option is specified on the RUNSTATS command.
The above two, NUM_FREQVALUES and NUM_QUANTILES must be the same on the
test system as in production in order to make sure you collect the same
number of frequent and quantile values on test system as in production.
SQL statement heap (4KB): (STMTHEAP)
The statement heap is used as a workspace for the SQL compiler during
compilation of an SQL statement. This parameter specifies the size of
this workspace. If this parameter is smaller on test than that in
production, you may start seeing SQL0101N message due to lack of space
in statement heap needed to compile the query. As well, you may see
SQL0437W RC=1 , dropping down to greedy join enumeration if there is
not enough statement heap for dynamic join enumeration.
Back to top
Examples of recreating the optimizer/query plan issue on test system
Example 1:
OS: Windows 2000
DB2LEVEL: V8.2 Fixpack 8 ESE Single partition
TEST and PRODUCTION same OS and db2level.
Databases:
Production Database: SAMPLE
Test Database: DUMMYDB
Sample database created using: db2sampl
Dummy database created using:
db2 create db DUMMYDB
Note: Create the TEST database with same codepage, territory and
collating sequence as production.
PRODUCTION ENVIRONMENT:
--------------------------------------------------------
-- Database SAMPLE and Database Manager configuration parameters
--------------------------------------------------------
UPDATE DBM CFG USING cpuspeed 9.446886e-007;
UPDATE DBM CFG USING intra_parallel NO;
UPDATE DBM CFG USING federated NO;
UPDATE DBM CFG USING fed_noauth NO;
!db2fopt SAMPLE update opt_buffpage 250;
!db2fopt SAMPLE update opt_sortheap 256;
UPDATE DB CFG FOR SAMPLE USING locklist 50;
UPDATE DB CFG FOR SAMPLE USING dft_degree 1;
UPDATE DB CFG FOR SAMPLE USING maxlocks 22;
UPDATE DB CFG FOR SAMPLE USING avg_appls 1;
UPDATE DB CFG FOR SAMPLE USING stmtheap 2048;
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5;
---------------------------------
-- Environment Variables settings
---------------------------------
!db2set DB2_INLIST_TO_NLJN=yes;
!db2set DB2_HASH_JOIN=yes;
In addtion to the above, the following configuration should also be
noted in the database configuration:
db2 get db cfg for sample > dbcfg_sample.out
Database heap (4KB) (DBHEAP) = 600
SQL statement heap (4KB) (STMTHEAP) = 2048
Number of frequent values retained (NUM_FREQVALUES) = 10
Number of quantiles retained (NUM_QUANTILES) = 20
Make sure that you stop and start the instance after making the
database manager configuration (dbm cfg) changes. For the sample
database, run runstats as follows for the ORG and the SALES table:
db2 connect to sample
db2 runstats on table <schema>.org with distribution and indexes all
db2 runstats on table <schema>.sales with distribution and indexes all
db2 terminate
Now, generate the EXPLAIN tables, by executing the EXPLAIN.DDL file,
which is under the <install directory>\sqllib\misc directory:
db2 connect to sample
db2 -tvf <intall path>\EXPLAIN.DDL
db2 terminate
Save the following commands in a file called query.sql:
connect to sample
set current explain mode explain
select * from org a, staff b where a.deptnumb=b.dept and b.dept=15
set current explain mode no
terminate
Now, execute the file as follows:
db2 -tvf query.sql
The above will only compile the query in explain mode. You should see
the following on the screen:
C:\>db2 -tvf query.sql
connect to sample
Database Connection Information
Database server = DB2/NT 8.2.1
SQL authorization ID = SKAPOOR
Local database alias = SAMPLE
set current explain mode explain
DB20000I The SQL command completed successfully.
select * from org a, staff b where a.deptnumb=b.dept and b.dept=15
SQL0217W The statement was not executed as only Explain information
requests
are being processed. SQLSTATE=01604
set current explain mode no
DB20000I The SQL command completed successfully.
C:\>db2 terminate
DB20000I The TERMINATE command completed successfully.
Generate the access plan using db2exfmt as follows:
db2exfmt -d SAMPLE -g TIC -w -1 -n % -s % -# 0 -o prod_sample_exfmt.txt
Examine the contents of prod_sample_exfmt.txt file. You should see the
following access plan generated:
Access Plan:
-----------
Total Cost: 25.8823
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
4
HSJOIN
( 2)
25.8823
2
/-----+-----\
4 1
TBSCAN TBSCAN
( 3) ( 4)
12.9682 12.913
1 1
| |
35 8
TABLE: SKAPOOR TABLE: SKAPOOR
STAFF ORG
Now, this is the plan you are getting on your production sample
database. You would like to mimic this plan on your test environment.
Gather the following information from the production sample database:
db2look -d SAMPLE -l -o storage.out
db2look -d SAMPLE -f -fd -o config.out
db2look -d SAMPLE -e -a -m -t ORG SALES -o table.ddl
TEST ENVIRONMENT:
Change the database name in each of the files gathered above where we
connect to the database from SAMPLE to DUMMYDB.
For example, if you view the contents of the 3 files, you will notice:
CONNECT TO SAMPLE;
Change this to:
CONNECT TO DUMMYDB;
Take the files over to the test environment. In this example, all the
tables are created under the default tablespace, USERSPACE1. So they
should also be created in the same SMS tablespace on the test system
with same configuration as dumped out in storage.out, including
PREFETCHSIZE, EXTENTSIZE, and so on.
Make one little change in the config.out file. Change the following:
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5;
to,
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 3;
and now save the config.out file.
Now, execute storage.out, config.out, and table.ddl as follows:
db2 -tvf storage.out > storage_output.out
db2 -tvf config.out > config_output.out
db2 -tvf table.ddl > table.out
Examine the output files to make sure that all the commands ran
successfully. Also, make changes to DBHEAP, STMTHEAP, NUM_FREQVALUES,
NUM_QUANTILES for DUMMYDB as it is for the SAMPLE DB shown in the
production environment setup. Also, check that your registry variable
settings are as identical as possible.
Stop and start the instance, using db2stop and db2start. Create the
explain tables again for this DUMMYDB database:
db2 connect to dummydb;
<install path>\sqllib\misc\db2 -tvf EXPLAIN.DDL
db2 terminate;
Now, run the query against the DUMMYDB database, modifying the database
name from SAMPLE to DUMMYDB in the query.sql file you generated earlier
when running against the SAMPLE database.
C:\>db2 -tvf query.sql
connect to dummydb
Database Connection Information
Database server = DB2/NT 8.2.1
SQL authorization ID = SKAPOOR
Local database alias = DUMMYDB
set current explain mode explain
DB20000I The SQL command completed successfully.
select * from org a, staff b where a.deptnumb=b.dept and b.dept=15
SQL0217W The statement was not executed as only Explain information
requests
are being processed. SQLSTATE=01604
set current explain mode no
DB20000I The SQL command completed successfully.
C:\>db2 terminate
DB20000I The TERMINATE command completed successfully.
Generate the db2exfmt output:
db2exfmt -d DUMMYDB -g TIC -w -1 -n % -s % -# 0 -o
test_dummydb_exfmt.txt
Examine the contents of test_dummydb_exfmt.txt and look at the access
plan:
Access Plan:
-----------
Total Cost: 25.8843
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
4
MSJOIN
( 2)
25.8843
2
/-----+-----\
1 4
TBSCAN TBSCAN
( 3) ( 5)
12.913 12.9682
1 1
| |
8 35
TABLE: SKAPOOR TABLE: SKAPOOR
ORG STAFF
You get a different access plan in test than production. In this case,
it is obvious, as we changed the DFT_QUERYOPT (default query
optimization) on test system from 5 to 3. So, you see a Merge Join plan
rather then the Hash Join plan, and the Total Cost differs a tiny bit,
as well.
Since the plans don't match (let's say you are not sure why), check the
configuration in the db2exfmt output. See Table 2.
As you can see, between the TEST and PRODUCTION, the only difference is
the Optimization Level, which we intentionally changed from 5 to 3 just
to show why one would not be successful in replicating the production
access plan on a testing environment.
In this case, you should update the DFT_QUERYOPT, using the following
UPDATE statement, to 5:
UPDATE DB CFG FOR SAMPLE USING dft_queryopt 5
Then terminate and reconnect to the database. Issue query.sql again
against the DUMMYDB, and generate the access plan using the db2exfmt
command. This time you should see the same access plan. If not, further
make sure that all the optimizer-related parameters discussed in this
article are identical.
Example 2:
This example shows the importance of the -m option in the db2look
command. The statistics are gathered with the -m option as mentioned
earlier, and should be the same in both test and production. In this
example, we'll see how the plan changes when the statistics are not
updated properly.
The database manager configuration, database configuration and db2set
registry variables are the same as in Example 1 above. The schema name
here is SKAPOOR. Replace it with the schema for your table. The
databases are the same, SAMPLE and DUMMY as in Example 1. The platform
and db2level used here were AIX 5.1 with DB2 UDB ESE V8.2, Fix pack 8,
single partition.
Perform the following commands on the sample database:
db2 "connect to sample"
db2 "create index name_ind on staff (name,id)"
db2 "runstats on table skapoor.staff with distribution and indexes all"
db2 "set current explain mode explain"
db2 "select name from staff where id=10 order by name"
db2 "set current explain mode no"
db2 "terminate"
Generate the access plan using db2exfmt. You will see the following
access plan:
Access Plan:
-----------
Total Cost: 0.111065
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
IXSCAN
( 2)
0.111065
0
|
35
INDEX: SKAPOOR
NAME_IND
Gather the db2look information from the sample database:
db2look -d sample -l -o storage.out
db2look -d sample -e -a -m -t STAFF -o db2look.out
db2look -d sample -f -fd -o config.out
Modify the files so that you are connecting to the dummy database
instead of sample as we did previously in Example 1 above.
Modify one of the statistics manually. Search for the following
statement in the db2look.out file (Note that the schema name, TABSCHEMA
and INDSCHEMA will differ in your case):
UPDATE SYSSTAT.INDEXES
SET NLEAF=1,
NLEVELS=1,
FIRSTKEYCARD=35,
FIRST2KEYCARD=35,
FIRST3KEYCARD=-1,
FIRST4KEYCARD=-1,
FULLKEYCARD=35,
CLUSTERFACTOR=-1.000000,
CLUSTERRATIO=100,
SEQUENTIAL_PAGES=0,
DENSITY=0,
AVERAGE_SEQUENCE_GAP=0.000000,
AVERAGE_SEQUENCE_FETCH_GAP=0.000000,
AVERAGE_SEQUENCE_PAGES=0.000000,
AVERAGE_SEQUENCE_FETCH_PAGES=0.000000,
AVERAGE_RANDOM_PAGES=1.000000,
AVERAGE_RANDOM_FETCH_PAGES=0.000000,
NUMRIDS=35,
NUMRIDS_DELETED=0,
NUM_EMPTY_LEAFS=0
WHERE INDNAME = 'NAME_IND' AND INDSCHEMA = 'SKAPOOR '
AND TABNAME = 'STAFF' AND TABSCHEMA = 'SKAPOOR ';
Now, change the FIRSTKEYCARD, FIRST2KEYCARD, FULLKEYCARD AND NUMRIDS
from 35 to 37. Save the db2look.out file and now run the 3 files:
db2 -tvf config.out > config_output.out
db2 -tvf storage.out > storage_output.out
db2 terminate
db2stop
db2start
db2 -tvf db2look.out > db2look_output.out
Examine the contents of the first two files, config_output.out and
storage_output.out to make sure they ran successfully. Now, examine the
contents of db2look_output.out file. You will see that the following
update statement below failed:
UPDATE SYSSTAT.INDEXES SET NLEAF=1, NLEVELS=1, FIRSTKEYCARD=37,
FIRST2KEYCARD=37
, FIRST3KEYCARD=-1, FIRST4KEYCARD=-1, FULLKEYCARD=37,
CLUSTERFACTOR=-1.000000, C
LUSTERRATIO=100, SEQUENTIAL_PAGES=0, DENSITY=0,
AVERAGE_SEQUENCE_GAP=0.000000, A
VERAGE_SEQUENCE_FETCH_GAP=0.000000, AVERAGE_SEQUENCE_PAGES=0.000000,
AVERAGE_SEQ
UENCE_FETCH_PAGES=0.000000, AVERAGE_RANDOM_PAGES=1.000000,
AVERAGE_RANDOM_FETCH_
PAGES=0.000000, NUMRIDS=37, NUMRIDS_DELETED=0, NUM_EMPTY_LEAFS=0 WHERE
INDNAME =
'NAME_IND' AND INDSCHEMA = 'SKAPOOR ' AND TABNAME = 'STAFF' AND
TABSCHEMA = 'SK
APOOR '
DB21034E The command was processed as an SQL statement because it was
not a
valid Command Line Processor command. During SQL processing it
returned:
SQL1227N The catalog statistic "37" for column "FULLKEYCARD" is out of
range
for its target column, has an invalid format, or is inconsistent in
relation
to some other statistic. Reason Code = "8". SQLSTATE=23521
As you can see, the above UPDATE statement for the index, NAME_IND,
failed because the FULLKEYCARD is greater then the cardinality (CARD)
of the table. The CARD is 35 as can be seen by the following update
statement in the db2look.out file:
UPDATE SYSSTAT.TABLES
SET CARD=35,
NPAGES=1,
FPAGES=1,
OVERFLOW=0,
ACTIVE_BLOCKS=0
WHERE TABNAME = 'STAFF' AND TABSCHEMA = 'SKAPOOR ';
Now again, run the same query:
db2 "select name from staff where id=10 order by name"
in explain mode and generate the access plan. You will see it is
different:
Access Plan:
-----------
Total Cost: 12.972
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
1
TBSCAN
( 2)
12.972
1
|
1
SORT
( 3)
12.9708
1
|
1
TBSCAN
( 4)
12.9682
1
|
35
TABLE: SKAPOOR
STAFF
This example shows that if RUNSTATS is run while there is WRITE
activity happening on the table, the statistics can be inconsistent as
in this example. Hence, the UPDATE statements to update the statistics
can fail with the SQL1227N error message. It is very important that all
the UPDATE statements run successfully, and if there are any
inconsistencies, they should be fixed and re-run. In this case,
solution would be to change back the KEYCARDS and NUMRIDS from 37 to
35.
Example 3:
You want to mimic the whole database in production to test, in a single
partition environment.
Note: You may need to change the database name in each output of
db2look, if the database names are different on test compared to
production.
Step1: Collect the db2look with the -l option to gather the
tablespace/bufferpoo/database node group information.
db2look -d <dbname> -l -o storage.out
Modify the tablespace information to fit your test environment. Say,
for example, in production you have the following tablespace:
------------------------------------
-- DDL Statements for TABLESPACES --
------------------------------------
CREATE REGULAR TABLESPACE DMS1 IN DATABASE PARTITION GROUP
IBMDEFAULTGROUP
PAGESIZE 4096 MANAGED BY DATABASE
USING ( FILE '/data/dms1'20000,
FILE '/data/dms2'20000,
FILE '/data/dms3'20000)
EXTENTSIZE 32
PREFETCHSIZE 32
BUFFERPOOL IBMDEFAULTBP
OVERHEAD 12.670000
TRANSFERRATE 0.180000
DROPPED TABLE RECOVERY ON;
If you do not have the same paths set up on test, then change the
location above. If you are only planning to mimic the environment, not
copy the data over, then make the file sizes smaller, and use fewer
containers if needed. If you do not have the same buffer pools created,
then you can also change the buffer pool name. Buffer pool must have
same pagesize. Don't change tablespace pagesize Once you are done with
this and have created the databsae, run the storage.out file:
db2 -tvf storage.out
Redirect the output if needed to make sure it all went through
successfully. For example:
db2 -tvf storage.out > storage_results.out
Step 2: Collect the configuration and environment variable information
from production and run it on test:
db2look -d sample -f -fd -o config.out
Remember, in MPP environments, this information will be collected for
the node where you ran this. If your DB2 registry and database and
database manager configuration are different on different database
partitions, you would want to gather the information from each node
separately. However, if you cannot have the same partitions in test as
in production, then use this information from the node where you
execute the query in production and then use this information on test.
Note that if you have different number of partitions on test, your
simulation will be imperfect.
On test, run the config.out file as follows:
db2 -tvf config.out
The above is taking under consideration that the optimizer will use the
db2fopt information to look at the total buffer pools allocated and
sortheap, which will be the setting now in your test environment.
Again, this is a technique used when you don't have the same buffer
pools due to memory constraints and sortheap in test as in production
Also, other configuration parameters along with environment variables
discussed earlier in this article will also be updated.
Step 3: Collect the DDL information for all objects from production, as
we are mimicking the whole database, and run the db2look in test.
On production:
db2look -d sample -e -a -m -o db2look.out
On test:
db2 -tvf db2look.out
To see the output result, issue:
db2look -tvf db2look.out > db2look.results
Once the above steps are complete, ensure that the dbheap database
configuration parameter is set to the same value in test as in
production.
Step 4: Get the access plan using db2exfmt from both test and
production, and make sure that the following is the same as it is in
production:
Database Context:
----------------
Parallelism: None
CPU Speed: 4.762804e-07
Comm Speed: 100
Buffer Pool size: 128500
Sort Heap size: 128
Database Heap size: 5120
Lock List size: 12250
Maximum Lock List: 10
Average Applications: 4
Locks Available: 78400
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 3
Blocking: Block All Cursors
Isolation Level: Cursor Stability
---------------- STATEMENT 1 SECTION 201 ----------------
QUERYNO: 1
QUERYTAG: CLP
Statement Type: Select
Updatable: No
Deletable: No
Query Degree: 1
Now, look at the access plan. If they are the same, then you are
successful in recreating the access plan. Note also that you should
also look towards the bottom of the db2exfmt output to verify that the
tablespace configuration match.
Example 4:
Production: MPP with 16 physical / 4 logical per partition.
Test: MPP with only 4 physical machines available with 4 logical per
partition.
Tables, views/MQTs involved in the query.
In this example, the simulation may not work accurately. You must have
the name number of partitions in both test and production. However, you
can still try to recreate, but it will not be right.
Therefore, you must add 16*4=64 partitions to the test environment in
order to get a proper recreation. You do not need 16 physical machines
in the test environment; that is, you can have four physical machines
with 16 logical partitions per physical. It's up to you, but you must
have 64 logical partitions overall, the same as production.
So now, after making the changes to add the same number of logical
partitions to test environment, the test should look like the original
production setup, shown in the table below.
Table 3. Production setup
DBPARTITIONS
ALLNODES (on nodes 1 to 64)
NODE1 (db partition defined on node 1)
NODE2 (db partition defined on node 5)
TABLESPACES
TABSPACE1 (DMS using devices defined in dbpartition ALLNODES)
TABSPACE2 (DMS using SMS defined in dbpartition NODE1)
TABSPACE3 (DMS using DMS defined in dbpartition NODE2)
TABLES
TAB1 in TABSPACE1
TAB2 in TABSPACE2
TAB3 in TABSPACE3
MQT:
MQT defined on TAB3
VIEW:
VIEW1 defined that include two tables TAB1 and TAB2
Make sure you gather the db2look using the -f and -fd from the node
from which the query is being issued to make sure that the buffer pool
info, as discussed earlier, is picked from this node and the registry
setting, along with db cfg and dbm cfg, is picked up from the node
where the query is being ran. In my experience, customers usually have
same configuration on all the nodes, except for buffer pools, which is
a very important setting.
Steps to follow:
Step1: Gather the storage information from production:
db2look -d <dbname> -l -o storage.out
Step2: Modify the tablespace/buffer pool information to fit the test
environment. If you do not have devices available, then use DMS file
containers. Also, if you do not wish to use the same number of
containers in test that are in production, shorten the list and use
fewer containers. But again, you will have to make sure that if a
tablespace is of type DMS or SMS on production, it must remain the same
type in test.
Step3: Gather the configuration information using the following
command:
db2look -d <dbname> -f -fd -o config.out
Step4: Now, gather the db2look information for just the objects that we
are interested in. In our example, we want all related information,
including table DDL, views, and MQTs associated with the tables:
db2look -d <dbname> -e -a -m -t TAB1 TAB2 TAB3 -o db2look.out
Once you have gathered all of this information and modified the
tablespace/buffer pool information, execute the db2look output files on
test environment, and, again, get the db2exfmt output from both
production and test and compare them.
Example 5:
This is a classic example of an issue where you have collected RUNSTATS
information during activity on the table. You're getting SQL1227N error
messages, and you will not be able to recreate the issue unless you
modify the statistics manually.
Say for example, the table has one million rows and there is a primary
key defined on an integer column. You ran RUNSTATS with distribution
and indexes all option, allowing write access to the table. By the time
we got the table statictics, there were 100,000 additional records
inserted into the table. So for the table statistics, the CARD will
show as 1,100,000. However, by the time we started collecting index
statistics, for example, for the primary key defined on the integer
column, there were 10,000 additional records inserted, hence, the
number of rows in the table is 1,110,000, and hence the primary index
FIRSTKEYCARD would be 1,110,000. So you can see the inconsistency. The
CARD for the table statistics says that there should be 1,100,000
records in the table, whereas the FIRSTKEYCARD for the primary index
statistics says that there are 1,110,000 records in the table. The
update for the index statistics would fail with a SQL1227N rc=8 error
message (in this case) as the FIRSTKEYCARD of the index is greater than
CARD of the table. You will have to manually fix this inconsistency
and, for this example, make the FIRSTKEYCARD equal to CARD, which is
1,100,000, or the opposite -- that is, increase CARD to equal
FIRSTKEYCARD, which is 1,110,000.
There are many other inconsistencies you could encounter. Make sure
that when you run the db2look with the -m option on test that you save
the output to a file, review all the inconsistencies, and fix them. I
have only given an example of one inconsistency here; there are many
others that you can run into, and it is left to the user to fix all
these inconsistencies and then rerun db2look, redirecting the output to
a file to make sure all the update statistics ran fine without any
problems.
Example 6:
This is an example where in production you get a SQL0437W rc=1 warning
message and you are not seeing it test. In this case, follow the above
examples to recreate the issue. Make sure that STMTHEAP is the same. If
it is different (for example, higher in test for some reason then in
production), then you may not see the same warning. Again, the other
parameters that we've already discussed matter as well.
SQL0437W rc=2 and other return codes can be recreated as well,
following the same method.
Other errors messages, such as SQL0101N and SQL0901N, can also be
recreated using the same method. Even traps in the compiler/optimizer
area can be recreated. This is very useful when you are at an older fix
pack level and want to try the latest fix pack level to see if that
avoids the problem or when you want to play around with different
optimization levels to see if that would overcome the problem
temporarily.
Back to top
Conclusion
The db2look utility is quite a powerful utility that you can use to
recreate access plan problems along with compiler issues like those we
have talked about in this article. Once the problem is recreated, you
can test out many variables that can affect performance, such as
changing optimization levels, playing with registry variables, and
updating the statistics without affecting the production, as well as
testing new fix pack levels. You'll find this a handy utility for
debugging problems and improving query performance.