Create tablespace slow

Desmodromic

unread,

Sep 7, 2011, 7:58:52 AM9/7/11

to

The following command to create a 14.6 GB tablespace took 19 minutes
to run. This strikes me as very slow. What are your experiences?

CREATE LARGE TABLESPACE "CREC_2011Q4_IDX" IN DATABASE PARTITION GROUP
IBMDEFAULTGROUP PAGESIZE 8192 MANAGED BY DATABASE USING (FILE '<LUN>/
CREC_2011Q4_IDX.01' 1914624) EXTENTSIZE 24 PREFETCHSIZE AUTOMATIC
BUFFERPOOL BP_IX_8K_CALL_REC AUTORESIZE YES INCREASESIZE 615168 K
MAXSIZE NONE NO FILE SYSTEM CACHING DROPPED TABLE RECOVERY ON;

The system is DB2 LUW 9.5 FP7 on Red Hat ELS 5.5. The underlying disk
storage is a NetApp filer.

I thought I would then see how dd performs with a file of the same
size being written to the same LUN using the command below:

dd if=/dev/zero of=<LUN>/emptyfile bs=8k count=1914624

Performance was better than the CREATE TABLESPACE command but not
great, taking an average of about 6 minutes over 3 runs.

I think I may have answered my own question here. This iostat output
was take during the dd command runs:

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sdb 0.00 18066.12 13.59 161.38 183.73 145819.98
834.43 148.21 834.62 5.96 104.20

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sdb 0.00 30114.54 18.07 239.10 224.75 243627.50
948.21 296.96 1133.86 7.64 196.48

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util
sdb 0.00 20727.59 14.90 164.26 168.57 167606.98
936.45 168.60 921.69 6.50 116.43

The await, svctm and %util values do not strike me as being great.

Interestingly, I ran a very similar CREATE TABLESPACE command on a
laptop running DB2 9.7 FP4 and Windows XP. The 14GB tablespace was
created in less than a second.

Can anyone shed any light on this?

I welcome your comments.

Thanks in advance.

Arthurccc

unread,

Sep 7, 2011, 10:48:39 AM9/7/11

to

Could you take out "NO FILE SYSTEM CACHING" from the command, then
take a try?

Arthur

Helmut Tessarek

unread,

Sep 7, 2011, 11:50:08 AM9/7/11

to

What type of filesystem are you using? ext3/ext4/xfs/jfs ?

On 07.09.11 7:58 , Desmodromic wrote:
> The system is DB2 LUW 9.5 FP7 on Red Hat ELS 5.5. The underlying disk
> storage is a NetApp filer.
>

> Can anyone shed any light on this?
>
> I welcome your comments.
>
> Thanks in advance.

--
Helmut K. C. Tessarek
DB2 Performance and Development
IBM Toronto Lab

Desmodromic

unread,

Sep 7, 2011, 7:32:06 PM9/7/11

to

The filesystem is ext3.

Desmodromic

unread,

Sep 8, 2011, 4:54:05 AM9/8/11

to

I ran the same CREATE TABLESPACE command without the NO FILE SYSTEM
CACHING option. It took 13 minutes rather than 19 minutes. What is the
rationale behind not using that option? I normally use it to prevent
double buffering.

Desmodromic

unread,

Sep 8, 2011, 4:57:28 AM9/8/11

to

This is not a good situation. During CREATE, DROP and RESIZE of
tablespaces, applications are not able to connect or disconnect from
the database. This is making any TABLESPACE administration an offline
task. Auto-extend also results in the same issue.

Tonkuma

unread,

Sep 8, 2011, 8:14:54 AM9/8/11

to

How about to specify "INITIALSIZE ... K"(48K or more)?

I thought that DB2 might formatting the tablespace upto INCREASESIZE
615168 K.

stefan.albert

unread,

Sep 8, 2011, 8:58:07 AM9/8/11

to

Hi,

since your dd command is also very slow I assume that there coulbd be
a config problem between the Linux-box and the Netapp-Filer. Something
like buffersizes for communication or so - I'm not a specialist for
these things, but I would have a look at this:
1) CFG of linux for connection to NetApp
2) CFG for NetApp for conn. to linux
3) the equipment for the communication itself (FC? ethernet? firewall?
switches?)

"NONE NO FILE SYSTEM CACHING" is OK for me, we use it also - and the
duration for TS creation is not significantly faster...

Is this the first/only tablespace you use on this NetApp-Filer on this
box?
Do you have other boxes connected to NettApp having a much better
speed?

Good luck - Stefan

Desmodromic

unread,

Sep 8, 2011, 10:52:21 PM9/8/11

to

All tablespaces are affected and all of our DB2 systems - Dev, UAT and
Prod - seem to suffer from the same I/O issues. All run Red Hat ELS
with NetApp storage. So, I don't have any DB2 systems running well
with NetApp storage here.

We are thin on Linux/Storage experts as well, which doesn't help.

One thing I am not keen on is that all containers reside in a single
2TB LUN.

The noatime option for this LUN is not set either. Does anyone know
whether setting it will lead to a significant performance improvement?

Thanks.

Helmut Tessarek

unread,

Sep 8, 2011, 11:37:20 PM9/8/11

to

On 08.09.11 22:52 , Desmodromic wrote:
> One thing I am not keen on is that all containers reside in a single
> 2TB LUN.

This is definitely not optimal. The question now is how many disks are there
underneath? Is there only one LUN defined on this array or are there several
LUNs which are used for different databases/apps?

In any case, please set DB2_PARALLEL_IO=*

I'm not familiar with NetApp storage, but the basics are the same.
The more disks for a LUN, the better. If you can spread tablespaces by placing
containers on different LUNs on different arrays (different set of disks),
even better.

> The noatime option for this LUN is not set either. Does anyone know
> whether setting it will lead to a significant performance improvement?

There is a performance impact, but it won't be significant. In benchmarks you
can get up to .1% - .8% improvement. For your workload, you will have to test
it yourself.

Desmodromic

unread,

Sep 9, 2011, 12:07:08 AM9/9/11

to

On our production system I am told the LUN is comprised of 48 physical
disks. Consequently, I have set DB2_PARALLEL_IO=*:48

These physical disks are also dedicated solely to this one DB2
database.

Typical iostat output from this system is shown below.

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
avgqu-sz await svctm %util

data_new_SdDg-data_new_SdHv
0.00 0.00 467.80 67.00 30043.20 908.80
57.88 34.44 64.43 1.87 100.02

From this it looks like an I/O request is spending 97% of its time
queuing.

Typical CPU usage looks like that below. Again %iowait looks high.

12:02:07 PM CPU %user %nice %system %iowait
%steal %idle
12:02:12 PM all 7.83 0.00 2.12 60.52
0.00 29.53
12:02:12 PM 0 34.94 0.00 11.45 53.61
0.00 0.00
12:02:12 PM 1 10.40 0.00 1.40 84.80
0.00 3.40
12:02:12 PM 2 2.40 0.00 0.40 62.73
0.00 34.47
12:02:12 PM 3 2.20 0.00 0.40 37.60
0.00 59.80
12:02:12 PM 4 1.80 0.00 0.60 34.13
0.00 63.47
12:02:12 PM 5 5.58 0.00 0.80 59.56
0.00 34.06
12:02:12 PM 6 2.20 0.00 0.80 71.54
0.00 25.45
12:02:12 PM 7 3.60 0.00 1.20 80.20
0.00 15.00

Helmut Tessarek

unread,

Sep 9, 2011, 4:22:03 PM9/9/11

to

Hi,

> On our production system I am told the LUN is comprised of 48 physical
> disks. Consequently, I have set DB2_PARALLEL_IO=*:48

Ok, I don't want to be bitchy here, but were you only told, or did you verify
this yourself?
What about your io_servers and io_cleaners?
What does a snapshot tell you? Is there too much wait, because of too less
cleaners or servers?

> These physical disks are also dedicated solely to this one DB2
> database.

This is good.

> Typical iostat output from this system is shown below.
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz
> avgqu-sz await svctm %util
> data_new_SdDg-data_new_SdHv
> 0.00 0.00 467.80 67.00 30043.20 908.80
> 57.88 34.44 64.43 1.87 100.02
>
> From this it looks like an I/O request is spending 97% of its time
> queuing.

This is not good. Not good at all. We need to have a look at your storage
driver and configration. Any chance that a Linux OS person can have a look at
the adapter? How is the storage connected anyway? Fibre channel? How is the
saturation? But I guess you are way below that point.

> Typical CPU usage looks like that below. Again %iowait looks high.

Any chance we can get an oprofile? Where is the CPU spent? CPU usage is quite
low, but it might still indicate the root cause.
An oprofile output and DB2 stack should tell you what is going on in your system.
Does the slow-down occur only during tablespace creation or also during your
workload?
Some other questions that might point you to the right direction:
What workload are you running? OLTP / Warehouse? Is the slow behaviour all the
time or only sometimes?

I'm sorry, but I will be on vacation the next two weeks, so I probably won't
be reading any newsgroups / mails nor access the Internet...

I suggest that you involve the RedHat performance team and if you can, open a
PMR with IBM.

TheBoss

unread,

Sep 9, 2011, 5:27:00 PM9/9/11

to

Desmodromic <davi...@yahoo.com.au> wrote in
news:565d9a88-41f4-41f5...@a10g2000prn.googlegroups.com:

In addition to the advice given already by Helmut, you may want to check
this thread on the IDUG-forum that seems related:
http://www.idug.org/p/fo/et/thread=34814

Also have a look at the recommendations in these NetApp Tech. Docs:

===
1. "DB2 9 for UNIX: Integrating with a NetApp Storage System"
http://media.netapp.com/documents/tr-3531.pdf

Chapter 9 has the following statement:
"Two of these registry variables, DB2_PARALLEL_IO and
DB2_STRIPED_CONTAINERS, should always be set when DB2 is used in
conjunction with a storage system"

===
2. "IBM DB2 on NetApp Storage: Deployment and Best Practices"
http://media.netapp.com/documents/tr-3272.pdf

Par. 11.4 discusses how to decide on some important storage related
settings, like the EXTENTSIZE and the PREFETCHSIZE

===
3. "IBM DB2 9.5 Performance and Scalability on RHEL5 with NFS and FCP
Using NetApp FAS or IBM N series Storage System"
http://media.netapp.com/documents/tr-3775.pdf

Par. 3.3 suggests to set DB2_LOGGER_NON_BUFFERED_IO to enable Direct-I/O
for the transaction logs.

===

Note: the docs also contain recommendations for Linux.

HTH

Cheers!

--
Jeroen

Desmodromic

unread,

Sep 12, 2011, 7:45:30 AM9/12/11

to

On Sep 10, 5:27 am, TheBoss <TheB...@invalid.nl> wrote:

Thanks for the links. The IDUG thread was actually posted by an ex-
colleague of mine and he was talking about exactly the same system
that I am!

I can confirm that our single data LUN is built using a single FlexVol
on a NetApp aggregate consisting of 3 RAID-DP arrays. Each RAID array
consists of 18 disks. Subtracting 2 parity disks from each array gives
us 3 x 16 disks for the LUN, hence the setting of DB2_PARALLEL_IO=*:
48.

Our tablespace extent sizes are typically 192KB. This is based upon
(what we believed was) a 4KB strip size and a RAID stripe size of 48 x
4KB = 192KB.

From what I understand in the TR-3272 document, the tablespace extent
size should be a multiple of 256KB.

I am still slightly confused about this Tetris size of 256KB. Is that
the RAID strip or stripe size? If it's the strip size then our stripe
size would be 16 x 256KB = 4096KB. In fact, since the aggregate
consists of 3 RAID arrays, wouldn't our RAID stripe size be 3 x 4096KB
= 12288KB? That's 12MB, which sounds excessive to me.

Also, are there any recommendations regarding the number of disks per
RAID array and the number of RAID arrays per aggregate? Should RAID
arrays span multiple disk trays or be confined to disks within the
same tray?

Any advice will be much appreciated.

TheBoss

unread,

Sep 14, 2011, 6:01:31 PM9/14/11

to

Desmodromic <davi...@yahoo.com.au> wrote in
news:37acbf57-e52e-45e7...@s2g2000prm.googlegroups.com:

> On Sep 10, 5:27 am, TheBoss <TheB...@invalid.nl> wrote:
>> Desmodromic <davies...@yahoo.com.au> wrote

>> innews:565d9a88-41f4-41f5-9ba9
> -b391be...@a10g2000prn.googlegroups.com:

>> 1. "DB2 9 for UNIX: Integrating with a NetApp Storage
>> System"http://media
> .netapp.com/documents/tr-3531.pdf
>>
>> Chapter 9 has the following statement:
>> "Two of these registry variables, DB2_PARALLEL_IO and
>> DB2_STRIPED_CONTAINERS, should always be set when DB2 is used in
>> conjunction with a storage system"
>>
>> ==

>> 2. "IBM DB2 on NetApp Storage: Deployment and Best
>> Practices"http://media
> .netapp.com/documents/tr-3272.pdf
>>
>> Par. 11.4 discusses how to decide on some important storage related
>> settings, like the EXTENTSIZE and the PREFETCHSIZE
>>
>> ==

>> 3. "IBM DB2 9.5 Performance and Scalability on RHEL5 with NFS and FCP
>> Using NetApp FAS or IBM N series Storage System"http://media.netap
> p.com/documents/tr-3775.pdf
>>
>> Par. 3.3 suggests to set DB2_LOGGER_NON_BUFFERED_IO to enable
>> Direct-I/O for the transaction logs.
>>
>> ==
>>

My experience on this subject isn't enough to offer sound advice.
If you have a service contract, your best bet would to open a service
request at either IBM or NetApp for this.
Alternatively you could try to contact (via social media?) well known
DB2 Champion Roger E. Sanders, who used to work for NetApp...

Cheers!

--
Jeroen