Performance question

Adrian

unread,

Sep 25, 2001, 2:13:05 AM9/25/01

to

Hello,

I am a relatively new SCO user and have a few questions about
performance issues.
We currently run SCO OpenServer 5.0.5, which is an application server
hosting a DataFlex database. We have about 100 users acces the server
at any one time. Our system is a Pentium II/450, with 512MB RAM. The
system has a RAID 5 array (4 disks, 10,000rpm) for /usr (the dataflex
database and application) and the rest is on a RAID 1 array.

We are having some performance problems- mainly slow resonse times while
using the application that runs on the DataFlex database. I have used
'sar' to get some idea about the bottlenecks but would like some
guidance. Any help would be appreciated.

Here is a cut down version of the output, with comments and questions
throughout (prepended with =>):

=====================================

SCO_SV thor 3.2v5.0.5 PentII(D)ISA 09/25/2001

10:22:49 %usr %sys %wio %idle (-u)
10:22:49 sar data collection enabled
12:20:01 9 20 55 16
12:40:02 10 31 46 13
13:00:03 8 36 48 7
13:20:01 10 46 28 15
13:40:01 11 46 24 18
14:00:01 11 47 23 19
14:20:01 10 44 31 15
14:40:01 7 16 43 33
15:00:01 8 17 53 22
15:20:01 6 16 42 35

Average 9 28 45 18

=> 'man sar' remarks upon %sys being much higherthan %usr- is this %sys
much higher than %usr?
=> also, coupled with other info (below) it seems we have a CPU
bottleneck, and maybe a disk bottleneck. does this make sense?
--------------------------------------------------

10:22:49 device %busy avque r+w/s blks/s avwait
avserv (-d)
10:22:49 sar data collection enabled

Average Sdsk-0 45.84 1.04 54.55 441.62 0.38
8.40
Sdsk-2 100.00 1.05 122.42 554.50 0.54
10.28

=> Are 'avque' and 'avwait' low? I am not sure what the accepted range
is.

--------------------------------------------------

10:22:49 swpin/s bswin/s swpot/s bswot/s pswch/s (-w)
10:22:49 sar data collection enabled
12:20:01 0.00 0.0 0.00 0.0 390
12:40:02 0.00 0.0 0.00 0.0 950
13:00:03 0.00 0.0 0.00 0.0 748
13:20:01 0.00 0.0 0.00 0.0 1465
13:40:01 0.00 0.0 0.00 0.0 1622
14:00:01 0.00 0.0 0.00 0.0 1588
14:20:01 0.00 0.0 0.00 0.0 1386
14:40:01 0.00 0.0 0.00 0.0 265
15:00:01 0.00 0.0 0.00 0.0 300
15:20:01 0.00 0.0 0.00 0.0 212

Average 0.00 0.0 0.00 0.0 728

=> This would suggest that there is no swapping of memory to disk, hence
any performance problems are not related to insufficient memory (we have
512 MB). Does this make sense?

-------------------------------------------------------

10:22:49 runq-sz %runocc swpq-sz %swpocc (-q)
10:22:49 sar data collection enabled
12:20:01 1.7 14
12:40:02 1.6 100
13:00:03 1.7 100
13:20:01 1.8 100
13:40:01 1.7 100
14:00:01 1.6 100
14:20:01 1.6 100
14:40:01 1.5 71
15:00:01 1.6 84
15:20:01 1.4 51

Average 1.7 100

=> according to 'man sar' if runq-sz is >2 and %runocc is >90% then the
CPU is heavily loaded and response time will be degraded.
These results seem to concur with the CPU utilization above, suggesting
that CP is the bottleneck. Again, does this make sense?

-------------------------------------------------------

10:22:49 vflt/s pflt/s pgfil/s rclm/s (-p)
10:22:49 sar data collection enabled
12:20:01 13.67 96.75 0.00 0.00
12:40:02 10.23 77.50 0.00 0.00
13:00:03 9.55 68.83 0.01 0.00
13:20:01 6.26 49.77 0.00 0.00
13:40:01 11.06 76.01 0.03 0.00
14:00:01 16.03 104.45 0.00 0.00
14:20:01 9.11 74.16 0.00 0.00
14:40:01 7.76 60.44 0.00 0.00
15:00:01 9.46 75.15 0.00 0.00
15:20:01 8.36 60.71 0.00 0.00

Average 11.30 81.69 0.00 0.00

=> Are these results good or bad? I just don't know what the accepted
ranges are...

------------------------------------------------------

10:22:49 iget/s namei/s dirbk/s (-a)
10:22:49 sar data collection enabled
12:20:01 874 226 3177
12:40:02 740 197 2805
13:00:03 711 237 10053
13:20:01 618 228 4661
13:40:01 699 181 2477
14:00:01 895 228 3222
14:20:01 699 187 2680
14:40:01 642 175 2444
15:00:01 701 185 2666
15:20:01 599 159 2160

Average 769 209 3485

=> Are these results good or bad? I just don't know what the accepted
ranges are...

=====================================

I know this is a long post but any info would be appreciated.

Cheers,

adrian

Bill Vermillion

unread,

Sep 25, 2001, 9:01:01 AM9/25/01

to

In article <3BB02070...@aot.com.au>, Adrian
<adr...@aot.com.au> wrote:

>We currently run SCO OpenServer 5.0.5, which is an application
>server hosting a DataFlex database. We have about 100 users acces
>the server at any one time. Our system is a Pentium II/450, with
>512MB RAM. The system has a RAID 5 array (4 disks, 10,000rpm) for
>/usr (the dataflex database and application) and the rest is on a
>RAID 1 array.
>
>We are having some performance problems- mainly slow resonse times
>while using the application that runs on the DataFlex database. I
>have used 'sar' to get some idea about the bottlenecks but would
>like some guidance. Any help would be appreciated.

>Here is a cut down version of the output, with comments and questions
>throughout (prepended with =>):
>
>=====================================

>=> 'man sar' remarks upon %sys being much higherthan %usr- is this %sys

>much higher than %usr?
>=> also, coupled with other info (below) it seems we have a CPU
>bottleneck, and maybe a disk bottleneck. does this make sense?
>--------------------------------------------------
>
>10:22:49 device %busy avque r+w/s blks/s avwait
>avserv (-d)
>10:22:49 sar data collection enabled
>
>Average Sdsk-0 45.84 1.04 54.55 441.62 0.38
>8.40
> Sdsk-2 100.00 1.05 122.42 554.50 0.54
>10.28

>=> Are 'avque' and 'avwait' low? I am not sure what the accepted range
>is.

If Sdsk-2 is your DataFlex note that it is 100% busy. That's a
problem right there.

>=> This would suggest that there is no swapping of memory to disk, hence
>any performance problems are not related to insufficient memory (we have
>512 MB). Does this make sense?

You didn't post the part of sar regarding memory.

--
Bill Vermillion - bv @ wjv . com

Adrian

unread,

Sep 25, 2001, 6:39:51 PM9/25/01

to

Hi,

Thanks for your prompt response. I actually thought I had posted the memory
section of the sar output. I should just post all of the output. See below.

With regards to your comments on disk usage:

------
>>10:22:49 device %busy avque r+w/s blks/s avwait avserv
(-d)
>>10:22:49 sar data collection enabled
>>
>>Average Sdsk-0 45.84 1.04 54.55 441.62 0.38 8.40
>> Sdsk-2 100.00 1.05 122.42 554.50 0.54
10.28

>>=> Are 'avque' and 'avwait' low? I am not sure what the accepted range
is.

>If Sdsk-2 is your DataFlex note that it is 100% busy. That's a
>problem right there.

When you refer to 'your DataFlex' do you mean the database itself or the flex
programs?
If at any one time there are 70 users using the database, all doing read and
write operations, would the disk not be active for 100% of the time?

Are 'avque' and 'avwait' low? I am not sure what the accepted range is.

Also, i forgot to mention that the RAID arrays are controlled by a DPT
SmartRAID controller.

Here is the whole sar output (with my comments/questions again):

===========================

SCO_SV thor 3.2v5.0.5 PentII(D)ISA 09/25/2001

10:22:49 %usr %sys %wio %idle (-u)

10:22:49 sar data collection enabled

12:20:01 9 20 55 16
12:40:02 10 31 46 13
13:00:03 8 36 48 7
13:20:01 10 46 28 15
13:40:01 11 46 24 18
14:00:01 11 47 23 19
14:20:01 10 44 31 15
14:40:01 7 16 43 33
15:00:01 8 17 53 22
15:20:01 6 16 42 35

Average 9 28 45 18

=> 'man sar' remarks upon %sys being much higherthan %usr- is this %sys

much higher than %usr?
=> also, coupled with other info (below) it seems we have a CPU
bottleneck, and maybe a disk bottleneck. does this make sense?

10:22:49 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s (-b)

10:22:49 sar data collection enabled

12:20:01 490 5774 92 82 121 32 0 0
12:40:02 482 5495 91 81 118 32 0 0
13:00:03 539 12843 96 83 115 28 0 0
13:20:01 374 7410 95 87 122 29 0 0
13:40:01 319 5249 94 85 123 31 0 0
14:00:01 291 5784 95 83 123 32 0 0
14:20:01 355 5116 93 83 119 31 0 0
14:40:01 294 4249 93 83 117 30 0 0
15:00:01 371 4660 92 83 118 29 0 0
15:20:01 279 3818 93 80 108 26 0 0

Average 416 5953 93 83 119 31 0 0

10:22:49 device %busy avque r+w/s blks/s avwait avserv
(-d)
10:22:49 sar data collection enabled

12:20:01 Sdsk-0 47.48 1.05 55.03 447.02 0.42 8.63
Sdsk-2 100.00 1.07 149.94 696.74 0.79
11.08

12:40:02 Sdsk-0 46.36 1.04 55.11 460.78 0.38 8.41
Sdsk-2 100.00 1.04 142.09 664.62 0.47
10.87

13:00:03 Sdsk-0 55.97 1.03 84.89 512.63 0.22 6.59
Sdsk-2 100.00 1.09 152.07 729.75 0.95
11.15

13:20:01 Sdsk-0 57.98 1.04 69.88 482.45 0.32 8.30
Sdsk-2 88.74 1.02 98.80 438.85 0.16
8.98

13:40:01 Sdsk-0 40.60 1.06 46.73 414.03 0.48 8.69
Sdsk-2 81.16 1.02 91.10 394.71 0.16
8.91

14:00:01 Sdsk-0 38.52 1.04 42.53 410.18 0.32 9.06
Sdsk-2 66.22 1.01 76.63 338.24 0.08
8.64

14:20:01 Sdsk-0 41.67 1.04 48.10 424.51 0.39 8.66
Sdsk-2 89.96 1.01 101.61 449.89 0.09
8.85

14:40:01 Sdsk-0 41.22 1.06 47.83 418.47 0.51 8.62
Sdsk-2 75.00 1.01 84.50 334.63 0.11
8.88

15:00:01 Sdsk-0 42.60 1.04 48.11 421.36 0.36 8.85
Sdsk-2 100.00 1.01 117.42 486.96 0.11
8.94

15:20:01 Sdsk-0 38.08 1.04 44.95 398.49 0.31 8.47
Sdsk-2 66.45 1.01 76.16 318.90 0.10
8.72

Average Sdsk-0 45.84 1.04 54.55 441.62 0.38 8.40
Sdsk-2 100.00 1.05 122.42 554.50 0.54
10.28

=> Are 'avque' and 'avwait' low? I am not sure what the accepted range is.

10:22:49 c_hits cmisses (hit %) (-n)

10:22:49 sar data collection enabled

12:20:01 6043456 1029598 (85%)
12:40:02 871243 155008 (84%)
13:00:03 761701 140448 (84%)
13:20:01 634352 132475 (82%)
13:40:01 821489 138681 (85%)
14:00:01 1057146 175629 (85%)
14:20:01 822192 146278 (84%)
14:40:01 755700 138826 (84%)
15:00:01 825044 143912 (85%)
15:20:01 704082 124589 (84%)

Average 1329640 232544 (85%)

10:22:49 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s (-y)

10:22:49 sar data collection enabled

12:20:01 37 0 22606 1 0 0
12:40:02 33 0 20213 2 1 0
13:00:03 30 0 17268 1 0 0
13:20:01 29 0 19630 1 0 0
13:40:01 38 0 20400 1 0 0
14:00:01 35 0 19476 1 0 0
14:20:01 34 0 20377 1 0 0
14:40:01 42 0 22589 1 0 0
15:00:01 39 0 26977 1 0 0
15:20:01 31 0 19368 1 0 0

Average 35 0 21451 1 0 0

10:22:49 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s (-c)

10:22:49 sar data collection enabled

12:20:01 5598 1489 779 1.86 1.80 32029 104772
12:40:02 6271 1891 1019 1.35 1.33 519360 415283
13:00:03 4544 1252 777 1.22 1.18 1423606 232829
13:20:01 7711 2602 1351 0.87 0.85 718557 698401
13:40:01 9081 3071 1454 1.49 1.43 3156753 791765
14:00:01 8762 2833 1402 2.18 2.11 3328404 808558
14:20:01 7669 2460 1312 1.28 1.26 2184966 625751
14:40:01 3899 794 806 1.04 1.01 328412 102245
15:00:01 4487 942 1052 1.35 1.33 2644218 106793
15:20:01 4163 1101 708 1.12 1.09 2132749 94748

Average 6015 1727 972 1.53 1.49 155444 61289

10:22:49 swpin/s bswin/s swpot/s bswot/s pswch/s (-w)

10:22:49 sar data collection enabled

12:20:01 0.00 0.0 0.00 0.0 390
12:40:02 0.00 0.0 0.00 0.0 950
13:00:03 0.00 0.0 0.00 0.0 748
13:20:01 0.00 0.0 0.00 0.0 1465
13:40:01 0.00 0.0 0.00 0.0 1622
14:00:01 0.00 0.0 0.00 0.0 1588
14:20:01 0.00 0.0 0.00 0.0 1386
14:40:01 0.00 0.0 0.00 0.0 265
15:00:01 0.00 0.0 0.00 0.0 300
15:20:01 0.00 0.0 0.00 0.0 212

Average 0.00 0.0 0.00 0.0 728

=> This would suggest that there is no swapping of memory to disk, hence any

performance problems are not related to insufficient memory (we have 512 MB).
Does this make sense?

10:22:49 iget/s namei/s dirbk/s (-a)

10:22:49 sar data collection enabled

12:20:01 874 226 3177
12:40:02 740 197 2805
13:00:03 711 237 10053
13:20:01 618 228 4661
13:40:01 699 181 2477
14:00:01 895 228 3222
14:20:01 699 187 2680
14:40:01 642 175 2444
15:00:01 701 185 2666
15:20:01 599 159 2160

Average 769 209 3485

=> Are these results good or bad? I just don't know what the accepted
ranges are...

10:22:49 runq-sz %runocc swpq-sz %swpocc (-q)

10:22:49 sar data collection enabled

12:20:01 1.7 14
12:40:02 1.6 100
13:00:03 1.7 100
13:20:01 1.8 100
13:40:01 1.7 100
14:00:01 1.6 100
14:20:01 1.6 100
14:40:01 1.5 71
15:00:01 1.6 84
15:20:01 1.4 51

Average 1.7 100

=> according to 'man sar' if runq-sz is >2 and %runocc is >90% then the
CPU is heavily loaded and response time will be degraded.
These results seem to concur with the CPU utilization above, suggesting
that CP is the bottleneck. Again, does this make sense?

10:22:49 proc-sz ov inod-sz ov file-sz ov lock-sz (-v)

10:22:49 sar data collection enabled

12:20:01 364/ 381 0 1477/4864 0 5304/5461 0 1778/2048
12:40:02 375/ 392 0 1493/4915 0 5053/5802 0 1771/2048
13:00:03 384/ 392 0 1492/5120 0 5110/5802 0 1759/2048
13:20:01 367/ 392 0 1449/5120 0 5132/5802 0 1826/2048
13:40:01 372/ 392 0 1462/5120 0 5038/5802 0 1801/2048
14:00:01 369/ 392 0 1465/5120 0 4964/5802 0 1686/2048
14:20:01 369/ 392 0 1448/5120 0 4724/5802 0 1675/2048
14:40:01 369/ 392 0 1484/5120 0 4674/5802 0 1626/2048
15:00:01 376/ 392 0 1488/5120 0 5018/5802 0 1744/2048
15:20:01 373/ 392 0 1476/5120 0 5161/5802 0 1783/2048

10:22:49 msg/s sema/s (-m)

10:22:49 sar data collection enabled

12:20:01 0.01 0.00
12:40:02 0.01 0.00
13:00:03 0.01 0.00
13:20:01 0.01 0.00
13:40:01 0.02 0.00
14:00:01 0.02 0.00
14:20:01 0.01 0.00
14:40:01 0.01 0.00
15:00:01 0.02 0.00
15:20:01 0.02 0.00

Average 0.01 0.00

10:22:49 vflt/s pflt/s pgfil/s rclm/s (-p)

10:22:49 sar data collection enabled

12:20:01 13.67 96.75 0.00 0.00
12:40:02 10.23 77.50 0.00 0.00
13:00:03 9.55 68.83 0.01 0.00
13:20:01 6.26 49.77 0.00 0.00
13:40:01 11.06 76.01 0.03 0.00
14:00:01 16.03 104.45 0.00 0.00
14:20:01 9.11 74.16 0.00 0.00
14:40:01 7.76 60.44 0.00 0.00
15:00:01 9.46 75.15 0.00 0.00
15:20:01 8.36 60.71 0.00 0.00

Average 11.30 81.69 0.00 0.00

=> Are these results good or bad? I just don't know what the accepted
ranges are...

10:22:49 freemem freeswp availrmem availsmem (-r)

10:22:49 sar data collection enabled

12:20:01 60585 2097152 108093 316337
12:40:02 58553 2097152 108061 314633
13:00:03 57753 2097152 108020 313511
13:20:01 57570 2097152 108056 314552
13:40:01 57744 2097152 108043 314326
14:00:01 57707 2097152 108049 314806
14:20:01 58005 2097152 108053 314664
14:40:01 57861 2097152 108054 314990
15:00:01 58009 2097152 108004 313957
15:20:01 57315 2097152 108014 313647

Average 58992 2097152 108045 314542

10:22:49 cpybuf/s slpcpybuf/s (-B)

10:22:49 sar data collection enabled

12:20:01 0.00 0.00
12:40:02 0.00 0.00
13:00:03 0.00 0.00
13:20:01 0.00 0.00
13:40:01 0.00 0.00
14:00:01 0.00 0.00
14:20:01 0.00 0.00
14:40:01 0.00 0.00
15:00:01 0.00 0.00
15:20:01 0.00 0.00

Average 0.00 0.00

10:22:49 dptch/s idler/s swidle/s (-R)

10:22:49 sar data collection enabled

12:20:01 730.76 279.65 158.11
12:40:02 1916.43 691.61 372.59
13:00:03 1332.14 445.20 263.32
13:20:01 3035.98 1074.10 555.20
13:40:01 3415.70 1212.00 613.81
14:00:01 3381.37 1209.78 610.91
14:20:01 2898.57 1032.56 529.27
14:40:01 602.09 266.31 135.52
15:00:01 647.43 279.72 149.27
15:20:01 502.43 232.00 108.92

Average 1481.22 543.79 286.99

10:22:49 ovsiohw/s ovsiodma/s ovclist/s (-g)

10:22:49 sar data collection enabled

12:20:01 0.00 0.00 0.00
12:40:02 0.00 0.00 0.00
13:00:03 0.00 0.00 0.00
13:20:01 0.00 0.00 0.00
13:40:01 0.00 0.00 0.00
14:00:01 0.00 0.00 0.00
14:20:01 0.00 0.00 0.00
14:40:01 0.00 0.00 0.00
15:00:01 0.00 0.00 0.00
15:20:01 0.00 0.00 0.00

Average 0.00 0.00 0.00

10:22:49 mpbuf/s ompb/s mphbuf/s omphbuf/s pbuf/s spbuf/s dmabuf/s
sdmabuf/s (-h)

10:22:49 sar data collection enabled

12:20:01 95.56 0.00 107.17 0.00 0.00 0.00 0.00
0.00
12:40:02 101.55 0.00 109.54 0.00 0.00 0.00 0.00
0.00
13:00:03 107.60 0.00 114.73 0.00 0.00 0.00 0.00
0.00
13:20:01 72.36 0.00 80.18 0.00 0.00 0.00 0.00
0.00
13:40:01 63.00 0.00 74.01 0.00 0.00 0.00 0.00
0.00
14:00:01 56.62 0.00 66.27 0.00 0.00 0.00 0.00
0.00
14:20:01 65.82 0.00 75.94 0.00 0.00 0.00 0.00
0.00
14:40:01 55.77 0.00 65.31 0.00 0.00 0.00 0.00
0.00
15:00:01 75.21 0.00 85.53 0.00 0.00 0.00 0.00
0.00
15:20:01 54.19 0.00 62.65 0.00 0.00 0.00 0.00
0.00

Average 81.57 0.00 91.67 0.00 0.00 0.00 0.00
0.00
===============================

Thanks again for any help,

adrian

Jean-Guy Charron

unread,

Sep 25, 2001, 8:31:14 PM9/25/01

to

How many buffers do you have ?
If it's the system default it's about 6000, try to increase it to 100000.
The freemem show that you have about 200MB free.

Jean-Guy Charron
Logiciels Sys-Themes Inc.
Montreal, Quebec

"Adrian" <adr...@aot.com.au> a écrit dans le message news:
3BB107B7...@aot.com.au...

Bill Vermillion

unread,

Sep 25, 2001, 9:29:33 PM9/25/01

to

James R. Sullivan

unread,

Sep 26, 2001, 1:14:46 PM9/26/01

to

Trimmed and commented:

Adrian wrote:
>
> Hi,
>
> Thanks for your prompt response. I actually thought I had posted the memory
> section of the sar output. I should just post all of the output. See below.
>
>

> Here is the whole sar output (with my comments/questions again):
>
> ===========================
>
> SCO_SV thor 3.2v5.0.5 PentII(D)ISA 09/25/2001
>
> 10:22:49 %usr %sys %wio %idle (-u)

> Average 9 28 45 18
>

> 10:22:49 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s (-b)

> Average 416 5953 93 83 119 31 0 0
>
> 10:22:49 device %busy avque r+w/s blks/s avwait avserv (-d)

> Average Sdsk-0 45.84 1.04 54.55 441.62 0.38 8.40
> Sdsk-2 100.00 1.05 122.42 554.50 0.54 10.28

Here is a problem, without a doubt. WaitIO (the wio in the first line), indicates
a condition where a process is ready to run, but is blocked waiting for some IO
event to clear. In all likelyhood, they are waiting for Sdsk-2 to become free.

Some possibilities, drawn from old memories, would be:

increase buffer cache. You have spare memory and are not swapping, so increasing
the buffer cache would/could help. Your % of Read Cache is generally good. The
write % is low, but you're probably writting to different parts of the disk/database
so there's little you could do about that. I suspect that the program is writting
with a sync of some sort, which may cause the significant waitio number.

increase SDSKOUT. This used to be the number of SCSI transactions that the system
would queue. A higher number wouuld queue more transactions and may inprove the disk
performance.

Get a better disk subsystem :-)

> 10:22:49 runq-sz %runocc swpq-sz %swpocc (-q)

> Average 1.7 100
>
> => according to 'man sar' if runq-sz is >2 and %runocc is >90% then the
> CPU is heavily loaded and response time will be degraded.
> These results seem to concur with the CPU utilization above, suggesting
> that CP is the bottleneck. Again, does this make sense?

Not with your Disk situation. They're ready to run, but the disk is holding
them back. Fix that first.

Any time that system is in WaitIO, nothing is happening. In all my performance
tuning over the years, I've always focused on reducing WaitIO when I see it.

my $0.02, from an old SCO SE.

--
Jim Sullivan
Director, North American System Engineers
Tarantella! http://www.tarantella.com
831 427 7384 - j...@tarantella.com

Adrian

unread,

Sep 26, 2001, 10:19:27 PM9/26/01

to

Hi james,

Thanks for your reply.

Comments throughout.

>
> > 10:22:49 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s (-b)
> > Average 416 5953 93 83 119 31 0 0
> >
> > 10:22:49 device %busy avque r+w/s blks/s avwait avserv (-d)
> > Average Sdsk-0 45.84 1.04 54.55 441.62 0.38 8.40
> > Sdsk-2 100.00 1.05 122.42 554.50 0.54 10.28
>
> Here is a problem, without a doubt. WaitIO (the wio in the first line), indicates
> a condition where a process is ready to run, but is blocked waiting for some IO
> event to clear. In all likelyhood, they are waiting for Sdsk-2 to become free.
>
> Some possibilities, drawn from old memories, would be:
>
> increase buffer cache. You have spare memory and are not swapping, so increasing
> the buffer cache would/could help. Your % of Read Cache is generally good. The
> write % is low, but you're probably writting to different parts of the disk/database
> so there's little you could do about that. I suspect that the program is writting
> with a sync of some sort, which may cause the significant waitio number.

> increase SDSKOUT. This used to be the number of SCSI transactions that the system
> would queue. A higher number wouuld queue more transactions and may inprove the disk
> performance.
>

I was going to modify NBUF at boot up with the command:
defbootstr nbuf=100000

Currently, SDSKOUT = 4, and I am not sure what to increase that to.
Is there a way of determinign a good first guess, similar to basing NBUF of the amount of
free memory?
I will probably set this at boot time, too, rather than rebuilding the kernel.

Is there a performance boost building these settings into the kernel or will the setting
at boot up be similar?

>
> Get a better disk subsystem :-)
>

We have a DPT Century SCSI 3-channel Controller, with on-board memory.
We have RAID1 and one of the RAID5 disks on one channel.
We have the other three disks of the RAID5 array on the second channel.
All CD_ROMs and tape drives are on their own channel.

Each disk is Cheetah ST39103LW (10,000rpm, blah blah blah)..

Will these settings (SDSKOUT and NBUF) interact detrimentally with the DPT's on-board
cache?

>
> > 10:22:49 runq-sz %runocc swpq-sz %swpocc (-q)
> > Average 1.7 100
> >
> > => according to 'man sar' if runq-sz is >2 and %runocc is >90% then the
> > CPU is heavily loaded and response time will be degraded.
> > These results seem to concur with the CPU utilization above, suggesting
> > that CP is the bottleneck. Again, does this make sense?
>
> Not with your Disk situation. They're ready to run, but the disk is holding
> them back. Fix that first.
>
> Any time that system is in WaitIO, nothing is happening. In all my performance
> tuning over the years, I've always focused on reducing WaitIO when I see it.
>

Thanks for this advise. The concensus is to fix the disk problem.

cheers,

adrian

Adrian

unread,

Sep 26, 2001, 10:32:06 PM9/26/01

to

Hi again,

This post adds some info to my last post asking for recommended SDSKOUT values.

I used sar -S on one of the daily sar reports and this is the result:
00:00:00 reqblk/s oreqblk/s (-S)
10:40:01 239.32 0.00
11:00:01 403.88 0.00
11:20:01 458.45 0.00
11:40:01 370.09 0.00
12:00:01 250.10 0.00
12:20:01 138.56 0.00
12:40:01 221.37 0.00
13:00:01 301.58 0.00
13:20:01 360.17 0.00

These numbers are representative of the days work.

According the the SCO documentation:
-----
"If oreqblk/s is greater than zero, increase the value of SDSKOUT by at least the maximum
value reported for the following quantity:

SDSKOUT = oreqblk/s / reqblk/s"
----

however, oreqblk/s is not greater than 0.

cheers,

adrian

Adrian

unread,

Sep 26, 2001, 11:26:22 PM9/26/01

to

Hi,

addendum 2:
The RAID controller is a DPT SmartRAID V controller.

woops.

adrian

Steve Fabac

unread,

Sep 27, 2001, 12:23:51 AM9/27/01

to

Adrian wrote:
>
> Hi,
>
> addendum 2:
> The RAID controller is a DPT SmartRAID V controller.
>
> woops.
>
> adrian
>
>

On Mylex RAID controllers, Mylex documentation recommends SDSKOUT =
128 / number of system disks. (The number of system disks reported by
the RAID hardware during pre-boot disk scan, not the number of hard
disks that make up the RAID array.)

I have searched Adaptec (nee DPT) web documentation for SDSKOUT
recommendations and have come up empty.

--

Steve Fabac
S.M. Fabac & Associates
816/765-1670

Andrey Bondar

unread,

Sep 27, 2001, 10:11:26 AM9/27/01

to

Adrian <adr...@aot.com.au> wrote in message news:<3BB107B7...@aot.com.au>...
> Hi,
>
[...]

>
> SCO_SV thor 3.2v5.0.5 PentII(D)ISA 09/25/2001
>
> 10:22:49 %usr %sys %wio %idle (-u)
> 10:22:49 sar data collection enabled
> 12:20:01 9 20 55 16
> 12:40:02 10 31 46 13
> 13:00:03 8 36 48 7
> 13:20:01 10 46 28 15
> 13:40:01 11 46 24 18
> 14:00:01 11 47 23 19
> 14:20:01 10 44 31 15
> 14:40:01 7 16 43 33
> 15:00:01 8 17 53 22
> 15:20:01 6 16 42 35
>
> Average 9 28 45 18
>

[...]

> 10:22:49 rawch/s canch/s outch/s rcvin/s xmtin/s mdmin/s (-y)
> 10:22:49 sar data collection enabled
> 12:20:01 37 0 22606 1 0 0
> 12:40:02 33 0 20213 2 1 0
> 13:00:03 30 0 17268 1 0 0
> 13:20:01 29 0 19630 1 0 0
> 13:40:01 38 0 20400 1 0 0
> 14:00:01 35 0 19476 1 0 0
> 14:20:01 34 0 20377 1 0 0
> 14:40:01 42 0 22589 1 0 0
> 15:00:01 39 0 26977 1 0 0
> 15:20:01 31 0 19368 1 0 0
>
> Average 35 0 21451 1 0 0
>
>

[...]

Besides disk-related things, you may notice that terminal device
(both serial
and/or pseudo, I think) activity is high. I suppose, your users spend
much time printing various reports. If DataFlex outputs one char via
one syscall (like FoxPro does), then mostly problem lies with
application, not OS or hardware.

Andrey Bondar, SysAdmin,
T.I.P.A.S. Ltd., Lithuania

James R. Sullivan

unread,

Sep 27, 2001, 11:52:06 AM9/27/01

to

Adrian wrote:
>
> I was going to modify NBUF at boot up with the command:
> defbootstr nbuf=100000

I can't remember if the NHBUFS get automatically adjusted when
you change NBUF, but you should make sure that they are appropriately
sized. In the past, you wanted a 4:1 ratio between NBUF and NHBUFS,
with NHBUFS being a power of 2. This later changed to a 1:2 ratio
on MP systems. Either way, make sure that NHBUFS is the right size
for NBUF=100000, probably around 65536 or 32768.

>
> Currently, SDSKOUT = 4, and I am not sure what to increase that to.
> Is there a way of determinign a good first guess, similar to basing NBUF of the amount of
> free memory?
> I will probably set this at boot time, too, rather than rebuilding the kernel.

I'd set it as high as I could, generally 256, based on the mtune entries. The higher
the number, the harder the SCSI bus will be working. I have seen instances where
increasing this number caused the system to crash, due to the quality of the
SCSI bus/termination. Go neutral, bump it to 128 and see what happens.

> Is there a performance boost building these settings into the kernel or will the setting
> at boot up be similar?

No idea.

> > Get a better disk subsystem :-)
> >
>
> We have a DPT Century SCSI 3-channel Controller, with on-board memory.
> We have RAID1 and one of the RAID5 disks on one channel.
> We have the other three disks of the RAID5 array on the second channel.
> All CD_ROMs and tape drives are on their own channel.
>
> Each disk is Cheetah ST39103LW (10,000rpm, blah blah blah)..
>
> Will these settings (SDSKOUT and NBUF) interact detrimentally with the DPT's on-board
> cache?

Got me beat. Haven't done OSR5 performance tuning for 3 years, at least :-)

>
> Thanks for this advise. The concensus is to fix the disk problem.

No problem. Since the disk system seems pretty beefy, I wonder if the program
is performing all it's writes synchronously, which would cause these delays given
the number of writes that are happening. This may be a setting within the program
that can be changed. I suspect that by examining the file table for the program
you could determine if this was true. There's probably an easier way, but I can't
remember it :-)

Adrian

unread,

Sep 27, 2001, 7:05:06 PM9/27/01

to

Hi,

thanks for the info. I will probably change one setting at a time first, well two-
NBUF/NHBUFS.

Then I will change the SDSKOUT setting, too.

I just found out that the DPT controller is set for Write-through rather than Write-back, which
means that writes to disk occur immediately, rather than using the controller's cache. The
controller has 32Mb memory which is going to waste a bit. That will probably have a large
effect. However, nto knowing which is more important/useful, the DPT setting or the OS buffer
cache I will start with the buffer cache and work through performance tuning one step at a
time.

"James R. Sullivan" wrote:

> Adrian wrote:
> >
> > I was going to modify NBUF at boot up with the command:
> > defbootstr nbuf=100000
>
> I can't remember if the NHBUFS get automatically adjusted when
> you change NBUF, but you should make sure that they are appropriately
> sized. In the past, you wanted a 4:1 ratio between NBUF and NHBUFS,
> with NHBUFS being a power of 2. This later changed to a 1:2 ratio
> on MP systems. Either way, make sure that NHBUFS is the right size
> for NBUF=100000, probably around 65536 or 32768.
>

Thanks for the tip.

> >
> > Currently, SDSKOUT = 4, and I am not sure what to increase that to.
> > Is there a way of determinign a good first guess, similar to basing NBUF of the amount of
> > free memory?
> > I will probably set this at boot time, too, rather than rebuilding the kernel.
>
> I'd set it as high as I could, generally 256, based on the mtune entries. The higher
> the number, the harder the SCSI bus will be working. I have seen instances where
> increasing this number caused the system to crash, due to the quality of the
> SCSI bus/termination. Go neutral, bump it to 128 and see what happens.
>

This is a large difference to our current setting- i like it!

>
>
> No problem. Since the disk system seems pretty beefy, I wonder if the program
> is performing all it's writes synchronously, which would cause these delays given
> the number of writes that are happening. This may be a setting within the program
> that can be changed. I suspect that by examining the file table for the program
> you could determine if this was true. There's probably an easier way, but I can't
> remember it :-)
>

another poster suggested the problem may be program based- it may be but I think I will try
changing the OS settings first. more fun, anyway.

thanks,

adrian

>

Adrian

unread,

Sep 27, 2001, 7:07:34 PM9/27/01

to

Hi Steve,

Thanks for your post.

Steve Fabac wrote:

>
> >
> >
>
> On Mylex RAID controllers, Mylex documentation recommends SDSKOUT =
> 128 / number of system disks. (The number of system disks reported by
> the RAID hardware during pre-boot disk scan, not the number of hard
> disks that make up the RAID array.)
>

another response suggested trying SDSKOUT = 128, which is twice the Mylex
setting but still both your suggestion and the other one are way above
the current setting so it is worth trying.

>
> I have searched Adaptec (nee DPT) web documentation for SDSKOUT
> recommendations and have come up empty.
>

no surprises there. We are flying very blind now that adaptec have
bought DPT,a s there is no real help for the DPT controllers anymore.

thanks,

adrian

Adrian

unread,

Sep 27, 2001, 7:10:43 PM9/27/01

to

Hi Andrew,

> Besides disk-related things, you may notice that terminal device
> (both serial
> and/or pseudo, I think) activity is high. I suppose, your users spend
> much time printing various reports. If DataFlex outputs one char via
> one syscall (like FoxPro does), then mostly problem lies with
> application, not OS or hardware.
>
> Andrey Bondar, SysAdmin,
> T.I.P.A.S. Ltd., Lithuania

Sure suggestion is a good one. You are correct in mentioning that our users spend
time working with reports, and we have seen some correlation ebtween the time that
the problem arose and a sudden increase in the generation of reports.

I will look into this after i try changing some of the system settings mentioned
earlier.

thanks for your reply,

adrian

Adrian

unread,

Sep 28, 2001, 6:52:47 PM9/28/01

to

Hi,

Thanks for everyone's input.
I used the command

defbootstr nbuf=80000 at boot: prompt to change the buffer size and this is
the result in /usr/adm/messages:
----------
mem: total = 458296k, kernel = 119944k, user = 338352k
swapdev = 1/41, swplo = 0, nswap = 1048576, swapmem = 524288k
rootdev = 1/42, pipedev = 1/42, dumpdev = 1/41
kernel: Hz = 100, i/o bufs = 100000k (high bufs = 91316k)CONFIG: Low buffers
una
vailable, converted to high buffers
----------

I have been tryign to get onto the SCO site and google but have not been
able to connect and find any answers.

IS this bad? good? Is there something i can do about it? I also tried
nbuf=50000 but got a similar result, but the value in parentheses was about
41000.

TIA,

adrian

Adrian

unread,

Sep 28, 2001, 6:52:56 PM9/28/01

to

Hi,

Thanks for everyone's input.
I used the command

defbootstr nbuf=80000 at boot: prompt to change the buffer size and this is
the result in /usr/adm/messages:
----------
mem: total = 458296k, kernel = 119944k, user = 338352k
swapdev = 1/41, swplo = 0, nswap = 1048576, swapmem = 524288k
rootdev = 1/42, pipedev = 1/42, dumpdev = 1/41
kernel: Hz = 100, i/o bufs = 100000k (high bufs = 91316k)CONFIG: Low buffers
una
vailable, converted to high buffers
----------

I have been tryign to get onto the SCO site and google but have not been
able to connect and find any answers.

IS this bad? good? Is there something i can do about it? I also tried
nbuf=50000 but got a similar result, but the value in parentheses was about
41000.

TIA,

adrian

Bela Lubkin

unread,

Sep 29, 2001, 5:36:08 PM9/29/01

to sco...@xenitec.on.ca, Adrian

Adrian wrote:

> I used the command
>
> defbootstr nbuf=80000 at boot: prompt to change the buffer size and this is
> the result in /usr/adm/messages:
> ----------
> mem: total = 458296k, kernel = 119944k, user = 338352k
> swapdev = 1/41, swplo = 0, nswap = 1048576, swapmem = 524288k
> rootdev = 1/42, pipedev = 1/42, dumpdev = 1/41
> kernel: Hz = 100, i/o bufs = 100000k (high bufs = 91316k)CONFIG: Low buffers
> una
> vailable, converted to high buffers

Strange -- are you sure you didn't use "nbuf=100000"?

> I have been tryign to get onto the SCO site and google but have not been
> able to connect and find any answers.
>
> IS this bad? good? Is there something i can do about it? I also tried
> nbuf=50000 but got a similar result, but the value in parentheses was about
> 41000.

You're worrying about the "Low buffers unavailable" message? Those
buffers are only needed for old ISA host adapters and for accessing the
floppy drive. For other devices (like PCI host adapters or IDE drives),
low and high buffers are completely interchangable. You could probably
get by with 40 low buffers for floppy access. The message is completely
ignorable.

>Bela<