Raid Stripe Size.....

oracle _ man@yahoo.com

unread,

Feb 28, 2005, 3:46:23 PM2/28/05

to

All,

I'm gonna lay out a new 10G db on RHEL4/AMD64. I have(for data files
only) two ICP Vortex Raid controllers. Each controller is dual
channel. Each channel has 8 Drives. We will hardware mirror channel
to channel on each controller. Then stripe using 128 chunks per drive
across 7 drives, leaving the 8th for hot spare. This 7x128K =
~917,504byte stripe size. Oracle recomends 1MB stripe, so we are
close. This is all hardware level. Now then, we have to build an
RHEL4 OS on these two arrays.

My questions remaining are:
1) Should I make one big logical volume across controllers?

2) What OS block size do I use, and how do I change it.

3) And lastly, if using 8K oracle block_Size, what
db_file_multiblock_read should I use for the above hardware config?

Best regards,

Rich

DA Morgan

unread,

Feb 28, 2005, 6:35:11 PM2/28/05

to

oracle _ m...@yahoo.com wrote:

The block size is the easy question ... 4K because that is the block
size of the O/S.

I am not familiar with the ICP Vortex RAID controllers but I am
concerned as to whether you have verified that they actually support
RAC. Why not just use ASM and lay it all down on top of raw.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu
(replace 'x' with 'u' to respond)

oracle_man

unread,

Feb 28, 2005, 6:48:12 PM2/28/05

to

We aren't RACing this one. Stand alone only. Also, we looked at ASM.
I just don't think it's quite there yet.

Rich

DA Morgan

unread,

Feb 28, 2005, 7:32:57 PM2/28/05

to

Why do you say that? I disagree but that's not the point of my inquiry.
What have you seen that makes you nervous?

oracle_man

unread,

Mar 1, 2005, 2:47:15 PM3/1/05

to

Daniel,

I sent you a personal email, as I didn't want to clutter up the group
with my rambling about ASM.

DA Morgan

unread,

Mar 1, 2005, 3:51:29 PM3/1/05

to

oracle_man wrote:

And I've have resonded off-line. Feel free to contact me as you wish.

bdb...@gmail.com

unread,

Mar 1, 2005, 4:31:29 PM3/1/05

to

8 drive RAID 10 works very well.
chances of losing a second disk on a mirrored pair are fairly remote.
with an 8KB block size, your stripe should be a multiple of 64KB, such
as 128 KB, 256 KB, 512 KB or ... 1 MB.
In this fashion the oracle blocks will be aligned with the physical
discs and you won't have blocks split across multiple disks.

please refer to the following article:
Aligning Oracle Blocks with Hardware Stripe Boundaries
http://hotsos.com/e-library/abstract.php?id=17

-bdbafh

bdb...@gmail.com

unread,

Mar 1, 2005, 4:34:51 PM3/1/05

to

Daniel,

>Why not just use ASM and lay it all down on top of raw.

and get those little tiny 64 KB reads?
sounds fine if its all single block IO.
If he is performing multiblock reads, he's in for some dissappointment.

A poster on oracle-l recently discussed his bruises from wrestling with
10.1 ASM on RHEL 3.0.
He advocated waiting for 10g R2.

-bdbafh

Noons

unread,

Mar 2, 2005, 4:56:54 AM3/2/05

to

oracle _ m...@yahoo.com apparently said,on my timestamp of 1/03/2005 7:46 AM:

It all looks reasonably good, except for this:

> to channel on each controller. Then stripe using 128 chunks per drive
> across 7 drives, leaving the 8th for hot spare. This 7x128K =
> ~917,504byte stripe size. Oracle recomends 1MB stripe, so we are
> close. This is all hardware level.

The stripe size in your case is 128K. The stripe size is the
portion of space used on each drive for each "stripe". Not the
sum of all stripes across all drives.

> Now then, we have to build an
> RHEL4 OS on these two arrays.

This is where I get confused: you said at the very start this
setup was for *data only*. Now you plonk the OS in there?

> My questions remaining are:
> 1) Should I make one big logical volume across controllers?

You should make logical volumes a size that is convenient for the
unit of backup you intend to use. Pick what one tape (or backup disk
unit) can store, that is your optimal volume size: you can now backup
easily both file systems and raw devices.

>
> 2) What OS block size do I use, and how do I change it.

I'm sorry but there is no such thing as "OS block size".
There is such a thing as "default file system block size",
"default block size used by the OS for I/O" and also "default
block size used for paging and swap activity". But there is no
such thing as the first one. A common *nix myth. The most
common defaults in *nix are 4K and 8K.

> 3) And lastly, if using 8K oracle block_Size, what
> db_file_multiblock_read should I use for the above hardware config?
>

128K/8K = 16

--
Cheers
Nuno Souto
in sunny Sydney, Australia
wizo...@yahoo.com.au.nospam

oracle_man

unread,

Mar 2, 2005, 9:00:28 PM3/2/05

to

Sorry, I was not going to put the OS files on the RAID array, merely
newfs/mkfs or lvcreate,whatever. NOT put the binaries on it.

Nuno, your wrong abut making vol same size as tape. Sorry bro. that's
not necessary.

Noons

unread,

Mar 2, 2005, 9:37:02 PM3/2/05

to

oracle_man wrote:
>
> Nuno, your wrong abut making vol same size as tape. Sorry bro.
that's
> not necessary.

I know it is not necessary. It is however darn convenient. ;)

Joel Garry

unread,

Mar 3, 2005, 6:58:54 PM3/3/05

to

>I know it is not necessary. It is however darn convenient. ;)

Unless you have multiple tape devices!

jg
--
@home.com is bogus.
"It's worth the beating" -
http://www.signonsandiego.com/uniontrib/20050303/news_1n3surf.html

Joel Garry

unread,

Mar 3, 2005, 7:14:59 PM3/3/05

to

>8 drive RAID 10 works very well.
>chances of losing a second disk on a mirrored pair are fairly remote.

I recently had a scary one (on hp-ux, but a raid issue).

13 drive RAID 5, one hot spare.

I go to startup db, get a corrupt block one on controlfile. Copy over
another, same issue... can't shutdown, have to abort... run orapw, get
interpreter "/usr/m/pa20^ÙL/dld.sl" not found
exec(2): could not load a.out

Hmmmm, start looking around and find things buried deep down directory
trees like
--wxrw---t+ 46477 814614889 9308519 6755399441062032 Oct 21 1921
SOL_C2R4.HTML

-rws---r-x+ 27748 1949254172 1953787657 5797666986988568015 Jun 14
2033 SOL_C2R8.HTML

-rw-r--r-- 1 oracle dba 7146 Jan 30 2002 SOL_C3E0.HTML

Turned out an impatient manager had replaced 2 disk drives, not waiting
for the parity to finish rebuilding between the 2.

The scary part was the resemblence between this and the old ripper
virus. It could have just as easily not hit the control files and how
would I know what was corrupted in the db?

jg
--
@home.com is bogus.

"We are not our customers' accounting departments."
http://www.signonsandiego.com/uniontrib/20050303/news_1b3mkt.html

Noons

unread,

Mar 4, 2005, 12:04:16 AM3/4/05

to

Joel Garry wrote:
> >I know it is not necessary. It is however darn convenient. ;)
>
> Unless you have multiple tape devices!
>

Not really. Even with multiple devices, the thing
you don't want in a recovery is to have to mount
one extra tape for the last few Mb of data file!

It's not in the backup that this is convenient:
it's during a recovery. When you have the damager
breathing fire down your neck and you NEED all the
speed you can get. ;)

Joel Garry

unread,

Mar 4, 2005, 4:15:07 PM3/4/05

to

Noons wrote:
> Joel Garry wrote:
> > >I know it is not necessary. It is however darn convenient. ;)
> >
> > Unless you have multiple tape devices!
> >
>
> Not really. Even with multiple devices, the thing
> you don't want in a recovery is to have to mount
> one extra tape for the last few Mb of data file!

Well, you seem to be assuming the tape is going to be smaller than the
database. Over all the versions I've worked with, I find that rarely
to be the case. (And the same applies to tablespace, if that is the
unit of recovery you are making the volsize based on [did I say that
right? :-].)

Anyways, isn't a growing database going to need another tape or 12
eventually no matter what size? That's been my experience. The
experience I want to avoid is some stupid backup manager that has to
hunt all over the tape for each file even if you are restoring
everything.

jg
--
@home.com is bogus.

They _are_ in Kansas again!
http://www.signonsandiego.com/uniontrib/20050304/news_1n4flight.html

Noons

unread,

Mar 5, 2005, 7:26:09 AM3/5/05

to

Joel Garry apparently said,on my timestamp of 5/03/2005 8:15 AM:

> Well, you seem to be assuming the tape is going to be smaller than the
> database. Over all the versions I've worked with, I find that rarely
> to be the case. (And the same applies to tablespace, if that is the

Hehehe! I need YOUR databases! Or your tape drives... :D

Seriously, my experience is exactly the opposite: the backup
device unit is almost always much smaller capacity than the database
itself. Hence my distorted view! ;)

> unit of recovery you are making the volsize based on [did I say that
> right? :-].)

Well, it's the smallest bit I can restore in Oracle and still
end up with something I can roll forward from, so I go with the
tablespace size. Mind you, with RMAN these things tend to take
a different shape and priority.

> Anyways, isn't a growing database going to need another tape or 12
> eventually no matter what size? That's been my experience. The
> experience I want to avoid is some stupid backup manager that has to
> hunt all over the tape for each file even if you are restoring
> everything.

Don't you just hate tape managers that restore by alphabetical
order? Asking for trouble...

hpuxrac

unread,

Mar 7, 2005, 9:07:22 AM3/7/05

to

snip

oracle _ m...@yahoo.com wrote:

First, I would recommend using the 1M stripe size and not "close to
it".

> My questions remaining are:
> 1) Should I make one big logical volume across controllers?

Some people work hard to isolate online redo logs and control files
from the rest of the (usual) database files.

> 2) What OS block size do I use, and how do I change it.

Do you have a choice on OS level? If so make it same as oracle block
size you will be using.

> 3) And lastly, if using 8K oracle block_Size, what
> db_file_multiblock_read should I use for the above hardware config?

Make it match against your stripe size.

>
> Best regards,
>
> Rich