DASD Disk Layout Advice

Pat

unread,

Nov 24, 2009, 10:18:04 PM11/24/09

to

For the last few years, I've largely been laying out Oracle storage on
SAN's, rather than on DASD, so I'm feeling a bit out of practise with
today's drives and drive arrays.

To make a long story short, I've got a new Oracle server in a remote
data center that I didn't order or configure; it comes like it comes.

Its a 32G 8 core intel box with 8 300G 15k SAS drives in it and a
decent raid controller. Database in question is somewhere between a
data warehouse and OLTP e.g. its read heavy, but there's still a very
significant amount of write activity.

I've got a colleague who, wants to build out the box something like:

Disk0 .. RedoA
Disk1 .. RedoB
Disks 2..3 RAID 1 OS and Index tablespace
Disks 4..7 Raid 10 Data tablespace

My instinct is that buildout "wastes" too many spindles and ends up
starving the index and data volumes.

The counterproposal is to just make one big raid group like:

Disk 0..7 Raid 10

What are folks doing these days with these bold 'old 300G disks? It
seems supremely wasteful to use an entire 300G drive hold 20G worth of
red.

Any recommendations, advice, etc would be appreciated.

Vladimir M. Zakharychev

unread,

Nov 25, 2009, 12:01:24 PM11/25/09

to

You might want to take a look at ORION here:
http://www.oracle.com/technology/software/tech/orion/index.html, and
setup a few experiments with different disk layouts and workloads to
have an idea of what to expect in different configurations and what
would work best for your particular implementation.

In theory, dedicating a couple of spindles (or better yet, SSDs)
purely for redo mirrors would minimize redo log sync waits, which
could be beneficial for OLTP, but with today's high-performance RAID
controllers with large built-in battery-backed write caches this is
probably an overkill and does not worth the disk space wasted unless
you really need top performance (in which case you'd probably look
towards Exadata v2 anyway.) Though you could procure a couple of
smaller disks for this (in 80-100GB range,) their price difference is
really negligible and you will still waste most of their capacity.

And then there's ASM, which you can use to build a single disk group
with external redundancy on top of your RAID10 volume and store
everything there (and let Oracle handle different file types
automagically.) ORION will help you test this approach as well as
native ASM striping (that is, no hardware RAID, treat the array as
JBOD and do redundancy and load balancing in Oracle software.)

And results of your experiments might constitute for a nice blog
entry. ;)

Regards,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com

joel garry

unread,

Nov 25, 2009, 1:08:45 PM11/25/09

to

The essential problem is the difference in write characteristics for
redo. Spreading everything among spindles is good, but the limiting
characteristic will be whether the controller buffering will be
saturated. If it saturates, performance will go downhill at
unpredictable times, and fast. The redo is critical. "A decent
controller," does that mean one path?

It is difficult to come up with a showing that separating out indices
and data is good for performance. Some people thought that in the
past, but it is simply an artifact of splitting I/O among drives. A
much better split may come from seeing which tablespaces use the most
I/O. For my OLTP type systems (aside from redo), I see most I/O going
towards (drum roll, please)... undo. YMMV. The important point is to
test the configurations with your load.

Will someone please tell the manufacturers that size of disk is less
important than number of spindles for us db types?

jg
--
@home.com is bogus.
http://www.signonsandiego.com/news/2009/nov/25/qualcomm-catches-break-brussels/

Noons

unread,

Nov 25, 2009, 4:33:35 PM11/25/09

to

On Nov 25, 2:18 pm, Pat <pat.ca...@service-now.com> wrote:

>
> The counterproposal is to just make one big raid group like:
>
> Disk 0..7 Raid 10
>
> What are folks doing these days with these bold 'old 300G disks? It
> seems supremely wasteful to use an entire 300G drive hold 20G worth of
> red.

Would you believe RAID 5? For most *sequential* workloads, it works
perfectly and is as good as RAID1+0, besides being much more efficient
in disk usage. Contrary to what most folks believe. But you'll need
an odd number of disks so in your case 7 is the magic number. Leaves
you a spare drive for the odd bits.
I hope that controller has multiple paths? Otherwise you're in
serious I/O performance trouble...
So, what is "sequential" in an Oracle db server?
Redo, SYSTEM, OS, software itself, archived redos, FRA.
And unless you are trying to run the latest 10000000000 concurrent
user OLTP benchmark-winning system in such a hardware configuration,
I'd say your data and index areas will likely be mostly sequential I/
O.
Temp and Undo might not but then again it's highly contingent on the
workload. If you have big serial updates the Undo will be mostly
sequential in nature.
Temp is a different animal, it can be all kinds. But then again, you
can't keep everyone happy! ;)

Note that by "sequential" I don't mean ordered! A FTS for example is
an example of sequential. So is an index FFS, an unqualified update, a
long batch insert, etcetc.
What is random? Lots and lots of queries and updates to a single row
or a very small number of rows, in many tables.

> Any recommendations, advice, etc would be appreciated.

You got them. Pity you're not in Sydney, we're talking about precisely
this at the Sydney Oracle Meetup this Friday evening:
http://dbasrus.blogspot.com , bottom half.

hpuxrac

unread,

Nov 25, 2009, 5:43:23 PM11/25/09

to

On Nov 24, 10:18 pm, Pat <pat.ca...@service-now.com> wrote:

snip

You are kind of limited from the get go with 8 disks.

Is the application "commit happy"? Lots of single row updates
followed by commits? In that case you might really want 2 dedicated
disks RAID 1 for online logs to reduce the impact of log file sync.

Where is the operating system and swap space going to go?

Are you going to use hot spares or if not how are you going to monitor
when ( not if ) a disk fails?

Your idea of using all 8 disks in a RAID 10 is probably the way I
might lean unless you are aware that the app is real commit happy.

Matthias Hoys

unread,

Nov 25, 2009, 6:39:29 PM11/25/09

to

"Pat" <pat....@service-now.com> wrote in message
news:8c721446-137b-48fb...@p35g2000yqh.googlegroups.com...

I would take 2 disks and put them in RAID-1 for the online redo logs
(although 300 GB is a lot of space for these alone) and make a RAID-10 array
out of the other ones for the rest. I would definitely not use separate
disks for indices and data.

Matthias

Pat

unread,

Nov 27, 2009, 10:45:39 AM11/27/09

to

Thanks for the feedback guys, that's exactly what I was looking for.

Oracle 10.2.0.4, OS is Red-Hat ES 5 (64 bit). Controller does support
multiple paths, and we'll be running the database with asynchronous IO
enabled (although I'm not thinking that'll impact disk layout, but
I've been wrong before).

A couple of comments (in no particular order) based on some questions
folks asked:

1) Somebody mentioned aligning the multiblock read count with the raid
stripe size ... we've gotten in the habit with 10.2 of unsetting
db_file_multiblock_read_count and just letting Oracle set it based on
how it observers the IO subsystem behaving. So far, that's seemed to
worked out pretty well for us, since my experience is not everybody
gets it right if they have to hand tune it.

2) App isn't all that commit happy. It generates a fair amount of
redo, but it tends to commit reasoanbly large transactions at once so
the number of commits doesn't usually saturate the IO subsystem, at
least it doesn't on my netapp installations and they have much larger
transaction volume than I'm expecting to see here.

3) RAID 5 ... with todays' controllers is that getting used more
often? It was drilled into my head years ago that you don't put OLTP
data on RAID 5, but then it was drilled into my head that you never
mount your tablespace over NFS too and that changed :)

In any event it sounds like now that pretty much everyone's agreed
that splitting "index" from "data" on a config like this is probably
counterproductive. The main open questions are just how to lay out
redo on whether or not RAID5 is an option.

joel garry

unread,

Nov 30, 2009, 1:00:47 PM11/30/09

to

On Nov 27, 7:45 am, Pat <pat.ca...@service-now.com> wrote:

>
> 3) RAID 5 ... with todays' controllers is that getting used more
> often? It was drilled into my head years ago that you don't put OLTP
> data on RAID 5, but then it was drilled into my head that you never
> mount your tablespace over NFS too and that changed :)

I'm a BAARFer by nature, but I have to admit, as long as it isn't in
degraded mode and nobody does anything stupid like yanking out two
drives and you aren't near saturating volumes of data movement, it
actually works pretty well on at least the specific configuration we
have, which is quite different than yours. Just get management to at
least agree to go away from it if some actual evidence of severe
performance degradation occurs. I see it during mass app upgrades,
which do things like add columns to every row in the largest tables,
but that just is a matter of waiting until things are done.
Performance issues during normal ops seem to skew towards cpu issues,
for my configuration. Generally they are due to DSS type operations
on my OLTP system. Which are decided upon by management, so are easy
to turn around into "hardware enhancement requirements."

jg
--
@home.com is bogus.

http://failmanifesto.org/

Frank van Bortel

unread,

Dec 10, 2009, 3:02:37 PM12/10/09

to

Dear BAARF member # 127 (or should I say "0xFF"). One thing is still
forgotten: any disk failure on a degraded RAID-5 will make your data
go POOF!
Stop worrying, go SAME (Stripe And Mirror Everything, or RAID 0+1)

--

Regards,
Frank van Bortel (BAARF #287)

Noons

unread,

Dec 10, 2009, 9:06:00 PM12/10/09

to

On Dec 11, 7:02 am, Frank van Bortel <frank.van.bor...@gmail.com>
wrote:

> Stop worrying, go SAME (Stripe And Mirror Everything, or RAID 0+1)

and pay twice the price for the same storage capacity...

joel garry

unread,

Dec 11, 2009, 12:18:18 PM12/11/09

to

On Dec 10, 12:02 pm, Frank van Bortel <frank.van.bor...@gmail.com>
wrote:

LOL! I asked them specifically for that number, 'cause it's so k001.

Actually, it's potentially worse than POOF!
http://groups.google.com/group/comp.databases.oracle.server/msg/1cedb062e0071ac0?dmode=source

I wasn't clear in distinguishing performance degradation from degraded
mode. I meant, be sure and get some management sign-off on switching
to a better raid layout if normal ops run into I/O bottlenecks. In
addition to that, CYA for degraded mode+disk failure.

The next hot thing is snapping file changes instead of just using
standby. Sigh.

jg
--
@home.com is bogus.

http://bobsneed.wordpress.com/2009/11/05/oracle-io-supply-and-demand/

Frank van Bortel

unread,

Dec 11, 2009, 2:20:53 PM12/11/09

to

Not quite a factor 2, but close.

And disks are cheap, duplicate cheap, and it's still cheap.
But a whole lot more certain!

joel garry

unread,

Dec 11, 2009, 4:27:25 PM12/11/09

to

On Dec 11, 11:20 am, Frank van Bortel <frank.van.bor...@gmail.com>

wrote:
> Noons wrote:
> > On Dec 11, 7:02 am, Frank van Bortel <frank.van.bor...@gmail.com>
> > wrote:
>
> >> Stop worrying, go SAME (Stripe And Mirror Everything, or RAID 0+1)
>
> > and pay twice the price for the same storage capacity...
>
> Not quite a factor 2, but close.
>
> And disks are cheap, duplicate cheap, and it's still cheap.
> But a whole lot more certain!
>

Not so cheap when the san is already fully populated.

jg
--
@home.com is bogus.

http://lmgtfy.com/?q=%22This+is+not+right.+%20It+will+never+pickup+a+leading+PLUS%2C+MINUS%2C+or+DOT.+%22

Mladen Gogala

unread,

Dec 11, 2009, 5:34:16 PM12/11/09

to

On Fri, 11 Dec 2009 20:20:53 +0100, Frank van Bortel wrote:

> And disks are cheap, duplicate cheap, and it's still cheap. But a whole
> lot more certain!

Disks are cheap, especially somebody else is paying for them.

--
http://mgogala.byethost5.com

hpuxrac

unread,

Dec 11, 2009, 7:44:20 PM12/11/09

to

On Dec 10, 9:06 pm, Noons <wizofo...@gmail.com> wrote:

snip

> > Stop worrying, go SAME (Stripe And Mirror Everything, or RAID 0+1)
>
> and pay twice the price for the same storage capacity...

I don't mind running test and dev systems on raid 5.

Not for prod systems please ... at least not mine.

Noons

unread,

Dec 12, 2009, 4:43:53 AM12/12/09

to

Frank van Bortel wrote,on my timestamp of 12/12/2009 6:20 AM:

>>> Stop worrying, go SAME (Stripe And Mirror Everything, or RAID 0+1)
>>
>> and pay twice the price for the same storage capacity...
>
> Not quite a factor 2, but close.
>
> And disks are cheap, duplicate cheap, and it's still cheap.

Maybe in your neck of the woods. Over here they most certainly aren't...

> But a whole lot more certain!

Not really.

Noons

unread,

Dec 12, 2009, 4:44:14 AM12/12/09

to

Mladen Gogala wrote,on my timestamp of 12/12/2009 9:34 AM:
> On Fri, 11 Dec 2009 20:20:53 +0100, Frank van Bortel wrote:
>
>
>> And disks are cheap, duplicate cheap, and it's still cheap. But a whole
>> lot more certain!
>
> Disks are cheap, especially somebody else is paying for them.

Bingo!

Noons

unread,

Dec 12, 2009, 4:45:24 AM12/12/09

to

I don't mind at all using raid 5 in prod systems, if they are not critical.
As for dev and test, they are all in raid 5.
Also: there is more than one raid 5, in the EMC world.

Noons

unread,

Dec 12, 2009, 4:51:49 AM12/12/09

to

joel garry wrote,on my timestamp of 12/12/2009 4:18 AM:

> Actually, it's potentially worse than POOF!
> http://groups.google.com/group/comp.databases.oracle.server/msg/1cedb062e0071ac0?dmode=source

Well, ask that impatient manager to do the same to RAID10 and watch the results?
Nothing to do with raid5: idiocy is independent of raid types...

> The next hot thing is snapping file changes instead of just using
> standby. Sigh.

That one is straight out of the demented minds of disk managers! Spent some
time this year trying to make an emc "guru" understand that if they can't switch
a standby san without corrupting one of my prod dbs (yes, it happened...),
there is no way in the world I'll let them have even more access to "snap" ANYTHING!

joel garry

unread,

Dec 14, 2009, 2:33:20 PM12/14/09

to

I just got an email from Oracle corp about how 11g can lower storage
costs by 10x. Clicking on the link goes to a marketing page which
includes this bullet point:

Compress all types of data to minimize storage requirements

But the stupid page overwrites "storage requirements" with a button to
"FIND OUT MORE". So it tells me:

"Compress all types of data to minimize FIND OUT MORE"

I always thought we used databases to maximize finding out things. I
guess less is more these days.

jg
--
@home.com is bogus.

regular expression wizard: http://www.yocoya.com/pls/apex/f?p=40714:500:1140110927960207

Mladen Gogala

unread,

Dec 14, 2009, 2:46:11 PM12/14/09

to

On Mon, 14 Dec 2009 11:33:20 -0800, joel garry wrote:

> On Dec 12, 1:44 am, Noons <wizofo...@yahoo.com.au> wrote:
>> Mladen Gogala wrote,on my timestamp of 12/12/2009 9:34 AM:
>>
>> > On Fri, 11 Dec 2009 20:20:53 +0100, Frank van Bortel wrote:
>>
>> >> And disks are cheap, duplicate cheap, and it's still cheap. But a
>> >> whole lot more certain!
>>
>> > Disks are cheap, especially somebody else is paying for them.
>>
>> Bingo!
>
> I just got an email from Oracle corp about how 11g can lower storage
> costs by 10x. Clicking on the link goes to a marketing page which
> includes this bullet point:
>
> Compress all types of data to minimize storage requirements
>
> But the stupid page overwrites "storage requirements" with a button to
> "FIND OUT MORE". So it tells me:
>
> "Compress all types of data to minimize FIND OUT MORE"
>
> I always thought we used databases to maximize finding out things. I
> guess less is more these days.
>
> jg

Joel, you must be reading my mind. Look at the subject named
"Compression" that I just started. As for your idea about maximizing, I
do not subscribe to the "big is beautiful" principle. I don't subscribe
to "size doesn't matter", either. We're talking about compression, of
course.

--
http://mgogala.byethost5.com

Noons

unread,

Dec 15, 2009, 7:20:07 AM12/15/09

to

Mladen Gogala wrote,on my timestamp of 15/12/2009 6:46 AM:

>> I just got an email from Oracle corp about how 11g can lower storage
>> costs by 10x. Clicking on the link goes to a marketing page which
>> includes this bullet point:
>>
>> Compress all types of data to minimize storage requirements
>>
>> But the stupid page overwrites "storage requirements" with a button to
>> "FIND OUT MORE". So it tells me:
>>
>> "Compress all types of data to minimize FIND OUT MORE"
>>
>> I always thought we used databases to maximize finding out things. I
>> guess less is more these days.
>>

LOL! flash?

>
> Joel, you must be reading my mind. Look at the subject named
> "Compression" that I just started. As for your idea about maximizing, I
> do not subscribe to the "big is beautiful" principle. I don't subscribe
> to "size doesn't matter", either. We're talking about compression, of
> course.
>

I'd settle for 10g compressing partitioned tables.
Unfortunately, it doesn't.
Or at least: it didn't the ones I threw at it.
Which BTW compressed fine while not partitioned...

Ah well: I'm sure 11gRsomething-or-other solves all that!...

Mladen Gogala

unread,

Dec 15, 2009, 10:09:57 AM12/15/09

to

On Tue, 15 Dec 2009 23:20:07 +1100, Noons wrote:

> I'd settle for 10g compressing partitioned tables. Unfortunately, it
> doesn't.
> Or at least: it didn't the ones I threw at it. Which BTW compressed fine
> while not partitioned...
>
> Ah well: I'm sure 11gRsomething-or-other solves all that!...

For a small price, of course.

--
http://mgogala.byethost5.com