We have a constantly updated relational database with mostly random writes
and reads.
The database size will be 60 GB in total. It will run in a UNIX (IBM or SUN)
environment.
We want to splitt up the database in a Read Only (RO) and a Read Write (RW)
database (probably running on 1 server). Clients will access (query) the RO
database. The RW and RO databases will be switched every hour. So the RW
becomes RO and vise versa. The new RO database now contains the newest data.
The new RW database will be updated with the transaction log (which will be
fast).
Now we need to decide on purchasing disk storage.
1. The RO storage (60 GB) needs maximum read performance. It doesn't need to
be failsafe,
if the RO crashes we can switch to the RW or vise versa to service the
end-users. In the
meantime we can recover the crashed diskset.
2. The RW storage (60 GB) needs maximum read/write performance. It doesn't
need to be fail-safe.
3. Then we need several fail-safe filesystems to store transaction logs.
Let's say 2 * 40 GB.
4. Also we need storage (200 GB) for other things.
For the RO/RW storage, do we buy 2 fast 60 GB disks or should we consider
for example RAID 0 with 5 * 20GB disks.
Or do we buy an integrated storage solution ? Important is that the RO and
RW storage is as fast as possible.
Please advise.
Regards,
Louis Banens
Your need depends on the number of users and transactions per time unit.
A single 15000 Rpm disk will give you approx 170 - 200 random accesses
per second. IIRC the TPC-C benchmark transaction was 27 accesses.
That should then give you maximum some 6 to 7 transactions per second.
from a single disk. ( provided it does not do something else as well )
If you know the number of diskaccess per transaction in your app
you will ofcourse end up at a different result.
How large is your user community ?
How many transactions per hour can they practically produce ?
Are you doing Batch runs ? (basically producing transactions As Fast as Possible)
If you are going to use the RO part as a datawarehouse it will be almost impossible
to size. If you allow someone to type " select *item from whatever "
and you have to read the complete database to satisfy the query, performance is gone.
Always buy more disks than you need ! ( You will grow ! )
--
========================================================
Lars Tunkrans
smtp: lars dot tunkrans at bredband dot net
--------------------------------------------------------
The more spindles, the more "multiprocessing" you can get out of that,
so there's considerable value to having as many disks as possible.
That being said, small SCSI disks are probably older, slower
technology, so you may be better off getting a somewhat smaller number
of newer ones.
If speed is key, and you can afford recovery time if a disk breaks
down, then striping the whole thing across a set of 5 75GB SCSI disks
would give the highest speed. There's value to not having to think
hard about the configuration, and RAIDing the whole thing across the
bunch of disks would be pretty simple.
--
let name="cbbrowne" and tld="ntlug.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/spreadsheets.html
Signs of a Klingon Programmer #7: "Klingon function calls do not have
'parameters' -- they have 'arguments' -- and they ALWAYS WIN THEM."
Damon.
"Louis" <nos...@nospam.nl> wrote in message
news:3f4cdfb2$0$49112$e4fe...@news.xs4all.nl...
How random is "random"? If it's going to be clustered such that requests
will typically be "close" to each other, then max out on ram.
And if you can help queries by having indexes, again max out on ram. The
aim is to minimise the need to access the disk, so disk speed won't
matter.
Cheers,
Wol
--
Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
Witches are curious by definition and inquisitive by nature. She moved in. "Let
me through. I'm a nosey person.", she said, employing both elbows.
Maskerade : (c) 1995 Terry Pratchett
If it is unrealistic to have 60GB of RAM to cache the data, then it
will surely be necessary to do _something_ on the disk side of things.
Things like FibreChannel disk arrays seem highly helpful to this end;
battery-backed cache on the array can provide _massive_ improvements
in per-transaction update times, as the array can treat transaction
updates as "committed" as soon as they are cached, instead of having
to wait until the disk actually rotates to the appropriate location.
A few hundred KB of cache on the controller could be very helpful;
move it to a few hundred MB and it's all the better :-).
--
"aa454","@","freenet.carleton.ca"
http://cbbrowne.com/info/spreadsheets.html
Thank you for flying U.S.A.F. We hope that you will consider us again
when your travel plans next include bombing Tripoli.