>
>There has been some discussion recently in comp.sys.mac concerning the relative
>merits of two different methods of storing database files: the monolithic
>(all-in-one-file) way and the distributed (files-all-over-the-place) method.
>...
>
>The benefits of monolithic structure are few.
It's dangerous to lead with your conclusion. Your position is not
100% false or true. Mono vs. separate file performance is very
dependent on the underlying software platform (i.e. OS) and
its implementation. I object to the use of "distributed" also;
this term as used with DBMS's refers to a database that is
physically fragmented across > 1 machine, but allows the
user transparent access to it, i.e. access not requiring knowlege of
or even showing the physical dispersion.
I should mention I am from the POV of a transaction-processing
relational DBMS vendor. Many users, much traffic, much multi-user
contention. Issues are different from the PC/Mac DBMS ones.
>With the Monolithic structure, it is possible (or guaranteed, depending on
>your particular choice >of DBMS) that you will corrupt some other portion
>of your database.
Again, you are assuming your conclusion. Properly designed databases
decouple their indexes sufficiently so that a corrupt one may be
deleted and a new one created. Now, an error can happen with a data
structure that the index and its table (or d.b.) share; that can lead
to serious corruption. But if such sharing is designed into
a multi-file system, then the same vulnerability exists. Also, the OS
could crash, leaving several files inconsistent.
>Good luck
>finding out what got damaged- it may take you weeks to find it,
Unless your system is properly instrumented, with automatic checkpointng,
and properly managed with regular backups-
>by which time all your backups may have the damage as well.
>In rare cases you could trash your entire database....
>The danger, however, never goes away entirely. This reason alone should
>convince most people.
Our customers don't seem to be convinced of the danger.
Of course, most software is not bug-free, and that's where customer
support comes in. The idea is to make the catastrophe scenario very
unlikely, with good software, fault-tolerance features, regular
checkpoints and backups. Reducing the likelihood of catastrophe makes
the optimization gained from a single, specially-managed file worth
it.
>
>The other overriding reason to use a distributed structure is performance. If
>your DBMS has to go through its own file-management code as well as the OS's,
>it will always be slower than if it only needed to go through the OS.
Unless you don't go thru the OS code. Know what a "raw disk" (in UNIX)
or a "foreign mounted" disk (in VMS) is/are? They are physical-block
interfaces to disks and disk partitions. They allow I/O to/from the
user's own buffer rather than the disk cache's, and immediate write-thru
instead of being subject to the OS's caching policies. Why use OS
services for a DBMS when they weren't written for a DBMS? I think
you are also exaggerating the trade-off. Why are 2 directory lookups
(to reach table T in database D) faster than two file reads (1 for data
dictionary to get table location and the other for the table page).
Also with an OS approach, to get decent caching, you'd use up file
descriptors very quickly keeping previously-used tables open.
Not so if you have your own table handles within the DBMS, especially
if they are a configurable resource.
>...With a distributed structure, the middle step doesn't
>exist. This time savings is not immense if the DBMS is very well-written, but
>the vast majority are not (at least when it comes to speed optimization).
Herewith, I toot my own horn: Sybase is very well-written (having written
about 8-10% myself and reviewed about 90). The system was designed
with data integrity and performance as the primary goals.
>
>There is a much bigger performance gain for distributed structures in a
>multiple-machine or multiple-hard-disk environment. ...
> [ discussion of multiple-disk interleaving, RAM-disks,
> arm-waving around the problem of multi-user access management:
>...Of course there are
>important logistics to consider, such as how to lock out access to data which
>is temporarily invalid because it is being updated privately by that node, but
>often this is not a problem at all. Even when it is, it's better that not being
>able to do the job at all. ]
Unless you don't mind inconsistent transactions and corrupt data.
As for the disk strategies, they're fine. I should mention that a
mono organization should not prohibit you from using other disk
devices (or files). If you have two disks, you want
to lay things out so that they carry about the same load at any
given time. A good mono organization allows this interleaving
without forcing dependency on OS mechanisms for implementing it.
>
>There is one other very important reason to use a distributed structure that
>comes to mind. Any monolithic structure will impose arbitrary restrictions on
>the number of data files or fields (tables and columns) allowed in the
>database. Sometimes these restrictions are very severe. Furthermore, if you
>have a very large database, it may not fit on one physical disk, and with the
>monolithic structure you are limited (generally) to one device. With the
>distributed structure, these limitations just go away.
Continued simplistic tone. OS's are software. DBMS's are software.
Why can't a DBMS be written with the same liberal (or practically nonexistent)
limits that the magical OS managing the separate-file structure has?
>
>....For example, one of the big DBMSs
>that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write
>directly to the disk (let's skip the technical details...). While this is like
>the monolithic system in some ways, it still allows for some of the benefits of
>the distributed structure. UNIX wizards may have a lot to say about this...
That's us (at least). We went so far as to modify the good ol' OS (Sun UNIX)
to support async I/O to raw disks, so that our (UNIX) process can
compute while the hardware deals with our last request(s). This
feature is native to VMS, and we use it there too.
Really the issue here is OS independence, not how many files you use.
Our experience is that we know best how to deploy system resources
to run our software, not the writers of the host operating system,
fine as that OS may be for other purposes.
-TW
Disclaimer: I am responsible for these opinions.
{ihnp4!pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
..not an @ in the bunch...
And have you met those goals? How do you measure your success?
-- Jon
--
Jonathan Krueger uunet!daitc!jkrueger jkru...@daitc.arpa (703) 998-4777
Inspected by: No. 15