Not long ago, I had a quick chat with a logician, where we talked about
boolean domains (domain with two elements) in connection to logic and
databases. All started from an instance of a relational schema:
Employee(id, dept, mgr?)
where mgr? is boolean. A tuple (x,y,true) means that employee x belongs
to department y and x is a manager of y. A tuple (x,y,false) means that
employee x belongs to department y but x is not a manager of y.
I was arguing about how schemas like that are commonly found in
practice, and about how I feel uncomfortable with them, because, let
apart the obvious difficulties with enforcing some integrity constraints
(e.g., a department should have one and only one manager at a time) in
current DBMSs, truth of a predicate is coded as a value of a domain.
Suprisingly (for me), she did not dismiss the approach at once, but
pointed out that you are trading a predicate (Manager(id)) for a couple
of new constant symbols and the two things should be essentially
interchangeable (from a logical point of view).
Not long after that, I stumbled across one of those lengthy arguments by
Chris Date, in which he critiqued a schema like:
Loves(x,y) ("x loves y")
Hates(x,y) ("x hates y")
on the ground of a principle by which each tuple should be insertable
only in one schema of a database (in the example above, ('Romeo',
'Juliet') may be inserted in both), and he proposes to use a single
schema R(x, z, y) where z ranges over {'loves','hates'}.
I am still uncomfortable with boolean domains, but I can't pinpoint the
exact reasons. In the Loves/Hates example I don't see a particular
advantage in using a single schema, because in either case you must
impose similar kinds of constraints (if x loves y then x doesn't hate
y...); the only difference is that in the first case those are
interrelational constraints, and you might say that they are not as
"easy" to enforce as intrarelational constraints (is it true?).
In the Employee example, boolean values may lead to redundancy, but of a
kind that is not captured by the usual normal forms. For example, if
each department may only have one manager, then in this instance:
id dept mgr?
--------------
1 Math true
2 Math ?
3 Math ?
all the values denoted by ? must necessarily be false. This schema is in
5NF, however (it is not in DKNF because it has an insertion anomaly,
e.g, inserting (4, 'Math', true) leads to an inconsistent instance). Of
course, this argument doesn't hold if a department may have many
managers.
But maybe my intuition is wrong. What do you think? Are there compelling
reasons to avoid boolean domains in logical database design?
Nicola
--- news://
freenews.netfront.net/ - complaints:
ne...@netfront.net ---