(start quote)
'... Note. Just to remind you, let me summarize those principles here:
1. A relvar that's not in fifth normal form (5NF) should be decomposed
into a set of 5NF projections.
2. The original relvar should be reconstructable by joining those
projections back together again.
3. The decomposition process should preserve dependencies.
4. Every projection should be needed in the reconstruction process.
'On the basis of this example I claimed that normalization by itself
wasn't enough: we needed something else to tell us what's wrong - what's
formally wrong, I mean - with this decomposition. In fact, I claimed
that what we needed was another principle (and later in the chapter I
claimed that the principle in question was essentially The Principle of
Orthogonal Design). ADR commented:
(ADR): "How about a principle stating that no attribute of the relvar
being decomposed is to appear as a nonkey attribute in more than one
projection? But I would not elevate such a simple and obvious rule to
the status of a principle; I would just add it to the definition of the
decomposition procedure. (That said, I admit that at the moment I'm
still wondering if there are pathological cases that my suggested
wording doesn't cater for.)"
(Date): 'There are several things I want to say ...
(snip)
'But my major concern is that I think the proposed rule is too strong ...
'By way of example, suppose suppliers are partitioned into classes (C1,
C2, etc.), so that the supplier relvar has an additional attribute
CLASS. Suppose also that (a) each class has just one associated status,
and (b) each city has just one associated status as well, but (c)
classes and cities are otherwise quite independent of each other. Then
the relvar satisfies these two functional dependencies (FDs):
{ CLASS } -> { STATUS }
{ CITY } -> { STATUS }
...(snip)
'it should be apparent that the following is a valid nonloss
decomposition for this revised suppliers relvar (I ignore attribute
SNAME for simplicity):
SCC ( S#, CLASS, CITY}
KEY { S# }
CLS { CLASS, STATUS}
KEY { CLASS }
CTS { CITY, STATUS}
KEY { CITY }
'Observe in particular that attribute STATUS appears here as a nonkey
attribute in more than one projection, and the decomposition thus
violates ADR's proposed rule. Yet, to repeat, the decomposition is
surely valid, and the proposed rule is thus not quite right.
(end quote)
I think that Date might be using a strawman argument here, ie., he
shoots down an example which is a violation of ADR's suggestion.
If I'm not mistaken Armstrong's axioms can be applied to show that
{ CITY, CLASS } -> { STATUS },
so the CTS relation isn't needed if the CLS relation is replaced by
CLS { CITY, CLASS, STATUS }
KEY { CITY, CLASS }
Have I got this right? If so, I think ADR's suggested rule does apply
to Date's example.
Sorry, you've got it wrong. {CITY,CLASS,STATUS} satisfies the multi-valued
dependency,
STATUS ->-:> CITY | CLASS
which is not implied by the key
{CITY,CLASS}
{CITY,CLASS,STATUS} is therefore not in 4NF let alone 5NF.
The projections, {CITY,STATUS} and {CLASS,STAUTS}, are in 5NF, though.
While AB -> C can be inferred from A -> C and B -> C, that is,
a table that satisfies A -> C and B -> C also satisfies AB -> C, but
A -> C and B -> C cannot be inferred from AB -> C.
The table,
(Boston, C1, S1),
(Boston, C2, S2),
(Chicago, C1, S2),
(Chicago, C2, S1),
satisfies the functional dependency
{CITY,CLASS} -> STATUS
but it is easy to see that neither
CITY -> STATUS or CLASS -> STATUS
are satisfied.
Thanks, I shouldn't have dropped those two dependencies and I take back
the strawman crack.
But I don't get why { CITY, CLASS, STATUS} isn't 4NF, a similar MVD
could be claimed for any relation that has at least one key attribute
and one non-key attribute.
The claim is due to Date's "otherwise quite independent" criterion.
Whenever there is a functional dependency from A to B, for each B value
there is a disjoint subset of A values. The set of A values is effectively
"partitioned" by the set of B values. In {CITY,CLASS,STATUS}, there are two
distinct functional dependencies,
CITY -> STATUS and CLASS -> STATUS;
consequently, for each STATUS value there is a disjoint subset of CITY
values and a disjoint subset of CLASS values. In order for those subsets to
be "otherwise quite independent," the multtivalued dependency,
STATUS ->-> CITY | CLASS
must hold. In the case of a table that satisfies a functional dependency
like
K -> A,
the degenerate multivalued dependency,
A ->-> K | nil,
is due to the functional dependency K -> A. It is therefore safe to say
that it is implied by the key.
It is also safe to say that {CITY,CLASS,STATUS} isn't even in 3NF, since
there are functional dependencies that are not implied by the key.
{CITY,CLASS} -> STATUS
does not imply
CITY -> STATUS or CLASS -> STATUS.
Sorry, BCNF.
Okay, this is new to me, and given your further exposition, would
appear highly nonstandard to me. Can you point me towards a freely
available paper in which Date nails himself to the cross with this
interpretation?
--
Sampo
Since I started it let me jump in and say that while the second
objection about 'quite independent' was a bit pedantic, given the
context it wasn't wrong. The first objection was more to the point,
ie., I had dropped the two fd's that Date used as a basis. The second
was just a side-effect of the basic error.
I suggest you read Ronald Fagin's paper "Multivalued Dependencies and a New
Normal Form for Relational Databases." Date's "otherwise quite
independent" criterion exemplifies the kind of relationship defined in the
paper. For example, an employee's salary is quite independent from (Fagin
uses "orthogonal" to) an employee's set of children. In the same way the
set of cities associated with a particular status is orthogonal to the set
of classes that are associated with it. The appearance of being nonstandard
is probably due to the fact that multivalued dependencies are not ordinarily
considered in the normalization procedure unless the database scheme is
already in BCNF.
I must have been braindead at the moment: I know and love my MVD's,
but despite your having mentioned them by name, I just kept on
thinking about how Date goes around it in the quote.
I actually consider MVD's more fundamental to normalization than FD's,
because they give a condition for two-way nonloss decomposition that
is not only sufficient but also necessary. Of course join dependencies
would be better still, but in practice they're nowhere near as easy to
spot or wrap your head around than FD's (for each x we have precisely
one y) or MVD's (for each x we have a unique set Y). Just about the
only nasty thing about full MVD's is that you always have to consider
the context in which the attributes appear, so that the concept really
only works well when you're designing the schema from the top down,
and not from the bottom up. In the other direction you'd probably want
to use embedded MVD's, but then IIRC their implication problem is
open.
As a funky note about the higher normal forms and more exotic kinds of
dependencies... Just about all course notes and practical expositions
of normalization I've seen sooner or later tell you 3NF (or perhaps
BCNF) is all you need in practice; "most relations in the wild are
then in 4NF as well". For the load of crap that is, it's repeated
amazingly often. You don't need to have anything in your model beyond
firms with sets of subsidiaries, people with families or orders with
line items, and you're already well into (E)MVD land. As such it's not
a big surprise that one well-known study found about one in five
deployed databases to have MVD's which hadn't been factored out (and
that was before star schemas became all the craze).
But certainly you never have to consider irreducible join
dependencies, right? Wrong again: the very first schema I got to
design from the ground up at work in fact had a genuine, irreducible,
three-pronged join dependency that had to be explicitly broken down.
In that case we had projects with milestones and measurements taken at
those milestones. Each project defined which measurements it was
using, and the milestones were subsetted as well, from a formal
waterfall template shared across the organization. So far we could of
course make do with MVD's, but additionally each metric was only
defined for a subset of the milestones; you couldn't for example have
cost measurements for preproject planning checkpoints where the
project hadn't been given a go, because the appropriate cost accounts
hadn't in that case been assigned yet. Bang: suddenly you no longer
have a two way nonloss decomposition, although a three way one exists.
And of course there was lots of assorted wackiness going on besides
that, like measurements being optional within the above constraints
(the eventual solution had to resort to triggers to check the
inclusion dependency against {project, milestone, measure} since
Oracle doesn't let you point a foreign key to a view), temporal
normalization (measurements could change values after first being
entered, and so had to be versioned), partial temporal constraints on
the order the milestone data was filled out, measurement targets with
a separate life of their own except that they couldn't be set after
actuals started to pour in, and so on, and so forth.
I ended up constraining the hell out of the thing even after the basic
design was in picture perfect PJ/NF (perhaps even 6NF, I didn't
formally check that one), with the result that there hasn't been a
single integrity issue with said DB to date. Had I just normalized
away the final nulls (in validity end time columns) and done away with
surrogates, that baby would probably qualify as my masterpiece.
--
Sampo