Role of functional dependencies in database design

Nicola

unread,

Feb 12, 2015, 11:02:19 AM2/12/15

to

I hope you won't mind if I start a new thread for the subject. The other
thread has got quite long and somewhat off-topic.

So, the question is: what is the role of FDs (and possibly other types
of dependencies) in database design? My personal take:

In general, I don't think that it is possible for them to inform the
design activity since the early stages or that they should be used as
the main design tool. I do think, however, that when a system is
understood sufficiently well (e.g., by means of a conceptual design),
they offer an aid (are my schemas well-designed according to certain
criteria?) and insight (allowing us to possibly find alternative ways to
design a database, for example).

>> Nicola wrote:
>> FDs do not come out of void, they follow from your requirements. They
>> are just a formal version of a very special (and relatively simple)
>> type
>> of requirement. I don't arbitrarily decide to assume that {desert
>> island} -> {can opener}, unless you tell me that there cannot be two
>> can
>> openers in the same desert island, in the world of desert islands you
>> are interested in.

> Erwin wrote:
>There you have it. When people tell you "there can only be one can
>opener on any given desert island", then what do you see ?
>
>A table {desert island, can opener} with (a.o.) a key "desert island",
>or
>A relation schema of whatever set of attributes in which you must add
>the FD desert island -> can opener ? Where this then tells you "the
>identity of a desert island is a determinant factor to the identity of
>the can opener that is on it" or "the relation over {desert island, can
>opener} is a function from desert island to can opener" ?

At an early stage, you probably cannot say: it might also be that desert
island and can opener should be represented as two relation schemas (for
which we have identified no attributes yet) related through a one-one or
one-many relationship. My point here is simply that FDs are not
invented, they are discovered.

>> FDs are a special type of first-order formulas. I can imagine >>
defining
>> an English-like syntax for them.

(Not that it would necessarily be a good idea, I should add.)

>Hmmmmmmmm.
>
>"The projection on {desert island, can opener} of the join of all
>tables in the database, represents a function from desert island to can
>opener."

Well no, like this certainly no :) Maybe something along these lines
(paraphrasing the given-when-then constructs existing in some testing
tools):

given two lecturers L1 and L2 and a time T
when L1 is teaching at time T and L2 is teaching at time T
then L1 and L2 are in different rooms

(this is absolutely off the top of my head: don't take it as a serious
attempt).

>And besides, avoiding redundancy was only a relevant topic in database
>design as long as there were no feasible ways to control the
>redundancies, e.g. via ASSERTIONs. Those days are gone.

Could you provide a small example?

Nicola

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Erwin

unread,

Feb 12, 2015, 1:58:19 PM2/12/15

to

Op donderdag 12 februari 2015 17:02:19 UTC+1 schreef Nicola:

> I hope you won't mind if I start a new thread for the subject. The other
> thread has got quite long and somewhat off-topic.
>
> So, the question is: what is the role of FDs (and possibly other types
> of dependencies) in database design? My personal take:
>
> In general, I don't think that it is possible for them to inform the
> design activity since the early stages or that they should be used as
> the main design tool. I do think, however, that when a system is
> understood sufficiently well (e.g., by means of a conceptual design),
> they offer an aid (are my schemas well-designed according to certain
> criteria?) and insight (allowing us to possibly find alternative ways to
> design a database, for example).

The problem is that formally speaking, you must first have that conceptual design (entities+relships) translated into a logical one (relvars). Only after that can you start thinking about FDs and normalization, and it requires that you somehow derived the FDs from, well what else other than the conceptual schema you have, perhaps incremented with a pack of tacit understandings of what it represents.

And what in a conceptual schema is it that you can get the FDs from ? NOT from any key specifications that are already there, because if you already have them, it's pointless deriving FDs from them only to find out that normalization theory gives you exactly the keys you started off with.

The normalization procedure/process as rooted in normalization theory, assumes a bottom-up methodology. First assume the database is a single gigantic table, then start decomposing. The trouble is that most people intuitively prefer top-down, in practice they also do it, and I have certainly known a time when they got things close to 99% right without any application of any kind of normalization theory.

>
> >> Nicola wrote:
> >> FDs do not come out of void, they follow from your requirements. They
> >> are just a formal version of a very special (and relatively simple)
> >> type
> >> of requirement. I don't arbitrarily decide to assume that {desert
> >> island} -> {can opener}, unless you tell me that there cannot be two
> >> can
> >> openers in the same desert island, in the world of desert islands you
> >> are interested in.
>
> > Erwin wrote:
> >There you have it. When people tell you "there can only be one can
> >opener on any given desert island", then what do you see ?
> >
> >A table {desert island, can opener} with (a.o.) a key "desert island",
> >or
> >A relation schema of whatever set of attributes in which you must add
> >the FD desert island -> can opener ? Where this then tells you "the
> >identity of a desert island is a determinant factor to the identity of
> >the can opener that is on it" or "the relation over {desert island, can
> >opener} is a function from desert island to can opener" ?
>
> At an early stage, you probably cannot say: it might also be that desert
> island and can opener should be represented as two relation schemas (for
> which we have identified no attributes yet) related through a one-one or
> one-many relationship. My point here is simply that FDs are not
> invented, they are discovered.

My question, and what I would like to understand is : what exactly is the available input (its semantics and in which possible syntactic forms does it appear, and where) that allows this "discovery" to be made ? I think JKL's post where he used the word "pesky" expressed the very same concern.

>
> >> FDs are a special type of first-order formulas. I can imagine >>
> defining
> >> an English-like syntax for them.
>
> (Not that it would necessarily be a good idea, I should add.)
>
> >Hmmmmmmmm.
> >
> >"The projection on {desert island, can opener} of the join of all
> >tables in the database, represents a function from desert island to can
> >opener."
>
> Well no, like this certainly no :) Maybe something along these lines
> (paraphrasing the given-when-then constructs existing in some testing
> tools):
>
> given two lecturers L1 and L2 and a time T
> when L1 is teaching at time T and L2 is teaching at time T
> then L1 and L2 are in different rooms

Well, there you have it. If you think that this sentence indicates an FD {L,T} -> {room}, then you are wrong ! This statement per se does not prohibit the same lecturer being in two distinct rooms at the same time. (Physics does, perhaps, but that really is another issue. My issue is correct logical inferences from given requirements as stated.)

The correct conclusion here is the FD {room,T} -> {L}. But before spotting that, I first had to twist the requirement into "only one lecturer can be in any given room at any time". Or "if two observations of a lecturer being in a room at time T, both involve the same room and the same time T, then they involve the same lecturer". But (knowing what the FD turned out to be and looking back on the natural language formulation) I find those alternatives only slightly less convoluted than your original.

(Your example being "temporal" is very tempting to go drifting off into the temporal arena. In fact, the temporal proposals by Darwen/Date/Lorentzos talk about keys almost all the time, but normalization theory isn't mentioned even once. My feeling is the current normalization theory is insufficient to underpin their approach soundly, and although I very much like their approach altogether, I'm amazed I never see any complaint about this.)

(...leaving the ASSERTIONs example for another post)

>
> (this is absolutely off the top of my head: don't take it as a serious
> attempt).
>
> >And besides, avoiding redundancy was only a relevant topic in database
> >design as long as there were no feasible ways to control the
> >redundancies, e.g. via ASSERTIONs. Those days are gone.
>
> Could you provide a small example?
>
> Nicola
>

> --- news://freenews.netfront.net/ - complaints: ---

Erwin

unread,

Feb 12, 2015, 2:25:43 PM2/12/15

to

Op donderdag 12 februari 2015 17:02:19 UTC+1 schreef Nicola:

> >And besides, avoiding redundancy was only a relevant topic in database
> >design as long as there were no feasible ways to control the
> >redundancies, e.g. via ASSERTIONs. Those days are gone.
>
> Could you provide a small example?

I overplayed my hand, slightly.

Take an example relvar DBR {A B C} subject to the FD {A} -> {B} (and its key thus being {AC}).

Rewind 40 years.

Enforcing this FD in that design involves extra checks, and in the time when all that theory originated, this meant : extra coding work, precisely in the update process that was exposed to the update "anomaly" involved. As far as I can tell, all that theory was developed so that the update "anomalies" implied by a given design, could be _identified_, so that after that an _informed_ decision could be made regarding how to deal with it : (a) eliminate it through redesign, or (b) live with it and keep the update processes that are exposed to the "anomaly" "guarded","monitored","under supervision",... But in that day and age, (b) was almost never a viable option, simply because the _means_ for doing so (which is "if a program gets it wrong, make sure it won't leave its devastating effects persisted in the database", or iow, "declare a constraint") simply weren't available.

Fast forward 40 years.

This day and age, I can simply do the (equivalent of the) following (hypothetical SQL) :

CREATE VIEW PROJ AS SELECT DISTINCT A,B FROM DBR CONSTRAINT UNIQUE (A) ;

That is, I have a system to my avail where I simply declare the view (projection on all the attributes mentioned anywhere in an FD), and declare a constraint to the effect that the LHS portion of the FD constitutes a key to that view. And if a program fails to obey the FD, it won't get any updates through (unless it is accidentally a good one).

Where I "overplayed my hand", is that introducing the constraint to enforce the FD to be obeyed, will not help in simplifying the potentially more complex proceedings a program has to follow to get the database updated. That is, while there is no possibility that the update "anomaly" will lead to incorrect data, the "anomaly" is still there and must still be dealt with by the programs that are exposed to it.

Erwin

unread,

Feb 12, 2015, 2:39:41 PM2/12/15

to

Op donderdag 12 februari 2015 19:58:19 UTC+1 schreef Erwin:

> Op donderdag 12 februari 2015 17:02:19 UTC+1 schreef Nicola:

> > Well no, like this certainly no :) Maybe something along these lines
> > (paraphrasing the given-when-then constructs existing in some testing
> > tools):
> >
> > given two lecturers L1 and L2 and a time T
> > when L1 is teaching at time T and L2 is teaching at time T
> > then L1 and L2 are in different rooms
>

> ... convoluted ...

I now see why I feel so uncomfortable with this formulation, and why it doesn't reveal the real FD right away.

First, yet another alternative formulation of the same.

Given two lecturers L1 and L2 and a room R
when L1 is teaching in room R and L2 is teaching in room R
then this is at different times.

The FD is {room, time} on the LHS. room and time are of equal "importance" here, they play "similar roles". But now look at the word usage for "room" and "time". Your original speaks of "different rooms" but _NOT_ of different times (quite the contrary) !

The _formulation_ in natural language "breaks" a certain kind of "symmetry" that is indeed present in the math formulation. That makes the discovery/analysis process harder !

Nicola

unread,

Feb 12, 2015, 4:43:24 PM2/12/15

to

In article <b99f007c-a8ab-4a23...@googlegroups.com>,

Erwin <e.s...@myonline.be> wrote:

> My question, and what I would like to understand is : what exactly is the
> available input (its semantics and in which possible syntactic forms does it
> appear, and where) that allows this "discovery" to be made ? I think JKL's
> post where he used the word "pesky" expressed the very same concern.

Formally, FDs are axioms. As such, *within* the formal system they are
given facts, but the discovery must happen *outside* the formal system.
Hence, the available input cannot be described formally (or, if it can,
it will be in a different formal system that will have its own axioms).

You don't have any difficulty with the process of "discovering" keys, do
you? I don't see the process of "discovering" other constraints as
intrinsically different.

> >
> > >> FDs are a special type of first-order formulas. I can imagine >>
> > defining
> > >> an English-like syntax for them.
> >
> > (Not that it would necessarily be a good idea, I should add.)
> >
> > >Hmmmmmmmm.
> > >
> > >"The projection on {desert island, can opener} of the join of all
> > >tables in the database, represents a function from desert island to can
> > >opener."
> >
> > Well no, like this certainly no :) Maybe something along these lines
> > (paraphrasing the given-when-then constructs existing in some testing
> > tools):
> >
> > given two lecturers L1 and L2 and a time T
> > when L1 is teaching at time T and L2 is teaching at time T
> > then L1 and L2 are in different rooms
>
> Well, there you have it. If you think that this sentence indicates an FD
> {L,T} -> {room}

Of course not.

> The correct conclusion here is the FD {room,T} -> {L}.

Yes.

> But before spotting
> that, I first had to twist the requirement into "only one lecturer can be in
> any given room at any time". Or "if two observations of a lecturer being in
> a room at time T, both involve the same room and the same time T, then they
> involve the same lecturer". But (knowing what the FD turned out to be and
> looking back on the natural language formulation) I find those alternatives
> only slightly less convoluted than your original.

I said not to take this example too seriously ;) The point of a
hypothetical specification in natural language is that you should not
bother with the mathematical formulation of the corresponding FD,
because the system would take care of it. My example may fail miserably
in being higher level compared to the corresponding FD, but that does
not mean much. If you had a program that takes as input the sentence "no
two teachers can teach in the same room at the same time" and outputs
the schema Teaches(room, time, teacher) with key {room, time} as
output, would you be satisfied? :)

Nicola

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Erwin

unread,

Feb 12, 2015, 5:11:09 PM2/12/15

to

Op donderdag 12 februari 2015 22:43:24 UTC+1 schreef Nicola:
> In article <>,

> --- news://freenews.netfront.net/ - complaints: ---

I probably would.

For what it's worth : "no two teachers can ..." immediately makes me think of a certain RENAME/JOIN combination that then in turn immediately makes me conclude "aha, a key".

But my mental process in that regard is probably heavily influenced by the work I've done on constraint enforcement. Give me a database [design] and I immediately start looking for all possible [data] faults (and their defining relational expressions). But somehow for keys, that's not the process I follow. I just wonder "what makes this thing unique". It's probably just too deeply instilled, but it works without FDs.

com...@hotmail.com

unread,

Feb 20, 2015, 1:16:15 AM2/20/15

to

On Thursday, February 12, 2015 at 8:02:19 AM UTC-8, Nicola wrote:

> So, the question is: what is the role of FDs (and possibly other types of dependencies) in database design?

We determine predicates that describe application situations. Basic predicates correspond to base tables. A FD holds when a certain constraint predicate built from a table predicate holds. (A JD corresponds to being able to write a table predicate as a conjunction of others. A FD corresponds to a certain JD.)

> In general, I don't think that it is possible for them to inform the design activity since the early stages or that they should be used as the main design tool.

As soon as you have a predicate you can determine FDs.

> > Erwin wrote:
> >There you have it. When people tell you "there can only be one can
> >opener on any given desert island", then what do you see ?
> >
> >A table {desert island, can opener} with (a.o.) a key "desert island",
> >or
> >A relation schema of whatever set of attributes in which you must add
> >the FD desert island -> can opener ? Where this then tells you "the
> >identity of a desert island is a determinant factor to the identity of
> >the can opener that is on it" or "the relation over {desert island, can
> >opener} is a function from desert island to can opener" ?

You don't need keys or entities, you just need predicates and determinants.

"If desert island I has opener Oa and desert island I has opener Ob then Oa = Ob."

Ergo I -> O in "desert island I has opener O".

> At an early stage, you probably cannot say: it might also be that desert island and can opener should be represented as two relation schemas (for which we have identified no attributes yet) related through a one-one or one-many relationship. My point here is simply that FDs are not invented, they are discovered.

First *say something* then you can find the FDs. There is no "one-one or one-many relationship" until then. First comes *a particular* relationship ie predicate then come parameters/attributes and then arities and other constraints.

> >> FDs are a special type of first-order formulas. I can imagine defining an English-like syntax for them.
>
> (Not that it would necessarily be a good idea, I should add.)
>
> >Hmmmmmmmm.

FD F1,... -> T holds in predicate R(F1,...,T,...) when (for all situations):

FORALL F1,...,Ta,...,Tb,...
R(F1,...,Ta,...) AND R(F1,...,Tb,...)
=> Ta = Tb

Notice that the determinant variables are unchanged between atoms, that there are two determined variables for one value, and that the other variables are don't-cares.

We can describe this pattern. Eg an FD holds "when each satisfying determinant subtuple has just one determined value". But all we have to do is *use* it with each predicate.

> given two lecturers L1 and L2 and a time T
> when L1 is teaching at time T and L2 is teaching at time T
> then L1 and L2 are in different rooms

FORALL L1,L2,T,R1,R2
teaches(L1,T,R1) AND teaches(L2,T,R2) => R1 <> R2

> (this is absolutely off the top of my head: don't take it as a serious attempt).

No FD.

For "then it was the same room", T -> R. For "then it was the same lecturer", T -> L.

> >And besides, avoiding redundancy was only a relevant topic in database
> >design as long as there were no feasible ways to control the
> >redundancies, e.g. via ASSERTIONs. Those days are gone.

But predicates, queries and updates are simpler in 5NF, ie when the only JDs are implied by the CKs.

philip

com...@hotmail.com

unread,

Feb 20, 2015, 2:42:41 AM2/20/15

to

On Thursday, February 12, 2015 at 1:43:24 PM UTC-8, Nicola wrote:

> In article <b99f007c-a8ab-4a23...@googlegroups.com>, Erwin wrote:
>
> > My question, and what I would like to understand is : what exactly is the
> > available input (its semantics and in which possible syntactic forms does it
> > appear, and where) that allows this "discovery" to be made ? I think JKL's
> > post where he used the word "pesky" expressed the very same concern.
>
> Formally, FDs are axioms. As such, *within* the formal system they are
> given facts, but the discovery must happen *outside* the formal system.
> Hence, the available input cannot be described formally (or, if it can,
> it will be in a different formal system that will have its own axioms).

Each base table holds rows that make a corresponding designer-chosen predicate true of the current application situation. A user evaluates the predicate for every row that could fit in a table and puts the ones making true propositions in. They look at the tables and plug the present and absent rows into its predicate to learn how the world is. They query by mapping a predicate about the application situation to a hybrid relation/predicate expression that the DBMS evaluates to the rows making the predicate true.

A designer shows that particular predicates built from these base predicates are true in every application situation that can arise. These are the constraints. A constraint is simultaneously a truth about each database state and its corresponding application situation. So the designer describes the application but this simultaneously describes table values.

> > > >> FDs are a special type of first-order formulas.

From: com...
> F1,... -> T
> FORALL F1,...,Ta,...,Tb,...
> R(F1,...,Ta,...) AND R(F1,...,Tb,...)
> => Ta = Tb

> > > given two lecturers L1 and L2 and a time T
> > > when L1 is teaching at time T and L2 is teaching at time T
> > > then L1 and L2 are in different rooms

From: com...
> FORALL L1,L2,T,R1,R2
> teaches(L1,T,R1) AND teaches(L2,T,R2) => L1 <> L2

> > Well, there you have it. If you think that this sentence indicates an FD
> > {L,T} -> {room}
>
> Of course not.
>
> > The correct conclusion here is the FD {room,T} -> {L}.
>
> Yes.

No. That FD is:

FORALL L1,L2,T,R
teaches(L1,T,R) AND teaches(L2,T,R) => L1 = L2

> > But before spotting
> > that, I first had to twist the requirement into "only one lecturer can be in
> > any given room at any time". Or "if two observations of a lecturer being in
> > a room at time T, both involve the same room and the same time T, then they
> > involve the same lecturer". But (knowing what the FD turned out to be and
> > looking back on the natural language formulation) I find those alternatives
> > only slightly less convoluted than your original.

It's not the same as the original constraint. Which isn't an FD. (Not surprising.)

But the natural language for an FD is not hard:

// {room,T} -> {L}
Given lecturers L1 and L2 and time T and room R,
if L1 teaches at T in R and L2 teaches at T in R
then L1 = L2

philip

Nicola

unread,

Feb 20, 2015, 4:08:58 AM2/20/15

to

In article <80f34c94-e62e-4d02...@googlegroups.com>,

com...@hotmail.com wrote:

> > given two lecturers L1 and L2 and a time T
> > when L1 is teaching at time T and L2 is teaching at time T
> > then L1 and L2 are in different rooms
>
> FORALL L1,L2,T,R1,R2
> teaches(L1,T,R1) AND teaches(L2,T,R2) => R1 <> R2

I think that it should be:

FORALL L1,L2,T,R1,R2
L1 <> L2 AND teaches(L1,T,R1) AND teaches(L2,T,R2) => R1 <> R2

> > (this is absolutely off the top of my head: don't take it as a serious
> > attempt).
>
> No FD.

Correct. The FD is implied, though:

Let X be L1 <> L2
Let Y be teaches(L1,T,R1) AND teaches(L2,T,R2)
Let Z be R1 <> R2

Then the sentence above is the universal closure of

X AND Y -> Z,

If my coffee has worked as it should, the latter is equivalent to

(NOT Z) AND Y -> NOT X,

that is,

R1 = R2 AND teaches(L1,T,R1) AND teaches(L2,T,R2) -> L1 = L2,

or, simplifying,

teaches(L1,T,R) AND teaches(L2,T,R) -> L1 = L2.

The idea behind my example is that a hypothetical formalism based on
natural language might allow the user to write specifications that are
not a direct translation of the logical definition of a FD, but from
which the system might *infer* a FD, giving the user more freedom in the
way constraints may be expressed. But again, I haven't really thought
about it too much.

Erwin

unread,

Feb 20, 2015, 4:22:53 AM2/20/15

to

Op vrijdag 20 februari 2015 10:08:58 UTC+1 schreef Nicola:
> In article <>,

> --- news://freenews.netfront.net/ - complaints: ---

Well, even if you can manage to get that formalism fully specced (a tall order imo), the main problem will still remain that if you apply that formalism to any given input, the output produced will only be as reliable as the input it is applied to.

Which is by the way illustrated perfectly by the example used : I carelessly overlooked the possibility of L1=L2, thus neglected to explicitly specify L1<>L2 as a premisse (tacitly assuming it would be tacitly understood), and the rest is history ...

Nicola

unread,

Feb 23, 2015, 11:34:08 AM2/23/15

to

In article <64b9cd98-8a5e-4f95...@googlegroups.com>,

Erwin <e.s...@myonline.be> wrote:

> Well, even if you can manage to get that formalism fully specced (a tall
> order imo), the main problem will still remain that if you apply that
> formalism to any given input, the output produced will only be as reliable as
> the input it is applied to.

That is the unsolvable problem of every specification: you may be
guaranteed that the output is correct wrt the specs, but how do you
ensure that the specs are correct? In principle, they should be easier
to write and understand than the system you want to verify, but it
doesn't mean that they do not require a good deal of thought and
technical competence. Still, formal specifications are arguably useful:
safety-critical software would not be as reliable, for example.

Nicola

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---