Sixth normal form

Sameeksha

unread,

Jul 30, 2007, 6:45:00 AM7/30/07

to sameeks...@persistent.co.in

Googling out for definition and explanation for sixth normal form only
resulted in the following information - "6th normal form states that a
relation R should not contain any non-trivial join dependencies". Also
everywhere it is stated that this normal form takes into account the
temporal (time) dimension to the relational model, and that current
implementations like SQL server 2005 do not implement this normal
form.

Any more explanation and preferably an example would help in
understanding the concept behind this normal form.

Bob Badour

unread,

Jul 30, 2007, 7:27:17 AM7/30/07

to

The join dependency definition is the definition for 5th normal form.
6th normal form was introduced in Date's, Darwen's and Lorentzos'
/Temporal Data and the Relational Model/

Basically, 6th normal form puts each non-key attribute in a separate
relation.

Jan Hidders

unread,

Jul 30, 2007, 12:30:30 PM7/30/07

to

On 30 jul, 12:45, Sameeksha <sameeksha.ch...@gmail.com> wrote:
> Googling out for definition and explanation for sixth normal form only
> resulted in the following information - "6th normal form states that a
> relation R should not contain any non-trivial join dependencies". Also
> everywhere it is stated that this normal form takes into account the
> temporal (time) dimension to the relational model, and that current
> implementations like SQL server 2005 do not implement this normal
> form.

It would help if you first explained what you already know, so we
don't spend time on explaining what you already know. Do you know what
at join dependency is? Do you know when it is trivial?

Btw. where and in what context did you read that SQL server did not
support this normal form? That is a rather odd statement since the
normal form is just about how much to split your relations into
projections, so strictly speaking it needs no support at all form the
DBMS. But perhaps support for temporal features was meant?

> Any more explanation and preferably an example would help in
> understanding the concept behind this normal form.

Informally put it says that every distinct fact gets its own relation
or "if you can split, then you should". So if you have a relation
Student(student_id, name, address) then the fact that the student with
a certain id has a certain name is split form the fact the this
student lives at a certain address. This is different from 5NF since
there you only split when there is a risk of redundancy or update
anomalies.

-- Jan Hidders

Sameeksha

unread,

Jul 31, 2007, 12:33:51 AM7/31/07

to

Thanks for replying. Some explanations regarding your questions :-
1. Just to explain my concept of join dependency - consider this
example - a table contains TeacherId, SkillId and CourseId as fields.
These are related by the rule that teacher with certain skills can
teach certain courses, however a teacher may possess skills required
for a course, but he may not be teaching that course. Here there are 3
join dependencies - (TeacherId, SkillId), (SkillId, CourseId) and
(TeacherId, CourseId) which should be separate tables as per the 5th
normal form. Please verify whether this concept is nearly correct.

2. My concept of Trivial dependency is this - in case in a table we
have id and name and both are unique the dependency of name on id is
trivial.

3. Regarding RDBMS support - what you say is correct. The statement is
there at http://en.wikipedia.org/wiki/Database_normalization, and it
is about temporal dimension, not about the sixth normal form.

Now another question after getting more idea about the sixth normal
form - If we consider the above example of a table teacherid, skillid
and courseid, how will we split it to fit in sixth normal form? If
this is not a suitable example for applying the sixth normal form,
please give another example which will make the concept clearer.

Thanks again for your help

Sameeksha

Jan Hidders

unread,

Jul 31, 2007, 4:41:21 AM7/31/07

to

Your terminology is not entirely correct, although your intuition is
in the right direction. In the case you describe there is one join
dependency namely {{TeacherId, SkillId}, {SkillId, CourseId},
{TeacherId, CoursId}}. Note that a join dependency is described by a
set of set of attrbutes, {S1, .., Sn} and holds for a relation R if R
always equal R[S1] JOIN ... JOIN R[Sn] where JOIN is the natural join
nd R[Si] denotes the projection of R on Si.

> 2. My concept of Trivial dependency is this - in case in a table we
> have id and name and both are unique the dependency of name on id is
> trivial.

Hmm, not really right. A JD (join dependency) {S1, ..., Sn} is trivial
if one of the Si contains all the attributes of R.

> Now another question after getting more idea about the sixth normal
> form - If we consider the above example of a table teacherid, skillid
> and courseid, how will we split it to fit in sixth normal form?

It already is in 6NF since the only JDs left are all trivial.

> If
> this is not a suitable example for applying the sixth normal form,
> please give another example which will make the concept clearer.

Actually you need a simpeler example. Because I lack time I'l make it
abstract. Asume that the following JD holds: {{a,b}, {b,c}} and that
{b} is a candidate key. In that case the JD follows from the CK, so
you are in 5NF, but this is a nontrivial JD, so according to 6NF you
should still split.

-- Jan Hidders

Bob Badour

unread,

Jul 31, 2007, 8:24:00 AM7/31/07

to

To offer a simple illustration, suppose one of the relations above were:
{TeacherID, TeacherName, CourseID} where the only candidate key in the
relation is TeacherID.

It would be in 5th normal form but not 6th normal form.

Sameeksha

unread,

Aug 1, 2007, 5:50:43 AM8/1/07

to

> It would be in 5th normal form but not 6th normal form.- Hide quoted text -
>
> - Show quoted text -

Thanks for the explanations.
The concepts of sixth normal form (along with join dependency and
trivial join dependency) are clearer now.

However, a next question (first of all sorry for taking up your time):
Will a table {a,b,c} with join dependencies {(a,b), (b,c)} be in
fourth / fifth normal form?

The fourth normal form states there should not be more than 1
independent multivalued dependencies; i.e. the table should not
contain more than 1 independent many-to-many relationships. e.g.
{teachername, teacherid, courseid}
The fifth normal form states that there should not be more than 1
semantically related multi-valued dependencies. e.g. {skillid,
teacherid, courseid}

So in case (a,b) and (b,c) are independent from each other they will
be split out into tables {a,b} and {b,c} when one is evaluating
whether the table is in 4NF. In case these two are semantically
related they will still be split when one is evaluating whether the
table is in 5NF.

Jan Hidders

unread,

Aug 1, 2007, 7:36:29 AM8/1/07

to

On 1 aug, 11:50, Sameeksha <sameeksha.ch...@gmail.com> wrote:
>
>
> However, a next question (first of all sorry for taking up your time):
> Will a table {a,b,c} with join dependencies {(a,b), (b,c)} be in
> fourth / fifth normal form?

That depends on the candidate keys. If {b} is a candidate key then it
is in 5NF if this is the only non-trivial JD that holds. If {b} is not
a candidate key then it is not in 5NF.

> The fourth normal form states there should not be more than 1
> independent multivalued dependencies; i.e. the table should not
> contain more than 1 independent many-to-many relationships. e.g.
> {teachername, teacherid, courseid}

That's a rather complicated and confusing way of formulating it and
depending on how you define independency of multivalued dependencies
probably wrong. I suggest we stick to the more standard definition:

A relattion is said to be in 4NF if for every non-trival MVD A->>B it
holds that A is a superset of a candidate key.

> The fifth normal form states that there should not be more than 1
> semantically related multi-valued dependencies. e.g. {skillid,
> teacherid, courseid}

How do you define "semantically related"? Also here it is really much
simpeler if you just stick to the standard definition:

A relation is said to be in 5NF if all join dependencies are implied
by the candidate keys.

Any attempt to reformulate it to something easier or more intuitive in
my experience almost always ends up with something that is either
wrong or actually harder to understand.

The only somewhat mysterious part may be the "JD is implied by the
CKs" but this can be tested by the following simple procedure:

1. Let jd be the join dependency we want to test
2. While jd has two elements (being sets of attributes) Si and Sj such
that the intersection of Si and Sj contains a candidate key do:
2.1 replace Si and Sj with the union of Si and Sj
3. If jd contains the header of the relation (which is also a set of
attributes) then return "yes" else "false"

For some reason most textbooks don't give this algorithm.

> So in case (a,b) and (b,c) are independent from each other they will
> be split out into tables {a,b} and {b,c} when one is evaluating
> whether the table is in 4NF. In case these two are semantically
> related they will still be split when one is evaluating whether the
> table is in 5NF.

Unless you tell me how you define "independent" and "semantically
related" the above might be complete nonsense. Again, your intuition
doesn't seem to be far off, but we need to be exact here.

-- Jan Hidders

Brian Selzer

unread,

Aug 1, 2007, 9:51:16 AM8/1/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message
news:1185813030.4...@k79g2000hse.googlegroups.com...

I think it is important to emphasize the fact that vertically splitting a
5NF relation into a set of 6NF relations has consequences, specifically the
need to enforce mutual foreign keys or a circular inclusion dependency,
depending upon the number of resulting 6NF relations. This is important
because support in commercial RDBMSs for enforcing such constraints is
severely limited, if not nonexistent, since it requires an implementation of
multiple assignment. Therefore splitting should only be done when
absolutely necessary, for example, to support a temporal dimension.

It should be easy to see that unless the referential constraints are
present, the 6NF relations are not equivalent to a 5NF relation. It would
be possible for a student to have an address without a name, for example,
something that is not possible in the 5NF relation.

It's somewhat unrelated, but important, nonetheless, to point out that if a
relation has a temporal dimension, then either every key that it references
in a foreign key constraint must be immune to updates, or the referenced
relation must also have the same temporal dimension. Otherwise, history
would have to be rewritten every time an update targets that referenced key.

> -- Jan Hidders
>

Neo

unread,

Aug 1, 2007, 1:39:49 PM8/1/07

to

> example - a table contains TeacherId, SkillId and CourseId as fields.
> These are related by the rule that teacher with certain skills can
> teach certain courses, however a teacher may possess skills required
> for a course, but he may not be teaching that course. Here there are 3
> join dependencies - (TeacherId, SkillId), (SkillId, CourseId) and
> (TeacherId, CourseId) which should be separate tables as per the 5th
> normal form.

Below script models above using dbd.

(; Create skills)
(new 'child_psychology 'skill)
(new 'english 'skill)
(new 'algebra 'skill)
(new 'logic 'skill)
(new 'judo 'skill)
(new 'sing 'skill)

(; Create courses)
(new 'history_101 'course)
(new 'physics_101 'course)
(new 'chemistry_101 'course)

(; Set course skill requirements)
(set history_101 skill english)

(set physics_101 skill english)
(set physics_101 skill logic)

(set chemistry_101 skill english)
(set chemistry_101 skill algebra)

(; Create teachers)
(new 'john 'teacher)
(new 'mary 'teacher)
(new 'sue 'teacher)

(; Create verb teach)
(new 'teach 'verb)

(; Set john's skills)
(set john skill child_psychology)
(set john skill english)
(set john skill algebra)
(set john skill logic)
(set john skill judo)

(; Set courses that john teaches)
(set john teach history_101)
(set john teach physics_101)

(; Set mary's skills)
(set mary skill child_psychology)
(set mary skill english)
(set mary skill algebra)
(set mary skill sing)

(; Set courses that mary teaches)
(set mary teach history_101)
(set mary teach chemistry_101)

(; Set sue's skills)
(set sue skill child_psychology)
(set sue skill sing)

(; Get all teachers who have skills for chemistry_101)
(; Gets john and mary)
(and (get teacher instance *)
(getAll * skill (get chemistry_101 skill *)))

(; Get courses taught by teachers that can sing)
(; Gets history101 and chemistry101)
(and (get course instance *)
(get (get * skill sing) teach * ))

paul c

unread,

Aug 1, 2007, 4:19:41 PM8/1/07

to

Brian Selzer wrote:
...
> It's somewhat unrelated, ...

I was with you up 'til the first comma in the last paragraph. It sure
doesn't have to do with redundancy. First there were rigid keys, then
contingent keys, now we have immune keys. Newcomers please note - don't
believe this mumbo-jumbo, it is all undefined mysticism, as is the term
"key update".

Also, I would like to know if anybody has ever invented a relational
algebra or calculus that supports "update". AFAIK, update is either a
system colloquial, environmental or a language, say SQL, term. Yeah, I
know people say "update" a tuple et cetera and I think that's okay if it
is agreed and understood just what exactly the system or dbms they have
in mind is going to do about the "update". In the recent threads, I
don't and I'm pretty sure we don't either.

p

Brian Selzer

unread,

Aug 1, 2007, 9:50:19 PM8/1/07

to

"paul c" <toledob...@oohay.ac> wrote in message
news:xV5si.22613$rX4.14717@pd7urf2no...

> Brian Selzer wrote:
> ...
>> It's somewhat unrelated, ...
>
> I was with you up 'til the first comma in the last paragraph. It sure
> doesn't have to do with redundancy. First there were rigid keys, then
> contingent keys, now we have immune keys. Newcomers please note - don't
> believe this mumbo-jumbo, it is all undefined mysticism, as is the term
> "key update".
>

Are you saying that Codd's use of "key update" in his definitive1970 paper
was also mumbo-jumbo?

Isn't it strange that people who can't come up with a solid argument resort
to misrepresentation, nitpicking over terminology, and outright derision.
Often they end up with their foot in their mouth.

> Also, I would like to know if anybody has ever invented a relational
> algebra or calculus that supports "update". AFAIK, update is either a
> system colloquial, environmental or a language, say SQL, term. Yeah, I
> know people say "update" a tuple et cetera and I think that's okay if it
> is agreed and understood just what exactly the system or dbms they have in
> mind is going to do about the "update". In the recent threads, I don't
> and I'm pretty sure we don't either.
>

It should be obvious that relational algebra and calculus are meant for
queries, not updates. Unlike the operators of relational algebra, neither
insert, update, nor delete returns a result. Nor does relational
assignment. More importantly, the algebra and the calculus involve a single
database value (set of relation values), a snapshot of reality if you
prefer. An update operator (as D&D call them) involves two snapshots of
reality, and truth in one does not imply truth in the other.

Before you make a complete fool of yourself, I suggest you read Codd's 1990
book. You should be able to find it online now. I did. I can't remember
where, though.

> p

paul c

unread,

Aug 1, 2007, 10:59:05 PM8/1/07

to

Brian Selzer wrote:
> "paul c" <toledob...@oohay.ac> wrote in message
> news:xV5si.22613$rX4.14717@pd7urf2no...
>
>>Brian Selzer wrote:
>>...
>>
>>>It's somewhat unrelated, ...
>>
>>I was with you up 'til the first comma in the last paragraph. It sure
>>doesn't have to do with redundancy. First there were rigid keys, then
>>contingent keys, now we have immune keys. Newcomers please note - don't
>>believe this mumbo-jumbo, it is all undefined mysticism, as is the term
>>"key update".
>>
>
>
> Are you saying that Codd's use of "key update" in his definitive1970 paper
> was also mumbo-jumbo?

> ...

Yes.

> Isn't it strange that people who can't come up with a solid argument resort
> to misrepresentation, nitpicking over terminology, and outright derision.
> Often they end up with their foot in their mouth.

> ...

No, it isn't strange to me, I see it all the time.

>
>>Also, I would like to know if anybody has ever invented a relational
>>algebra or calculus that supports "update". AFAIK, update is either a
>>system colloquial, environmental or a language, say SQL, term. Yeah, I
>>know people say "update" a tuple et cetera and I think that's okay if it
>>is agreed and understood just what exactly the system or dbms they have in
>>mind is going to do about the "update". In the recent threads, I don't
>>and I'm pretty sure we don't either.
>>
>
>
> It should be obvious that relational algebra and calculus are meant for
> queries, not updates. Unlike the operators of relational algebra, neither
> insert, update, nor delete returns a result. Nor does relational
> assignment. More importantly, the algebra and the calculus involve a single
> database value (set of relation values), a snapshot of reality if you
> prefer. An update operator (as D&D call them) involves two snapshots of
> reality, and truth in one does not imply truth in the other.
>
> Before you make a complete fool of yourself, I suggest you read Codd's 1990
> book. You should be able to find it online now. I did. I can't remember
> where, though.

> ...

Let me know if you remember. A salesman stole mine.

p

Bruce C. Baker

unread,

Aug 1, 2007, 11:01:33 PM8/1/07

to

"paul c" <toledob...@oohay.ac> wrote in message

news:ZLbsi.23504$fJ5.6912@pd7urf1no...

You can find the .pdf at the ACM Digital Library;

http://portal.acm.org/citation.cfm?id=77708

But it'd probably be cheaper in the long run to buy a copy from Amazon

http://www.amazon.com/gp/offer-listing/0201141922/ref=dp_olp_2/104-9019283-7102314

or Abebooks

http://www.abebooks.com/servlet/SearchResults?tn=relational&sts=t&an=codd&y=0&x=0

paul c

unread,

Aug 1, 2007, 11:17:13 PM8/1/07

to

Bruce C. Baker wrote:
...
>>Let me know if you remember. A salesman stole mine.
>>
>>p
>
>
> You can find the .pdf at the ACM Digital Library;
>
> http://portal.acm.org/citation.cfm?id=77708

...

Bruce, thanks so much. There've been many times when I wanted to look
at it again. It's downloading as I write, slowly on my 'lite' connection.

p

Bruce C. Baker

unread,

Aug 2, 2007, 1:27:10 AM8/2/07

to

"paul c" <toledob...@oohay.ac> wrote in message

news:Z0csi.23534$fJ5.8923@pd7urf1no...

"I endeavor to give satisfaction, sir!" ;-)

David Cressey

unread,

Aug 2, 2007, 4:21:03 AM8/2/07

to

"Brian Selzer" <br...@selzer-software.com> wrote in message
news:vLasi.384$3x...@newssvr25.news.prodigy.net...

>
> "paul c" <toledob...@oohay.ac> wrote in message
> news:xV5si.22613$rX4.14717@pd7urf2no...
> > Brian Selzer wrote:
> > ...

> Isn't it strange that people who can't come up with a solid argument

resort
> to misrepresentation, nitpicking over terminology, and outright derision.
> Often they end up with their foot in their mouth.

[snip]

> Before you make a complete fool of yourself, I suggest you read Codd's
1990
> book. You should be able to find it online now. I did. I can't remember
> where, though.

It seems to me that "before you make a complete fool of yourself" is
outright derision.

Jan Hidders

unread,

Aug 2, 2007, 6:18:21 AM8/2/07

to

On 1 aug, 15:51, "Brian Selzer" <br...@selzer-software.com> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

Agreed. I would add that this is not specific for going from 5NF to
6NF but anywhere you decompose to go from a lower to a higher normal
form.

> It's somewhat unrelated, but important, nonetheless, to point out that if a
> relation has a temporal dimension, then either every key that it references
> in a foreign key constraint must be immune to updates, or the referenced
> relation must also have the same temporal dimension. Otherwise, history
> would have to be rewritten every time an update targets that referenced key.

Why would the DBMS not allow history to be rewritten? Of course it
should allow you to specify that in certain cases it cannot, but in
general I don't think that is a good idea, if only because you want to
be able to correct mistakes afterwards.

-- Jan Hidders

Brian Selzer

unread,

Aug 2, 2007, 7:07:20 AM8/2/07

to

"David Cressey" <cres...@verizon.net> wrote in message
news:Ptgsi.14369$Q85.188@trndny02...

Just friendly advice.

Paul Mansour

unread,

Aug 2, 2007, 4:42:43 PM8/2/07

to

On Aug 2, 6:18 am, Jan Hidders <hidd...@gmail.com> wrote:

>Why would the DBMS not allow history to be rewritten? Of course it
>should allow you to specify that in certain cases it cannot, but in
>general I don't think that is a good idea, if only because you want to
>be able to correct mistakes afterwards.

"Allowing history to be rewritten" and "correcting mistakes
afterwards" are, I think, two very different concepts, that are
commonly confused because most DBMSs use destructive updates.

You can have a DBMS that never lets you "rewrite history", that is
change the set of events that lead up the present state of a DB, but
that certainly lets you make corrections to past mistakes.

One good reason to not allow destructive updates is that they are a
large component of the complexity of the implementation of a RDMS.
Another if fraud.

Brian Selzer

unread,

Aug 3, 2007, 11:55:53 AM8/3/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1186049901.2...@q75g2000hsh.googlegroups.com...

Not always. If two sets of attributes are independent, as is the case when
moving from 1NF to 2NF, then there is no need for a referential constraint;
if two projections are independent, as is the case when moving from BCNF to
4NF, then there is no need for a referential constraint. But I see what
you're saying. Decomposing a 2NF schema into a BCNF schema is not always
dependency preserving. There is also the rare cyclical referential
constraint that may be required when moving from 4NF to 5NF.

It is my understanding that the normalization process is used to choose a
database schema whose instances can contain /at least the same/ information
content as the former but avoids the update anomalies that exist in the less
normalized schema. The presence of a functional dependency in a schema
indicates that each tuple represents an atomic formula that contains an
implication. The presence of "mutual" functional dependencies, such as is
the case when there is more than one key, indicates that each tuple
represents an atomic formula that contains a biconditional. It is easy to
see that the absence of the second functional dependency indicates that a
projection that does not include the dependent is independent; whereas the
existence of a tuple in the projection that includes the dependent implies
the existence of a tuple in the other. Thus decomposing a 2NF relation
schema that has a transitive FD into two schemata requires only one
referential constraint in order for an instance to have /at least the same/
information content. In order to ensure that instances of the new schema
can contain /exactly the same/ information content /and no more/, a pair of
referential constraints is required ("mutual" foreign keys, if you will).

The difference between moving from 1NF to 5NF and moving from 5NF to 6NF is
that breaking up a set of two or more dependent attributes in a 5NF relation
/always/ requires a cyclical constraint in order for an instance of the
database schema to be able to contain /at least the same/ information
content. If a relation is already in 5NF, then there is no way to separate
the set of dependent attributes without losing information. Suppose that A,
B and C are dependent attributes in a 5NF relation schema. Then FD ABC -->
ABC is implied by the candidate key, as well as the FDs ABC --> A, ABC -->
B, and ABC --> C. Splitting the relation vertically so that each relation
schema is in 6NF causes those FDs to be lost. The only way for an instance
of the database schema to be able to contain /at least the same/ information
content as an instance of the 5NF schema is to introduce a cyclical
referential constraint.

>> It's somewhat unrelated, but important, nonetheless, to point out that if
>> a
>> relation has a temporal dimension, then either every key that it
>> references
>> in a foreign key constraint must be immune to updates, or the referenced
>> relation must also have the same temporal dimension. Otherwise, history
>> would have to be rewritten every time an update targets that referenced
>> key.
>
> Why would the DBMS not allow history to be rewritten? Of course it
> should allow you to specify that in certain cases it cannot, but in
> general I don't think that is a good idea, if only because you want to
> be able to correct mistakes afterwards.
>

I never meant that it shouldn't allow it; I meant that it shouldn't
necessitate it.

> -- Jan Hidders
>

Brian Selzer

unread,

Aug 3, 2007, 3:40:37 PM8/3/07

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:deIsi.54745$5j1...@newssvr21.news.prodigy.net...

A little clarification. Suppose that K is the set of attributes that is the
key of the above 5NF relation schema. Then every valid relation value must
satisfy the FD K --> ABC. As long as A, B and C are in the same relation,
the closure of the set of FDs is

K --> A, K --> B, K --> C,
K --> AB, K --> AC, K --> BC, and
K --> ABC

Once A, B and C are split into 6NF relations, however, the closures of the
set of FDs is

for {K, A} K --> A,
for {K, B} K --> B, and
for {K, C} K --> C.

Clearly, the following FDs are not part of the database schema:

K --> AB, K --> AC, K --> BC, and K --> ABC.

If a database schema is already in BCNF, then the loss of these FDs clearly
signals information loss, since by definition, every FD must be a logical
consequence of the candidate keys. That's why it's /always/ necessary to
introduce a cyclical set of referential constraints when decomposing a 5NF
relation schema into a set of 6NF schemata.

Jan Hidders

unread,

Aug 6, 2007, 5:16:53 AM8/6/07

to

I have no idea what you mean here. In all those cases you need a
cyclic pair of nclusion dependencies if you want the new schema to be
equivalent with the old. And if you only want the new schema to
contain at least as much information as the old then you strictly
speaking don't need any inclusion dependency at all.

> The difference between moving from 1NF to 5NF and moving from 5NF to 6NF is
> that breaking up a set of two or more dependent attributes in a 5NF relation
> /always/ requires a cyclical constraint in order for an instance of the
> database schema to be able to contain /at least the same/ information
> content.

Also here you don't need any inclusion dependency to get a schema that
can contain at least as much information as the old.

> If a relation is already in 5NF, then there is no way to separate
> the set of dependent attributes without losing information. Suppose that A,
> B and C are dependent attributes in a 5NF relation schema. Then FD ABC -->
> ABC is implied by the candidate key, as well as the FDs ABC --> A, ABC -->
> B, and ABC --> C. Splitting the relation vertically so that each relation
> schema is in 6NF causes those FDs to be lost.

The dependencies that are left allow you derive ABC --> ABC for the
result of joining them back together, which is what the old set of FDs
also told you. So no information is lost.

-- Jan Hidders

Brian Selzer

unread,

Aug 7, 2007, 12:38:35 AM8/7/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1186391813....@w3g2000hsg.googlegroups.com...

[big snip]

>>
>> > Agreed. I would add that this is not specific for going from 5NF to
>> > 6NF but anywhere you decompose to go from a lower to a higher normal
>> > form.
>>
>> Not always. If two sets of attributes are independent, as is the case
>> when
>> moving from 1NF to 2NF, then there is no need for a referential
>> constraint;
>> if two projections are independent, as is the case when moving from BCNF
>> to
>> 4NF, then there is no need for a referential constraint.
>
> I have no idea what you mean here. In all those cases you need a
> cyclic pair of nclusion dependencies if you want the new schema to be
> equivalent with the old. And if you only want the new schema to
> contain at least as much information as the old then you strictly
> speaking don't need any inclusion dependency at all.
>

If two sets of attributes are independent, then for each combination of
values for one of the sets of attributes, there is a copy of the projection
over all of the other attributes. What I mean is, the cardinality of the
1NF relation is equal to the product of the cardinalities of the projections
over each independent set of attributes. A cyclical constraint would only
prevent (pathologically, I should add) the case where the 1NF relation is
empty and where either but not both of the 2NF relations is empty: the whole
point of moving to 2NF is to avoid the update anomalies caused by lumping
two independent sets of attributes in the same relation schema. The same
can be said for independent projections of relations, as is the case when
moving from BCNF to 4NF.

I guess I need to clarify what I mean by /at least the same/ information
content. Consider the following example relation schema that is in 5NF:

(1) {Whse, Item, TranId, RcvdDate, QtyRcvd, QtySold, Cost}

such that

{Whse, Item, TranId} --> {RcvdDate, QtyRcvd, QtySold, Cost}

An instance of this schema enumerates a set of cost tiers for items in a
warehouse. Cost is the total cost for the quantity received. The unit cost
is therefore Cost / QtyRcvd, but is not calculated ahead of time to minimize
rounding errors. There can be several cost tiers for an item in a
warehouse, and the cost assigned to a sale is computed by pulling from the
oldest tier first, and continuing with successive tiers until the quantity
needed for the sale has been met. Now decomposing the above schema into 6NF
gives:

(2) {Whse, Item, TranId, RcvdDate}
(3) {Whse, Item, TranId, QtyRcvd}
(4) {Whse, Item, TranId, QtySold}
(5) {Whse, Item, TranId, Cost}

This schema allows a Cost without a QtyRcvd, thus preventing the calculation
of unit cost. It permits the existence of a QtySold without a QtyRcvd, thus
preventing the calculation of the remaining quantity. It also permits
QtyRcvd without RcvdDate, thus preventing the selection of the oldest tier.
Clearly the existence of QtySold and Cost depend upon the existence of
QtyRcvd, and the existence of QtyRcvd depends upon the existence of
RcvdDate. Consequently, the set of 6NF relation schemata can't contain /at
least the same/ information content as the 5NF schema, because some valid
instances of the 6NF schemata do not make sense.

Suppose that tuples exist for a particular {Whse, Item, TranId} in instances
of the 6NF schemata for (3) and (4), but not for (2) and (5). A query
seeking the total quantity on hand for the Item (SUM(QtyRcvd - QtySold))
could indicate that quantity required for a sale is available, but since the
tuples for (2) and (5) do not exist, the cost to be assigned to the sale
transaction cannot be computed. It certainly doesn't sound to me like a
good thing to carry inventory that you can't sell!

It should be obvious that if sets of dependent attributes are truly
independent of each other (which is the only case that an inclusion
dependency is not indicated), then they should have already been separated
into their own relation schemata as part of the normalization process. It
means that a multivalued dependency exists that is not implied by the
candidate keys. It means that the relation schema is not even in 4NF.
Therefore, when moving from 5NF to 6NF, a cyclical constraint is /always/
necessary to avoid information loss.

[snip]

Jan Hidders

unread,

Aug 7, 2007, 4:20:27 AM8/7/07

to

On 7 aug, 06:38, "Brian Selzer" <br...@selzer-software.com> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

>
> news:1186391813....@w3g2000hsg.googlegroups.com...
>
> [big snip]
>
>
>
>
>
> >> > Agreed. I would add that this is not specific for going from 5NF to
> >> > 6NF but anywhere you decompose to go from a lower to a higher normal
> >> > form.
>
> >> Not always. If two sets of attributes are independent, as is the case
> >> when
> >> moving from 1NF to 2NF, then there is no need for a referential
> >> constraint;
> >> if two projections are independent, as is the case when moving from BCNF
> >> to
> >> 4NF, then there is no need for a referential constraint.
>
> > I have no idea what you mean here. In all those cases you need a
> > cyclic pair of nclusion dependencies if you want the new schema to be
> > equivalent with the old. And if you only want the new schema to
> > contain at least as much information as the old then you strictly
> > speaking don't need any inclusion dependency at all.
>
> If two sets of attributes are independent, then for each combination of
> values for one of the sets of attributes, there is a copy of the projection
> over all of the other attributes.

Huh? I assume you meant that for each two tuples t1 and t2 there is a
tuple t3 such that t3[A] = t1[A] and t3[B] = t2[B] where t[A] denotes
the projection of t on A. In other words, there is an MVD X->>A|B for
some X.

> What I mean is, the cardinality of the
> 1NF relation is equal to the product of the cardinalities of the projections
> over each independent set of attributes.

I doubt that is what you mean. If for R(a,b,c) there is an MVD a ->> b
| c then it is certainly not true that | R | = | R[b] | x | R[c] |.
Of course this is true if R a restricted to the tuples with particular
value in the 'a' column, so perhaps that is what you meant?

> A cyclical constraint would only
> prevent (pathologically, I should add) the case where the 1NF relation is

> empty and where either but not both of the 2NF relations is empty: [...]

This is simply not true. Also if the 1NF relation is nonempty the 2NF
relations might violate the two inclusion dependencies if the are
allowed to contain more than just the projections of the 1NF relation.

> The same
> can be said for independent projections of relations, as is the case when
> moving from BCNF to 4NF.

No, also there it isn't true.

You could, once the missing information has been added. Yes, there
might be circumstances under which you would want to disallow such
incomplete information, but it is also very conceivable that you would
want to explicitly allow it.

Btw., did you really have to type all this to explain the rather
trivial observation that if you move to a more liberal schema then
certain assumptions that are necessary for certain queries to make
sense may no longer hold? And again, this is not typical for the step
from 5NF to 6NF, but it can happen at all normalization steps where
you split and don't include inclusion dependencies in both directions.

Finally, your redefinition of the phrase "a schema with at least the
same information" is a bit confusing, to say the least, because the
problem you are describing is caused by the schema being too liberal
and thereby allowing information that does not make sense. It would be
more appropriate to say that the schema allows too much information.

> It should be obvious that if sets of dependent attributes are truly
> independent of each other (which is the only case that an inclusion
> dependency is not indicated), then they should have already been separated
> into their own relation schemata as part of the normalization process.

Assuming my guess of your definition of "independence" is correct this
is not true.

> It
> means that a multivalued dependency exists that is not implied by the
> candidate keys.

Nope. Also that is not true.

> It means that the relation schema is not even in 4NF.

Also false.

> Therefore, when moving from 5NF to 6NF, a cyclical constraint is /always/
> necessary to avoid information loss.

Which is also not true, unless of course you redefine "avoid
information loss" as "a schema that is not equivalent", which is how
you seem to interpret it, and in which case, as I already said, this
is true for any normalization split.

-- Jan Hidders

vldm10

unread,

Aug 7, 2007, 2:36:40 PM8/7/07

to

On Aug 1, 7:36 am, Jan Hidders <hidd...@gmail.com> wrote:

> Any attempt to reformulate it to something easier or more intuitive in
> my experience almost always ends up with something that is either
> wrong or actually harder to understand.
>
> The only somewhat mysterious part may be the "JD is implied by the
> CKs" but this can be tested by the following simple procedure:
>
> 1. Let jd be the join dependency we want to test
> 2. While jd has two elements (being sets of attributes) Si and Sj such
> that the intersection of Si and Sj contains a candidate key do:
> 2.1 replace Si and Sj with the union of Si and Sj
> 3. If jd contains the header of the relation (which is also a set of
> attributes) then return "yes" else "false"

> -- Jan Hidders

You gave here the procedure which is more on intuitive level than
based on some formal system.
The other thing here which is maybe with a questionable meaning is
"The only somewhat mysterious part...". This seems like there are
some other parts in definition of 5NF.
The sentence "JD is implied by CKs" is main and only part of 5NF
definition (if we don't analyze trivial cases).
Now we can set the question - why mentioned procedure for misterios
part is combination of formal and intuitive. Although, mentioned
procedure is useful, I beleive it is good to be aware of the
following:

a) Cks are based on FDs and for FDs there is a formal system.
b) For JDs there is no formal system in the sense of complete
inference rules.
c) In 5NF, JDs are based on CKs

Of course, in my opinion, 5NF is important result.

Vladimir Odrljin

Jan Hidders

unread,

Aug 7, 2007, 3:30:00 PM8/7/07

to

On 7 aug, 20:36, vldm10 <vld...@yahoo.com> wrote:
> On Aug 1, 7:36 am, Jan Hidders <hidd...@gmail.com> wrote:
>
> > Any attempt to reformulate it to something easier or more intuitive in
> > my experience almost always ends up with something that is either
> > wrong or actually harder to understand.
>
> > The only somewhat mysterious part may be the "JD is implied by the
> > CKs" but this can be tested by the following simple procedure:
>
> > 1. Let jd be the join dependency we want to test
> > 2. While jd has two elements (being sets of attributes) Si and Sj such
> > that the intersection of Si and Sj contains a candidate key do:
> > 2.1 replace Si and Sj with the union of Si and Sj
> > 3. If jd contains the header of the relation (which is also a set of
> > attributes) then return "yes" else "false"
> > -- Jan Hidders
>
> You gave here the procedure which is more on intuitive level than
> based on some formal system.

Although somewhat informally described by me, it is a proper algorithm
and as such *is* a formal system that has been proven both sound and
complete.

> The other thing here which is maybe with a questionable meaning is
> "The only somewhat mysterious part...". This seems like there are
> some other parts in definition of 5NF.

You may want to check your irony-meter. It seems broken. :-)

> Now we can set the question - why mentioned procedure for misterios
> part is combination of formal and intuitive. Although, mentioned
> procedure is useful, I beleive it is good to be aware of the
> following:
>
> a) Cks are based on FDs and for FDs there is a formal system.
> b) For JDs there is no formal system in the sense of complete
> inference rules.

JDs have been succesfully axiomatized. In fact, much larger classes
including MVDs, FDs and much more have been axiomatized. See the Alice
book for a wealth of information on that.

-- Jan Hidders

vldm10

unread,

Aug 7, 2007, 4:34:48 PM8/7/07

to

> -- Jan Hidders- Hide quoted text -

>
> - Show quoted text -

There is no irony in my post. I am sorry, that you understood this as
an irony. You used the term "mysterious part" for esential of 5NF. If
you use this term like "loosely" kind of everday user-group
communication, than it is OK for me. I can understand it. If you mean
something serious, than I need additional explanation.
I also clearly wrote, " Although, mentioned procedure is useful,...".
Regarding the formal level of the procedure, I gave my opinion and
signed it as well as you gave and sign your.

Vladimir Odrljin

Jan Hidders

unread,

Aug 7, 2007, 5:00:46 PM8/7/07

to

Don't be sorry. I didn't. I was only pointing out that there was irony
in *my* posting.

-- Jan Hidders

Brian Selzer

unread,

Aug 8, 2007, 10:15:43 PM8/8/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1186474827.2...@k79g2000hse.googlegroups.com...

Not exactly, unless perhaps "some X" were {}.

>> What I mean is, the cardinality of the
>> 1NF relation is equal to the product of the cardinalities of the
>> projections
>> over each independent set of attributes.
>
> I doubt that is what you mean. If for R(a,b,c) there is an MVD a ->> b
> | c then it is certainly not true that | R | = | R[b] | x | R[c] |.
> Of course this is true if R a restricted to the tuples with particular
> value in the 'a' column, so perhaps that is what you meant?
>

I was talking about 2NF, not 4NF.

>> A cyclical constraint would only
>> prevent (pathologically, I should add) the case where the 1NF relation is
>> empty and where either but not both of the 2NF relations is empty: [...]
>
> This is simply not true. Also if the 1NF relation is nonempty the 2NF
> relations might violate the two inclusion dependencies if the are
> allowed to contain more than just the projections of the 1NF relation.
>

What would the inclusion dependencies look like for 2NF projections of a 1NF
relation {A, B} where A and B are disjoint and independent sets of
attributes?

I could be wrong, but isn't the join of the 2NF projections therefore a
cross product, which just happens to be the 1NF relation?

>> The same
>> can be said for independent projections of relations, as is the case when
>> moving from BCNF to 4NF.
>
> No, also there it isn't true.
>

I agree, it's not exactly the same thing. There could be some overlap.

I don't think that it's about a schema being too liberal. I think it's
about changing a deterministic model into one that isn't. If a FD is lost
as a result of the decomposition of a relation, then that deterministic
relationship between the two sets of attributes is also lost. To recover
that deterministic relationship requires the addition of an inclusion
dependency.

>> It should be obvious that if sets of dependent attributes are truly
>> independent of each other (which is the only case that an inclusion
>> dependency is not indicated), then they should have already been
>> separated
>> into their own relation schemata as part of the normalization process.
>
> Assuming my guess of your definition of "independence" is correct this
> is not true.
>

My use of the term "independent" is not the same as that defined by
Rissanen. I consider a projection over a set of attributes A to be
independent if and only if every dependent set of attributes determined by
any subset of A is also a subset of A. In other words, a projection over a
set of attributes A is independent if and only if for each functional
dependency X --> Y in the closure of the set of functional dependencies for
the relation schema containing A, if X is a subset of A then Y must also be
a subset of A. If a subset of A is the determinant for a set of attributes
that is not a subset of A, then the existence of a tuple in the projection
over A is dependent upon the existence of a tuple in another projection.

Consider a simple relation schema, {A, B, C} such that A --> B and B --> C.
Rissanen would consider both of the projections {A, B} and {B, C}
independent. I don't. The closure of the set of functional dependencies
includes A --> C, which can only be preserved by the inclusion dependency,
{A,B}[B] IN {B,C}[B]. In this way there can't be a value for A unless there
is also a specific value for C. As a result, I would consider {B, C} to be
independent, but not {A, B}.

vldm10

unread,

Aug 9, 2007, 2:53:16 PM8/9/07

to

Although your comments in this thread are knowledgeable and useful I
have a few remarks here.

On Aug 7, 3:30 pm, Jan Hidders <hidd...@gmail.com> wrote:
> On 7 aug, 20:36, vldm10 <vld...@yahoo.com> wrote:
>
>
>
>
>
> > On Aug 1, 7:36 am, Jan Hidders <hidd...@gmail.com> wrote:
>
> > > Any attempt to reformulate it to something easier or more intuitive in
> > > my experience almost always ends up with something that is either
> > > wrong or actually harder to understand.
>
> > > The only somewhat mysterious part may be the "JD is implied by the
> > > CKs" but this can be tested by the following simple procedure:
>
> > > 1. Let jd be the join dependency we want to test
> > > 2. While jd has two elements (being sets of attributes) Si and Sj such
> > > that the intersection of Si and Sj contains a candidate key do:
> > > 2.1 replace Si and Sj with the union of Si and Sj
> > > 3. If jd contains the header of the relation (which is also a set of
> > > attributes) then return "yes" else "false"
> > > -- Jan Hidders
>
> > You gave here the procedure which is more on intuitive level than
> > based on some formal system.
>
> Although somewhat informally described by me, it is a proper algorithm
> and as such *is* a formal system that has been proven both sound and
> complete.

I have a feeling that you didn't pay enough attention to make a
distinction between a constructioun (construct) and a formal axiomatic
system (FAS). I mean here on construction for 5NF and 6NF, that put
"things" in 5NF, 6NF.
For example your explanation in this thread that 6NF : "Informaly put
it says that every distinct fact gets its own relation...", I am
afraid this does not have sense and probably is naive, regarding the
constructs for 6NF.

> > The other thing here which is maybe with a questionable meaning is
> > "The only somewhat mysterious part...". This seems like there are
> > some other parts in definition of 5NF.
>
> You may want to check your irony-meter. It seems broken. :-)
>
> > Now we can set the question - why mentioned procedure for misterios
> > part is combination of formal and intuitive. Although, mentioned
> > procedure is useful, I beleive it is good to be aware of the
> > following:
>
> > a) Cks are based on FDs and for FDs there is a formal system.
> > b) For JDs there is no formal system in the sense of complete
> > inference rules.
>
> JDs have been succesfully axiomatized. In fact, much larger classes
> including MVDs, FDs and much more have been axiomatized. See the Alice
> book for a wealth of information on that.
>

> -- Jan Hidders- Hide quoted text -

>
> - Show quoted text -

Vladimir Odrljin

Jan Hidders

unread,

Aug 10, 2007, 7:08:26 AM8/10/07

to

Indeed, the algorithm is not a formal axiomatic system, but it is a
formal system in the broader sense of the word. And if you look
closely you will see that it is in fact based on a very simple
axiomatic system whose propositions are JDs and CKs and has one
inference rule that allows you to derive JDs. And that axiomatic
system has been proven to be sound and complete.

> For example your explanation in this thread that 6NF : "Informaly put
> it says that every distinct fact gets its own relation...", I am
> afraid this does not have sense and probably is naive, regarding the
> constructs for 6NF.

You are entitled to your opinion, of course.

-- Jan Hidders

Jan Hidders

unread,

Aug 10, 2007, 7:55:03 AM8/10/07

to

In case anyone's interested:

For a relation with header H (a set of attrbiutes) this formal system
starts with the following axioms:
- JD{H}
- any CKs that hold

Next to the trivial inference rule that says that all axioms are
derived there is the following inference rule :
(+ denotes set union, * denotes set intersection, <= denotes set
inclusion)

IF
1. we derive JD{C_1, ..., C_i, ..., C_m}
2. C_i = D_1 + D_2
3. we derive CK(E)
4. D_1 * D_2 <= E
THEN
we derive JD{C_1, ..., D_1, D_2, ..., C_m}

-- Jan Hidders

Jan Hidders

unread,

Aug 10, 2007, 9:08:53 AM8/10/07

to

On 9 aug, 04:15, "Brian Selzer" <br...@selzer-software.com> wrote:
>
>
> My use of the term "independent" is not the same as that defined by
> Rissanen. I consider a projection over a set of attributes A to be
> independent if and only if every dependent set of attributes determined by
> any subset of A is also a subset of A. In other words, a projection over a
> set of attributes A is independent if and only if for each functional
> dependency X --> Y in the closure of the set of functional dependencies for
> the relation schema containing A, if X is a subset of A then Y must also be
> a subset of A.

Very good, finally a proper definition. So you apprently meant it as a
unary predicate over sets of attributes and not as a binary
relationship between sets of attributes. Let me see, if we have a
relation R1(A,B,C) with an FD A->B then you call the following sets
independent: {}, {B}, {C}, {A,B}, {B,C}, {A,B,C}. Correct? And if we
have R2(A,B,C) with A->B and A->C then the following are independent:
{}, {B}, {C}, {B,C}, {A,B,C}.

> Consider a simple relation schema, {A, B, C} such that A --> B and B --> C.
> Rissanen would consider both of the projections {A, B} and {B, C}
> independent. I don't.

Indeed. You only call {B,C} independent.

> The closure of the set of functional dependencies
> includes A --> C, which can only be preserved by the inclusion dependency,
> {A,B}[B] IN {B,C}[B].

Not necessarily. That depends on your definition of FDs over
attributes in different relations. The usual definition in
normalization theory is that they hold for a schema if they hold for
the natural join of all relations in the schema. In that case the FD
is preserved also without the inclusion dependency.

-- Jan Hidders

Brian Selzer

unread,

Aug 10, 2007, 4:41:01 PM8/10/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1186751333.0...@j4g2000prf.googlegroups.com...

> On 9 aug, 04:15, "Brian Selzer" <br...@selzer-software.com> wrote:
>>
>>
>> My use of the term "independent" is not the same as that defined by
>> Rissanen. I consider a projection over a set of attributes A to be
>> independent if and only if every dependent set of attributes determined
>> by
>> any subset of A is also a subset of A. In other words, a projection over
>> a
>> set of attributes A is independent if and only if for each functional
>> dependency X --> Y in the closure of the set of functional dependencies
>> for
>> the relation schema containing A, if X is a subset of A then Y must also
>> be
>> a subset of A.
>
> Very good, finally a proper definition. So you apprently meant it as a
> unary predicate over sets of attributes and not as a binary
> relationship between sets of attributes. Let me see, if we have a
> relation R1(A,B,C) with an FD A->B then you call the following sets
> independent: {}, {B}, {C}, {A,B}, {B,C}, {A,B,C}. Correct?

Yes, provided A --> B covers the closure of the set of functional
dependencies for R1.

> And if we have R2(A,B,C) with A->B and A->C then the following are
> independent:
> {}, {B}, {C}, {B,C}, {A,B,C}.
>

Yes, provided A --> B and A --> C covers the closure of the set of
functional dependencies for R2.

>> Consider a simple relation schema, {A, B, C} such that A --> B and B -->
>> C.
>> Rissanen would consider both of the projections {A, B} and {B, C}
>> independent. I don't.
>
> Indeed. You only call {B,C} independent.
>
>> The closure of the set of functional dependencies
>> includes A --> C, which can only be preserved by the inclusion
>> dependency,
>> {A,B}[B] IN {B,C}[B].
>
> Not necessarily. That depends on your definition of FDs over
> attributes in different relations. The usual definition in
> normalization theory is that they hold for a schema if they hold for
> the natural join of all relations in the schema. In that case the FD
> is preserved also without the inclusion dependency.
>

I don't agree with the usual definition. It isn't strict enough, in my
opinion.

(1) A --> B /and/ B --> C; therefore A --> C.
(2) A --> B /or/ B --> C; therefore A -/-> C.

(1) is preserved by the IND {A, B}[B] IN {B, C}[B]; (2) is what is without
the IND.
While it is true that A --> C in {A, B} JOIN {B, C}, without the IND there
can still exist values for A that do not determine a value for C.

On the other hand, requiring a cyclical referential constraint for every
decomposition is too strict.

> -- Jan Hidders
>

Jan Hidders

unread,

Aug 11, 2007, 6:59:54 AM8/11/07

to

On Aug 10, 10:41 pm, "Brian Selzer" <br...@selzer-software.com> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

Which sometimes is and sometimes isn't a problem. Just as not having
the inclusion dependencies in both directions may sometimes be a
problem but usually isn't. It's really quite simple. If you want to be
really equivalent you need both inclusion dependencies. Omitting one
or both gives you a more liberal schema which might be a good thing
because it now let's you represent information that you couldn't
before, but it might also be a bad thing because, as your example
nicely demonstrated, you now allow the violatio of constraints that
might be necessary for certain information to make sense.

With all respect, but why you are so insistent on making this
complicated and why you are so fixated on finding the one right way of
normalizing (which of course there isn't) is really beyond my
comprehension.

-- Jan Hidders

Brian Selzer

unread,

Aug 11, 2007, 10:54:56 AM8/11/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1186829994.3...@d55g2000hsg.googlegroups.com...

I think it is important. I think that a database schema that has each
cyclical dependency contained within a relation schema is "better" than one
that doesn't. There are several reasons, some theoretical but mostly
practical, but there isn't space in this post to go into them. I think that
6NF is a mistake, despite its usefulness for temporal databases. If a
database schema is already in 5NF, then a more "liberal" schema must contain
nonsense. If a dependent attribute cannot always have a value, then the
schema cannot be in 5NF. Decomposing a 5NF schema into several 6NF schemata
without the necessary cyclical referential constraints is akin to permitting
nulls (or more precisely, I-marks) for every dependent attribute in the 5NF
schema. Having spent uncountable hours cleaning up databases so that they
can be integrated or converted, I have a particularly strong aversion to
anything that permits nonsense in the database.

> -- Jan Hidders
>

Jan Hidders

unread,

Aug 16, 2007, 2:57:28 AM8/16/07

to

That is not the main point of our disagreement. The question was
wether there is something special going on when going to 6NF.

> I think that
> 6NF is a mistake, despite its usefulness for temporal databases. If a
> database schema is already in 5NF, then a more "liberal" schema must contain
> nonsense. If a dependent attribute cannot always have a value, then the
> schema cannot be in 5NF. Decomposing a 5NF schema into several 6NF schemata
> without the necessary cyclical referential constraints is akin to permitting
> nulls (or more precisely, I-marks) for every dependent attribute in the 5NF
> schema.

Again. All this is also true for the other normal forms. In all cases
you need cyclic inclusion dependencies if you want to be exactly
equivalent. If you are not, then there is always the risc of allowing
data that does not make sense in the given business context. Wether
this risc exists or not is not something you can decide by a few quick
and simple rules.

-- Jan Hidders

Brian Selzer

unread,

Aug 17, 2007, 1:15:18 PM8/17/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187247448.8...@b79g2000hse.googlegroups.com...

If the goal is a database schema that can represent exactly the same
information content, then the cyclical interrelational constraint is
required; if the goal is a schema that can represent additional information
without contradicting the closure of the set of FDs and INDs for all
schemata that are equivalent to the less normalized schema, then the
cyclical interrelational constraint is not always required, except, of
course, when moving from 5NF to 6NF.

> -- Jan Hidders
>

Jan Hidders

unread,

Aug 17, 2007, 6:54:59 PM8/17/07

to

On 17 aug, 19:15, "Brian Selzer" <br...@selzer-software.com> wrote:
>
> [... big snip ...]

>
> If the goal is a database schema that can represent exactly the same
> information content, then the cyclical interrelational constraint is
> required; if the goal is a schema that can represent additional information
> without contradicting the closure of the set of FDs and INDs for all
> schemata that are equivalent to the less normalized schema, then the
> cyclical interrelational constraint is not always required, except, of
> course, when moving from 5NF to 6NF.

*sigh* You already said this, and I already explained that under the
usual definitions of those terms they are *never* required, including
when going from 5NF to 6NF. You replied that you are using other
definitons but apart from some informal examples you never gave a good
definition nor a good motivation why that should be the definition. I
think the onus is on you here to show why you want to depart from
rather well-established terminology.

-- Jan Hidders

Brian Selzer

unread,

Aug 18, 2007, 3:26:37 AM8/18/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187391299.3...@w3g2000hsg.googlegroups.com...

It all boils down to the domain closure assumption, which states that the
only individuals that exist are represented by values in the body of the
database, and the identity relation, =, which guarantees that no matter how
many times a value appears, there is only one individual represented by that
value. If you have a database schema consisting of a single relation schema
that satisfies the functional dependency A --> B, then due to the domain
closure assumption, the existence of an individual that is represented by a
value for A depends upon the existence of a specific individual that is
represented by a value for B. So if the values a1 and b1, for A and B
respectively, appear in the same tuple, then a denial of the existence of
the individual represented by b1 denies the existence of the individual
represented by a1, but a denial of the existence of the individual
represented by a1 does not necessarily deny the existence of the individual
represented by b1, since there could be another tuple that has the values a2
and b1.

Now suppose that the relation schema also satisfies the functional
dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
respectively, appear in the same tuple, then a denial of the existence of
the individual represented by c1 denies the existence the individual
represented by b1 and transitively the existence of the individual
represented by a1. When the relation schema is decomposed into a family of
relation schemata such that A and B appear in one relation schema and B and
C appear in another, then the denial of the existence of the individual
represented by c1 no longer denies the existence of the individuals
represented by b1 and a1. This is the problem. This is why I think that an
inclusion dependency is required.

This issue is not limited to database schemata with only one relation
schema, but if you limit the scope to the decomposition of a single relation
schema into a family of relation schemata, then it follows from the above
two paragraphs that if the relation schema satisfies the functional
dependency A --> B then the appearance of an individual that is represented
by a value for A depends upon the appearance of a specific individual that
is represented by a value for B. Thus it also follows that the denial of the
appearance in the family of relations of the individual represented by c1 no
longer denies the appearance in the familiy of relations of the individuals
represented by b1 and a1.

The issue is also not limited to decomposition due to transitive functional
dependencies. If a relation schema satisfies the functional dependency
A --> BC, then the appearance of an individual represented by a value for A
depends upon the existence of both the individual represented by a value for
B and the individual represented by a value for C. When the relation
schema is decomposed into a family of relation schemata such that A and B
appear in one relation schema and A and C appear in another, then if the
values a1, b1 and c1, for A, B and C respectively, appear in the same tuple
of the original relation, then in the family of relations the denial of the
appearance of the individual represented by c1 no longer denies the
appearance of the individual represented by a1 and the denial of the
appearance of the individual represented by b1 no longer denies the
appearance of the individual represented by a1. This is why I think that in
this case a circular inclusion dependency is required.

I would argue that the requirement for an inclusion dependency that is due
to the failure for a denial of the dependent to deny the determinant is
differs from the other interrelational constraints that would be required
for a goal of equivalence in that the one describes a 1:1 relationship, and
the absence of the inclusion dependency changes the relationship to 1:0..1;
whereas the other interrelational constraints are similar to "whenever there
is a value for X, there must be at least one value for Y," which describes a
1:1..n relationship, and the absence of one of those other interrelational
constraints changes the relationship to 1:0..n.

I'm not sure if the last paragraph is coherent. I'm a little bit tired
right now.

> -- Jan Hidders
>

JOG

unread,

Aug 18, 2007, 8:58:56 PM8/18/07

to

On Aug 18, 8:26 am, "Brian Selzer" <br...@selzer-software.com> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

Does anyone else understand any of this? Am I losing the plot -'
Individuals', 'denial of existence', etc, etc... what on earth does
any of this have to do with the price of fish? Sigh, I remember the
good ol' days of logical models. Propositions with roles and values.
Predicates describing which of these propositions were acceptable.
Looks like they've gone the way of betamax, walkmans, and those darned
talented kids from Fame. I wonder if they ever did learn how to fly.

Message has been deleted

Brian Selzer

unread,

Aug 19, 2007, 12:03:16 AM8/19/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1187485136....@w3g2000hsg.googlegroups.com...

To deny the existence of an individual represented by a particular value,
one need only issue a delete or a multiple delete that targets every tuple
in which that particular value appears. I would have thought this obvious,
but it appears I am mistaken.

It has nothing to do with the price of fish, but it has everything to do
with whether cyclical referential constraints are required when moving from
5NF to 6NF.

The balance of your post does not make any sense. In what way does what I
posted deny the import or applicability of propositions, predicates or
logical models?

Jan Hidders

unread,

Aug 19, 2007, 6:45:09 AM8/19/07

to

On 18 aug, 09:26, "Brian Selzer" <br...@selzer-software.com> wrote:
> "Jan Hidders" <hidd...@gmail.com> wrote in message

I am quite impressed. How can you make something so trivial sound so
incredibly complicated? :-) Of course, if during normalization you
split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
dependencies you can have C's without associated B's and vice versa.
If that is a problem, then add the INDs. That is basically all that
you have shown in the above. But your claim was that this is (almost?)
always a problem, and for that you have not provided any supporting
argumentation at all.

Note btw. that your argument is symmetric so if you indeed don't want
to change the nature of the relationships you need INDs in both
directions.

-- Jan Hidders

JOG

unread,

Aug 19, 2007, 7:08:41 AM8/19/07

to

Propositions, and the values they contain, are treated orthogonally to
the "individuals" they might refer to - those entities are a concern
of the conceptual model not the logical one. I am unclear as to why
you keep attempting to try and mix these two layers up?

Functional determination is nothing more than material implication
within FOL propositions. If we normalize surely we just have to make
sure that our new predicates are mathematically consistent with the
previous ones. Perhaps if you wrote down an example in FOL without
referring to anything outside the logical model such as 'individuals'?

>
> It has nothing to do with the price of fish, but it has everything to do
> with whether cyclical referential constraints are required when moving from
> 5NF to 6NF.

"*sigh* You already said this, and I already explained that under the

usual definitions of those terms they are *never* required, including

when going from 5NF to 6NF. " J.Hidders, 2007. I get the feeling that
Jan kind of knows his stuff.

>
> The balance of your post does not make any sense. In what way does what I
> posted deny the import or applicability of propositions, predicates or
> logical models?

More importantly whatever happened to Leroy, Bruno and Dr Shorofsky?

Brian Selzer

unread,

Aug 19, 2007, 8:43:30 AM8/19/07

to

"paul c" <toledob...@oohay.ac> wrote in message
news:%CNxi.70306$fJ5.3346@pd7urf1no...
> JOG wrote:
>> ...
>> Does anyone else understand any of this? ...
>
> It strikes me as absurdly technocratic, no apparent value, ie. for me the
> answer is no. I don't see any value in theory for its own sake unless you
> can say or guess at *some* point along the way what the "sake" is.

[snip]

No apparent value.... I gave an example before, but perhaps it was a bit
too complicated. Suppose that you have a 5NF relation schema,

employee {emp#, last, first, ssn, payrate} where emp# is the key, last is
the last name, and first is the first name, ssn is the social security
account number and payrate is the hourly pay rate.

Now split it into the family of 6NF schemata,

emplast {emp#, last}
empfirst {emp#, first}
empssn {emp#, ssn}
emppayrate {emp#, payrate}

Wouldn't it be a bit strange for an employee to have a first name but not a
last name? How about a pay rate without a social security account number?
Under the domain closure assumption, if there is a value for emp#, then
there is an employee with that employee number. For example, if there is a
tuple {emp#:152, first:Brian} in empfirst, then there is an employee with
employee number 152, and that employee has the first name, Brian. Under the
closed world assumption, the absence of a tuple with employee number 152 in
emplast indicates that the employee with employee number 152 has no last
name. So how can you determine how much to pay him without a pay rate?How
can you produce a check to pay him without a last name? How can you report
to the government how much he was paid without a social security account
number. If you can't pay him, then is he really an employee?

I now ask what is wrong with exploring why this can happen? Is this really
theory for theory's sake? In order to find a correct solution, isn't it
necessary to find the root cause?

I could be wrong, but I think I may have found the root cause. I offered it
up here in this forum--the database theory newsgroup. If you don't
understand what I'm trying to say, then please ask for clarification. If
you see a problem with my argument, or even better, if you can prove that
I'm wrong, then by all means do it: I'm not afraid to be wrong and would
prefer to be corrected so I don't waste any more time on it.

> When the foundation is nothing more than mysticism, arbitrary vocabulary
> and name dropping, any result no matter how ostensibly it appears to be
> reasoned is likely to be up for grabs.

Are you saying the foundation of the relational model is mysticism? The
whole notion of keys is semantic in nature: does that mean that the model is
based upon mysticism? The entirety of normalization theory is semantically
oriented, does that mean that it is all founded upon mysticism?

[snip]

JOG

unread,

Aug 19, 2007, 9:52:26 AM8/19/07

to

On Aug 19, 1:43 pm, "Brian Selzer" <br...@selzer-software.com> wrote:
> "paul c" <toledobythe...@oohay.ac> wrote in message

>
> news:%CNxi.70306$fJ5.3346@pd7urf1no...
>
> > JOG wrote:
> >> ...
> >> Does anyone else understand any of this? ...
>
> > It strikes me as absurdly technocratic, no apparent value, ie. for me the
> > answer is no. I don't see any value in theory for its own sake unless you
> > can say or guess at *some* point along the way what the "sake" is.
>
> [snip]
>
> No apparent value.... I gave an example before, but perhaps it was a bit
> too complicated. Suppose that you have a 5NF relation schema,
>
> employee {emp#, last, first, ssn, payrate} where emp# is the key, last is
> the last name, and first is the first name, ssn is the social security
> account number and payrate is the hourly pay rate.
>
> Now split it into the family of 6NF schemata,
>
> emplast {emp#, last}
> empfirst {emp#, first}
> empssn {emp#, ssn}
> emppayrate {emp#, payrate}
>
> Wouldn't it be a bit strange for an employee to have a first name but not a
> last name?

Yes it would be strange if a person had no surname. Would be strange
that there is no proposition containing a person's last name. No. In
the RealWorld(tm) its always possible there we will be missing info.
See the difference?

> How about a pay rate without a social security account number?
> Under the domain closure assumption, if there is a value for emp#, then
> there is an employee with that employee number. For example, if there is a
> tuple {emp#:152, first:Brian} in empfirst, then there is an employee with
> employee number 152, and that employee has the first name, Brian. Under the
> closed world assumption, the absence of a tuple with employee number 152 in
> emplast indicates that the employee with employee number 152 has no last
> name.

I disagree, it does not mean at all that employee 152 has no last
name. Under CWA in a data model, it means there is no proposition
describing that information. The first order objects are propositions,
not people. A subtle but invaluable distinction.

> So how can you determine how much to pay him without a pay rate? How
> can you produce a check to pay him without a last name? How can you report
> to the government how much he was paid without a social security account
> number. If you can't pay him, then is he really an employee?
>
> I now ask what is wrong with exploring why this can happen? Is this really
> theory for theory's sake? In order to find a correct solution, isn't it
> necessary to find the root cause?
>
> I could be wrong, but I think I may have found the root cause. I offered it
> up here in this forum--the database theory newsgroup. If you don't
> understand what I'm trying to say, then please ask for clarification. If
> you see a problem with my argument, or even better, if you can prove that
> I'm wrong, then by all means do it: I'm not afraid to be wrong and would
> prefer to be corrected so I don't waste any more time on it.

I don't think it is ever a waste of time to explore these issues.

>
> > When the foundation is nothing more than mysticism, arbitrary vocabulary
> > and name dropping, any result no matter how ostensibly it appears to be
> > reasoned is likely to be up for grabs.
>
> Are you saying the foundation of the relational model is mysticism? The
> whole notion of keys is semantic in nature: does that mean that the model is
> based upon mysticism?

No, again I disagree 100%. Keys are not semantic in nature whatsoever.
They are the antecedents of any material implication in a statement of
FOL. Semantics or 'meaning' are added by the user at the conceptual
layer, as they are to any role or value.

Brian Selzer

unread,

Aug 20, 2007, 4:16:52 AM8/20/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187520309.1...@57g2000hsv.googlegroups.com...

Perhaps it is because this is something that appears trivial since the
solution is so intuitively obvious that it is so incredibly complicated to
formulate an argument. :-) The above was an attempt (apparently inadequate)
to identify the root cause of the problem and to show that it can serve as a
logical basis for determining when there should be an inclusion dependency
(but not necessarily when there should not).

> Note btw. that your argument is symmetric so if you indeed don't want
> to change the nature of the relationships you need INDs in both
> directions.
>

I don't think it is, exactly. For a functional dependency X --> Y, there
can be more than one value for X that determines a particular value for Y.
So if x1 determines y1 then the statement, "x1 cannot appear in the relation
unless y1 also appears." is true, but the statement "y1 cannot appear in the
relation unless x1 also appears." is false. So since the first statement is
true, it should remain true for the family of relations; hence the need for
an IND in that direction. Since the second statement is already false
there's no need for an IND in the other direction in order to ensure that it
remains false.

I agree that if you want to preserve the fact the function from X to Y is
onto, then the IND in the other direction would be necessary.

> -- Jan Hidders
>

Jan Hidders

unread,

Aug 20, 2007, 5:59:07 AM8/20/07

to

We already have that logical basis. You simply have to check if it is
ok if there are any C's without B's and/or if there are any B's
without C's. Invoking high-sounding principles like the domain closure
assumption does not explain anything that was not clear already.

> > Note btw. that your argument is symmetric so if you indeed don't want
> > to change the nature of the relationships you need INDs in both
> > directions.
>
> I don't think it is, exactly. For a functional dependency X --> Y, there
> can be more than one value for X that determines a particular value for Y.
> So if x1 determines y1 then the statement, "x1 cannot appear in the relation
> unless y1 also appears." is true, but the statement "y1 cannot appear in the
> relation unless x1 also appears." is false. So since the first statement is
> true, it should remain true for the family of relations; hence the need for
> an IND in that direction. Since the second statement is already false
> there's no need for an IND in the other direction in order to ensure that it
> remains false.

You are looking at the wrong line of symmetry. If you split R(A,B,C)
into R1(A,B) and R2(A,C) then there are two lowerbounds you need to
worry about:
- for every C there is at least one B
- for every B there is at least one C
There is no reason to consider only one of them just because there is
an FD associated with it. Especialy since FDs don't really say
anything about lower bounds.

-- Jan Hidders

Brian Selzer

unread,

Aug 20, 2007, 7:47:40 AM8/20/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1187531546....@50g2000hsm.googlegroups.com...

I see what you're driving at. I don't agree with your argument. If the
last name could have been missing, then the relation wasn't in 5NF to begin
with. 6NF isn't a substitute for modeling missing information. It could be
a requirement that whenever one value is supplied, another value must also
be supplied. That can't be modeled through decomposition into 6NF.

Besides, due to the domain closure assumption, the absence of a proposition
containing a person's last name indicates that the person doesn't have a
last name. So if emplast only contains supplied last names, then the
meaning of last in emplast is different from the meaning of last in
employee, and thus the join of the relations emplast, empfirst, empssn,
emppayrate no longer means the same thing as the original relation.

>> How about a pay rate without a social security account number?
>> Under the domain closure assumption, if there is a value for emp#, then
>> there is an employee with that employee number. For example, if there is
>> a
>> tuple {emp#:152, first:Brian} in empfirst, then there is an employee with
>> employee number 152, and that employee has the first name, Brian. Under
>> the
>> closed world assumption, the absence of a tuple with employee number 152
>> in
>> emplast indicates that the employee with employee number 152 has no last
>> name.
>
> I disagree, it does not mean at all that employee 152 has no last
> name. Under CWA in a data model, it means there is no proposition
> describing that information. The first order objects are propositions,
> not people. A subtle but invaluable distinction.
>

But the domain closure assumption states that the only individuals that
exist are represented by values in the body of the database. Are you
denying the domain closure assumption?

What does that have to do with it? Are you saying that a functional
dependency is not a semantic notion? If you are, then I'm not alone in
disagreement.

JOG

unread,

Aug 20, 2007, 10:10:35 AM8/20/07

to

What is preventing the 6NF version from harnessing a database
constraint that demands that for every proposition of the form
(emp#:a, empfirst:x) there must also be a proposition (emp#a,
emplast:y), if so desired?

>
> Besides, due to the domain closure assumption, the absence of a proposition
> containing a person's last name indicates that the person doesn't have a
> last name.

I only know of its use refererring to domains, where if a value is not
contained within a enumerated domain, it is deigned not to exist. I
don't see what relevance this has to propositions - their values do
not make up a domain. In fact quite the opposite - they take values /
from/ domains.

> So if emplast only contains supplied last names, then the
> meaning of last in emplast is different from the meaning of last in
> employee, and thus the join of the relations emplast, empfirst, empssn,
> emppayrate no longer means the same thing as the original relation.

I don't see how:
1) emp#(a) -> empfirst(b) ^ emplast(c)
differs from:
2) emp#(a) -> empfirst(b)
3) emp#(a) -> emplast(c)

Are we not agreed these are mathematically equivalent? If empfirst and
emplast were nullable, then line 1 was an error. If they were
"not_nullable" (and 6NF was not necessary) then the 6NF database
requires a constraint to encode lines 2 and 3, but so what? No
problem there.

>
>
>
> >> How about a pay rate without a social security account number?
> >> Under the domain closure assumption, if there is a value for emp#, then
> >> there is an employee with that employee number. For example, if there is
> >> a
> >> tuple {emp#:152, first:Brian} in empfirst, then there is an employee with
> >> employee number 152, and that employee has the first name, Brian. Under
> >> the
> >> closed world assumption, the absence of a tuple with employee number 152
> >> in
> >> emplast indicates that the employee with employee number 152 has no last
> >> name.
>
> > I disagree, it does not mean at all that employee 152 has no last
> > name. Under CWA in a data model, it means there is no proposition
> > describing that information. The first order objects are propositions,
> > not people. A subtle but invaluable distinction.
>
> But the domain closure assumption states that the only individuals that
> exist are represented by values in the body of the database. Are you
> denying the domain closure assumption?

I am questioning your use of it. If you are paraphrasing an academic
paper's use in reference to database theory, then link me to it and I
will check it out.

I don't see how material implication has "semantics" outside of IF/
THEN to a logical model. "If I have this (role/value), then I can
determine some other (role/value)".

Brian Selzer

unread,

Aug 20, 2007, 10:32:49 AM8/20/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187603947....@g4g2000hsf.googlegroups.com...

But not in the general case. See below.

For this particular case, you're right, but it doesn't hurt to look at the
FDs, and in the general case, it is necessary. Here are some simple
decompositions.

(1) R(A,B,C) such that A --> B and A --> C into R1(A,B) and R2(A,C):
Since A --> C, there can't be a value for A without a value for C, so the
IND R1[A] in R2[A] is needed. Since A --> B, there can't be a value for A
without a value for C so the IND R2[A] in R1[A] is needed.

(2) R(A,B,C) such that A --> B and B --> C into R1(A,B) and R2(B,C):
Since A --> C is implied by the cover for R, there can't be a value for A
without a value for C, so the IND R1[B] in R2[B] is needed.

(3) R(A,B,C) such that AB --> C and C --> B into R1(A,C) and R2(B,C):
Since C --> B, there can't be a value for C without a value for B, so the
IND R1[C] in R2[C] is needed.

(4) R(A,B,C) such that A --> B and B --> A into R1(A,B) and R2(A,C):
Since A --> B, there can't be a value for A without a value for B, so the
IND R2[A] in R1[A] is needed.

>
> -- Jan Hidders
>

Jan Hidders

unread,

Aug 20, 2007, 6:10:09 PM8/20/07

to

On 20 aug, 16:32, "Brian Selzer" <br...@selzer-software.com> wrote:
>
>
> For this particular case, you're right, but it doesn't hurt to look at the
> FDs, and in the general case, it is necessary. Here are some simple
> decompositions.
>
> (1) R(A,B,C) such that A --> B and A --> C into R1(A,B) and R2(A,C):
> Since A --> C, there can't be a value for A without a value for C, so the
> IND R1[A] in R2[A] is needed. Since A --> B, there can't be a value for A
> without a value for C so the IND R2[A] in R1[A] is needed.

The conclusion is correct, but your argumentation is false. It is not
because of A --> C that there cannot be an A value without a C value.
Also if that FD did not hold then there would have to be for every A
value at least one C value. All that the FD says is that in addition
there can be at most one.

> (2) R(A,B,C) such that A --> B and B --> C into R1(A,B) and R2(B,C):
> Since A --> C is implied by the cover for R, there can't be a value for A
> without a value for C, so the IND R1[B] in R2[B] is needed.

But in R there cannot be a B value without a value for A, so the IND
R2[B] -> R1[B] is needed.

> (3) R(A,B,C) such that AB --> C and C --> B into R1(A,C) and R2(B,C):
> Since C --> B, there can't be a value for C without a value for B, so the
> IND R1[C] in R2[C] is needed.

In R there cannot be a C value with at least one associated A value,
so you also need IND R2[C] -> R1[C].

>
> (4) R(A,B,C) such that A --> B and B --> A into R1(A,B) and R2(A,C):
> Since A --> B, there can't be a value for A without a value for B, so the
> IND R2[A] in R1[A] is needed.

In R there cannot be a B value without at least one associated C
value, so you also need IND R1[A] -> R2[A].

So you see that in all cases both INDs are required. Which FDs exactly
hold is in fact completely immaterial.

-- Jan Hidders

Brian Selzer

unread,

Aug 20, 2007, 11:52:10 PM8/20/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187647809.3...@57g2000hsv.googlegroups.com...

In each case above that you added an IND, the reason was something like,
"There cannot be a value for Y without at least one associated value for X."
In each case above where I added an IND, the reason was something like,
"There cannot be a value for X without one and only one associated value for
Y."

Did you notice that each INDs that I defined above restores the functional
relationship between sets of attributes, whereas each INDs that you added
restores only the surjective nature of those functions?

In (1) the FDs AB --> AC and AC --> AB appear in the closure of R.
In (2) the FD AB --> BC appears in the closure of R.
In (3) the FD AC --> BC appears in the closure of R.
In (4) the FD AC --> AB appears in the closure of R.

In (1) the functional relationship is bijective, so both INDs are required
to ensure that the relationship remains a function in both directions.

In (2) the IND R1[B] in R2[B] ensures that the relationship from AB to BC
remains a function, but the IND R2[B] in R1[B] only ensures that it remains
a surjection.

In (3) the IND R1[C] in R2[C] ensures that the relationship from AC to BC
remains a function, but the IND R2[C] in R1[C] only ensures that it remains
a surjection.

In (4) the IND R2[C] in R1[C] ensures that the relationship from AC to AB
remains a function, but the IND R1[C] in R2[C] only ensures that it remains
a surjection.

My argument is that if there is a functional relationship from one set of
attributes to another in a less normalized schema, then there should still
be a functional relationship after decomposition. I don't think it's always
necessary that that functional relationship also remains a surjection.
(This coincides with the definition for independence I supplied earlier.)
In fact, it is often undesireable: the update anomalies that BCNF is
supposed to eliminate can be linked to the fact that the relationships are
surjective, so adding an IND in the opposite direction would be
counterproductive. I would note, however, that when moving from 5NF to 6NF,
the relationships between the sets of attributes in each pair of projections
are bijective, and therefore a cyclical referential constraint is always
required.

> -- Jan Hidders
>

Jan Hidders

unread,

Aug 21, 2007, 8:31:27 AM8/21/07

to

On 21 aug, 05:52, "Brian Selzer" <br...@selzer-software.com> wrote:
>
> My argument is that if there is a functional relationship from one set of
> attributes to another in a less normalized schema, then there should still
> be a functional relationship after decomposition.

This cannot be your argument because it is only a reformulation of
your claim. You still have yet to supply any form of argumentation
that justifies that claim.

You seemed to come close to such a thing by invoking the domain
closure property and the principle that you don't want to change the
upper and lowerbounds of the relationships between the objects in the
domains, but now you start special pleading because you only want to
apply those principles for the lower and upper bounds in the direction
of the FDs, i.e., you are only applying the principles there where you
need them to support your claim and ignore them where they contradict
your claim. This way you are of course proving precisely nothing.

Since I don't see any progress, and suspect that there will none in
the near future, I'm signing off from this discussion.

-- Jan Hidders

Brian Selzer

unread,

Aug 21, 2007, 9:18:32 AM8/21/07

to

"JOG" <j...@cs.nott.ac.uk> wrote in message

news:1187619035.8...@k79g2000hse.googlegroups.com...

This is what Jan and I have been discussing. I think the cyclical
referential constraint is always needed when moving from 5NF to 6NF.

>>
>> Besides, due to the domain closure assumption, the absence of a
>> proposition
>> containing a person's last name indicates that the person doesn't have a
>> last name.
>
> I only know of its use refererring to domains, where if a value is not
> contained within a enumerated domain, it is deigned not to exist. I
> don't see what relevance this has to propositions - their values do
> not make up a domain. In fact quite the opposite - they take values /
> from/ domains.
>
>> So if emplast only contains supplied last names, then the
>> meaning of last in emplast is different from the meaning of last in
>> employee, and thus the join of the relations emplast, empfirst, empssn,
>> emppayrate no longer means the same thing as the original relation.
>
> I don't see how:
> 1) emp#(a) -> empfirst(b) ^ emplast(c)
> differs from:
> 2) emp#(a) -> empfirst(b)
> 3) emp#(a) -> emplast(c)
>
> Are we not agreed these are mathematically equivalent? If empfirst and
> emplast were nullable, then line 1 was an error. If they were
> "not_nullable" (and 6NF was not necessary) then the 6NF database
> requires a constraint to encode lines 2 and 3, but so what? No
> problem there.
>

I'm not sure what you're trying to say.

There is "Equality and Domain Closure in First-Order Databases" by Raymond
Reiter which was published in the JACM in April 1980. An application of it
can be seen in "Logic and Databases: A Deductive Approach," by Gallaire,
Minker and Nocolas, also available in the ACM Digital Library. Another
application can be found in "Maintaining State Constraints in Relational
Databases: A Proof Theoretic Basis," by McCune and Henschen, again available
online in the ACM Digital Library.

Sorry for the delay in responding. Your response caused me to question my
own use of it and also my argument. The latter article, by McCune and
Henschen, employs different domain closure axioms for the database values
immediately preceeding and succeeding an insert, which supports my usage.

JOG

unread,

Aug 21, 2007, 10:29:03 AM8/21/07

to

Ta. I will check the references out. It is not usage I have not come
across before, and it seemed instinctively obtuse - but I will quell
that thought prior to further reading.

Brian Selzer

unread,

Aug 21, 2007, 11:46:13 AM8/21/07

to

"Jan Hidders" <hid...@gmail.com> wrote in message

news:1187699487.1...@g4g2000hsf.googlegroups.com...

Thank you for your attention. I enjoyed the discussion and learned several
things I hadn't put together before. I think that the reason I'm having
trouble convincing you is because I really didn't clearly state my claim to
begin with. I never really came up with a clear definition of what it means
for a schema to have /at least as much/ information, nor did I provide any
proof that if a relation is in 5NF, then the relationships between the
dependent attributes are due to the fact that the relation is in 5NF, or
that the presence of a pathological relationship between the dependent
attributes indicates that the relation isn't in 5NF. One interesting
consequence of the discussion that I think it should be possible to prove is
that every update anomaly can be linked to the presence of a pathological
many to many relationship between attributes. I'll leave that until a later
date. Again, thank you.

> -- Jan Hidders
>

David Cressey

unread,

Aug 21, 2007, 1:58:24 PM8/21/07

to

"Brian Selzer" <br...@selzer-software.com> wrote in message
news:9NDyi.28722$RX....@newssvr11.news.prodigy.net...

> begin with. I never really came up with a clear definition of what it
means
> for a schema to have /at least as much/ information, nor did I provide any
> proof that if a relation is in 5NF, then the relationships between the
> dependent attributes are due to the fact that the relation is in 5NF, or
> that the presence of a pathological relationship between the dependent
> attributes indicates that the relation isn't in 5NF.

Part of the problem may be that the phrase "at least as much information"
suggests some sort of measure of information, but not the information
itself. For example, if you have "at least as much money in the bank as I
have", it doesn't mean that you have the same money as I do in the bank.

I think what you may have meant might be better conveyed by a phrase like
"at least all the same information as". But I'm not sure what you did mean,
so this is just a guess.

Brian Selzer

unread,

Aug 24, 2007, 9:15:02 PM8/24/07

to

"David Cressey" <cres...@verizon.net> wrote in message
news:4JFyi.3591$wr3.2580@trndny04...

What I was trying to convey by the phrase /at least as much/ information is
that the only additional information that should ever appear in an instance
of the more normalized database schema is exactly that information that
should be allowed but can't be due to the structure of the less normalized
database schema. For example, if the FDs A --> B and B --> C hold in a
relation schema {A, B, C}, then it is not possible to insert values for B
and C without also inserting a value for A. If it should be possible, then
there is a structural problem which is due to the fact that the MVD B ->-> A
| C that holds in {A, B, C} is pathological. The problem is that while the
decomposition into {A, B} and {B, C} makes it possible to insert values for
B and C without also requiring a value for A, it also permits values to be
inserted for A and B without also requiring a value for C. The inability to
insert a value for A without also inserting a value for C is due to the fact
that the FD A --> C holds in {A, B, C}. Even though it is true that the
FD A --> C implies the MVD B ->-> A | C, it is only due to the fact that B
appears in the relation schema that the MVD holds. Therefore, in order to
maintain the functional relationship from A to C, it is necessary to add the
IND {A,B}[B] in {B,C}[B].

Brian Selzer

unread,

Aug 27, 2007, 12:30:18 PM8/27/07

to

"Brian Selzer" <br...@selzer-software.com> wrote in message

news:qoLzi.21288$eY....@newssvr13.news.prodigy.net...

>
> "David Cressey" <cres...@verizon.net> wrote in message
> news:4JFyi.3591$wr3.2580@trndny04...
>>

> What I was trying to convey by the phrase /at least as much/ information
> is that the only additional information that should ever appear in an
> instance of the more normalized database schema is exactly that
> information that should be allowed but can't be due to the structure of
> the less normalized database schema. For example, if the FDs A --> B and
> B --> C hold in a relation schema {A, B, C}, then it is not possible to
> insert values for B and C without also inserting a value for A. If it
> should be possible, then there is a structural problem which is due to the
> fact that the MVD B ->-> A | C that holds in {A, B, C} is pathological.
> The problem is that while the decomposition into {A, B} and {B, C} makes
> it possible to insert values for B and C without also requiring a value
> for A, it also permits values to be inserted for A and B without also
> requiring a value for C. The inability to insert a value for A without
> also inserting a value for C is due to the fact that the FD A --> C holds
> in {A, B, C}. Even though it is true that the
> FD A --> C implies the MVD B ->-> A | C, it is only due to the fact that B
> appears in the relation schema that the MVD holds.

Correction: A --> C does not imply B ->->A | C; B --> C does. But the
MVD B ->-> A | C and the FD A --> C cannot both hold unless at least one
of the FDs B --> A or B --> C also holds.

jmlet...@gmail.com

unread,

Jun 5, 2013, 5:52:49 AM6/5/13

to

and there is only one On Monday, July 30, 2007 6:45:00 AM UTC-4, Sameeksha wrote:
> Googling out for definition and explanation for sixth normal form only
> resulted in the following information - "6th normal form states that a
> relation R should not contain any non-trivial join dependencies". Also
> everywhere it is stated that this normal form takes into account the
> temporal (time) dimension to the relational model, and that current
> implementations like SQL server 2005 do not implement this normal
> form.
>
> Any more explanation and preferably an example would help in
> understanding the concept behind this normal form.

vldm10

unread,

Jun 12, 2013, 10:47:20 AM6/12/13

to

Dana srijeda, 5. lipnja 2013. 11:52:49 UTC+2, korisnik jmlet...@gmail.com napisao je:
> and there is only one On Monday, July 30, 2007 6:45:00 AM UTC-4, Sameeksha wrote:

I think, it is good to understand that “6NF” has no theoretical significance. The definition of "6NF" does not provide a procedure that will "Relvar" bring into "6NF".
The authors of “6NF” understand that 6NF is important, but they do not understand that “6NF” is not matter of functional and join dependences.
The decomposition of a structure into binary (atomic) structures is what is important here. Atomic structures imply many important consequences, not only for databases, but also for a range of other sciences.
For example, we can directly determine truth value for the atomic sentences. However, if sentence is complex we need additional tool for determining its truth value, that is, we need truth tables. Note that the truth tables are not proved in mathematics. And mathematics is not intuitive science.
Here is also the issue of atomic structures in mathematics, semantics, linguistics, cognitive science and philosophy as well as their construction. Note that relations have the corresponding predicates.

At this User Group, I gave the example of the relation whose attributes are mutually independent. In this case we have “all key” relvar. This example shows that “6NF” is an absurd, because there are relations who are in “6NF”, but these relations have keys that are composed of a large number of attributes. Theoretically the number of these attributes can be any finite number of attributes.
“Simple Form” which I presented billow in this post can decompose mutually independent attributes. In fact, entities with mutually independent attributes mean good construction of these entities. Any restriction on the mutually independent attributes, including functional dependency, is an additional requirement, which needs especially take into consideration.

In my paper "Some ideas about a new data model" at http://www.dbdesign10.com it was shown the procedure of the decomposition of data structures into the corresponding binary structures. (Presented in2005).
In my paper, "Database design and data model founded on concept and knowledge constructs" at this http://www.dbdesign11.com I have introduced some other generalizations regarding the mentioned decomposition, see section 4.2. 9 (Presented in 2008).

In my paper, "Database design and data model founded on concept and knowledge constructs" at this http://www.dbdesign11.com I have introduced some other generalizations regarding the mentioned decomposition, see section 4.2. 9 (Presented in 2008).
In my papers from 2008 and 2012, I presented decomposition of a file into the corresponding binary files as well as the decompositions of concepts into the corresponding binary concepts at ER level. (see http://www.dbdesign11.com )

In 2006 I presented “Simple Form”, see section 4 at http://www.dbdesign10.com :
Relation schema R (K, A1, A2,…, An) is in Simple Form if R satisfies:
R (K, A1, A2, …,An) = R1 (K, A1) join R2 (K, A2), join … join Rn (K, An)
if and only if
1. Key K is simple
2. A1, A2,… , An are mutually independent.

In fact, in the current theory of database design, it is not precisely defined what are the first steps. Simple Form defines a first step in the db design. Simple Form is introduced only for one reason; it determines and constructs the attributes and the simple key for entities. Mutually independent attributes in RM correspond to the intrinsic attributes on ER level. Note that any limitation of the mutually independent attributes I threat as a constraint.
In contrast to so-called “6NF”, Simple Form completely determines construction of entities. Note that "6NF" requires that we first need to do the following NFs: 1NF, 2NF, 3NF, BCNF, 4NF, ETNF, RFNF, SKNF and 5NF.

Note that the "Simple Form" enables the introduction of constraints only after determining of attributes and binary structures. So, I separate the design of attributes from the design of constraints.
Therefore the simple key is presented in the abstract object i.e. the key is presented in a database. It also demonstrates that the key is one thing i.e. the key is one wholeness.

Lately, there have appeared papers which are completely based on 6NF. For example the following papers:

1.
“Anchor Modeling An agile Modeling Technique using the Sixth Normal Form for Structurally and Temporally Evolving Data”

On this user group I showed that this paper is wrong in many aspects, see my thread "some information about anchor modeling." For example in my post from this thread, from April 1, 2013, I showed 11 major errors in paper Anchor Modeling.

2.
“Translating data between XML schema and 6nf conceptual models”. This paper also has 6NF in the title.

This paper writes in details about Anchor Modeling, I think that the paper is a master thesis.

3.
“Database design & relational theory Normal Forms & all that jazz”.

This book is interesting case, because it was written by C. Date, who is co-author of "6NF".
C. Date writes about "Anchors" as a well-known fact in db theory. However, he wonders if anchor recalls on Codd's paper RM / T ? “…Does this state of affairs remind you of the RM/T discipline…” (See page 211). Note that “anchor” is the main construct in Anchor Modeling and is defined as the surrogate key, see reference [19].

4.
“Modern approaches to data warehouse design”, Tine Borovnik, Master thesis, University Ljubljana, Slovenia, 03.09.2010.
Section 4.2 is devoted to Anchor Modeling
---------------------
As I wrote above, Anchor Modeling has many errors and incorrect solutions.
So all of these paperss are based on a theory that is wrong. In addition, these works use "6NF" as fundamentals.

So, as a conclusion of my post, you can accept my claims, which are clearly outlined in my thread "some information about anchor modeling." Note that this my claims, so far, no one has denied. In this case, the above-mentioned four papers may cause some kind of the chaos in the field of the theory of the database. I guess that has a lot more papers, which are based on "6NF", here I listed the ones that I saw.

On the other hand you or someone else can claim on this user group that my claims are false. Of course in this case, the claims must be justified.

>
> > Googling out for definition and explanation for sixth normal form only
>
> > resulted in the following information - "6th normal form states that a
>
> > relation R should not contain any non-trivial join dependencies". Also
>
> > everywhere it is stated that this normal form takes into account the
>
> > temporal (time) dimension to the relational model, and that current
>
> > implementations like SQL server 2005 do not implement this normal
>
> > form.
>
> >
>
> > Any more explanation and preferably an example would help in
>
> > understanding the concept behind this normal form.

Vladimir Odrljin