Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

On Normalisation & the State of Normalisation

547 views
Skip to first unread message

Derek Asirvadem

unread,
Feb 2, 2015, 9:11:05 PM2/2/15
to
New Thread Normalisation

Dear People

Normalisation has come up in three threads recently, in an indirect but important manner (isn't Normalisation /always/ important?). Let's handle it as a stand-alone subject that the other threads can refer to.

Let's exercise the following context, pretend we have the following roles, so that we can progress through the discussion quickly:

- I am the nominated Database Administrator. The database exists and developers come to me, due to users going to them with user requirements, that cause extensions to the database. The database is 100% Relational, OLTP and DSS/OLAP from the one set of tables, etc. It is precious to the company and their bottom line depends on the quality of the data in the database. The company hired me because I am strictly Relational, strictly Codd, and the auditors signed off the database when I implemented it years ago. Therefore I am the policeman, and nothing goes into the database unless I approve it, and if the data quality is damaged, I will lose my job (not the developer who gave the extension).
- You guys are the nominated theoreticians in the nominally Relational Database space, as the entire Relational Database world knows it. (You may have different "definitions" and "relational" might mean different things to each of you, and that is part of what I expect to have teased out of this.)
- The above is important because the developers are very skilled, they read your books and articles (not your theoretical papers). They know a fair amount of Date, Darwen, Celko, etc. Some of them may have read Pascal. They are now excited about the Alice book, because it is so heavily marketed, but they have difficulty with it. They are excellent SQL coders within the scope that I have given them (and fairly poor outside that). of course, it is 2015, and none of us actually write code, we all have different IDEs for our tasks.
- So when the developers come to me with a proposed extension, they are relying on your books and articles that they have read.
- And finally, since the marvellous internet exists, and google provides c.d.t, I have the opportunity, instead of hammering them for any mistake or substandard proposal, to go to you, the authors, directly, and ask, why do they have this concept, how do you construct or justify this concept in your books. Specifically, re Normalisation, why something does or does not satisfy some NF or other.

In this manner, I expect to (a) emulate the very real, real world problems that happen in thousands or hundreds of thousands of sites, and (b) minimise the discussion re different definitions, etc.

As far as I am concerned, Normalisation is a fixed science, not merely a bag of NF definitions, in fact way beyond that bag, that can be applied easily and directly by an undergrad after a one-semester course. But the reality is, anyone and everyone who attempts Normalisation as it is various known and taught, struggles with it. And they need three semesters.

I laugh at the crackpot NFs that keep coming up. I deliver 1NF, 2NF, 3NF, and DKNF, in a form way, way, beyond the definitions, and I declare that my database will satisfy any NF definition that the theoreticians may come up with in the future. I do this by
- relying on the fullness of the RM,
- and Normalisation as a science (not the bag of NFs).
When 4NF and 5NF were published, my declaration was proved to be true, I earned $10,000 for signing a form that stated the db that was previously declared and proven to be 3NF, now "satisfies 5NF", without doing a single piece of work, without executing a single command on the SQL platform. Of course, I bought the entire team lunch, and the auditors dinner.

I also deliver *data quality* that is way, way beyond the fragments of constraints that you guys are aware of, also by relying on the fullness of the RM. But you guys have demonstrated that your notion the RM is fractional, so there is no point in dealing with that, we have to deal with getting you up to the full RM first. Therefore full data quality, constraints beyond DKNF (5NF if you don't like DKNF), the full RM, are beyond the scope of this thread, let's limit this to:
____Normalisation____
____Normalisation wrt to the RM____
and get some clarity and agreement amongst you people, so that the confusion in the real world is reduced. We might end up performing a service to the Relational Database community.

In other threads, as theoreticians in our field, I have laid the responsibility, and therefore the blame for the abject failure, of Normalisation to be understood in the real world, and used, at your feet. That is to say, the developers are relying on the theory as you theoreticians have delivered it. (Sure, there is a lot of garbage on the internet, but I am limiting the scope of this to formal education, which uses your textbooks.)

I am not expecting this thread to be an exposition of my work, the purpose is to clarify and crystalise Normalisation (as a set of NFs or whatever) and reduce the madness that is going on in its absence. Forty five years of confusion and differences in "definitions" is just about enough, don't you think ? That should be enough time for us to come up with a coherent set of definitions. Ie. I expect the content to be delivered by you. Let's try and complete this task this year.

I will say, I agree with most of what James expressed in his instructions to Ruben in the Needs Tutoring thread, but there are some key differences. I say this because it appears most of you disagree with most if not all of what James stated.

----

Ok. Here is the latest extension that a developer proposed to me, to be added to the RDb.

http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20A.pdf

We need a set of tables to rationalise and consolidate all Addresses. The company sells their gear on the internet as well as Australia, the addresses have to be international. This is what the developer says is the cluster of all address type data that will be used as reference, in order for the data quality in the single Address table to be maintained. The CountryCode and StateCode are ISO 3166-1 and 3166-2, and the CountyCode is ANSI/FIPS, or something meaningful outside America (I already have that loaded elsewhere).

Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.

1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice. I expect minimal discussion, but don't let me stop you.

2. The developer is excited because he has read the C J Date and R Fagin paper *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys are "simple". He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.

3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.

Cheers
Derek

Derek Asirvadem

unread,
Feb 3, 2015, 1:24:43 AM2/3/15
to
Clarification

> On Tuesday, 3 February 2015 13:11:05 UTC+11, Derek Asirvadem wrote:

> Therefore full data quality, constraints beyond DKNF (5NF if you don't like DKNF), the full RM, are beyond the scope of this thread, let's limit this to:
> ____Normalisation____
> ____Normalisation wrt to the RM____

That first line should be:
____Normalisation as a science____
I am so used to it being a science, not a bag of nuffs, I forgot to state it in every case.

> Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.

In case it needs to be said, I am not trying to control the way you think. If you perceive the order of relevance/dismissal in the question as different to mine, go right ahead with your sequence.

2.1 The developer was also relying of a paperweight (shorthand for weighty paper) named ETNF. I asked him to explain it in three sentences or less. After half an hour of listening patiently to his monologue, I had to ask him to vacate the office due to my next appointment standing outside. But I did gather that he had a novel method of guaranteeing uniqueness, which I found absurd, because uniqueness has never been a problem that needed to be solved.

Cheers
Derek

Erwin

unread,
Feb 3, 2015, 3:13:01 AM2/3/15
to
"The developer declares that the proposed extension satisfies 5NF. Is that correct ?"

(a) as is usually the case when unskilled people present a drawing of boxes and ask whether this satisfies xNF, the question isn't answerable because the dependencies haven't been stated.

(b) Assuming certain external predicates for the relvars that would at the logical level correspond to the boxes in your conceptual model, and also assuming a logical structure for the database that is kinda one-to-one with your drawing of boxes and lines, and also assuming the dependencies are as real life has them, in Belgium, the answer is no.

Derek Asirvadem

unread,
Feb 3, 2015, 5:05:02 AM2/3/15
to
???

a. There is a legend at the bottom of the page. Please read. If the dependencies are still not clear, please post again, and I will spell out all the little squiggles and notches.

b.1. As stated, and as you seem to understand, it is the developer's model, not mine. If I erect a model like that Heaven will strike me dead !!!

b..2. It is not a conceptual model.

b.3. It is a logical model, minus the datatypes, which are irrelevant to the purpose of this thread.

b.4. No idea why you think the boxes are "kinda" one-for-one. The point of a model, at the logical level is one-for-one, at the logical level. When moving to the physical, which, again, is not relevant to this thread "Normalisation" of data, please be advised that the only change to the one-for-one is that Associative Tables are added to resolve each n:m relationship. They are none such in this model, so it is one-for-one at the physical level as well.

b.5. "Relvars" scare, and confuse, developers. I had previously assured him that they are nothing to be concerned about, because it is a trick that theoreticians use to avoid definition in the physical universe, so he did not use it in our conversations. Neither he nor I have seen "relvars" mentioned in the texts defining the normal forms or Normalisation. Neither he nor I give a rats about them. If your point is relevant to Normalisation, please explain what you mean in those terms.

b.6. The predicates are in the dependencies, _read_ as a model is _read_. Refer my response (a). If you would like assistance reading the model, let me know. It is IDEF1X, the standard for modelling Relational Databases, that we have been using in the implementation universe, since 1985. A pleasant coincidence: the best product that provides it, is called ERwin (Entity Relation modelling for win-doze), your namesake! But of course, anyone with a drawing program can construct a good diagram, they will be understood as long as they use the standard notation. Only in the implementation universe, of course.

b.7. The existential predicates are usually not stated nowadays, I am happy to provide them for you at your request. I was hoping to focus on the Normalisation issues.

b.7. He apologises for the relation names instead of proper Verb Phrases, but I thought they were pretty obvious, and allowed him to get away with it. I shall demand a correction in his next iteration.

I would think, once you have been able to _read_ the model; to identify the dependencies conveyed therein; and thus notice the predicates are explicit, your categorical response might change. Therefore I will not address that at this stage.

Cheers
Derek

Derek Asirvadem

unread,
Feb 3, 2015, 6:33:32 AM2/3/15
to
> On Tuesday, 3 February 2015 21:05:02 UTC+11, Derek Asirvadem wrote:
>
> b.7. The existential predicates are usually not stated nowadays, I am happy to provide them for you at your request. I was hoping to focus on the Normalisation issues.
>
> b.7. He apologises for the relation names instead of proper Verb Phrases, but I thought they were pretty obvious, and allowed him to get away with it. I shall demand a correction in his next iteration.

Done.

Cheers
Derek

Derek Asirvadem

unread,
Feb 3, 2015, 6:49:04 AM2/3/15
to
> On Tuesday, 3 February 2015 21:05:02 UTC+11, Derek Asirvadem wrote:
> On Tuesday, 3 February 2015 19:13:01 UTC+11, Erwin wrote:
>
> b.7. The existential predicates are usually not stated nowadays, I am happy to provide them for you at your request. I was hoping to focus on the Normalisation issues.

b.9. Descriptive predicates are also not usually given these days, because they can be _read_ the model:
____County is described by ( StateId, CountyCode, Name )
However, if that is proving difficult, please ask, and I will get him to provide. It only takes a few seconds, and I always say, do anything and everything in order to be understood.

b.10 We produce State Transition and Transaction Sequence diagrams, as standard practice. I would think those are out-of-scope re Normalisation. But if it helps, if it increases understanding the database, of which the model is merely one rendition, eg. what operations are performed on those predicates, and under what conditions do those operations perform their function, please ask. Seconds away.

b.11 All updates to the database are performed via OLTP standard-compliant Transactions. Therefore View Updating and similar crimes against nature are a non-issue.

Cheers
Derek

Erwin

unread,
Feb 3, 2015, 7:16:26 AM2/3/15
to
Op dinsdag 3 februari 2015 11:05:02 UTC+1 schreef Derek Asirvadem:
>
> ???
>
> b.1. As stated, and as you seem to understand, it is the developer's model, not mine.

Why are you saying this ? The subject of the thread is "normalization". Do you think whose model it is makes a difference as to "which normal form this is in" ? And if you are referring to my usage of the word "unskilled", I meant by that that anyone who understands what normalization really is, wouldn't be asking this question, because any such person would know that one of the following two is true :

- either all applicable dependencies are explicitly stated, and then this person would presumably know how to assess the NF level.
- or else it is assumed that the only applicable dependencies are those that are implied by the boxes and the key specifications they might hold, but in that case such a person would know that _every_ model of that nature satisfies 5NF trivially and by definition.

>
> b..2. It is not a conceptual model.

It is a conceptual model, and you even know the reason why. You stated it in b.3 immediately after.



>
> b.3. It is a logical model, minus the datatypes, which are irrelevant to the purpose of this thread.

a logical model is a logical model only if it gives a _full_ account of the details of every relvar involved. Meaning : anything that is "minus the datatypes" is not a logical model precisely because of that.

(I agree that the precise definition of the attributes datatype are orthogonal to NF questions/issues, but that brings me to my next point.)

A logical model is also a logical model only if it gives a _full_ account of all the constraints that apply to the relvars involved. Meaning in particular : a _full_ account of all the functional dependencies _including those_ that apply to the relation schema you get when, say, you join Street with Suburb and Town_. For example, The constraint that corresponds to the FD TownID -> Postcode that might apply to [the schema of] that join.



> b.5. "Relvars" scare, and confuse, developers. I had previously assured him that they are nothing to be concerned about, because it is a trick that theoreticians use to avoid definition in the physical universe, so he did not use it in our conversations. Neither he nor I have seen "relvars" mentioned in the texts defining the normal forms or Normalisation. Neither he nor I give a rats about them. If your point is relevant to Normalisation, please explain what you mean in those terms.

A relvar, as far as normalization theory is concerned, is a [possibly named] (S,D) pair where S is the relation schema for that relvar and D is the set of dependencies that apply to it. The members of D can be functional or join dependencies, and they must be termed exclusively in attributes of the schema of the relvar. I'm not considering other kinds of dependencies because I'm not certain to which extent there even exists a [sufficiently agreed-upon] "normalization theory" that incorproates/embraces them.

Anyway, what I was getting to is that had the author of the schema started off with a single relation schema for the _entire_ model (which is how the normalization procedure really goes), he might have noticed that in that schema, there was an FD that applied to the effect that TownID -> PostCode. The decomposition he [implicitly] applied has lost the expressibility of that FD, because there no longer is a relvar [with corresponding relation schema] in which both attributes appear. But the fact that the expressibility of an FD has been lost, does not mean that the FD itself no longer holds.

It is extremely rare for there to be database schemas in which there is not a single constraint that isn't a key constraint (attribute, tuple and foreign key constraints notwithstanding, of course).

(if by "physical universe", you meant "all the stuff that comes to be important once you start out with the details of physical database implementation, then no, relvars are not a concept that belongs to the physical universe.)



> b.7. The existential predicates are usually not stated nowadays, I am happy to provide them for you at your request. I was hoping to focus on the Normalisation issues.

You might have meant "external predicates". That they are not usually stated is precisely the problem. They are often quite helpful in identifying dependencies, read constraints.



> I would think, once you have been able to _read_ the model; to identify the dependencies conveyed therein; and thus notice the predicates are explicit, your categorical response might change.

It stays exactly as is.

com...@hotmail.com

unread,
Feb 3, 2015, 7:44:52 AM2/3/15
to
On Tuesday, February 3, 2015 at 4:16:26 AM UTC-8, Erwin wrote:
> (which is how the normalization procedure really goes)

Interestingly for this application: An envelope gets to a particular address. Ids do not appear on an envelope. So after joining a correct set of tables and projecting out ids there must be a superkey among the remaining columns. The FKs correspond to FDs in the join. So there is a superkey from among Suburb Name, StreetName Name, StreetType Name and the non-id attributes of Address.

philip

Erwin

unread,
Feb 3, 2015, 8:23:17 AM2/3/15
to
Op dinsdag 3 februari 2015 13:44:52 UTC+1 schreef com...@hotmail.com:
Your point being ?

Knowing of the existence of that superkey will not help in identifying the original FD, which is where the real issue is.

Derek Asirvadem

unread,
Feb 3, 2015, 8:51:26 AM2/3/15
to
> On Tuesday, 3 February 2015 23:16:26 UTC+11, Erwin wrote:
> Op dinsdag 3 februari 2015 11:05:02 UTC+1 schreef Derek Asirvadem:
> >
> > I would think, once you have been able to _read_ the model; ...

> Why are you saying this ?

I think, once you have been able to _read_ the thread ...

I might stop ignoring your posts

Cheers
Derek

com...@hotmail.com

unread,
Feb 3, 2015, 8:59:04 AM2/3/15
to
On Tuesday, February 3, 2015 at 5:23:17 AM UTC-8, Erwin wrote:
> Knowing of the existence of that superkey will not help in identifying the original FD, which is where the real issue is.

I just find it amusing that the particular application plus no interpretation of tables tells us that there is a universal relation CK that this diagram fails to express.

philip

Derek Asirvadem

unread,
Feb 3, 2015, 9:09:03 AM2/3/15
to
Thank you, Philip

> On Tuesday, 3 February 2015 23:44:52 UTC+11, com...@hotmail.com wrote:
> On Tuesday, February 3, 2015 at 4:16:26 AM UTC-8, Erwin wrote:
> > (which is how the normalization procedure really goes)
>
> Interestingly for this application: An envelope gets to a particular address. Ids do not appear on an envelope.

Yes.

> So after joining a correct set of tables and projecting out ids there must be a superkey among the remaining columns.

I understand where you are going with that. Both the auditors, and the metropolitan district superior of police prohibit "superkeys", on the basis that it causes human being to think like apes, and thus is an evil against society, condemned, etc. I completely agree.

AFAIC, the first three NFs are rock-solid. If taken to the fullness of the RM, then we do not need BCNF, 4NF, 5NF, they are all "definitions" of 3NF, that patch up holes, that only saboteurs and terrorists find and exploit. This is precisely why, I state, how I deliver, full Normalisation that satisfies any and all NFs that have not been "defined" yet. That was one reason that I was hired.

There are perfectly ordinary methods, within Normalisation (the first three NFs, taken to the fullness of their meaning re the RM, or in your terms BCNF, 4NF, 5NF, DKNF, ETNF, NFNF, etc, and any human intuitive logic that you may have at your disposal) and Relational Modelling (the RM, the IDEF1X Methodology that flows from the RM) to provide what the users need, without the cancer of "superkeys", and the various beastly dances attached to it.

AFAIC, the model fails, for a number of reasons, not identified by the readership thus far.

There is no point in "making it work" as is, via "superkeys" or posicle sticks. The faults must be corrected first, following that "superkeys", etc, will not be an option.

Cheers
Derek

com...@hotmail.com

unread,
Feb 3, 2015, 9:10:09 AM2/3/15
to
On Tuesday, February 3, 2015 at 5:59:04 AM UTC-8, com...@hotmail.com wrote:
> I just find it amusing that the particular application plus no interpretation of tables tells us that there is a universal relation CK that this diagram fails to express.

But then that must be the case if a diagram's CKs only involve non-natural columns.

philip

Derek Asirvadem

unread,
Feb 3, 2015, 11:49:57 PM2/3/15
to
> On Wednesday, 4 February 2015 01:09:03 UTC+11, Derek Asirvadem wrote:
>
> There is no point in "making it work" as is, via "superkeys" or popsicle sticks. The faults must be corrected first, following that "superkeys", etc, will not be an option.

What I find amazing is the kaleidoscope of alien patterns that people perceive when they don't have a clue what earthly patterns to look for. The task is to measure up the elephant and to determine that it is not deformed; not mutilated; if it is good enough to go into the circus. But people are finding one elephant tail hair here; a tree trunk there; a python in the front. Ok, that's pretty bad and I was going to leave. But to my absolute amazement, they have created, actually manifested, a very hirsute python with one hefty leg. Not only that, because they have a Complete Set of Tools for measuring such alien beasts, they can confirm that it is specifically the Venezuelan Eyeless variety and not the Mexican Hairless. Now the circus is much more interested in that, than an elephant that plays football.

Keep it coming, folks, this is good.

Please forgive me for providing drawings to the blind. It might seem like a sick prank, I really didn't mean to do it. I had no idea. I will see if I can get it printed in braille.

> AFAIC, the model fails, for a number of reasons, not identified by the readership thus far.

For those who are not distracted from their given task of measuring up an elephant, please keep the pressure up on that task.

Just to confirm, as per the detailed exchange with Jan, the company and I are using the established definition for 3NF, written in technical English, not the tiny fragmented non-definition written in gibberish or Swahili that they use to subvert it. We are well aware of the tricks that frauds use to represent deranged dwarves as able men, or a Venezuelan Eyeless as an elephant.

Cheers
Derek

Jan Hidders

unread,
Feb 4, 2015, 5:35:12 AM2/4/15
to
Op dinsdag 3 februari 2015 03:11:05 UTC+1 schreef Derek Asirvadem:
> New Thread Normalisation
>
> [.. snip ..]
>
> ----
>
> Ok. Here is the latest extension that a developer proposed to me, to be added to the RDb.
>
> http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20A.pdf
>
> We need a set of tables to rationalise and consolidate all Addresses. The company sells their gear on the internet as well as Australia, the addresses have to be international. This is what the developer says is the cluster of all address type data that will be used as reference, in order for the data quality in the single Address table to be maintained. The CountryCode and StateCode are ISO 3166-1 and 3166-2, and the CountyCode is ANSI/FIPS, or something meaningful outside America (I already have that loaded elsewhere).
>
> Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.
>
> 1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice. I expect minimal discussion, but don't let me stop you.

Yes, it is in 5NF. But it is not dependency preserving.

> 2. The developer is excited because he has read the C J Date and R Fagin paper *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys are "simple". He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.

His conclusion is correct, but he is basing it on wrong assumptions. Not all his candidate keys are simple.

> 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.

To me it is. But this might also have been the case if it was not in 5NF. Normalization theory does not tell you in which normal form you should be, just whether you are or not and what some of the consequences of that might be.

-- Jan Hidders

Derek Asirvadem

unread,
Feb 4, 2015, 10:15:51 PM2/4/15
to
> On Wednesday, 4 February 2015 21:35:12 UTC+11, Jan Hidders wrote:

Thank you for breaking the ice.

> Op dinsdag 3 februari 2015 03:11:05 UTC+1 schreef Derek Asirvadem:
> >
> > Ok. Here is the latest extension that a developer proposed to me, to be added to the RDb.
> >
> > http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20A.pdf
> >
> > We need a set of tables to rationalise and consolidate all Addresses. The company sells their gear on the internet as well as Australia, the addresses have to be international. This is what the developer says is the cluster of all address type data that will be used as reference, in order for the data quality in the single Address table to be maintained. The CountryCode and StateCode are ISO 3166-1 and 3166-2, and the CountyCode is ANSI/FIPS, or something meaningful outside America (I already have that loaded elsewhere).
> >
> > Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.
> >
> > 1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice. I expect minimal discussion, but don't let me stop you.
>
> Yes, it is in 5NF. But it is not dependency preserving.

I will take each of them separately

> Yes, it is in 5NF.

Ok. I think there is no disagreement that, if it "satisfies" 5NF, then it means it "satisfies" 4NF, 3NF, 2NF, 1NF. Yes ?

That is what the developer says, so you are confirming his declaration.

I don't know what to say.

First, from my position of
- Normalising data for 39 years, as a science
- following Normalisation after it was defined as NFs (Codd 1970, 1971), within that science, because those early declarations proved science and articulated it
- after seeing the putrid garbage that various blind people proposed, and dismissing them, as an attack on Codd, on science
- which act reinforced the science, the principle, and categorised the blind who populate certain parts of our field as gross incompetents
- I am left with The Three Normal Forms (not counting HNF and RNF, because you guys have not heard about that [which btw suggests that the blind are deaf as well] )
I declare:
____the proposed data model [A] fails 3NF____
and in general:
____the proposed data model [A] is not Normalised____

Further, since he intends this cluster to be implemented in a Relational Database:
____the proposed data model [A] fails Relational____

-- Fail --
The model fails for the following reasons (many instances of said reasons). The ordering of the issues is mine, in that if it isn't Relational, it is not worth bothering about the specifics of a Normalisation error. It fails *Relational* mandates on two counts, it is non-relational.

1 Definition for keys, from the RM: "a Key is made up from the data"
__ A surrogate (RecordId) is not made up from the data
__ There are no keys on the data

2 The RM demands unique rows (data)
__ A surrogate does not provide row (data) uniqueness

3 Re Normalisation, (if we consider that outside the Relational Model), it breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).

This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.

None of you theoretician identified any of that.

> But it is not dependency preserving.

Very good insight. Trusting that you do not have private definitions for "depency" and "preserving". Give that man a purple cloak and a point hat!

But from where I sit, there you go again with self-contradiction. In the physical universe, where things are a bit more integrated than in the outer reaches of the galaxy, it is easy for me to Normalise, 3NF is part of, inseparable from Normalisation.

I can't see how any model can "satisfy" any NF, 5NF in this case, and *NOT* have the property of preserving dependencies. Hence the self-contradiction, that is of course unacceptable. Whatever you guys are drinking, it is causing you to:
- approve unnormalised data as normalised
- permit the absence of essential properties
- in the data that you say is "normalised"

Take the purple pointy hat back! Ok, keep the robe.

> > 2. The developer is excited because he has read the C J Date and R Fagin paper *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys are "simple". He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.
>
> His conclusion is correct, but he is basing it on wrong assumptions. Not all his candidate keys are simple.

That is very interesting. There are no "candidate keys" in the model. Big tick if you intuitively determined some, but they do not count. The only keys to deal with are those defined in his model. And you said you can read IDEF1X models, so I trust there is not going to be an argument there. All the keys he has identified are simple (unless you have changed the definition of "simple"). So he may well be right in that declaration.

But note that that paper is one of those that I consider an attack on science. Date are Fagin are eating pig poop, straight from the backside of a sow. Therefore anyone who relies on it, categorises himself as a monkey, setting you up, to pick your pocket. Gypsies, tramps, thieves. It is simply not possible to "guarantee" 5NF when he has broken 3NF.

Basically this paper elevates Record Filing systems, much like their other papers. ETNF is an assault on science as well as causing blindness.

-- Aside --

Let's stop this difference in terminology. You theoreticians cannot be serving an industry if you have different meanings to established definitions, you are serving something else. Now, since 1985, and as a standard since 1993, we do not have "candidate keys" in IDEF1X data models. In some early stage of modelling, when keys are being evaluated (but not decided), sure, they are Candidates. Once the election takes place, and one of those Candidates are chosen as Primary. After that, the candidates are no longer candidates, because the election is over, they are known as Alternate Keys.

You said you could /read/ IDEF1X. Alternate Keys are identified with [AK].

-- End Aside --

> > 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.
>
> To me it is. But this might also have been the case if it was not in 5NF. Normalization theory does not tell you in which normal form you should be, just whether you are or not and what some of the consequences of that might be.

I don't see how you would accept a model that fails your (not science's) 5NF, as normalised, but I won't argue, I will let that one pass.

To me it is completely unacceptable because it fails Normalisation and it does not comply with (fails) the Relational Model.

The real value in this thread is, how the hell you guys, allegedly theoreticians in the field of Relational Databases, pass off something as "normalised", when it isn't; how you pass off something as "relational" when it isn't. It stands as evidence that you guys are grossly ignorant of both Normalisation and the Relational Model.

The bottom line is, you pass off the unnormalised garbage in Record Filing Systems (anti-relation), as "normalised" and "relational". Through gross ignorance of the science that is established in our field. You justify and validate that 95% of the market that implements RFS, thinking it to be "relational". Due to the books that your mentors have written and marketed, which the implementers read as well.

Of course, I am not singling you out, Jan, you are the only one with enough courage to come out and exercise your theories with someone in the field. The others are too scared to leave the security of their isolation from the field that they allege to be serving, and to maintain their private definitions, and their convulsions (convoluted "logic") that justify them.

----

Ok, back to business. I informed the developer as to what was wrong, what was unacceptable with his model, and sent him off. Gratefully, it took far less time to communicate that to him, than it does to communicate that to you guys. He came back today, with his corrected version.

He still declares 5NF for sure, and Relational to the best of his understanding (which is based on books by the same authors that you read).

http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf

My corrections to his /previous/ model [A] are on page 2.

Please answer the questions in any order:

1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice.

2. The developer refers to *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys /that apply/ are "simple", and he has added keys to correct his previous errors. He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.

3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.

Cheers
Derek

Jan Hidders

unread,
Feb 5, 2015, 3:55:26 AM2/5/15
to
Op donderdag 5 februari 2015 04:15:51 UTC+1 schreef Derek Asirvadem:
> > On Wednesday, 4 February 2015 21:35:12 UTC+11, Jan Hidders wrote:
>
> Thank you for breaking the ice.
>
> > Op dinsdag 3 februari 2015 03:11:05 UTC+1 schreef Derek Asirvadem:
> > >
> > > Ok. Here is the latest extension that a developer proposed to me, to be added to the RDb.
> > >
> > > http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20A.pdf
> > >
> > > We need a set of tables to rationalise and consolidate all Addresses. The company sells their gear on the internet as well as Australia, the addresses have to be international. This is what the developer says is the cluster of all address type data that will be used as reference, in order for the data quality in the single Address table to be maintained. The CountryCode and StateCode are ISO 3166-1 and 3166-2, and the CountyCode is ANSI/FIPS, or something meaningful outside America (I already have that loaded elsewhere).
> > >
> > > Answer the questions in sequence, please. No point in going to [2] if you reject it at [1], etc.
> > >
> > > 1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice. I expect minimal discussion, but don't let me stop you.
> >
> > Yes, it is in 5NF. But it is not dependency preserving.
>
> I will take each of them separately
>
> > Yes, it is in 5NF.
>
> Ok. I think there is no disagreement that, if it "satisfies" 5NF, then it means it "satisfies" 4NF, 3NF, 2NF, 1NF. Yes ?

As these normal forms are usually defined in normalization theory, yes, that is true by definition.

> That is what the developer says, so you are confirming his declaration.
>
> I don't know what to say.

Really? This is you not knowing what to say? :-)

> First, from my position of
> - Normalising data for 39 years, as a science

.. engineering

> - following Normalisation after it was defined as NFs (Codd 1970, 1971), within that science, because those early declarations proved science and articulated it

The math behind it is science, the rest engineering. Does not make it less valuable, but it is important to make the distinction.

> - after seeing the putrid garbage that various blind people proposed, and dismissing them, as an attack on Codd, on science
> - which act reinforced the science, the principle, and categorised the blind who populate certain parts of our field as gross incompetents
> - I am left with The Three Normal Forms (not counting HNF and RNF, because you guys have not heard about that [which btw suggests that the blind are deaf as well] )
> I declare:
> ____the proposed data model [A] fails 3NF____
> and in general:
> ____the proposed data model [A] is not Normalised____

For the private definitions of 3NF and Normalised that you seem to use and have not yet made explicit, this might al very well be true. Hard to say.

> The model fails for the following reasons (many instances of said reasons). The ordering of the issues is mine, in that if it isn't Relational, it is not worth bothering about the specifics of a Normalisation error. It fails *Relational* mandates on two counts, it is non-relational.
>
> 1 Definition for keys, from the RM: "a Key is made up from the data"
> __ A surrogate (RecordId) is not made up from the data
> __ There are no keys on the data
>
> 2 The RM demands unique rows (data)
> __ A surrogate does not provide row (data) uniqueness
>
> 3 Re Normalisation, (if we consider that outside the Relational Model), it breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).
>
> This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.
>
> None of you theoretician identified any of that.

Because that is not what you asked. You asked if it was in 5NF, not if it was a well-designed relational schema. Being in 5NF is neither a necessary nor a sufficient condition for being well-designed, not does normalization theory claim that.

> > But it is not dependency preserving.
>
> Very good insight. Trusting that you do not have private definitions for "depency" and "preserving". Give that man a purple cloak and a point hat!

You don't know what "dependency preserving" means in the context of database normalization?

> I can't see how any model can "satisfy" any NF, 5NF in this case, and *NOT* have the property of preserving dependencies.

And you don't know in which NFs dependency preservation is achievable? This is text book stuff, Derek.

> > > 2. The developer is excited because he has read the C J Date and R Fagin paper *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys are "simple". He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.
> >
> > His conclusion is correct, but he is basing it on wrong assumptions. Not all his candidate keys are simple.
>
> That is very interesting. There are no "candidate keys" in the model.

Nope. That is a big omission and already sufficient to consider it badly designed.

> Big tick if you intuitively determined some, but they do not count. The only keys to deal with are those defined in his model.

Not if you're serious about data integrity and normalisation, and wondering if this is a well-designed schema.

> > > 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.
> >
> > To me it is. But this might also have been the case if it was not in 5NF. Normalization theory does not tell you in which normal form you should be, just whether you are or not and what some of the consequences of that might be.
>
> I don't see how you would accept a model that fails your (not science's) 5NF, as normalised, but I won't argue, I will let that one pass.

Please don't. It's at the core of the misunderstanding.

> ----
>
> Ok, back to business. I informed the developer as to what was wrong, what was unacceptable with his model, and sent him off. Gratefully, it took far less time to communicate that to him, than it does to communicate that to you guys. He came back today, with his corrected version.
>
> He still declares 5NF for sure, and Relational to the best of his understanding (which is based on books by the same authors that you read).
>
> http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf
>
> My corrections to his /previous/ model [A] are on page 2.
>
> Please answer the questions in any order:
>
> 1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice.
>
> 2. The developer refers to *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys /that apply/ are "simple", and he has added keys to correct his previous errors. He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.
>
> 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.

He is still in 5NF. He still has surrogate identifiers, but I personally do not consider that a big problem as long as he has correctly represented all candidate keys, which at first sight I don't think he has done fully. The argumentation with which he tries to prove that he is in 5NF is still wrong. You have to consider all candidate keys, period. It's not clear why he would want to use that argument anyway, because there seem to be no MVDs and JDs other the those implied by the FDs, so if he can verify that the schema is in BCNF then he can infer that it is also in 5NF. Finally, it is also still not dependency preserving, which might be a reason to denormalise to 3NF. Whether that is a good idea or not depends on circumstances that were not specified.

-- Jan Hidders

Erwin

unread,
Feb 5, 2015, 4:40:59 AM2/5/15
to
Op donderdag 5 februari 2015 09:55:26 UTC+1 schreef Jan Hidders:
> Whether that is a good idea or not depends on circumstances that were not specified.

:-)

Why does that remind me of something ?

Derek Asirvadem

unread,
Feb 5, 2015, 6:24:55 AM2/5/15
to
Jan

> On Thursday, 5 February 2015 19:55:26 UTC+11, Jan Hidders wrote:
> Op donderdag 5 februari 2015 04:15:51 UTC+1 schreef Derek Asirvadem:
> > > On Wednesday, 4 February 2015 21:35:12 UTC+11, Jan Hidders wrote:

Thanks for your response.

> > > Yes, it [A] is in 5NF.
> >
> > Ok. I think there is no disagreement that, if it "satisfies" 5NF, then it means it "satisfies" 4NF, 3NF, 2NF, 1NF. Yes ?
>
> As these normal forms are usually defined in normalization theory, yes, that is true by definition.

Good.

I think you know full well, that one os the issues that will get exposed in this thread, is the delta between two things. So let me enumerate them. Let me start by saying this delta should not exist, it doesn't exist in other industries, it exists only in the software industry (to a lesser degeree) and this to an unacceptable degree. Theoreticians not being able to discuss the same science with the practitioners is an absurd situation.

So this:
> First, from my position of
> - Normalising data for 39 years, as a science
> - following Normalisation after it was defined as NFs (Codd 1970, 1971), within that science, because those early declarations proved science and articulated it
> - after seeing the putrid garbage that various blind people proposed, and dismissing them, as an attack on Codd, on science
> - which act reinforced the science, the principle, and categorised the blind who populate certain parts of our field as gross incompetents
> - I am left with The Three Normal Forms (not counting HNF and RNF, because you guys have not heard about that )

is expanded to (ie. without excluding the above):

I Normalisation
- Normalisation as a science, a principle
- Relational Model, to the fullness where Normalisation is concerned
- which includes 1NF, 2NF, 3NF,
___ as originally defined,
___ (Ie. the original 3NF by taken to its full extent by someone who is not trying to subvert it, includes what you call BCNF, 4NF, 5NF)
___ (I view ETNF, NSNF, 6NF, SCGHNF, etc, as pig poop)
___ (I view NSNF as a gross error, that we have it in the RM)
- and to the fullness of the RM, which was the context of those definitions
- which therefore includes HNM and RNF
___ out of scope for theoreticians in this space

I.a But the most important thing is the result. Many implementers and I use this, with ease, and it prevents various errors, and produces good Relational tables. Eg model [A] fails, and specifics are given.
I.b Specifically, in addition to satisfying itself, this Normalisation "satisfies" any NF that you and your colleagues wrote, and will ever write in the future.
I.c It is not "private" it is defined everywhere, the RM, etc.

Then, while not denying the relevance of theory, science, and mathematics, to any field, but sadly noting the absence of producing anything of value in the implementations in this field ... you have ...

ii "Normalisation Theory"
- non-science
- 1NF and 2NF
- an assault on 1NF currently in progress
- a Fragment of 3NF, unfortunately you call it "3NF", which is fraud, since 3NF existed for decades before you came up with the fragment, and your fragment produces only that fraction of what 3NF produces
- BCNF, "4NF", "5NF", "ETNF", "NSNF", "6NF", "SCGHNF", etc
- all of which rely on "private definitions" which are illegal in most science
- Please feel free to identify any fragments that I may have missed)

ii.a But the most important thing is the result. Record Filing Systems, which are non-relational or anti-relational, (such as [A]) are passed, as acceptable, as they "satisfy xxNF by definition".

So whatever it is you are anointing with your magic oil, your Normalisation Theory, (a) it stinks, (b) it is not Relational, and (c) it should not be accepted, or validated, or elevated in any way.

> .. engineering

Science

> > - following Normalisation after it was defined as NFs (Codd 1970, 1971), within that science, because those early declarations proved science and articulated it
>
> The math behind it is science, the rest engineering. Does not make it less valuable, but it is important to make the distinction.

Fine. For you. As long as you do not fragment it.

> > I declare:
> > ____the proposed data model [A] fails 3NF____
> > and in general:
> > ____the proposed data model [A] is not Normalised____
>
> For the private definitions of 3NF and Normalised that you seem to use and have not yet made explicit, this might al very well be true. Hard to say.

???
I do not have private Definitions.

???
3NF: I have always, and severally stated, Codd's definition 1970 and 1971. I quoted it the other day in the Theoretician Crippled thread, and you seem to have accepted it.

???
Normalisation: read up on it. Sure, I have extended or enhanced applications. Sure, it is deeper than soem practitioners understand it. And sure it is a world apart from you guys.

But that does not matter, I am not selling that. I did not ask for validation (Don't worry about "hard to say") and I am not obliged to justify it or prove it.

I am only dealing with the context of this thread. That pretty much scopes it to the delta wrt 3NF (your "5NF") which causes a massive difference in the pass/fail status of models. Let's try an focus on why the model fails for me and passes for you. Why you accept Record Filing Systems (all surrogates) with no Relational integrity, power, speed, and I don't.

What the hell is the purpose of postulating over normalisation, devising a few fragmentary nuffs, that permit RFS in an RDB.

I am not introducing anything, I am not adding to 3NF. I recognise you have a three fragments "3NF", "4NF", "5NF", that equates to the original 3NF.

> > The model fails for the following reasons (many instances of said reasons). The ordering of the issues is mine, in that if it isn't Relational, it is not worth bothering about the specifics of a Normalisation error. It fails *Relational* mandates on two counts, it is non-relational.
> >
> > 1 Definition for keys, from the RM: "a Key is made up from the data"
> > __ A surrogate (RecordId) is not made up from the data
> > __ There are no keys on the data
> >
> > 2 The RM demands unique rows (data)
> > __ A surrogate does not provide row (data) uniqueness
> >
> > 3 Re Normalisation, (if we consider that outside the Relational Model), it breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).
> >
> > This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.
> >
> > None of you theoretician identified any of that.
>
> Because that is not what you asked ... not if it was a well-designed relational schema ...

Er. excuse me. Read the three questions. I asked for that, explicitly in [3].

> You asked if it was in 5NF, [] Being in 5NF is neither a necessary nor a sufficient condition for being well-designed, not does normalization theory claim that.

Good.

So throw the "5NF" definition out, then.

> > > But it is not dependency preserving.
> >
> > Very good insight. Trusting that you do not have private definitions for "depency" and "preserving". Give that man a purple cloak and a point hat!
>
> You don't know what "dependency preserving" means in the context of database normalization?

No. You misunderstood my statement. Read again. I am merely confirming that YOU don't have yet another private definition for the established terms.

> > I can't see how any model can "satisfy" any NF, 5NF in this case, and *NOT* have the property of preserving dependencies.
>
> And you don't know in which NFs dependency preservation is achievable? This is text book stuff, Derek.

I do know, silly boy. I am just not playing your "dependency preserving" game. Because yours fails miserably anyway. Because mine (Codd's) preserves dependencies beyond your "dependency preserving" definition. Way, way beyond. But we will limit that to the scope of the original 3NF.

Textbooks. I went to college long before textbooks on this subject were written. I was teaching Normalisation for DBMS vendors, using these principles, when Codd came onto the stage, and right through his main acts (the 80's, not the 70's). We knew the three NFs from our scientists, and from practice. And those were the days of real textbooks.

The textbooks these days are manuals in bestiality, scriptures for devil worship. The results of which are theoreticians as I have to deal with here. In the first few decades, Date mostly, tried to diminish the three NFs. now it is free-floating cancer and everyone is doing it.

Point being, I do not accept "Normalisation Theory", that bag of fragments, that is nowhere near Normalisation. So there is no point in telling me that some fragment, and its smell, is "textbook stuff".

But don't t[let me stop you from expressing why a certain aspect of the given model has this small or that colur, and therefore passes or fails.

> > > > 2. The developer is excited because he has read the C J Date and R Fagin paper *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys are "simple". He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.
> > >
> > > His conclusion is correct, but he is basing it on wrong assumptions. Not all his candidate keys are simple.
> >
> > That is very interesting. There are no "candidate keys" in the model.
>
> Nope. That is a big omission and already sufficient to consider it badly designed.

(I take your comment re "candidate keys" to mean one that you noticed, that were not in the model)

Good. So why did you pass it [A] then ? I was pointing out your contradiction, declaring it "candidate keys incomplete" or not explicit, and passing it anyway.

> > Big tick if you intuitively determined some, but they do not count. The only keys to deal with are those defined in his model.
>
> Not if you're serious about data integrity and normalisation, and wondering if this is a well-designed schema.

Misunderstanding. They count very much. in the statement above, the candidate keys that you have perceived, that are not in the model, do not count, because he did not identify them in the model.

You might be rushing during your lunch break. Take your time.

> > > > 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.
> > >
> > > To me it is. But this might also have been the case if it was not in 5NF. Normalization theory does not tell you in which normal form you should be, just whether you are or not and what some of the consequences of that might be.
> >
> > I don't see how you would accept a model that fails your (not science's) 5NF, as normalised, but I won't argue, I will let that one pass.
>
> Please don't. It's at the core of the misunderstanding.

Fine. The argument remains, re why you pass some model that "satisfies" your fragments, why you find it acceptable [3], and I fail that model because it fails my integrated set.

> > Ok, back to business. I informed the developer as to what was wrong, ... He came back today, with his corrected version.
> >
> > He still declares 5NF for sure, and Relational to the best of his understanding (which is based on books by the same authors that you read).
> >
> > http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf
> >
> > My corrections to his /previous/ model [A] are on page 2.
> >
> > Please answer the questions in any order:
> >
> > 1. The developer declares that the proposed extension satisfies 5NF. Is that correct ? If not, please state why, which NF is breaks, any errors that it may have, etc. A few words will suffice.
> >
> > 2. The developer refers to *Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases*, he has complied with the requirements, and he is sure of his 5NF declaration. He asserts that all the keys /that apply/ are "simple", and he has added keys to correct his previous errors. He expects quick approval. If not, please state why, any errors that it may have, etc. Again, minimal discussion.
> >
> > 3. Is this acceptable to you, as a human being, as a scientific logical person, as a set of relational tables ? If not, why not, please name the problems, if any.
>
> He is still in 5NF. He still has surrogate identifiers, but I personally do not consider that a big problem as long as he has correctly represented all candidate keys, which at first sight I don't think he has done fully. The argumentation with which he tries to prove that he is in 5NF is still wrong. You have to consider all candidate keys, period. It's not clear why he would want to use that argument anyway, because there seem to be no MVDs and JDs other the those implied by the FDs, so if he can verify that the schema is in BCNF then he can infer that it is also in 5NF. Finally, it is also still not dependency preserving, which might be a reason to denormalise to 3NF. Whether that is a good idea or not depends on circumstances that were not specified.

a. Ok, so he is in your "5NF", but he fails the original unfragmented 3NF.

b. The candidate keys are incomplete. Unacceptable to me.

c. Surrogates are definitely not "identifiers", can you please explain. The term Identifiers in carved in stone, same as 3NF(1971), since 1976 It is part of the IDEF1X standard (which you have said you know). By definition, surrogates are NOT Identifiers.

Please explain why you think:
c.1 surrogates are keys (worthy of your detailed evaluation)
c.1 surrogates are "identifiers" (RM, and IDEF1X)

You may have missed my Fail[3] point above.

d. All-surrogates are acceptable to you, as long as the candidate keys are handled correctly, correct ?

e. The argumentation (and the basis) for his "5NF" declaration is wrong because [b].

f. >> It's not clear why he would want to use that argument anyway, because there seem to be no MVDs and JDs other the those implied by the FDs, so if he can verify that the schema is in BCNF then he can infer that it is also in 5NF.

That is precisely the kind of insane conversation that is produced by your "Normalisation Theory", which does not occur in the implementation field, and which, if someone started spouting, we would ask him to take the rest of the day off, have a long hot bath, and get a good nights sleep. It is not that I don't understand it, it is that that is such a sloooooooow way of dealing with small technical issues. We can figure that out at "lower" levels of abstraction than you are using.

But I won't stop you from doing that, certainly. You are free to express your point in whatever way you like. Why you imagine MVDs and JDs where there are none might be worth understanding, but let's not get distracted. Oh, wait. You are treating the surrogates as "keys".

g. Nor dependency preserving. I would say, that declaration is tending towards correct, but it can't be because [b], and the "keys" are not Keys, they are surrogates.

h. >> which might be a reason to denormalise to 3NF. Whether that is a good idea or not depends on circumstances that were not specified.

I don't accept den-ormalisation. #9 years, over 20 large implementations, over 150 consulting assignments, and I have always Normalised up, to fix any and all problems related to Normalisation (and speed). I have never seen any data store that is honestly de-nomrlaised. Unnormalised yes. Often their failure to Normalise is declared as "de-normalised".

Second, it should never be dependent on circumstances (which I take to mean, how the data is used, process). Data should be analysed and Normalised, as data, and only as data. The OO/ORM crowd mix data with process, and yes, that hinders, if not cripples, the Normalisation, and thus the resultant database.

In any case, let me declare, this set of tables is to be normalised within the cluster given, there are no other considerations.

So I take it you fail his model [B].

Cheers
Derek

Nicola

unread,
Feb 5, 2015, 7:27:07 AM2/5/15
to
Hi Derek,
I'd just like to make a remark on your statement in the document you
have linked:

"[The schema] breaks Third Normal Form, because in each case, there is
no Key for the data to be Functionally Dependent upon (it is dependent,
yes, but on a pork sausage, outside the data, not on a Key)."

I hope we all agree that the existence of a key does not depend on a
developer's choice, but it intrinsically depends on the meaning of the
data. In the Relational Model - the one you like, and I do, too - each
and every relational schema always has at least one key, and possibly
more than one. I hope we all agree on that, too. To find the keys, you
must determine which functional dependencies hold. That, again, depends
exclusively on the semantics associated to the attributes.

Now, let's take a look, for instance, at the Country schema. I think we
all share more or less the same idea about what a country code is
(you've pointed to the standard in a previous post), or what the name of
a country is. Once we understand the meaning of those attribute names,
it is not difficult (in this case) to make the functional dependencies
explicit and to determine, for example, that CountryCode is a key (not
necessarily the only key in this example, but it does not matter for my
argument). Once you have found all the keys, you may tell whether your
schema is in 3NF or not (in this case, it is - provided that I interpret
correctly, and I suppose I do, the meaning of the remaining attributes).

Now, the fact that you add a surrogate CountryId does not change
anything at all: unless we have different concepts of what a (surrogate)
key is, we should all conclude that CountryId functionally depends on
CountryCode, and vice versa. Even after adding CountryId to the schema,
CountryCode remains a key, because that is what our understanding of the
data dictates. I hope that this is crystal clear.

So, your statement above does not make any sense, if only because all
schemas have at least one key, and all the attributes of a schema
functionally depend on each key, by definition (Codd's definition).

I think I understand, however, what your point is, and I totally embrace
it (as I bet others would do): when your client (unnecessarily) defines
a schema with a surrogate key, he "forgets" about the other keys and
data integrity is thrown out of the window. Note, however, that a
statement like "there are no keys on the data" is still inappropriate.
What you should rather complain about is that that logical schema design
fails to capture the real constraints on the data. Technically, what
your client has designed is a schema with only one functional dependency
of all the attributes from the surrogate key. Given that, in a formal
sense the schema is in 5NF - wrt *that* set of dependencies (which is
*not* the set of "real" dependencies, that is the constraints that you
have in mind when you think about countries, country names, etc...).

I also agree that, by designing that way, the database becomes akin to a
"record filing system". I wouldn't go as far as saying that it is "not
relational", but certainly it is using the relational model in a bad way.

So, Jan is correct to point out that the normal form has nothing to do
with how good or bad a logical design is. You can get a highly
normalized schema from a completely wrong set of constraints. You should
complain about the latter in the first place.

Finally, talking about constraints and the meaning of the data, I have a
(genuine) question: does it really happen that two addresses differing
only in their Unit have different post codes (in my country, that is
never the case, I think)? Aren't different units assigned when the same
building has more than one entrance?

Nicola

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Nicola

unread,
Feb 5, 2015, 9:05:08 AM2/5/15
to
In article <a5e28036-efef-4cc8...@googlegroups.com>,
Jan Hidders <hid...@gmail.com> wrote:

> > 1. The developer declares that the proposed extension satisfies 5NF. Is
> > that correct ? If not, please state why, which NF is breaks, any errors
> > that it may have, etc. A few words will suffice. I expect minimal
> > discussion, but don't let me stop you.
>
> Yes, it is in 5NF. But it is not dependency preserving.

Which dependencies are not preserved? With respect to what schema?

Erwin

unread,
Feb 5, 2015, 9:47:10 AM2/5/15
to
Op donderdag 5 februari 2015 15:05:08 UTC+1 schreef Nicola:
> In article <>,
> Jan Hidders <> wrote:
>
> > > 1. The developer declares that the proposed extension satisfies 5NF. Is
> > > that correct ? If not, please state why, which NF is breaks, any errors
> > > that it may have, etc. A few words will suffice. I expect minimal
> > > discussion, but don't let me stop you.
> >
> > Yes, it is in 5NF. But it is not dependency preserving.
>
> Which dependencies are not preserved? With respect to what schema?
>
> Nicola
>

If I tell him that, he tells me to learn to read.

Nicola

unread,
Feb 5, 2015, 11:44:06 AM2/5/15
to
In article <a24a286a-4212-40b3...@googlegroups.com>,
Jan?

Jan Hidders

unread,
Feb 5, 2015, 12:07:52 PM2/5/15
to
Op donderdag 5 februari 2015 17:44:06 UTC+1 schreef Nicola:
> In article <a24a286a-4212-40b3...@googlegroups.com>,
> Erwin wrote:
>
> > Op donderdag 5 februari 2015 15:05:08 UTC+1 schreef Nicola:
> > > In article <>,
> > > Jan Hidders wrote:
> > >
> > > > > 1. The developer declares that the proposed extension satisfies 5NF.
> > > > > Is
> > > > > that correct ? If not, please state why, which NF is breaks, any
> > > > > errors
> > > > > that it may have, etc. A few words will suffice. I expect minimal
> > > > > discussion, but don't let me stop you.
> > > >
> > > > Yes, it is in 5NF. But it is not dependency preserving.
> > >
> > > Which dependencies are not preserved? With respect to what schema?
> > >
> > > Nicola
> > >
> >
> > If I tell him that, he tells me to learn to read.
>
> Jan?

I'm afraid I have to back-pedal here. The dependency I had in mind is actually preserved after all.

What I was thinking of is the following. We were told that the CountryCode and StateCode are ISO 3166-1 and 3166-2. Now, ISO 3166-2 actually contains the country code from ISO 3166-1. So if you would join everything into a single table (Universal Relation, and all that) there would be an FD StateCode -> CountryCode. In the current schema that FD no longer lives in one of the relations, but I had missed that it actually follows from the local FDs / CKs.

Showing this is left to the reader as an exercise. :-)

-- Jan Hidders

Nicola

unread,
Feb 5, 2015, 1:04:16 PM2/5/15
to
In article <feca8067-162d-4095...@googlegroups.com>,
Ok, I thought I was missing something.

Erwin

unread,
Feb 5, 2015, 2:13:58 PM2/5/15
to
Op donderdag 5 februari 2015 17:44:06 UTC+1 schreef Nicola:
> In article <>,
> Erwin <> wrote:
>
> > Op donderdag 5 februari 2015 15:05:08 UTC+1 schreef Nicola:
> > > In article <>,
> > > Jan Hidders <> wrote:
> > >
> > > > > 1. The developer declares that the proposed extension satisfies 5NF.
> > > > > Is
> > > > > that correct ? If not, please state why, which NF is breaks, any
> > > > > errors
> > > > > that it may have, etc. A few words will suffice. I expect minimal
> > > > > discussion, but don't let me stop you.
> > > >
> > > > Yes, it is in 5NF. But it is not dependency preserving.
> > >
> > > Which dependencies are not preserved? With respect to what schema?
> > >
> > > Nicola
> > >
> >
> > If I tell him that, he tells me to learn to read.
>
> Jan?
>
> Nicola
>

Oops. "him" was Dereck, of course.

com...@hotmail.com

unread,
Feb 5, 2015, 4:58:13 PM2/5/15
to
On Tuesday, February 3, 2015 at 5:59:04 AM UTC-8, com...@hotmail.com wrote:
> I just find it amusing that the particular application plus no interpretation of tables tells us that there is a universal relation CK that this diagram fails to express.

Here's what I failed to express earlier:

An envelope gets to a particular address. Ids do not appear on an envelope. So joining an appropriate set of tables and projecting out id columns gives rows 1:1 with Address AddressIds. So there is a join CK among the non-id columns. But this set of tables gives no such CK. So they are not appropriate.

(The FKs correspond to FDs on the CK in the join. So there is a CK among the non-id columns of Street and Address.)

PS:

Of course, that claim does not arise from a particular (normalization or design) process. So it proves nothing about the adequacy of any allegedly adequate process. Nor would any claims about the design without giving a process and sound reasoning.

To show that an alleged process is adequate one need only give it and either a counterexample or sound justification. Of course, one would only soundly believe it inadequate if they had already done so. (Granted, after demonstrating enough examples of inadquate processes from a group one might reasonably claim that the group didn't have an adequate process.)

Of course, that's not showing that an allegedly adequate process actually is. That requires giving it and a sound justification.

philip

com...@hotmail.com

unread,
Feb 5, 2015, 5:01:10 PM2/5/15
to
On Thursday, February 5, 2015 at 1:58:13 PM UTC-8, com...@hotmail.com wrote:

> To show that an alleged process is adequate one need only give it and either a counterexample or sound justification.

Typo. Inadequate.

philip

Jan Hidders

unread,
Feb 6, 2015, 4:54:20 AM2/6/15
to
Op donderdag 5 februari 2015 12:24:55 UTC+1 schreef Derek Asirvadem:
>
> > > I declare:
> > > ____the proposed data model [A] fails 3NF____
> > > and in general:
> > > ____the proposed data model [A] is not Normalised____
> >
> > For the private definitions of 3NF and Normalised that you seem to use and have not yet made explicit, this might al very well be true. Hard to say.
>
> ???
> I do not have private Definitions.
>
> ???
> 3NF: I have always, and severally stated, Codd's definition 1970 and 1971. I quoted it the other day in the Theoretician Crippled thread, and you seem to have accepted it.

Let's focus a little on this. As you know I did not accept it as the full and exact description of Codd's definition of 3NF. Your claim is that it is different form the standard textbook definition. That can be easily established by comparing them. Would you mind quoting the full and precise definition of 3NF by Codd?

Btw. that reminds me: have you already found a concrete example of a statement that Gary Boetticher makes in the movie you referred to where he uses standard normalization terminology in a non-standard way? You accused him of doing that, and even mentioned it as a reason to call him a fraud. You seemed to be moving towards providing some evidence of that, and then stopped.

-- Jan Hidders

Derek Asirvadem

unread,
Feb 7, 2015, 12:04:24 AM2/7/15
to
Nicola

> On Thursday, 5 February 2015 23:27:07 UTC+11, Nicola wrote:

First, thank you for joining us.

Second, please forgive the delay in responding, I was busy.

I don't know you yet, so I need to apologise in advance if you find my interjections untoward. I am marking critical points to indicate agreement thus far, such that we establish an agreed set of facts.

> Hi Derek,
> I'd just like to make a remark on your statement in the document you
> have linked:
>
> "[The schema] breaks Third Normal Form, because in each case, there is
> no Key for the data to be Functionally Dependent upon (it is dependent,
> yes, but on a pork sausage, outside the data, not on a Key)."
>
> I hope we all agree that the existence of a key does not depend on a
> developer's choice, but it intrinsically depends on the meaning of the
> data. In the Relational Model - the one you like, and I do, too -

The one that this data model has to confirm to, yes, that the developer declares as conforming.

> each
> and every relational schema always has at least one key, and possibly
> more than one. I hope we all agree on that, too.

Yes.

> To find the keys, you
> must determine which functional dependencies hold.

That is a very slow method, but yes. The tables are dead simple, and the attributes should be well-known to anyone who has any experience at all.

> That, again, depends
> exclusively on the semantics associated to the attributes.
>
> Now, let's take a look, for instance, at the Country schema. I think we
> all share more or less the same idea about what a country code is
> (you've pointed to the standard in a previous post), or what the name of
> a country is. Once we understand the meaning of those attribute names,

Yes, the meaning of data is important.

> it is not difficult (in this case) to make the functional dependencies
> explicit and to determine, for example, that CountryCode is a key (not
> necessarily the only key in this example, but it does not matter for my
> argument).

Yes.

> Once you have found all the keys, you may tell whether your
> schema is in 3NF or not (in this case, it is - provided that I interpret
> correctly, and I suppose I do, the meaning of the remaining attributes).

No, I have stated that it FAILS 3NF.

(There is an interaction going on with Jan, where we have teased out that you people have a fragment of 3NF which you fraudulently call "3NF". Result being, as evidenced here, you accept a total failure, a non-relational Record Filing System, to be satisfactory.)

I think the pivotal difference is, Codd and I require a Key, before Functional Dependency can be worked with, and you people have non-FDs, and on non-keys. While that latter could possibly be relevant when dealing with a relational algebra problem when one is waiting for an act of God while sitting on the toilet, it is irrelevant when evaluating a data model that is intended for implementation.

> Now, the fact that you add a surrogate CountryId does not change
> anything at all: unless we have different concepts of what a (surrogate)
> key is, we should all conclude that CountryId functionally depends on
> CountryCode, and vice versa. Even after adding CountryId to the schema,
> CountryCode remains a key, because that is what our understanding of the
> data dictates. I hope that this is crystal clear.

It is mud.

Ok, the communication is crystal clear.

But the statements are totally false, incorrect, wrong. I can't read the rest of your post. Let us stop and deal with the numerous errors in your para.

Are you sure you know the Relational Model, by Codd, 1970, that you say you like ? (To be clear, I am excluding RM/T where he explored Date's surrogates, and later banned them. I am taking only the RM, which is widely known.) You are not using some bag of fragments that some freaked-out theoretician alleges is the "RM" ?

If you don't mind, if, and when, you achieve a position in which you reject non-Relational piles of crap that fail Relational and 3NF, rather than pass as "5NF, Relational", you may then be in a position to employ the teaching tone. Until then, your "shoulds" and "inappropriates" are more than a little presumptuous, and worse. My uncontaminated thinking results in rejecting his data model using six words; you use six hundred words to justify an acceptance of it. The corollary: if, and when I wish to learn how to use six hundred words to accept totally unacceptable models, I will seek your advice.

It is difficult to take each of your sentences and deal with them, it will consume a great amount of time, akin to explaining the crime to an axe murderer who insists they are following the Ten Commandments. So I won't take that route. I will identify the commandments you have broken, the logic we use in the physical universe to lock up axe murderers, and you can respond.

I trust you are aware that there are two models, Address A, and Address B. Both have been accepted by all of you, using various means to undermine and set aside the RM, and 3NF, and both have been rejected by me. And that the charge you make, re my statement "there are no keys on the data" applies to Address A.

1. What a Key is, is defined in the RM (that you allege you like, that this data model must comply with).
>>"
>> Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called a primary key.
>>"

(Codd uses the word Domain in several contexts, the context here is an attribute or column)

(Codd states rules for Alternate Keys, but I don't think we need to visit that point. But this is a good juncture to note the stupidity of the term "candidate keys", which does not exist in the RM, and is yet another trick that is used by you good people to circumvent the strictures of the RM, and to flagellate in the albumin without penetrating the cell wall of the ovum. The forty six years of impotence. Ie. we have the demanded PKs and AKs, and you have the non-relational CKs.)

We contract that, or paraphrase that, to "A key must be made up from the data", which is commonly used by Relational adherents, without losing its original meaning.

A surrogate is not made from the data. Typically, IDENTITY, AUTOINCREMENT, GID, etc, is completely unrelated to the data.

Therefore a surrogate is not a key.

2. The term "surrogate key" is totally false.

A Key has specific properties that a surrogate does not have.

Therefore, to use the term "surrogate key" is a misrepresentation, a fraud, because the user or developer will naturally expect some or all of the basic properties of a Key, and the surrogate has none. Zero.

3. The RM demands >>>row<<< uniqueness.

This is simple to implement (we are talking the logical level, since the data model is declared to be the logical model), by making the Key unique. At the physical level, an unique index is used.

A surrogate does not provide row uniqueness.

Using Address A, because it has surrogates only, no Keys:

3.a Placing the surrogate above the lines does not make it a Primary Key (it fails as per above). One who does so only fools himself.

3.b Using the SQL <PRIMARY KEY> clause does not make a surrogate a Primary Key (it fails as per above). One who does so only fools himself.

Likewise for placing an unique index on the surrogate.

3.c Taking the Country File (implemented in SQL for convenience, and having none of the properties of a Relational table), and let's say CountryId is an IDENTITY column, populated by the server, thus the first column here is CountryCode:
____INSERT Country VALUES ( "MM", Mickey Mouse", ... )
____INSERT Country VALUES ( "MM", Mickey Mouse", ... )
____INSERT Country VALUES ( "MM", Mickey Mouse", ... )

will succeed.

Therefore the surrogate does not provide row uniqueness.

Any and all row uniqueness that is implemented, is implemented via a Key, and only via a Key (of which there may be more than one, in which case, one is the PK, and the others are AKs).

In data model Address A, there are no keys on the data.

4. In data model Address B, he has corrected that one item, the above INSERTS will now fail. (But the file, the data model remains rejected for other reasons not related to your para.)

5. There is one valid use for surrogates, but that scenario is not present here, and thus we don't need to discuss it. I am making the statement, in order to identify that I am not black-or-white about surrogates, that they have an use. And that I have decades of experiences with such properly-used surrogates. As well over 120 assignemnts eliminating improperly-used surrogates.

----

To now address your para:
> Now, the fact that you add a surrogate CountryId does not change
> anything at all: unless we have different concepts of what a (surrogate)
> key is,

It changes many things, each too substantial to ignore.

> we should all conclude that CountryId functionally depends on
> CountryCode, and vice versa.

I reject the notion entirely.

You are free to use that fragmented understanding of FDs, and note that by that use, you accept models that are non-relational Record Filing systems, that have none of the Relational Integrity, Power, or Speed of the RM, that fail 3NF (the real one not the fragmented bit), as "satisfying 5NF".

You have taken a surrogate (I have proved it is not a Key), and you are equating it to a Key. By using some silliness, that they are "functionally dependent" (again by some strange fragment of "definition", not the real FD definition) on a non-key (hence: donkey, monkey).

> Even after adding CountryId to the schema,
> CountryCode remains a key, because that is what our understanding of the
> data dictates.

There are no keys in data model Address A. No Key on CountryCode.

That has been corrected in the data model [B]. He has one Key on Country, CountryCode.

So my statement that you reference:
> > "[The [Address A] schema] breaks Third Normal Form, because in each case, there is
> > no Key for the data to be Functionally Dependent upon (it is dependent,
> yes, but on a pork sausage, outside the data, not on a Key)."

stands.

It is actually [3], it lies in a continuous body of text that refers to its specific terms and reasoning (prior to that which you quote), including [1] and [2], I won't repeat. That prior text [1] refers to Codd's definition of Key, so there is no reason you should be thinking about a non-key in my [3].

If you have some definition of "key" other than Codd's then you are not Relational. As declared, the data model given has to be Relational. It fails Relational [1][2]. It also fails 3NF [3].

> > This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.

I determine that in six mins, six words. You (actually all of you) accept his model and argue with my six words, using six hundred words to present your "logic", but you miss the fact that it fails, so the six hundred words are totally irrelevant. The axe murderer using six hundred words to argue that he is keeping the Commandments, while denying the body with the axe still in it. And now you are arguing about the relevance of the six hundred words, without realising (denying ?) that the case has been heard, the axe murderer has been convicted, he is in jail.

----

A few more comments, without addressing each sentence, because I have dismissed the whole, as detailed above.

> So, your statement above does not make any sense, if only because all
> schemas have at least one key, and all the attributes of a schema
> functionally depend on each key, by definition (Codd's definition).

You are not using Codd's definition, you are using a false report of it from a porcine source.

I have used Codd's definition, above, and eliminated your points.

The evidence is, you don't know the RM, despite your references to it, strange since you say you like it.

> I think I understand, however, what your point is, and I totally embrace
> it (as I bet others would do): when your client (unnecessarily) defines
> a schema with a surrogate key, he "forgets" about the other keys and
> data integrity is thrown out of the window.

That is the result of you having 42 fragments, with no understanding or integration. If you used the two definitions (RM and 3NF, which contains FD), all of which are integrated, and easy to understand, if you did not have ID columns posed as "keys", and the flagellating that goes with it, you would likely not "forget" such things. And you would create valid models in a small fraction of the time.

> Note, however, that a
> statement like "there are no keys on the data" is still inappropriate.

As evidenced above, there are no keys on the data.
As evidenced, again, in your sentence above, you are ignorant of the RM.

> What you should rather complain about is that that logical schema design
> fails to capture the real constraints on the data.

Well, if you take it that my reasons (given , now detailed further) that I rejected the data model, are serious, and follow them to completion, the result would be, precisely, that the real relationships between the data will be exposed, and the constraints that are required therefore will also be exposed. In a direct manner. Instead of the backwards (starting from the back, trying to get to the front, as well as primitive, compared to Codd), long and winding journey that you people are taking.

> Technically, what
> your client has designed is a schema with only one functional dependency
> of all the attributes from the surrogate key. Given that, in a formal
> sense the schema is in 5NF - wrt *that* set of dependencies (which is
> *not* the set of "real" dependencies, that is the constraints that you
> have in mind when you think about countries, country names, etc...).
>
> I also agree that, by designing that way, the database becomes akin to a
> "record filing system". I wouldn't go as far as saying that it is "not
> relational",

I have given, both in the post that you quote, and above, the specific reasons why it is not Relational.

It is not Relational.

You do not know what Relational is.

> but certainly it is using the relational model in a bad way.

There is no "good" or "bad" way to use the RM. It is a set of rules, laws. There is only compliance.

There is a scientific way that we use to determine the precise extent of compliance, and thus how much a given model is (a) a pre-1970 Record Filing System at one end of the spectrum, and how much it (b) uses all Relational concepts, correctly, at the other end. But that is not relevant here. Here it fails Relational on basic issues, basic violations, full stop.

> So, Jan is correct

So far, in this thread, he is dead wrong.

I take it you mean the flagellating he is performing is familiar to you, and "correct" in terms of the performance.

> to point out that the normal form has nothing to do
> with how good or bad a logical design is.

Well, in the real world, which is integrated, it is one (not the only) measure, of exactly that, and we have just three (we have five but the last two are beyond your comprehension, out-of-scope for this thread). In the fractured universe, where everything is expressed in terms of isolated fragments (schizophrenia), sure, nothing has anything to do with anything else. And you have 42 NFs, which are tiny fragments of ours.

> You can get a highly
> normalized schema from a completely wrong set of constraints. You should
> complain about the latter in the first place.

I am complaining about both. But since it breaks simple essential rules, it is stupid for me to enumerate how the constraints on broken tables (which they reference) breaks further rules. It is more efficient to get the modeller to fix the essential breaches first, the tables and Keys, and then to deal with the constraints, etc, on tables that can be relied upon.

That emphasises, again that there is no point in evaluating your "fds", because Functionally Dependencies have to have a Key to be Functionally Dependent upon.

But you are free to approach the exercise backwards. And to write pages re "fds" that shoulda coulda woulda bin.

> Finally, talking about constraints and the meaning of the data, I have a
> (genuine) question: does it really happen that two addresses differing
> only in their Unit have different post codes (in my country, that is
> never the case, I think)? Aren't different units assigned when the same
> building has more than one entrance?

It is different in every country. The post code has different meaning in every country, and different granulation. In some countries it applies to an entire block (many apartment buildings); in others to a single entrance to a building, and there, it is a huge maintenance problem. That finer granularity is stupid, because they are trying to identify a specific geographic point, beyond the domain of the postal service, and we already have very good ways to do that (ie. GPS co-ordinates are not post codes). They should stick to their domain, of sorting mail for the letter carriers.

Cheers
Derek

Derek Asirvadem

unread,
Feb 7, 2015, 12:09:18 AM2/7/15
to
> On Friday, 6 February 2015 01:47:10 UTC+11, Erwin wrote:
>
> If I tell him that, he tells me to learn to read.

Liar. I stated:

> > I would think, once you have been able to _read_ the model; to identify the dependencies conveyed therein; and thus notice the predicates are explicit, your categorical response might change.

Ok, so you are now providing evidence of something else: that you can't read English.

Accepted. Explains a lot.

Derek Asirvadem

unread,
Feb 7, 2015, 12:12:27 AM2/7/15
to
I would be pleased to take up any arguments you (plural, the 1%) may have, as long as you reference the science that has been established in the industry, that is used by implementers, the 99%, that you allege that you serve.

Second, although I have entertained it in the past, I will no longer entertain the _justification_ of the long and winding methods that fail to reject data models that fail either 3NF or RM Compliance, because their result is failure. Dealing your "definitions" and ridiculously long and winding methods, that fail anyway, is a huge waste of time.

You are of course, free to flagellate amongst yourselves, before you come up with something relevant to submit.

Let's keep the focus on the task, does the given data model

1. Is it "5NF" ?

2. Is the reference to SCGHNFRDB valid ?

3. Is it Relational ?

Cheers
Derek

Derek Asirvadem

unread,
Feb 7, 2015, 3:09:53 AM2/7/15
to

Jan

> On Friday, 6 February 2015 04:07:52 UTC+11, Jan Hidders wrote:

> I'm afraid I have to back-pedal here. The dependency I had in mind is actually preserved after all.
>
> What I was thinking of is the following. We were told that the CountryCode and StateCode are ISO 3166-1 and 3166-2. Now, ISO 3166-2 actually contains the country code from ISO 3166-1. So if you would join everything into a single table (Universal Relation, and all that) there would be an FD StateCode -> CountryCode. In the current schema that FD no longer lives in one of the relations, but I had missed that it actually follows from the local FDs / CKs.

1. I have no idea why you are contemplating the UR, the exercise is to accept or reject the tables as given, for the criteria given. The exercise is not to contemplate all the possibilities of all the data or what non-fds could exist on what non-keys.

If you people are struggling to determine the keys, using that particular long and winding method, you are lost.

The data is purposely simple. Everyone should know what a State is; what a County Key should be.

And there is no request to determine the Keys.

2. Separately.
If you do what you are saying re 3166-1 and 3166-2, you have more holes in your head that I realised. The StateCode you are describing breaks 1NF. I can't believe you did that, or that you thought that would be acceptable in a Relational Database that is Normalised (our 3NF; your 42 NFs). Again, it goes to show that you guys can't keep track of your abstractions, you accept a sub-standard model and the basis of such, whilst being hilariously ignorant of the fact that it breaks RM Compliance and 3NF. And not, Jesus Almighty, you yourself break 1NF.

The state that the theoreticians in the space are in, is very sad.

I suppose you don't know how you are breaking 1NF, if you did, you wouldn't break it, right ? So I had better give another discourse.

a. Every attribute shall be Atomic.

b. Your notion of StateCode since it contains CountryCode as well as StateCode, is not Atomic.

c. You are taking the 3166-1 and 3166-2 spec literally. Govt Departments, ISO, are not that technical. That is meant for the post office and stamp-lickers. Each spec stands on its own. Technical people do not do that. They understand what the specs mean, they esteem Atomicity, and they implement a correct set of tables:

CountryCode = "US", "NL"

StateCode = "AL", "NY", "WY" ... "ZE", "FR"

The notion of a state code such as "USAL" or "US-AL" etc, breaks 1NF.

d. Keep in mind that the users and developers mean "Alaska" when they use "AL", they do not want to, and should not have to, use "US-AL". Ie. a StateCode is an Atomic item, it would be preposterous if we had to mess with SUBSTR(StateCode, 3, 2).

e. Likewise for the CountyCode. It is not CHAR(7) ie. containing CountryCode and StateCode. It is CHAR(3) containing CountyCode, only CountyCode, and nothing but CountyCode, so help me Codd. FIPS is US only, numeric; most countries use a string.

> Showing this is left to the reader as an exercise. :-)

I suppose that is a good distraction, if you can't execute the main exercise.

> > Which dependencies are not preserved? With respect to what schema?

Far be it from me, to get entangled in the flagella, so only to increase the pace a little:
- the correct FDs preserve all dependencies
- the tables are too immature, too non-relational at this stage for us to worry about the FDs yet, he has to fix the Keys so that they are correct first
- granted, you are working with non-FDs on non-keys


> On Friday, 6 February 2015 20:54:20 UTC+11, Jan Hidders wrote:
> Op donderdag 5 februari 2015 12:24:55 UTC+1 schreef Derek Asirvadem:
> >
> > > > I declare:
> > > > ____the proposed data model [A] fails 3NF____
> > > > and in general:
> > > > ____the proposed data model [A] is not Normalised____
> > >
> > > For the private definitions of 3NF and Normalised that you seem to use and have not yet made explicit, this might al very well be true. Hard to say.
> >
> > ???
> > I do not have private Definitions.
> >
> > ???
> > 3NF: I have always, and severally stated, Codd's definition 1970 and 1971. I quoted it the other day in the Theoretician Crippled thread, and you seem to have accepted it.
>
> Let's focus a little on this.

What, despite the fact that, two short paras later, I posted:

> > I am only dealing with the context of this thread. That pretty much scopes it to the delta wrt 3NF (your "5NF") which causes a massive difference in the pass/fail status of models. Let's try an focus on why the model fails for me and passes for you. Why you accept Record Filing Systems (all surrogates) with no Relational integrity, power, speed, and I don't.

So you want to address a distraction. Reluctantly, then ...

> As you know I did not accept it as the full and exact description of Codd's definition of 3NF.

That is news to me.

> Your claim is that it is different form the standard textbook definition. That can be easily established by comparing them.

Not my claim, I think that was yours, but yes, it is true.

And I have already identified that the "textbook definition" is what falls into the category that you call "Normalisation Theory", an algebraic expression that has lost the meaning of the 3NF Definition, a fragment. And again I identify the problem with it six paras above this one.

Would you mind quoting the full and precise definition of 3NF by Codd?

I don't know why you don't know this. It is textbook stuff. I don't like using wiki, but in this simple case, it should suffice:

http://en.wikipedia.org/wiki/Third_normal_form

1. Of course, as usual, wiki posts pure excreta. The definition they attribute to Codd is false (proof: the term "superkey" was not known in 1971; I believe that was Zaniolo & Dates invention, circa 1981).

2. "A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982"

is totally incorrect because the definition attributed to Codd is false.

3. The "Nothing but the key" section is slightly better.

"A statement of Codd's definition of 3NF ... was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key."[7] A common variation supplements this definition with the oath: "so help me Codd"

Is half true and half fiction. wiki is famous for its efforts in re-writing history.

The Codd statement that he gave at the RM/T conference in Australia in ~1981 (?) was the one I gave, using "Functionally Dependent" instead of the wiki "provide a fact" quote. It was written up and discussed in many articles in those days. There were many "Codd Facts" and "Codd Rules" that we concerned ourselves with in those days, all well-known and eagerly discussed. Kent was known to have simplified it.

Of course a key point is by removing "Functionally Dependent", in the usual manner that abstractionists do, it breaks it up into fragments., and thus loses its meaning. Codd gave us Functional Dependency and 3NF together. The RM is an integrated set of rules. The 42 fragments are *dis*integrated possibilities. Just note the flagellation required for the latter, none required for the former.

That statement (attributed by wiki to Kent, in fact a simplified form of Codd's) was also, and famously, made by Date, and he kept repeating it, in many shows and presentations. IIRC one in Australia in ~1992. He added the "so help me Codd", because it was his way shutting down the argument (we wouldn't give FDs up), and of diminishing Codd subtly, at the same time. Typical liar, and he showed himself up many times, same as Einstein. For the very reason that he was fighting the widely accepted Codd/Kent 3NF definition, and wanted his algebraic fragment instead.

"Superkey" is Date and Darwen's baby, only required to make Record filing Systems pass off as "Normalised". It is an abortion, requiring additional columns and indices, that are simply not required in a Relational Database. The Darwen groupies who rewrite history falsely state that Codd used the term. Pulp. Fiction.

There, I have responded to your distraction. Now can we please get back to some of the central issues. For this one, I think it is established that
- the 3NF Definition was established in 1970, crystallised in 1971, in technical English; no algebraic expression was given
- (I am not at home, I don't have all Codd's papers on me, so I can't quote from them directly)
- that Date & Zaniolo first gave an algebraic expression of the "normalisation theory 3nf" in 1982, which is a mere fragment of the original (ie. the relevance of the Key is missing, plus others)
- that has been modified (evidence that the first was incorrect) severally since then

So what we have to deal with here is, we implementers use the original definition and reject the data model as failing 3NF; you use the 42 fragments and accept the same data model as "satisfying 5NF". It is pointless to argue about definitions, especially the established ones; better to examine the difference, and to determine why you people are so crippled.

You are concentrating on justifying your fragments. If you like, I can draw a picture for you, why your fragments fail.

And don't worry too much about whether mine has a formal definition or not, it is widely understood, accepted, and used. Keep in mind that it works; it takes a tiny fraction of the time that your takes (ie. I rejected the data model, and I can express the reasons in a few words). And yours, with 42 fragments that don't even approach mine, fails miserably. (ie. you accepted the data model that fails Relational and fails 3NF). If I were you, I would worry about the decrepit state and uselessness of your 42 formally defined fragments.

You can call mine (Codd's) a toy if you like. Just don't forget that my Tonka Truck wipes the floor with your 42 fragments, it beats your entire army. Two rounds ended with TKOs.

> Btw. that reminds me: have you already found a concrete example of a statement that Gary Boetticher makes in the movie you referred to where he uses standard normalization terminology in a non-standard way?

That is answered in a previous post in that thread, in great detail. You stopped at a certain point, it is past that point.

> You accused him of doing that,

No. I accused him of using the established term FD in a non-standard way, which has since been exposed as your "normalisation theory" way. So while some minor progress has been made re what each other is talking about when we use terms (confusion that does not exist in other industries), the bottom line remains: 99% of us know what an FD is, what 3NF is; you guys (the 1%) have abstractions of it, which are so abstracted that the meaning is lost (eg. the relevance of the Key), and you (plural) commit fraud and misrepresentation when you use labels that are already established for your novel fragments. Additionally, because you have many fragments (eg. BCNF, 4NF, 5NF, ETNF, SCGHNFRDB, etc, etc, etc) for what we consider a single logical concept, and those fragments still do not add up to the single logical concept, the fraud is serious.

If you (plural) were honest, you would use terms like "3NF.1 Zaniolo" and "3NF.42 Fagin", and you might be a little more useful to the community. Better still "Fragment 3NF.42", "Fragment FD.17".

Since it is now clear that when you (plural) use the term FD, you mean nothing of the sort, you mean some fragment of it, that does not have a Key to depend on, then if you were honest, you would use a different term.

Much like, it has been proved, and it will continue to be proved, that when you state the "rm says this or that", you are not referring to the RM. It is a pack of lies. Which gets exposed one lie at a time. See my response to Nicola. This doesn't happen in other industries, a theoretician in the car industry does not say he understands internal combustion engines, only to be proved wrong five days later. It is professionally embarrassing. You people have no shame.

> and even mentioned it as a reason to call him a fraud.

Yes. And I called him a fraud additionally (to the above, another count) because he was not even teaching your "normalisation theory fd theory" correctly. It is all in that post.

> You seemed to be moving towards providing some evidence of that, and then stopped.

No. Unlike you, who has left at least nine (I lost count) items hanging in three threads, with no response one way or the other, I have, as stated from the outset, continued each item to resolution.

I do admit, some days are busy, and sometimes, not usually, it takes a few days for me to respond. Eg. I posted on 7 Feb replying yours of 3 Feb. That is unusual. My apologies.

You might not be getting my posts. Also, if you are using "show/hide quoted text", it might be hiding new, un-quoted text. I say this because quite often you stop at one point in my post, and the rest remains unaddressed. On that thread, I can see 32 posts as at 07 Feb 15 07:25 UTC.

Cheers
Derek

Derek Asirvadem

unread,
Feb 7, 2015, 3:53:41 AM2/7/15
to
> On Friday, 6 February 2015 08:58:13 UTC+11, com...@hotmail.com wrote:
> On Tuesday, February 3, 2015 at 5:59:04 AM UTC-8, com...@hotmail.com wrote:
> > I just find it amusing that the particular application plus no interpretation of tables tells us that there is a universal relation CK that this diagram fails to express.

No idea why you might think that data models are supposed "hexpress" URs or CKs. Do you go to the butcher for bread ?

Therefore "fails to express" is false. Rather more telling of your position.

We don't have CKs in the physical universe, as already explained. They are a construct in the theoretical universe to avoid the election of a PK, and thus a subversion of the RM.

The developer has no CKs in the model anyway, he is not trying to subvert the RM (at least not intentionally).

> Here's what I failed to express earlier:
>
> An envelope gets to a particular address. Ids do not appear on an envelope. So joining an appropriate set of tables and projecting out id columns gives rows 1:1 with Address AddressIds. So there is a join CK among the non-id columns. But this set of tables gives no such CK. So they are not appropriate.

Result: agreed.
Method: far too backward and laborious. If we stick to the RM, the method is much simpler, the crime is simpler to identify.

> (The FKs correspond to FDs on the CK in the join. So there is a CK among the non-id columns of Street and Address.)
>
> PS:
>
> Of course, that claim does not arise from a particular (normalization or design) process. So it proves nothing about the adequacy of any allegedly adequate process. Nor would any claims about the design without giving a process and sound reasoning.
>
> To show that an alleged process is adequate one need only give it and either a counterexample or sound justification. Of course, one would only soundly believe it inadequate if they had already done so. (Granted, after demonstrating enough examples of inadquate processes from a group one might reasonably claim that the group didn't have an adequate process.)
>
> Of course, that's not showing that an allegedly adequate process actually is. That requires giving it and a sound justification.

Looks like a nice justification for incompetence.

This is the same as the six hundred words to explain that a decrepit non-Relational failed-3NF data model is somehow accepted as one. Side-stepping the issue that you don't know how it breaks 3NF, you don't know how it breaches the RM.

All we need here is for you to examine the model and accept/reject it against the criteria given. We do not need a write-up on the process you use, and that you think we should use, which is fine to ponder, in case you can't execute the exercise.

So, are you rejecting it or accepting it ? If rejected, on what basis (less than six hundred words) ?
1. Is it "5NF" ?
2. Is the reference to SCGHNFRDB valid ?
3. Is it Relational ?

I see that you perceive the non-existent CKs doing the polka with the UR behind the eyelids, and you notice the CK is tripping once or twice, but is that a rejection ? Is the Ur knocked out, do we need an ambulance, or is she just dazed, will she live to polka again ?

Cheers
Derek

Derek Asirvadem

unread,
Feb 7, 2015, 4:10:43 AM2/7/15
to
Dear people

So it appears that:
- all of you accept the declaration that the data model is "5NF"
- a few of you are uncomfortable with it, but not uncomfortable enough to reject it.
___ And that discomfort is around the smell of the keys, but nothing specific has been stated
___ If nothing is stated, that means you accept it as such
- nothing has been stated re the 3rd claim, that it is Relational. That means you accept it as such.

I will give it a few more days, perhaps to ponder the keys.

Cheers
Derek

Nicola

unread,
Feb 7, 2015, 1:39:26 PM2/7/15
to
> > To find the keys, you
> > must determine which functional dependencies hold.
>
> That is a very slow method, but yes. The tables are dead simple, and the
> attributes should be well-known to anyone who has any experience at all.

Sure. If you find the keys directly, in fact you have defined some
functional dependencies (from the keys). But, in general, you must
ensure that no other relevant semantic constraints (not necessarily only
functional) are missed.

> > Once you have found all the keys, you may tell whether your
> > schema is in 3NF or not (in this case, it is - provided that I interpret
> > correctly, and I suppose I do, the meaning of the remaining attributes).
>
> No, I have stated that it FAILS 3NF.

Look, I overall accept your line of argument and its conclusions, with a
caveat. Let me see if we can converge.

You say a schema with a surrogate "key" does not enforce row uniqueness.
I write "key" to emphasize that it does not conform to Codd's definition
because it is a totally made up set of values without any tie to reality
- and that's fine with me to assume that. A logical consequence of the
previous statement is that a surrogate "key" cannot be part of a
relational schema. E.g., this representation:

CountryId CountryName
1 Australia
2 Australia

depicts two identical tuples. Then the instance above is not a relation,
hence the model is not a relational model. Fine.

But then, that diagram is ill-formed, as it does not represent a logical
relational model. Saying that it is not in 3NF is akin to saying that an
integer does not run at 100Mph. Not false, but not particularly
significant either.

Btw, your argument about surrogate "keys" contradicts the business rule
"Country is uniquely Identified by (CountryId)", stated in the document
you have shown us. Why doesn't that statement imply that CountryId is a
key? How would that be different from "Country is uniquely Identified by
(Name)" (which would imply that Name is a key)? The difference is in the
eyes of the beholder, it seems.

> (There is an interaction going on with Jan, where we have teased out that you
> people have a fragment of 3NF which you fraudulently call "3NF". Result
> being, as evidenced here, you accept a total failure, a non-relational Record
> Filing System, to be satisfactory.)

No. We are making a formal argument. In other words, we *interpret*

CountryId CountryName
1 Australia
2 Australia

as having *two* distinct tuples. And we are *assuming* that CountryId ->
CountryName (as per the stated business rule). Formally, this does not
look any different from

X Y
1 A
2 A
FDs: {X -> Y}

where I don't need to care what those attribute mean any longer to tell
that this is a relation, and its schema happens to be in 3NF (and 5NF,
too).

Since we are discussing upon different *assumptions*, we are not going
anywhere with this game of "it is/it is not 3NF". If you want to
continue arguing on the irrationality or dumbness of the formal position
above, feel free to do that, but mathematics is mathematics, and
modeling reality is a totally different business (I have already
expressed my almost complete support for you point of view, see above).

> I think the pivotal difference is, Codd and I require a Key, before
> Functional Dependency can be worked with, and you people have non-FDs, and on
> non-keys.

The formal definition of a key depends on the notion of FD: FDs come
first, keys can be derived from them. I don't know what you mean by
"non-FDs", but FDs on "non-keys" exist, because the reality to be
modeled mandates it.

Finally, I have summarized in a few words the problem with that model:
"it does not capture the real constraints" (sorry if it's more that
six). I don't think my posts will ever be longer than yours :)

Btw, thanks for the clarification about the post codes.

Derek Asirvadem

unread,
Feb 7, 2015, 8:46:40 PM2/7/15
to
> On Sunday, 8 February 2015 05:39:26 UTC+11, Nicola wrote:
>
> > > To find the keys, you
> > > must determine which functional dependencies hold.
> >
> > That is a very slow method, but yes. The tables are dead simple, and the
> > attributes should be well-known to anyone who has any experience at all.
>
> Sure. If you find the keys directly, in fact you have defined some
> functional dependencies (from the keys). But, in general, you must
> ensure that no other relevant semantic constraints (not necessarily only
> functional) are missed.

Do you have an example of such ?

> > > Once you have found all the keys, you may tell whether your
> > > schema is in 3NF or not (in this case, it is - provided that I interpret
> > > correctly, and I suppose I do, the meaning of the remaining attributes).
> >
> > No, I have stated that it FAILS 3NF.
>
> Look, I overall accept your line of argument and its conclusions, with a
> caveat. Let me see if we can converge.

Yes!

> You say a schema with a surrogate "key" does not enforce row uniqueness.
> I write "key" to emphasize that it does not conform to Codd's definition
> because it is a totally made up set of values without any tie to reality
> - and that's fine with me to assume that.

Ok, but that is despite my post, which details exactly why the word is false and confusing. So to acknowledge that; to acknowledge that it is not Relational; and then to continue using the term with double quotes as a modifier, removes it from the previous category of unconscious fraud, and places it in the category of conscious fraud.

Either it is a Key, key, "key", 'key', or it is not.

Or, you still think a surrogate has some properties of a key. In which case, I have not gotten through to you, and there is something we need to pursue.

At the least, since I have given you detailed reasoning, which you seem to accept, you have to stop using the term "surrogate" and the term "key" (in any form) together, because it is false, and you accept it is false. Otherwise, every sentence it is located in causes a strain and a response.

Do you want me to enumerate the properties of a Key; the properties of a surrogate; then identify why the latter has none of the former ? Keys are central to the RM. I mean, you say you know the RM, and you like it, but each time we discuss some aspect of the RM, it appears you are ignorant of that aspect.

I have already posted the definitive section from the RM that relates to Key (in my previous post, which you are responding to). If you do not accept that definition, then you are not Relational, stop pretending to be.

If you are purely theoretical, which as evidenced, results in a non-relational RFS and non-keys, just state that. For such people the pretence at Relational is fraud. It allows you to anoint the non-relational RFS as "relational", which is a serious fraud, because such RFS are devoid of the Integrity, Power, and Speed of the RM. Until someone like me comes along and takes the fraud apart.

> A logical consequence of the
> previous statement is that a surrogate "key" cannot be part of a
> relational schema.

I have specifically declared that it is not Relational, yes.

> E.g., this representation:
>
> CountryId CountryName
> 1 Australia
> 2 Australia
>
> depicts two identical tuples. Then the instance above is not a relation,
> hence the model is not a relational model. Fine.

Agreed.

> But then, that diagram is ill-formed, as it does not represent a logical
> relational model.

Agreed. Already stated by me.

If you have arrived at that conclusion now, then yes, you are correct.

> Saying that it is not in 3NF is akin to saying that an
> integer does not run at 100Mph. Not false, but not particularly
> significant either.

???

You may be running into problems by chopping up my statements, and then viewing each of them in isolation. Here is what I stated. Please do not break this up into fragments:
== Quote: Fails Relational, Fails Third Normal From ==
-- Fail --
The model fails for the following reasons (many instances of said reasons). The ordering of the issues is mine, in that if it isn't Relational, it is not worth bothering about the specifics of a Normalisation error. It fails *Relational* mandates on two counts, it is non-relational.

1 Definition for keys, from the RM: "a Key is made up from the data"
__ A surrogate (RecordId) is not made up from the data
__ There are no keys on the data

2 The RM demands unique rows (data)
__ A surrogate does not provide row (data) uniqueness

3 Re Normalisation, (if we consider that outside the Relational Model), it breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).

[4] This is a classic Record Filing System, anti-relational. It is nowhere near ready for Normalisation, let alone Relational Normalisation.

None of you theoreticians identified any of that.
== End Quote ==

[1][2][3] I stated that it was not Relational AND that it breaks 3NF.
I stated that it was not Relational for several reasons,
[1] the no-keys-on-the-data issue was just one
[2] the non-uniquesness was another
[3] I stated that it breaks 3NF issue, and I gave the reason

Now, you have drawn my attention to [2], you have laid out yet another example (no idea why), and agreed with it. Good. But then you jump over to the conclusion of [3] and make a comment about it as if the reasons for [2] applies to conclusion of [3]. I did not make that connexion, therefore I cannot defend it. It is you who is make that connexion (and I agree, yes, the connexion is absurd). To wit, the conclusion for [3] is for the reasons given under [3], the conclusion for [3] is not for the reasons given under [1] and [2].

(For the purpose of being complete, since we are examining this issue again, by "pork sausage", I am referring to your surrogate, the non-key that you are treating as "key". I have stated that such an act is not only wrong, it is self-confusing.)

That kind of mistake (fragmenting my statements; then mixing them up; then making an incorrect attribution) is typical of people who have gone to universities after 1983, the consequence of losing the war. They actually teach pharisaic argument as "logic".

Therefore this does not make any sense at all:
> Saying that it is not in 3NF is akin to saying that an
> integer does not run at 100Mph. Not false, but not particularly
> significant either.

> Btw, your argument about surrogate "keys"

I argued no such thing. Surrogates are not Keys by any stretch of the imagination. I argued surrogates.

> contradicts the business rule
> "Country is uniquely Identified by (CountryId)", stated in the document
> you have shown us. Why doesn't that statement imply that CountryId is a
> key? How would that be different from "Country is uniquely Identified by
> (Name)" (which would imply that Name is a key)? The difference is in the
> eyes of the beholder, it seems.

I will answer the whole para, then answer each sentence.

In that same post, mine, part of which I have quoted above,

a. I state:
> > The model fails for the following reasons (many instances of said reasons).
That means, I have not enumerated each instance.

b. I state that the developer has corrected the mistakes identified as per above conclusions; that he has made a second submission, Address [B]; to which I have given a link; and then I stated:

> > My corrections to his /previous/ model [A] are on page 2.

There, I have enumerated (not completely) and detailed his some of his mistakes.

It appears you have not read Address [B] page 2.

I suggest you read it.

For your convenience, I will quote the relevant part (the corrective notes, in the form of a Post-It-style note, stuck on top of his Business Rules) here:
== Quote ==
* These are idiotic Business Rules, they merely declare the Records in your Record Filing System. Of course, every File in a RFS is independent, but not so in a Relational Database.
* I have told you one hundred times, if you start the design process by sticking RecordId on every box, you cripple yourself and the modelling exercise.
* IDEF1X is a Methodology, not merely a notation. Follow it.
* Go and discuss with the business, and determine the real Identifiers, the real Business Rules.
* Find out what the data is, what it means, how it relates to all other data in this cluster.
== End Quote ==

Now for each sentence.

> Btw, your argument about surrogate [] contradicts the business rule
> "Country is uniquely Identified by (CountryId)", stated in the document
> you have shown us. Why doesn't that statement imply that CountryId is a
> key?

Ok, that means you do not understand the world of implementation.

1. On one side, where the business gives us "business rules", they are not to be taken as implementation imperatives. If taken as such, we would be merely clerks, implementing their requirements, without using the skills that they hired us for. Eg. we would implement a "business transaction" that updated six million rows, that hung the users up for 15 minutes in the middle of the day, and we would take no responsibility, because the business "told us to do it".

1.a Obviously, we do not do that. We exercise the skills we were hired for. Part of which is to implement OLTP Standard-compliant transactions. We do not view the business requirements as imperatives, we view them as initial requirement statements. We work back and forth, such that the requirements are modified, then accepted, and then implemented, such that they do not crash the system; such that the database does not have circular references; etc; etc; etc.

1.b So the example "business transaction" would be converted into a batch job that runs in a loop and executes six million OLTP Standard-compliant single-row transactions. The batch job keeps track of its position; is restartable; etc. SO they business gets the requirement they want, but not in the METHOD that they initially stated it. Ie. Just tell me what you want, don't tell me how to do it.

1.c On this one side, in no case is a business rule to be taken as an imperative.

2. On another side, I was merely answering a request from people here who evidently could not read an IDEF1X data model; who evidently could not understand that the predicates were in the diagram, and I asked the developer to fill them in.

2.a The way we do that is this:
___ All the predicates in the data model, *as well as predicates that are unknown to you people*, are covered under a formal documentary section "Business Rules". The predicates are included, precisely because we do not expect the users and auditors to fully understand the notation in the data model, so we spell it out for them.

2.b The developers had better know the predicates (a) from the business requirements given, that they have actually implemented, in the model, as well as (b) derived from the model, as a validation of it.

2.c In fact, the exercise of modelling, is to go back-and-forth with the users, using the data model as a communication tool, such that:
___ The model progresses incrementally
___ Understanding of the data, is progressively increased with every iteration
___ by *both* parties
___ The data model used for communication with the users, is incomplete without that "Business Rule" section
___ the discussions that are had are specifically to get them to agree that the stated Business Rules (which an IT person understands to be predicates) are correct. Ie. it is a formal method of validating the model, and obtaining a signature.
___ The primary purpose of the data modelling exercise (with the model used as a subject tool) is to *understand the data*. It is not to design a database. After the model has achieved some maturity; after the users have agreed to it; then yes, the secondary purpose may be commenced, it may be ready for designing a database. If you get those primary and secondary purpose mixed up, the resulting database will be a gross failure.

2.d This, btw, is exactly what is going on in this thread, with the model progressing incrementally A, B, and so on.
___ With the exception that you guys are more like users, who need everything spelled out, because you are not used to the models that we have been using since 1985, and you have little understanding of the modelling exercise
___ instead of data modellers, who can read models, forwards, backwards, side-ways, and determine the faults in the model, in short order.

2.e So the developer was just filling in the gap as requested by posters here, using the formal structures that are in place.

3. On the third side,
3.a It is plainly obvious to me (unfortunately not to you people), that he has:
- not understood the data
- therefore not understood how to Identify the data (the Keys required)
- therefore not correctly worked the FDs out
- erected a model starting with certain boxes in mind (a severe mistake, if one is trying to understand the data)
- stuck an ID column on every box that moves (which cripples his ability to understand the data, as data, and nothing but data; cripples his ability to perceive the data Relationally; cripples his ability to determine the required entities)
- which is the classic hallmark of a Record Filing System
- which has none of the Integrity, Power, and Speed of a Relational Database
- which you people use, each and every time

Therefore he is screwed before he starts.

Hence my annotations to him in my quote above.

Therefore:
> Btw, your argument about surrogate [] contradicts the business rule
> "Country is uniquely Identified by (CountryId)", stated in the document
> you have shown us.

is plainly incorrect. CountryId uniquely identifies precisely NOTHING. Instead of recognising that the stated BR is an error, a false and impossible statement, you are taking it as fact.

As detailed above, and in my annotations in Address [B] page 2, the idiot worked backwards, and merely wrote BRs from the model, which is an RFS. Since the model is incorrect, the BRs are therefore incorrect. Separate to it being false, for any human being.

> Why doesn't that statement imply that CountryId is a
> key?

That statement taken alone, ignoring the fact that it is a severe error, does carry that implication, yes.

Which is one of the reasons (here, exemplary) that viewing surrogates as "keys" is so dangerous. You mess with the meaning of Key, and you mess your own self up.

----

Screwed, the same as you people, crippled, same as you people, because you take the same RFS approach, draw rectangles without having a clue about the entities that are *actually*required; without understanding the data; without determining the Keys.

Actually, he is less screwed than you, by one recognisable increment, because he understands that surrogates are not Keys.

But that is ok, we can approach the issues in the sequence that you are following (no, Derek, surrogates are ok; we have our magical mysterious ways of ensuring the {keys|fds|dependencies|fks|relationship|etc} re correct), the result (for me) is the same, I will determine various faults such as:
- Non-compliance with the RM
- Beaks {1NF|2NF|3NF}
that you will fail to determine. And do so in five minutes.

I had better identify the reasons for that. The difference is, I am working with a deep understanding of:
- the RM
- 3NF
___ which contains your BCNF, 4NF, 5NF fragments
___ and any NF fragments that you might declare in the future
- data modelling, specifically the standard for it, IDEF1X
- your 17 NFs fragments, carefully inspected and dismissed
- your level of abstraction, carefully examined and determined to be too abstract, due to loss of meaning, and dismissed
- knowledge of RFSs vs RDBs, and knowledge of the specific failures of RFSs (Loss of Integrity, Power, Speed that is in the RM)
- the specific issues concerning surrogates

Whereas you are working with:
- zero understanding of the RM
- a tiny understanding of 3NF
___ due to your 17 fragments taking precedence over 3NF
___ thus denying 3NF
- zero understanding of data modelling
- a small ability to read and understand a standard data model
- your 17 NF fragments
- your level of abstraction
- zero knowledge of RFSs vs RDBs, and zero knowledge of the specific failures of RFSs (Loss of Integrity, Power, Speed that is in the RM)
- thus zero knowledge that you are creating an RFS that is nowhere near an RDB, but declaring it to be an "RDB"
- which relies on surrogates as "keys"

The difference, in sum, is that I determine errors and failures in five minutes, in five words, whereas you take the long and winding trail through the forest of your mysteries; fail to determine those errors; and have six hundred words to justify your mysteries.

----

> > (There is an interaction going on with Jan, where we have teased out that you
> > people have a fragment of 3NF which you fraudulently call "3NF". Result
> > being, as evidenced here, you accept a total failure, a non-relational Record
> > Filing System, to be satisfactory.)
>
> No. We are making a formal argument. In other words, we *interpret*
>
> CountryId CountryName
> 1 Australia
> 2 Australia
>
> as having *two* distinct tuples. And we are *assuming* that CountryId ->
> CountryName (as per the stated business rule). Formally, this does not
> look any different from
>
> X Y
> 1 A
> 2 A
> FDs: {X -> Y}
>
> where I don't need to care what those attribute mean any longer to tell
> that this is a relation, and its schema happens to be in 3NF (and 5NF,
> too).

This could be an excellent vehicle to Identify and Determine why the vehicle that you people use is broken, bankrupt. Thank you for expressing it clearly, in technical English, so that the 99% can deal with it, and not in the gibberish that is so beloved of the 1%.

It remains an error, irrelevant to the physical universe, relevant only in the interaction amongst yourselves.

Your formalism simply means that you can declare a non-relational RFS that breaks 3NF, as "relationaL", and "satisfying 5NF" (and now "satisfying your "3NF" "), and rely on the fact that you have arrived at such declaration on the basis of formal argument. You are missing the point that the argument (with or without the formal basis) is wrong; the determination that you made using that argument are wrong.

I have every respect for formalism. I have no respect for a formalism that is sooo isolated from reality (facts in the physical universe); sooo ignorant of other sciences (ie. established truths); sooo abstracted such as to lose the meaning of the very thing it is abstracting, such that it "proves" something that is patently false; devoid of meaning.

Eg. there is no problem at all to "prove" in an abstract, isolated, ignorant sense (formal argument), that pigs can fly. But that has to be very abstract, very isolated, very ignorant. And if you tell anyone who has not lost his mind that you have proved theoretically that pigs can fly, he will split his sides, because the extent of his laughter is physically damaging. But you make that statement with a straight face.

And if you make that statement to someone who is an authority, he will lock you up, in order to protect society for such insanity.

Eg. here you have "proved" that the heap of rubbish that the developer submitted "satisfies 5NF", with a straight face, despite the physically evidenced fact that it breaks 3NF for the reasons given.

I laugh in your face. ROTFLMAO, etc, etc.

I won't be addressing any of your formal argument, because that might give it some credibility, or suggest it has some value, where it has none right now. It remains an incredible mental abstraction that is wrong, wrong, wrong. Pigs can't fly. The data model breaks 3NF. Five minutes, not five days.

I am happy for you to continue this line of reasoning (I will note, and ignore, your formal arguments), if you are interested in:
- determining the delta (why it fails Codd's and my 3NF vs why it passes you "5NF")
- determining the delta (why it fails the RM vs why you think it is "relational").

But that the formal argument is devoid of credibility, has the value of toilet paper, is already proved.

> Since we are discussing upon different *assumptions*, we are not going
> anywhere with this game of "it is/it is not 3NF".

1. I have already proved it breaks 3NF, there is no argument to be had.

2. It is not a game, it is established science, which you are in denial of, a denial that you must maintain in order to maintain the relevance of your 17 "NF" fragments.

3. It does become silly, when you hold the yes-but-but-but position that 3NF is not valid (denying established science), that you will use only your 17 "NF" fragments, and your isolated, abstracted processes, whilst denying physical facts, to prove that it "satisfies "5NF" ". For you. Not for me, I have already shot it down, the pig is in flames.

4. You are maintaining your unreal universe (the validity thereof), but the task called for dealing with the model in the physical universe. Epic Fail.

> If you want to
> continue arguing on the irrationality or dumbness of the formal position
> above, feel free to do that, but mathematics is mathematics, and
> modeling reality is a totally different business (I have already
> expressed my almost complete support for you point of view, see above).

The request was to accept/reject the data model (modelling reality) on specific criteria (established science), in the physical universe, and you (plural) failed miserably, twice. Instead, you have give reasons (a second set of six hundred words) as to why your formal arguments might be valid in the unreal, abstract universe.

> > I think the pivotal difference is, Codd and I require a Key, before
> > Functional Dependency can be worked with, and you people have non-FDs, and on
> > non-keys.
>
> The formal definition of a key depends on the notion of FD: FDs come
> first, keys can be derived from them. I don't know what you mean by
> "non-FDs", but FDs on "non-keys" exist, because the reality to be
> modeled mandates it.

Totally rejected.

0. A non-FD is your theoretical notion of the FD, which is abstracted to the point where the FD has lost its meaning. Therefore, a fragment of the FD. Therefore it is a fraud to label that fragment "FD". Since I will not participate in your fraud, and in the confusion that fraudulent labels cause, I call it what it is: non-FD.

1. The established definition for Key is in the RM. The requirement was to comply with the RM, not a fractured notion of "key" from outer space, thus I couldn't care less about any other definition of Key, it does not apply. This again, proves you are not Relational.

2. The established definition for Key does not depend on the "notion of FD".

3. The Key comes first, the FD comes second, and it is used to validate the key. (Goin back-and-forth during the modelling exercise.)

-- Aside --
4. I am quite aware of the abstract notion that non-FDs can be used to determine the key (or non-key because you assent to surrogates). But that is an exercise in solving a puzzle, much like solving Sudoku or a crossword on the train. That is relevant only when you have a Record Filing system, devoid of Keys with meaning, focusing on non-keys that have no meaning, to determine that such non-keys are somehow "valid". What I have labelled MMM, and again it is evidenced here.

All that [4] is totally irrelevant because it is not necessary for the requested task, because the Keys have not been correctly determined. Therefore to embark on evaluating the FDs (given, not your puzzle) is premature. And to evaluate non-FDs is hysterically funny. ROTFL, etc, etc.

Ok, I accept, MMM is addictive, once the long and winding trail through the mysterious forest is commenced, there is no stopping the train, it must reach its climax.
-- End Aside --

5. FDs do not come first. Your non-FDs may well come first. I couldn't care less, it has nothing to do with the task, it is a rumble in the jungle. After you return, if you have some statement of value (that relates to the task), sure, I will listen. I don't need the details of the latest rumble. In this case, no statement of value has been made.

6. Repeat [3]. Your non-FDs on your non-keys may well exist. Repeat [5].

7. The reality to be modelled scoffs at it (the notions in your para, destroyed by my points above). It takes a massive amount of time and energy, and comes up with totally incorrect determinations. Null and void in the reality (physical universe).

7. The reality to be modelled, due the efforts of others (ie. not due to you people, who have delivered nothing of value since 1970), have tools and methods, that are unfortunately unknown to you, that can be used to determine the correctness and completeness (that is what a standard does) of a model, in simple scientific steps, and in a tiny fraction of the time it takes you.

8. Your notion of what the reality mandates is false. It is transparent that your mandates are totally self-serving (the abstract, unreal universe), and it has nothing at all to do with reality. It is yet another fraud to state that your self-serving, self-imposed mandates are "mandated by the reality".

Stated otherwise, that means, your model is bankrupt, isolated from the reality is alleges to model. And you are ignorant of models that do serve the reality, that reality does use [7].

> Finally, I have summarized in a few words the problem with that model:
> "it does not capture the real constraints"

Very good. That is a correct, although less-than-perfect, conclusion about the aspect of constraints (the 3NF and RM issues remain separate aspects, in which you failed). One sentence as to why "it does not capture the real constraints" would have made it a perfect conclusion.

Hint (re Address A): there are no keys on the data.
Hint (re Address B): the keys have not been determined correctly.

I am grateful, more than you can imagine, that you did not give me six hundred words, that detailed your ramble through the forest, which exercise gave you the reasoning to form that conclusion.

> (sorry if it's more that
> six). I don't think my posts will ever be longer than yours :)

I have no problem with length. I do have a problem with length that does not contain substance, yes. Or length that "proves" the opposite of an established fact, or that fails to determine faults that are determined by other science.

Cheers
Derek

Nicola

unread,
Feb 8, 2015, 5:39:01 AM2/8/15
to
In article <abc8bebc-4e27-4af4...@googlegroups.com>,
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > On Sunday, 8 February 2015 05:39:26 UTC+11, Nicola wrote:
> >
> > > > To find the keys, you
> > > > must determine which functional dependencies hold.
> > >
> > > That is a very slow method, but yes. The tables are dead simple, and the
> > > attributes should be well-known to anyone who has any experience at all.
> >
> > Sure. If you find the keys directly, in fact you have defined some
> > functional dependencies (from the keys). But, in general, you must
> > ensure that no other relevant semantic constraints (not necessarily only
> > functional) are missed.
>
> Do you have an example of such ?

Do you mean, an example in which finding the keys directly is harder
that finding FDs first and deriving the keys from them? Well, it depends
on how clever you are at "seeing" keys, which in turn depends on how
much experience you have. For instance (from H. Koehler):

CourseSchedule(Course, Lecturer, Room, Time)

where a course has only one lecturer, each class has a fixed duration,
and the obvious constraints hold, such as teachers do not have the gift
of ubiquity. You may find the keys "directly" here, of course, but I
argue that, even if you do that, you are in fact (maybe unconsciously)
reasoning about the valid FDs.

Advantages of making the FDs explicit are that (1) they provide you with
a systematic (algorithmic) way to derive all the keys (not that this is
computationally inexpensive, but in many cases it is efficient enough);
(2) you have a formal documentation of your systems's requirements.
Since FDs are a more general concept than keys, you may capture
constraints that are not captured by keys (I don't think that you were
asking me to show you an example of this, since you may find it in so
many books and papers, including Codd's).

Of course, if you are clever enough to design all of your schemas to be
in Codd's 3NF to begin with (which, as far as I can tell, is implicit in
your arguments), then there's nothing else to discuss: of course all of
your FDs will be full dependencies from keys. You'd never come up with
the schema above in the first place.

> > You say a schema with a surrogate "key" does not enforce row uniqueness.
> > I write "key" to emphasize that it does not conform to Codd's definition
> > because it is a totally made up set of values without any tie to reality
> > - and that's fine with me to assume that.
>
> Ok, but that is despite my post, which details exactly why the word is false
> and confusing. So to acknowledge that; to acknowledge that it is not
> Relational; and then to continue using the term with double quotes as a
> modifier, removes it from the previous category of unconscious fraud, and
> places it in the category of conscious fraud.
>
> Either it is a Key, key, "key", 'key', or it is not.

Ok, let's call it just "surrogate".

> Or, you still think a surrogate has some properties of a key. In which case,
> I have not gotten through to you, and there is something we need to pursue.

We have agreed that a surrogate has no place in a relational schema. So,
there's no need to discuss surrogates any further at the logical level.

> > Saying that it is not in 3NF is akin to saying that an
> > integer does not run at 100Mph. Not false, but not particularly
> > significant either.
>
> ???

1. Is 3NF defined in the context of the RM?
2. Does it make sense to use a definition outside the context in which
it is given?

If your answers to these questions are not "yes" and "no", respectively.
then I think I don't follow you. You are saying that something that is
not a relational model lacks a property (3NF) that is defined for
relational models. Strictly speaking, that's not a wrong statement: also
integers lack the property of being able to run at 100Mph.

> The model fails for the following reasons (many instances of said reasons).
> The ordering of the issues is mine, in that if it isn't Relational, it is not
> worth bothering about the specifics of a Normalisation error.

Exactly. So why are you so keen to point out that it fails 3NF? Just say
that it is not relational.

> It fails
> *Relational* mandates on two counts, it is non-relational.
>
> 1 Definition for keys, from the RM: "a Key is made up from the data"
> __ A surrogate (RecordId) is not made up from the data
> __ There are no keys on the data

Ok.

> 2 The RM demands unique rows (data)
> __ A surrogate does not provide row (data) uniqueness

Ok.

> 3 Re Normalisation, (if we consider that outside the Relational Model), it
> breaks Third Normal Form, because in each case, there is no Key for the data
> to be Functionally Dependent upon (it is dependent, yes, but on a pork
> sausage, outside the data, not on a Key).

See above.

> [4] This is a classic Record Filing System, anti-relational. It is nowhere
> near ready for Normalisation, let alone Relational Normalisation.

Ok.

> I have every respect for formalism. I have no respect for a formalism that
> is sooo isolated from reality (facts in the physical universe); sooo ignorant
> of other sciences (ie. established truths); sooo abstracted such as to lose
> the meaning of the very thing it is abstracting, such that it "proves"
> something that is patently false; devoid of meaning.
>
> Eg. there is no problem at all to "prove" in an abstract, isolated, ignorant
> sense (formal argument), that pigs can fly. But that has to be very
> abstract, very isolated, very ignorant. And if you tell anyone who has not
> lost his mind that you have proved theoretically that pigs can fly, he will
> split his sides, because the extent of his laughter is physically damaging.
> But you make that statement with a straight face.
>
> And if you make that statement to someone who is an authority, he will lock
> you up, in order to protect society for such insanity.

Progress in science often happens exactly because the "established
truths" and the "facts in the physical universe" are subverted, and
someone comes and says:. "what if pigs could fly?". You don't need
non-Euclidean geometry or String Theory to find your way home, and you
don't need to know number theory to buy things in your favorite online
shop. You (and me) can be happy with Riemann's definition of an integral
for all practical purposes (in fact, we might never need it in our
lives), and ignore the fact that there are several other definitions,
which give different results only in pathological cases. We can already
build spatial databases that solve our problems satisfactorily, so why
bother with a topological extension of the RM (see Norbert Paul's
thread)?

Yes, you don't need any of the many abstract definitions of normal forms
in the literature to design and implement a good database that makes
your customers happy. You don't need to care about the nested RM or
infinite relations. You don't have to worry about the fact that a
relational schema may have a factorial number of keys, or that a schema
with with fourteen attributes and ten constraints may have tens of
millions of possible decompositions in 3NF. You will never ever find
such things in the real world. That does not mean that we don't have the
freedom to explore.

Welcome to c.d.t. (crazy.database.theory) ;)

Derek Asirvadem

unread,
Feb 8, 2015, 1:19:24 PM2/8/15
to
Nicola

> On Sunday, 8 February 2015 21:39:01 UTC+11, Nicola wrote:
> In article <abc8bebc-4e27-4af4...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:

I have to say, I welcome your lucidity.

> > > On Sunday, 8 February 2015 05:39:26 UTC+11, Nicola wrote:
> > >
> > > > > To find the keys, you
> > > > > must determine which functional dependencies hold.
> > > >
> > > > That is a very slow method, but yes. The tables are dead simple, and the
> > > > attributes should be well-known to anyone who has any experience at all.
> > >
> > > Sure. If you find the keys directly, in fact you have defined some
> > > functional dependencies (from the keys). But, in general, you must
> > > ensure that no other relevant semantic constraints (not necessarily only
> > > functional) are missed.
> >
> > Do you have an example of such ?
>
> Do you mean, an example in which finding the keys directly is harder
> that finding FDs first and deriving the keys from them? Well, it depends
> on how clever

Capable.

The RM and 3NF are written in technical English. Finding the Keys doesn't require any special skills other than the normal aptitude for databases (as opposed to say, programming). IQ in the upper half of the average band, ie. 100 to 110. One just needs to follow a few simple steps.

But first, before attempting that, we need to establish the facts.

> you are at "seeing"

I don't have an Ouija board or a crystal ball. I am an Orthodox Catholic. The only candles I light are at the foot of the altar, below a crucifix. I do pray, but I ask for nothing for myself. And every four years I need a stronger prescription for my lenses.

> keys, which in turn depends on how
> much experience you have. For instance (from H. Koehler):
>
> CourseSchedule(Course, Lecturer, Room, Time)
>
> where a course has only one lecturer, each class has a fixed duration,
> and the obvious constraints hold, such as teachers do not have the gift
> of ubiquity.

I couldn't find that. From what I can /see/, it looks too simple anyway.

I found a very similar example in Foundations of Data Heaps, which has one less element, way too simple for our purpose. It is aimed at simple minds.

I found Köhler's DNF paper, it has an example that is very similar, with one more element ("non-simple" in Relational terms), and it is quite fine for me. If you are happy to go with that, I have one tiny question. Given:

>>>>
Domination Normal Form - Decomposing Relational Database Schemas (sic), Henning Köhler

5 An Example

A (sic) university has oral examinations at the end of each semester, and wants to manage related data using a relational database. The relevant attributes to be stored are

____R = {Student, Course, Chapter, Time, Room}

Here Chapter denotes a chapter from the course textbook the student will be examined about. Every student can get examined about multiple chapters, and chapters may vary for each student. Multiple students can get examined at the same time in the same room, but the course must be the same. Further constraints are that a student gets examined for a >course<chapter> only once, and can't be in multiple rooms at the same time. Those conditions can be expressed through functional dependencies as follows:
<<<<

1. Is the strike-out and substitution correct ? Otherwise the facts given are incoherent.

----

Of course, I didn't read past that point, ie. I did not peruse his non-FDs.

If the answer to [1] is "no", then please explain the contradicting requirements, and skip the rest of this post.

If the answer to [1] is "yes", then please look at this page. At this stage, before I dive into determining the Keys, I would like to make sure that I have gotten the facts right.
http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20A.pdf

Cheers
Derek

Derek Asirvadem

unread,
Feb 8, 2015, 3:56:42 PM2/8/15
to
Nicola

> On Monday, 9 February 2015 05:19:24 UTC+11, Derek Asirvadem wrote:
>
> If the answer to [1] is "yes", then please look at this page. At this stage, before I dive into determining the Keys, I would like to make sure that I have gotten the facts right.
> http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20A.pdf

I missed one requirement, doc updated.

Also, in addition to my 'traditional' layout, I am experimenting (exploring?) a new one. I have given you both. Please indicate which one you prefer, not in terms of eye candy, but in terms of cognition; comprehension; logical structure of the data presented.

Cheers
Derek

Nicola

unread,
Feb 8, 2015, 4:47:10 PM2/8/15
to
In article <7e4c6cec-6534-4187...@googlegroups.com>,
Derek Asirvadem <derek.a...@gmail.com> wrote:


> > For instance (from H. Koehler):
> >
> > CourseSchedule(Course, Lecturer, Room, Time)
> >
> > where a course has only one lecturer, each class has a fixed duration,
> > and the obvious constraints hold, such as teachers do not have the gift
> > of ubiquity.
>
> I couldn't find that.

I've borrowed it from his PhD thesis, but it's discussed in the paper
"Finding Faithful Boyce-Codd Normal Form Decompositions", too.

> From what I can /see/, it looks too simple anyway.

Good for you. So, let's move on to the next example, which is slightly
more interesting.

> I found Köhler's DNF paper, it has an example that is very similar, with one
> more element ("non-simple" in Relational terms), and it is quite fine for me.
> If you are happy to go with that, I have one tiny question. Given:
>
> >>>>
> Domination Normal Form - Decomposing Relational Database Schemas (sic),
> Henning Köhler
>
> 5 An Example
>
> A (sic) university has oral examinations at the end of each semester, and
> wants to manage related data using a relational database. The relevant
> attributes to be stored are
>
> ____R = {Student, Course, Chapter, Time, Room}
>
> Here Chapter denotes a chapter from the course textbook the student will be
> examined about. Every student can get examined about multiple chapters, and
> chapters may vary for each student. Multiple students can get examined at the
> same time in the same room, but the course must be the same. Further
> constraints are that a student gets examined for a >course<chapter> only
> once, and can't be in multiple rooms at the same time. Those conditions can
> be expressed through functional dependencies as follows:
> <<<<
>
> 1. Is the strike-out and substitution correct ? Otherwise the facts given
> are incoherent.

Usually, students take an exam for a course, not for a chapter, so
"course" intuitively make more sense. Let me see if I can spot some
contradiction. I will go through the requirements (not in the same
order) and try to provide an example of sets of values compatible with
each requirement and the previous ones.

1. "Every student can get examined about multiple chapters" (for the
same course or for different courses):

Student Course Chapter Time Room
s1 c1 1 ...
s1 c1 2 ...
s1 c2 1 ...
s1 c2 2 ...

2. "chapters may vary for each student" (also for the same course):

Student Course Chapter Time Room
s1 c1 1 ...
s1 c1 2 ...
s1 c2 1 ...
s1 c2 2 ...
s2 c1 3 ... <-- different from s1
s2 c1 4 ... <-- different from s1
s2 c2 3 ... <-- different from s2

3. "a student gets examined for a *course* only once and can't be in
multiple rooms at the same time": this says that given a student and a
course, there is only one exam, hence time and room are fixed (an exam
takes place at one time in one room):

Student Course Chapter Time Room
s1 c1 1 t1 r1
s1 c1 2 t1 r1 <-- cannot be t2 or r2

4. "Multiple students can get examined at the same time in the same
room, but the course must be the same". So:

Student Course Chapter Time Room
s1 c1 1 t1 r1
s1 c1 2 t1 r1
s1 c2 1 t2 r1 <-- cannot be t1
s1 c2 2 t2 r1
s2 c1 2 t1 r1
s2 c1 3 t1 r1
s2 c2 3 t3 r2 <-- cannot be t1

I don't see any contradictory facts (there is redundancy, of course).

For the theoreticians among us, let me express the same requirements
formally ( "->" means "functionally determines" and "-/->" means "does
not functionally determines"):

1. {Student,Course} -/-> {Chapter}
2. Well, this states that the multi-valued dependency
{Course} ->> {Chapter} does not hold.
3. {Student,Course} -> {Time,Room} and {Student,Time} -> {Room}
4. {Time,Course,Room} -/-> {Student}, but {Time,Room} -> {Course}

So, we have the following constraints:

{Student,Course} -> {Time,Room}
{Student,Time} -> {Room}
{Time,Room} -> {Course}

"Room" in the first dependency can be removed without loss of
information (SC -> R descends from SC -> T and ST -> R by transitivity).

Erwin

unread,
Feb 8, 2015, 5:47:06 PM2/8/15
to
Op zondag 8 februari 2015 19:19:24 UTC+1 schreef Derek Asirvadem:
>
>
> The RM and 3NF are written in technical English. Finding the Keys doesn't require any special skills other than the normal aptitude for databases (as opposed to say, programming). IQ in the upper half of the average band, ie. 100 to 110.

Aha. Now I know why I can't understand.

James K. Lowden

unread,
Feb 8, 2015, 6:59:35 PM2/8/15
to
On Sat, 7 Feb 2015 17:46:38 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > > I think the pivotal difference is, Codd and I require a Key,
> > > before Functional Dependency can be worked with, and you people
> > > have non-FDs, and on non-keys.
> >
> > The formal definition of a key depends on the notion of FD: FDs
> > come first, keys can be derived from them. I don't know what you
> > mean by "non-FDs", but FDs on "non-keys" exist, because the reality
> > to be modeled mandates it.
>
> Totally rejected.
...
> 4. I am quite aware of the abstract notion that non-FDs can be used
> to determine the key.... That is relevant only when you have a
> Record Filing system, devoid of Keys with meaning, focusing on
> non-keys that have no meaning, to determine that such non-keys are
> somehow "valid". [...]
>
> All that [4] is totally irrelevant because it is not necessary for
> the requested task, because the Keys have not been correctly
> determined.

I think this exchange illustrates a difference in tradition that you
feel is idiotic but is really just a question of what one assumes.

From your point of view, you have a customer and a system that
maintains addresses in a particular place and time. The columns have
meaning (exemplified in their names) and the keys more or less announce
themselves to you. Any ambiguity or error can be addressed by
discussing them with your customer. I.e., by agreeing on their
meaning. Any formalism with FDs is pointless.

From the academic point of view -- indeed from the point of view of
the DBMS, as you know -- no column has meaning. It has a type and
domain, and some relationship (perhaps functional dependency) on other
columns. Any statement about keys *must* be based on stated FDs.

Your corrective notes end with what really is all that need be said,

> * Find out what the data is, what it means, how it relates to
> all other data in this cluster.

That is an option open to your group and not generally to c.d.t..
Anyone here willing to make an assertion about the correctness of the
model must also be willing to make assumptions about the meaning of the
columns. However safe those assumptions might be, they are still only
assumptions.

Surely you agree that to be unwilling to make assumptions like that need
not be an exercise in stupidity or obfuscation.

An example of the difficulty arising from unclear meaning is the
discussion over the relationship of postal code to unit. I would never
have guessed there are jurisdictions in which they are 1:1, but you
said (IIUC) that there are. Other questions arise, too. When I looked
at
http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf
I found myself wondering about StreetName and StreetType. I couldn't
think of an application for which those tables would be useful. They're
not objectively wrong, but I assume they are.

I remember a different example in my work. We had two tables,
Countries, and CountryGroups. Each Country had and ISO code and was
the real deal. CountryGroups reflected various political designations
and business imperatives.

One fine day a developer wrestling with an "application problem" (his
term) asked for comment on his preferred solution: to add one row to
Countries named "all countries". (You can imagine my reaction and I
yours!) I would say the suggestion stemmed fundamentally from a failure
to understand the meaning of the Countries table. To the man in
question, the table had no meaning per se, and the "missing" row was a
deficiency. I suggested, colorfully, that the concept of "all
countries" belonged squarely in CountryGroups. Obvious as that may be
to you, it took quite a lot of persuasion to prevent corrupting a basic
domain table. Meaning is surprisingly hard to pin down.

LIke you, I learned about 3NF from an informal description. I don't
know how many treatises I've read describing an algorithm based on FDs;
they all read to me like the How to Hunt Elephants
(e.g., "COMPUTER SCIENTISTS"
http://paws.kettering.edu//~jhuggins/humor/elephants.html): sure to
succeed if ever it finished, and unnecessary in my context.

Unlike you, I don't think the FD formalism is an exercise in navel
gazing. Would only that the described algorithm were implemented, and
we could pour our column definitions in and get a 3NF (say) logical
model out!

The problem as I see it isn't in the formalism per se, but in
describing the columns' meaning to the algorithm. You do that in your
head and depict the result with IDEF1X. I've done the same. Are you
prepared to say that's the last and best way? I'm not. I'm still
waiting for an FD language (loosely speaking) that will describe my
database better than SQL, from which I can generate an IDEF1X diagram
and matching SQL DDL. That would be a better way to work, and would be
fruit from FD tree.

--jkl


Derek Asirvadem

unread,
Feb 8, 2015, 7:44:36 PM2/8/15
to
Nicola

> On Monday, 9 February 2015 08:47:10 UTC+11, Nicola wrote:
> In article <7e4c6cec-6534-4187...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> > 1. Is the strike-out and substitution correct ? Otherwise the facts given
> > are incoherent.
>
> Usually, students take an exam for a course, not for a chapter,

Really ? It is quite different here, we have at least one exam per semester. (Chapter is a contrivance for the example, but I knew that.) No matter.

> so
> "course" intuitively make more sense. Let me see if I can spot some
> contradiction. I will go through the requirements (not in the same
> order) and try to provide an example of sets of values compatible with
> each requirement and the previous ones.

Whoa! I just needed a quick answer to a tiny question, re the contradiction in the requirement. I thought you understood (from your posts and mine) that I was going to try and find the Keys the "hard" way, without FDs. Sorry if I wasn't clear. Thanks for taking the time, but I won't look at the details, certainly not at the non-FDs.

< big snip of long and winding, "computationally expensive" road, thanks anyway >

In his paper, I found that this:
> > Every student can get examined about multiple chapters
contradicted with this:
> > a student gets examined for a course only once

Which, thanks to your additional info, I now understand to be:
- one exam per course per student
- multiple courses per student, therefore multiple exams
- therefore multiple chapters.

The poor guy expresses himself backwards ! Ok, so it wasn't a contradiction, it was missing requirements. "multiple chapters" implies:
- multiple courses
- one exam per course
- one chapter per exam

Give me five ...

... I am back.

Let's set the context for this, so as to avoid confusion. Here is the sequence:

> > Do you have an example of such ?
>
> Do you mean, an example in which finding the keys directly is harder
> that finding FDs first and deriving the keys from them?

Yes.

> > Well, it depends
> > on how clever you are at "seeing" keys, which in turn depends on how
> > much experience you have. For instance (from H. Koehler) ....

> I found Köhler's DNF paper, it has an example that is very similar, with one more element ("non-simple" in Relational terms), and it is quite fine for me ....

And following that, you have given me the missing requirements ...

So now I am going to try the "hard" way, to determine the Keys without using your non-FDs or his non-FDs.

The first thing is to translate all those text strings re the facts about the data (Köhler's section 5, first two paras only), into Relational. The easiest and most comprehensive way to do that is to erect a model. And then check that with the user (you !), "have I got the facts about the data right ?". At this stage, it is minus the Keys, because there is no point in /formally/ determining the keys if the facts re the data are incorrect.

(No keys were given in the paper, as usual for theoreticians. Being Relational on my side, as a natural result of modelling the stated facts, some keys are /informally/ determined, in order to support the facts. You might say, the obvious Keys, or the demanded Keys. But that is a natural product of the exercise, and it remains informal, until the formal exercise commences.)

After I get the facts verified, I will determine the Keys the "hard" way, without the "computationally expensive" method from the non-relational side, using the formal method on the Relational side.

And check that with you. If I fail, you can give me the computationally expensive method from the non-relational side.

Is that alright with you ?

Could you please look at this page and quickly see if I have gotten the facts (Köhler's section 5, first two paras only) right is the case.
http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20A.pdf

I don't want you to spend any significant amount of time typing, etc. If you find issues, please throw it back at me, with a few words.

Cheers
Derek

Derek Asirvadem

unread,
Feb 9, 2015, 1:54:40 AM2/9/15
to

Guys and Dolls

> On Monday, 9 February 2015 11:44:36 UTC+11, Derek Asirvadem wrote:
>
> In his paper, I found that this:
> > > Every student can get examined about multiple chapters
> contradicted with this:
> > > a student gets examined for a course only once
>
> Which, thanks to your additional info, I now understand to be:
> - one exam per course per student
> - multiple courses per student, therefore multiple exams
> - therefore multiple chapters.
>
> The poor guy expresses himself backwards ! Ok, so it wasn't a contradiction, it was missing requirements. "multiple chapters" implies:
> - multiple courses
> - one exam per course
> - one chapter per exam

> Could you please look at this page and quickly see if I have gotten the facts (Köhler's section 5, first two paras only) right is the case.
> http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20A.pdf
>
> I don't want you to spend any significant amount of time typing, etc. If you find issues, please throw it back at me, with a few words.

While I was waiting, I looked at Köhler's paper a little bit further, squeezing my eyes shut tight every time I saw a non-FD. I found what appears to be an instance of the UR on page 6. That seems to show that the above correction was incorrect, the version three correction is:
The poor guy expresses himself backwards ! Ok, so it wasn't a contradiction, it was missing requirements. "multiple chapters" implies:
- multiple courses
- one exam per course
- multiple chapters per exam, in one sitting

So I changed the model to suit the revised revised requirement. Thirty seconds.

That is one (just one) of the great values of using Keys: they are reliable, and when the model changes, because the Keys are solid, the changes are easy as. Stated another way, reliable Keys are the pegs that you can hang something on, with confidence, it doesn't break off or disappear like an ID field in a Record Filing System. As you know, I am finding a lot of things that should be hanged, so I need as many good keys as I can get.

Went back to waiting for the phone to ring.

So I thought, well, that is the third iteration, the facts are progressing in terms of reliability, I might as well check the Keys (that had been informally derived through the simple act of placing the data, as described by the facts, two paras) are good, and the predicates hold.

I read them forwards, backwards, and sideways.

---------------------
The Keys are good
---------------------

There is no work, no change to make, to /formally/ determine the Keys. The simple act of placing the data in a Relational context exposed the keys, naturally, and now those Keys hold. Yes, yes, I use real FDs to verify the Keys. Minutes, not days.

So I proceeded to check the predicates. I read them backwards, sideways, and forwards, in honour of Köhler.

------------------------
The Predicates hold
------------------------

> > Do you have an example of such ?
>
> Do you mean, an example in which finding the keys directly is harder
> that finding FDs first and deriving the keys from them? Well, it depends
> on how clever you are at "seeing" keys, which in turn depends on how
> much experience you have. For instance (from H. Koehler):

Example in DNF Paper.

Done.

---------------------------------------
"Hard" Key Determination is Easy.
---------------------------------------

And the "easy" method is ... well, we are still waiting.

So I rushed to type up a status post, with the good news, that I had accomplished the "hard" task of determining Keys directly from the data, without using the long and winding, computation-heavy, non-key, non-FD method. On data that I had never seen before. It is not that I was against learning the easy method, it is that one thing I learned during my decades of endurance horse riding, never ever use new gear on an official race, always use the tried-and-tested, worn-in gear. Splitting the sking where the sun doesn't shine, due to a seam in a new garment, when you are sixty kilometres from the car is not something you forget easily.

No doubt you are doing the same.

But I thought, I might as well identify the next step, from the paper, and put that in the same post. You good people will never guess what I found out !!!

The problem that Köhler declares, the premise of the paper, DISAPPEARED. There is no problem for me to test the proposed solution ON.
- Normalisation is complete: there is simply no decomposition; reduction to be had (or, stated otherwise, the one critical table cannot be decomposed loss-less-ly.
- No redundancies; no Update Anomalies
- All Köhler's reports can be produced via simple natural joins
- storage (if it ever was a concern) is up no concern, because the predicate cannot be decomposed further
- no multiple decompositions to choose from
- I thought I should check carefully, so I found his (what seems to be) universal relation and I worked backwards. All good. I added the sample data in the form that it would be, if he had used the Relational Model.

The simple act, of placing the data in the Relational context, translating his text into a Relational model, according to his facts (identifying facts; independent facts; dependent facts, and drawing a picture with dependency lines and squiggles) eliminated the problem that Köhler proposed to solve. The problem simply does not exist in the Relational context.

-----------------------------------------------
Relationalisation Eliminates Theory
-----------------------------------------------

Now, that is not to say, the paper is useless or that it has no value. Not at all. Outside the Relational context, such as in Record Filing systems, with no keys on the data, that you people are so glued to, of course is has value.

I suppose all my databases are suddenly "satisfying DNF". Through no fault or actiona of my own. I can't keep upw ith the changes. Real 3NF = BCNF+4NF+5NF+ETNF+NRNF+DKNF+DNF ...

And I am not suggesting that he erected a Straw Man argument either. He seems genuine, other than the fact that the propoaal is null in the space he declares it to be a problem, solved, he has done what I perceive (as an unqualified, informal reviewer) to be a good job. It is just that he says this problem exists in Relational databases, and this proposal is going to solve it. He just has no clue what "Relational" means, that the problem simply does not exist in the "Relational" that he keeps talking about. If he changes all occs of "Relational" to "Record Filing System", he will be fine.

He has no idea that Keys are important, let alone how to use keys Relationally. He is playing the same pre-1970's, "non-FD can determine a key" tune you people play. Using that method, the meaning of each column, and the meaning of the constructed key is LOST. Thus the facts too, have no meaning.

A = 1
B = 1
Therefore A = B
Therefore anywhere you use A, you can substitute it with B

The meaning in life cannot be expressed in 1's and 2's, in a's and b's. When you abstract it out to that level, you have totally lost its meaning. And you make massive mistakes.

Horse = 1
Dolphin = 1
Therefore Dolphin = Horse
Horse = gallop
Dolphin = dive
Therefore Horse = dive
Dolphin = galllop

Hysterical.

That is how you end up accepting a model as "5NF", one I have rejected as breaking 3NF.

Accepting a non-relational model as "relational" is due to a different reason, ignorance of what "Relational" means, the set of integrated rules.

One page summary and revised data model:
http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20B.pdf

Back to work.

Cheers
Derek

Nicola

unread,
Feb 9, 2015, 4:08:25 AM2/9/15
to
In article <20150208185933.5...@speakeasy.net>,
"James K. Lowden" <jklo...@speakeasy.net> wrote:


> Would only that the described algorithm were implemented, and
> we could pour our column definitions in and get a 3NF (say) logical
> model out!

A few years ago I implemented a few algorithms from Koehler's PhD thesis
in a Ruby script. Given a set of FDs, the script finds all the keys and
all the minimal covers (well, in most cases - these problems are
computationally hard, Ruby is not super-efficient, and I didn't care
about any optimizations). Then, I had a graduating student re-implement
it in Java and adding a dependency-preserving decomposition in BCNF
(when it exists) or in 3NF to the output. He also did an extensive
experimental analysis.

Both programs are rough prototypes. I have been toying with the idea of
an interactive command-line tool (à la Matlab or R) tailored to database
design for a while. I think it's a pity that the plethora of algorithms
that exist are not collected in a single tool.

If you're interested, I can make the code available.

Derek Asirvadem

unread,
Feb 9, 2015, 4:37:32 AM2/9/15
to
Nicola

Apologies for getting distracted for a day, with the opportunity to Determine Keys Directly. I am back, to giving you a worthy response to this post.

> On Sunday, 8 February 2015 21:39:01 UTC+11, Nicola wrote:
> In article <abc8bebc-4e27-4af4...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> > > On Sunday, 8 February 2015 05:39:26 UTC+11, Nicola wrote:
> > >
> > > > > To find the keys, you
> > > > > must determine which functional dependencies hold.
> > > >
> > > > That is a very slow method, but yes. The tables are dead simple, and the
> > > > attributes should be well-known to anyone who has any experience at all.
> > >
> > > Sure. If you find the keys directly, in fact you have defined some
> > > functional dependencies (from the keys). But, in general, you must
> > > ensure that no other relevant semantic constraints (not necessarily only
> > > functional) are missed.
> >
> > Do you have an example of such ?
>
> Do you mean, an example in which finding the keys directly is harder
> that finding FDs first and deriving the keys from them? ...

Closed under separate cover.

> Well, it depends
> on how clever you are at "seeing" keys, which in turn depends on how
> much experience you have.

My previous comments were not mocking your words, I was trying to say, I am using a scientific method, a few simple steps. No cleverness or special powers required.

The meaning is there in the data, in the proposed column names. One just needs to ask questions and to progress.

One limit that people often have, is they are blind to certain things. Check the Hierarchical Model thread, and you see how it sometimes takes a long time, to get past their blindness. In the case of you people, it is invented blindness, and I lay the blame for that on your teachers, who have taught you falsilties about the HM; the RM; RFS; etc. I can't help you with the blindness, as I can for the other aspects, because it is psychological, they have planted the falsities in your mind.

You guys have a different third problem, separate to not noticing the meaning in the data, the columns, separate to the blindness. You abstract yourself away from the data, such that whatever meaning there is, is lost. Subsequently, when you do the non-FD non-key thing, you are several levels of abstraction away from say me, who never left the data.

> For instance (from H. Koehler):
>
> CourseSchedule(Course, Lecturer, Room, Time)

And now the "harder" problem., with five elements in the UR. Done, Keys determined the "hard" way is actually easy, with very little effort. (Separate to the issue that the problem proposed in the paper disappears in the Relational context, that the paper is for Record Filing systems).

If you look at the data model, you will notice that there are only keys, there no non-key columns, except for the two lowest-levl, transactional tables, and even there the non-key columns are migrated foreign keys. SO the real science, the real value, is not in the Determination of Keys, that part is easy, the real science, the real value, is in the ability to work with Relational Keys. And there, yes, experience matters. But everyone, including me, had to start somewhere, and with experience, travelled to somewhere else. You people have not started, because you are fixed, glued at the hip, to RFSs. You have no idea how limited, how pre-1970, that filing system is. You might see a bit of it in this thread, but you will only really see what the difference is, if and when you start working with the RM, and real Relational Keys.

I am not asking you to give your RFS up, I am asking you to ADDITIONALLY dip your feet into the RM. It took me two years to get the length and breadth of it.

> where a course has only one lecturer, each class has a fixed duration,
> and the obvious constraints hold, such as teachers do not have the gift
> of ubiquity.

That is pretty much a 4-element version of the 5-element version that I solved. And you see how I solved it, with keys, the whole keys and nothing but keys. Yes ?

> You may find the keys "directly" here, of course, but I
> argue that, even if you do that, you are in fact (maybe unconsciously)
> reasoning about the valid FDs.

That isn't fair. I have always said, after Codd, the keys come first, teh FDs second, and yes, the FDs are used to verify and validate the key. But if there is no key, there is nothing for the attribute to be Functionally Dependent upon.

Nothing unconscious, the though process is fully conscious. Sure, I am practised, so I can do it very fast.

And there I am talking about a real FD. Not your non-FD (a tiny fragment of the real FD). So your statement is true if you mean the real FD, and false because you mean the non-FDs, because I do not consider them at all.

The fraudulent use of terms that are established for decades really places a strain on the communication. Only happens in this industry.

> Advantages of making the [non-]FDs explicit are that (1) they provide you with
> a systematic (algorithmic) way to derive all the keys (not that this is
> computationally inexpensive, but in many cases it is efficient enough);
> (2) you have a formal documentation of your systems's requirements.

Well, if you don't have FDs, then sure, you need something. Your non-FDs, plus the algorithm, all stated in 1's and 2's and a's and b's. That no user or developer will understand. But it is something, and it justifies what you have done.

It also speaks volumes about what you have not done.

And all that is unnecessary when you have real FDs, because they are written up everywhere, in every data model, every Business Rule, etc. In plain technical English. No question about the essential nature of documentation. But it has to be useful. Otherwise it is a door-stop.

I mean, look at the gap we have in communication, with your guys having totally different definitions for almost every term in the Relational database business. Even the word Relational is misused, abused, defrauded. So, if you write any of that (your meaning) using words that we know and love, you will be grossly misunderstood, your documentation, separate to it being unreadable, is not useable. You don't supply the dictionary to go with it.

> Since [non-]FDs are a more general concept than keys, you may capture
> constraints that are not captured by keys (I don't think that you were
> asking me to show you an example of this, since you may find it in so
> many books and papers, including Codd's).

I haven't seen it in Codd's, because he uses the real FD.

I have gone through many of the theoretical books you guys use, and I have posted the exact nature of their wrongs. Quickly here, there is nothing of value that I have found in them.

Take the Köhler paper as an example. I get to section [5], and the problem disappears. There is nothing to read after that.

I said he is not erecting a Straw Man argument, but the Date, Darwen, Abiteboul, Hull, Vianu, Fagin, all erect Straw Men, every single time. Propose a nn-problem in the non-RM, and then propose how to solve it outside the RM. Putrid filth. And sum total of it, over decades, diminishes the RM. Eg. why is it that people think there is nothing of the HM in the RM ? Why is it that people like James are surprised (good for him) to find out that that "truth" is in fact false ? It is planted; it is marketed; it is propagandised, at every opportunity.

Nevertheless, I would very much like an example of the thing that "capture constraints that are not captured by keys". Of course, if you know the RM, you will know that there is no such thing in the Relational context, because everything, constraints of any kind, not only FK constraints, are dependent on a Key. In my last project, I implemented probably 50 or so constraints of a type that you would not know about, way beyond all the constraints that you do know, partly because you think that cannot be done in the RM, in SQL, and done transactionally. That comes from mature use of the RM. But every single one of them is dependent on a key.

> Of course, if you are clever enough to design all of your schemas to be
> in Codd's 3NF to begin with (which, as far as I can tell, is implicit in
> your arguments),

Yes.

But not the "clever enough" part. Just if you are educated enough in the RM, and un-subverted enough. The RM is simpler than the various and sundry alternatives. And it is all integrated, woven into one whole, the rest is disintegrated fragments.

> then there's nothing else to discuss: of course all of
> your FDs will be full dependencies from keys.

That is the only kind of FD there is.

I have already stated, I am not saying that theoreticians should not have or use their non-FDS, I am saying it is fraudulent, mind-numbimg, to label that thing "FD".

> You'd never come up with
> the schema above in the first place.

I didn't. A developer who has been reading the afore-mentioned pornography did. And you guys couldn't find anything wrong with his model because who have read and ingested the same books.

> > > You say a schema with a surrogate "key" does not enforce row uniqueness.
> > > I write "key" to emphasize that it does not conform to Codd's definition
> > > because it is a totally made up set of values without any tie to reality
> > > - and that's fine with me to assume that.
> >
> > Ok, but that is despite my post, which details exactly why the word is false
> > and confusing. So to acknowledge that; to acknowledge that it is not
> > Relational; and then to continue using the term with double quotes as a
> > modifier, removes it from the previous category of unconscious fraud, and
> > places it in the category of conscious fraud.
> >
> > Either it is a Key, key, "key", 'key', or it is not.
>
> Ok, let's call it just "surrogate".

Well, that is the technical name (unless your reference is wiki).

Can you name one property of a Key that the surrogate has ?

It is like the problem I had with one of my developers (read your books, brains are scrambled) who was arguing with another developer (stopped reading your bokks after I put the new database in, and he can see that 99% of the problems have disappeared, brain slowly getting back to normal). The first was trying to tell the second that he had a non-unique key (meaning a surrogate). The second wasn't buying it, he knew better, but he could not articulate it (brains not quite normal yet). So I had to intervene, and explain that (a) a key is unique (b) therefore it is not possible to have a non-unique unique thing, (c) and if he had one, each part of the term cancels the other part, so what he has is zero, so stop talking about it. I sent him off to read acouple of articles I wrote decades ago, and he came back and later and said, "you are right, I have been taught to accept schizophrenia". He is sincere, that is why I have him, but the cancer, it is everywhere.

> > Or, you still think a surrogate has some properties of a key. In which case,
> > I have not gotten through to you, and there is something we need to pursue.
>
> We have agreed that a surrogate has no place in a relational schema. So,
> there's no need to discuss surrogates any further at the logical level.

They are not acceptable at the physical level either, so don't try anything freaky like that. It is pharisaic argument (ie. not science, not logic), to abstract surrogates out to the "physical only", and thus implement them. Secnd, it is a lie, becaue your non-FDS, depend on them. So the fact is, if you do use them, you are tightly bound to the physical. Hence, a pre-1970 Record Filing System, with NONE of the integrity, power, or Speed of the Relational model.

That abstracted-to-the-physical-so-it-doesn't-matter-in-the-logical lie, is Darwen's pig poop. I remember it. All of them use it. It is all in the category of "the physical is orthogonal to the logical" lie. Pig poop, fresh, warm and soft.

The RM is not "logical only". You might want to read it.

> > > Saying that it is not in 3NF is akin to saying that an
> > > integer does not run at 100Mph. Not false, but not particularly
> > > significant either.
> >
> > ???
>
> 1. Is 3NF defined in the context of the RM?

Yes.

> 2. Does it make sense to use a definition outside the context in which
> it is given?

No.

But that is a well-known abstractionist's trick. I have already explained in detail, when you exclude the context, when you break something up into fragments and isolate it from its context, from the other fragments, you lose its meaning and value.

> If your answers to these questions are not "yes" and "no", respectively.
> then I think I don't follow you. You are saying that something that is
> not a relational model lacks a property (3NF) that is defined for
> relational models.

See what I mean. I said no such thing. You said it. Yeah, I agree, it is a stupid thing to say.

> Strictly speaking, that's not a wrong statement: also
> integers lack the property of being able to run at 100Mph.

Sure, in a fragmented, unreal universe.

Not in the real universe.

> > The model fails for the following reasons (many instances of said reasons).
> > The ordering of the issues is mine, in that if it isn't Relational, it is not
> > worth bothering about the specifics of a Normalisation error.
>
> Exactly. So why are you so keen to point out that it fails 3NF? Just say
> that it is not relational.

Because the people I deal with, in the real universe, are not as fragmented as you.

Because mot people know (ok, not here) that while the RM and Normalisation are married, and inseparable, that is Normalisation post-RM, and they can also discuss Normalisation without reference to the RM. It happens dozens of times a day where I work, no one says, hey you can't say this, or that with out something else attached to it. Try wiki, that study in mediocrity, look up the NFs, see which ones reference the RM.

Because no one will accept any work, being returned to them rejected, with just "pass/fail" stuck onto it. If they have half a brain, they will want to know why, exactly what was wrong, so that they can fix it. You can't fix "failed RM on 22 counts", you have to know what each count was, what the specific prohition that they broke was, in order to fix it.

Evidently you have no such intent. Your intent is to change your ever-changing "definitions" so that the problem is a non-problem to you. There is not one hair on your head that can implement.

And when we do have those conversations, it is ordinary, common, to state "this breaks x NF", so that they have a clear understanding of what definition to look up, what discussion they need to have with their mentor, to fix it. Because if they keep making the same mistake, over and over, they will lose their job.

Second last, you missed the bit that I understood all this, and wrote my annotations for the benefit of both the fragmented and the integrated:

> > Re Normalisation, (if we consider that outside the Relational Model):
> > 3 It breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).

The "if we consider" covers all types. But you argue.

Last, why don't you try to take the meaning from the words, and deal with that, instead of dealing with the words, abstracted out of their meaning ?
Once I realised what he was trying to do, I told him exactly that. You can't "extend" the RM. The RM can be applied to any field. Applying it to topology is over twenty years old.

He, and I presume you, are talking about something else. Again, fraudulently labelled "RM". Something in your abstract space that keeps shifting and changing, that has a bunch of 42 or so different algebra for it. Sure, that thing, it can be extended.

> Yes, you don't need any of the many abstract definitions of normal forms
> in the literature to design and implement a good database that makes
> your customers happy.

Yes. The un-perverted 3NF, applied fully, is way beyond all the NF definitions that all of you put together can come up with in the next fifty years. The proof is, separate to what I do, that you have come up with nothing in the last forty five years.

They came up with BCNF, 4NF, 5NF, to Codd and me, they are all fragments of 3NF, that only idiots need to have spelled out.

Now they have ETNF, SCGHNFRD, which actively subvert the RM. #NF remains unchanged.

When Darwen made a big deal about 6NF (a) we had been using that construct for twenty years, gee, we jsut didn't have a sexy label for it. (b) there is another use, when I wrote to him about it, with full documentation, he couldn't understand it. (I am not going to reveal that use here, sorry.) Years later, I found out that, gee, just like you people, you can't read models.

> You don't need to care about the nested RM or
> infinite relations.

Pig poop. Breaks 2NF. I do al that (nested relations) in the current RDBMS, the current SQL, without breaking anything. Now Date & Darwen are trying to change the definition of 1NF, to squeeze their abortion "a relation can contain a relation" through, without understanding that we do that all the time, we have had the capability since 1984, as a matter of course. We don't need to break any NF, let alone change definitions, to supply it.

Notice, they are doing the same thing you are doing, further up, changing definitions to make a pig pass off as an eagle.

> You don't have to worry about the fact that a
> relational schema may have a factorial number of keys, or that a schema
> with with fourteen attributes and ten constraints may have tens of
> millions of possible decompositions in 3NF.

Not 3NF, no. Must be some private definition of yours.

> You will never ever find
> such things in the real world. That does not mean that we don't have the
> freedom to explore.

Of course you have the freedom to explore. I don;t see how anything I have said takes away from that. In fact, I have said that we desperately need theoreticians, because this industry is very poorly served. I have my years in R&D, I understand and support that exploratory freedom. God knows I have paid large amounts of money for others to have that freedom.

But some day, one day, you have to produce something that is of value, in the physical universe. And you have produced nothing, after Codd.

So the truth is, it is not about your freedom to explore. It is about not exploring anything of value.

Second, if anyone takes an analytical approach, you have masses of impediments: teachers who teach filth, lies, non-science; who diminish the science that we do have; all of you using definitions for terms that are different to those that the 99% use; etc.

> Welcome to c.d.t. (crazy.database.theory) ;)

Thank you. I have been here before.

And thanks for the clarity in your other posts.

Cheers
Derek

Nicola

unread,
Feb 9, 2015, 5:18:17 AM2/9/15
to
In article <bef08005-8a74-4b97...@googlegroups.com>,
Derek Asirvadem <derek.a...@gmail.com> wrote:

> Guys and Dolls

This prompts me to point out that Nicola is an Italian *male* name. I
usually get treated better than I am worth in public forums when I omit
that :)

> Ok, so it wasn't a contradiction,
> it was missing requirements. "multiple chapters" implies:
> - multiple courses
> - one exam per course
> - multiple chapters per exam, in one sitting

Yes.

> - No redundancies;

In StudentExamination, the fact that, say, Room 101 will host the
Networks exam on 3/10, 1pm, is repeated for each student taking that
exam. This is a form of redundancy.

> no Update Anomalies

Yes, that may well be the case, with the proper inter-relational
constraints in place. For example, the redundancy mentioned above does
not cause update anomalies by virtue of a referential integrity
constraint.

> Back to work.

Definitely.

Derek Asirvadem

unread,
Feb 9, 2015, 6:32:39 AM2/9/15
to
> On Monday, 9 February 2015 21:18:17 UTC+11, Nicola wrote:
> In article <bef08005-8a74-4b97...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> > Guys and Dolls
>
> This prompts me to point out that Nicola is an Italian *male* name. I
> usually get treated better than I am worth in public forums when I omit
> that :)

Thank you for the correction. I have niece named Nicholla.

> > - No redundancies;
>
> In StudentExamination, the fact that, say, Room 101 will host the
> Networks exam on 3/10, 1pm, is repeated for each student taking that
> exam. This is a form of redundancy.

???

Welcome to the world of Relational Databases, Nicola. I see this is your first foray. There is lots to learn. It would be better if you asked a question about what you do not know, rather than making a declaration.

I have no idea what you mean. There is not enough specific info in that statement for me to make a specific answer. So let me take that step by step. Please answer (at least) yes or no to each part.

1. Are you sure you know what Redundancy means, whose "definition" are you using ?
(If you are using Date' or Darwen', I know that they are wrong, because they don't know what a Relational Database is, and they think migrated keys are "redundancies". Other pig poop eaters may well think that same.)
1.a That there are no "forms" of redundancy. Either you have it, or you don't ?

2. Are you saying that there is one row in StudentExamination for (your quoted values):
Student Course DateTime Room
xxxxxx Networks 3/10, 1pm 101 (Student not given)

Or many rows, one per Student, with those three values, repeated x no of students ?

3. Are you aware, that there are two (not one) foreign keys in StudentExamination ?
3.a PK of CourseExamination ( DateTime, Room, Course )
3.b PK of StudentEnrolment ( Student, Course )

4. Are you aware that [3.a] and [3.b] are migrated foreign keys ?
4.a That there is no choice about it, if integrity PK->>FK is required ?
4.b That ( Student, Course, DateTime, Room ) must be carried

5. Ok, now that you have gone through the questions, on what basis, and what columns exactly [ I understand ( Course, DateTime, Room ) ], is it Redundant ?
(I couldn't care less about "forms".)

6. And why exactly is ( Course, DateTime, Room ) included in the "forms of redundancy" list, and ( Student ) excluded ?
6.a Note that Student is "repeated" (by your wording) for every Course they take.

7. And what exactly, would you do, to fix this "form of redundancy" that you claim ? Either in the data model as is, with a correction, or fell free to supply the data model or text DDL of the Record Filing System that you would use. Of course, it too, must not have the "forms of redundancy" that you complain about, either Relational Key columns xor Record IDs.

> > no Update Anomalies
>
> Yes, that may well be the case, with the proper inter-relational
> constraints in place.

I am saying that that is the case with the constraints that are in the model. You don't have to worry about anything that could be, should be, would be, just what is documented.

Either state that you agree, or state that my claim is false, with a reason, not could be would be's.

If you don't know then say nothing, instead of demeaning what you don't know with allegations that cannot be dealt with. That is dishonest.

> For example, the redundancy mentioned above does
> not cause update anomalies by virtue of a referential integrity
> constraint.

Gibberish.

Name the constraint you are talking about. Either one in the model (they are all named) or the one you identify as missing (eg. TableX::TableY)

Start of lesson.

Thank you for looking at the pictures. Use the blue link, and read the IDEF1X Intro document.

Cheers
Derek

Derek Asirvadem

unread,
Feb 9, 2015, 7:06:13 AM2/9/15
to
Nicola

> On Monday, 9 February 2015 22:32:39 UTC+11, Derek Asirvadem wrote:

I withdraw the courtesy that I extended, in that post, to lead you down the garden path, holding your hand, teaching you not to be afraid of the butterflies. Goose bumps are ok.

It is not my job to teach you, or to un-teach your false teaching. You are too argumentative and not reliable enough to engage re education.

If, and when you come to me with a question, I will answer it.

For your declarations and claims, I will just destroy it.

Cheers
Derek

Erwin

unread,
Feb 9, 2015, 7:23:50 AM2/9/15
to
Op maandag 9 februari 2015 12:32:39 UTC+1 schreef Derek Asirvadem:
> Welcome to the world of Relational Databases, Nicola. I see this is your first foray. There is lots to learn. It would be better if you asked a question about what you do not know, rather than making a declaration.


Boy, you really don't have even an itch of an inkling of a beginning of a clue, do you ?

Derek Asirvadem

unread,
Feb 9, 2015, 7:27:04 AM2/9/15
to
Nicola

> On Monday, 9 February 2015 21:18:17 UTC+11, Nicola wrote:
> In article <bef08005-8a74-4b97...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> > Ok, so it wasn't a contradiction,
> > it was missing requirements. "multiple chapters" implies:
> > - multiple courses
> > - one exam per course
> > - multiple chapters per exam, in one sitting
>
> Yes.

That's not what you when you gave me the incorrect info. Good to know that you agree with the info that I found myself.

> > - No redundancies;
>
> In StudentExamination, the fact that, say, Room 101 will host the
> Networks exam on 3/10, 1pm, is repeated for each student taking that
> exam. This is a form of redundancy.
>
> > no Update Anomalies
>
> Yes, that may well be the case, with the proper inter-relational
> constraints in place. For example, the redundancy mentioned above does
> not cause update anomalies by virtue of a referential integrity
> constraint.
>
> > Back to work.
>
> Definitely.
>

Reminds me of the poor relations when they visit. They look up at the big house, the swimming pool, the horses, and they say, "yes, but they would be having marital problems". Sick, yes.

Either put up, or shut up. Name the missing constraint or tick mark or whatever it is that you see, or say nothing. Because if you do say something, you will be proving something bout yourself. If something is missing, name it, use the table names with some character between them. Keep the pharisee to yourself.

The definition of Redundancy is that it cause an Update Anomaly. One, for each occurrence. Each proves the other. If there are 12 Redundancies, there are 12 Update Anomalies. If there are Update Anomalies, it is caused by a Redundancy.

So when you argue against yourself, as above, you destroy your own point.

> the redundancy mentioned above does
> not cause update anomalies

Then it is not a Redundancy, silly, give it another name.

> by virtue of a referential integrity
> constraint.

Really ? You don't know that there are many types of constraints, you know only one ?

I have six, not counting three that I publish only for customers. Obviously, you won't see the three, they are not documented, just look for the six minus one.

Because, if you don't know the others, the five, you will not recognise them in the model, and I have field you hilarious accusations. So, please, go off and do some reading. Assist me in reducing the typing I have to do for you.

Cheers
Derek

Nicola

unread,
Feb 9, 2015, 10:34:50 AM2/9/15
to
In article <694b8c03-9c5e-4dc1...@googlegroups.com>,
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > > - No redundancies;
> >
> > In StudentExamination, the fact that, say, Room 101 will host the
> > Networks exam on 3/10, 1pm, is repeated for each student taking that
> > exam. This is a form of redundancy.
>
> 1. Are you sure you know what Redundancy means, whose "definition" are you
> using ?
> (If you are using Date' or Darwen', I know that they are wrong, because they
> don't know what a Relational Database is, and they think migrated keys are
> "redundancies". Other pig poop eaters may well think that same.)

No, foreign keys have nothing to do with it. Someone (Tegiri, I think?)
has said it already in some other thread: the accepted definition of
redundancy is Shannon's. The relationship between that notion and
relational databases has been investigated thoroughly.

> 1.a That there are no "forms" of redundancy. Either you have it, or you
> don't ?

True. Then, read: "This is redundancy".

> 2. Are you saying that there is one row in StudentExamination for (your
> quoted values):
> Student Course DateTime Room
> xxxxxx Networks 3/10, 1pm 101 (Student not given)
>
> Or many rows, one per Student, with those three values, repeated x no of
> students ?

The latter.

> 3. Are you aware, that there are two (not one) foreign keys in
> StudentExamination ?
> 3.a PK of CourseExamination ( DateTime, Room, Course )
> 3.b PK of StudentEnrolment ( Student, Course )

Yes.

> 4. Are you aware that [3.a] and [3.b] are migrated foreign keys ?

Yes.

> 5. Ok, now that you have gone through the questions, on what basis, and what
> columns exactly [ I understand ( Course, DateTime, Room ) ], is it Redundant
> ?

Yes, it's those three columns that generate redundancy, in the sense
explained below.

> 6. And why exactly is ( Course, DateTime, Room ) included in the "forms of
> redundancy" list, and ( Student ) excluded ?

Note that if you show me *only* this:

StudentExamination
Student Course DateTime Room
--------------------------------
Nicola Networks 3/10,1pm 101
Ada ? 3/10,1pm 101

given the requirements, I can infer the only possible value for ?. This
is not possible for Student.

> 6.a Note that Student is "repeated" (by your wording) for every Course they
> take.

You can't make the same kind of inference in that case.

> 7. And what exactly, would you do, to fix this "form of redundancy" that you
> claim ?

I wouldn't fix it. Your model is good enough. My point was just that it
is not true that it has "no redundancy".

Anyway, I'll show an alternative design, which I hope you will accept as
relational, normalized, etc... Note that I am not claiming that it is
better than (or equivalent to) yours in any sense nor that it is the
only possible alternative. Sorry for using just textual form:

Session(DateTime, Course)
Key: {DateTime, Course}
FK: {DateTime} refers to Time
FK: {Course} refers to Course

ExamLocation(DateTime, Room, Course)
Key: {DateTime, Room}
FK: {DateTime, Course} refers to Session
FK: {Room} refers to Room

ExamAppointment(Student, Course, DateTime)
Keys: {Course, Student}, {Student, DateTime}
FK: {Course, Student} refers to StudentEnrolment
FK: {DateTime, Course} refers to Session

StudentExamination(Student, DateTime, Room)
Key: {Student, DateTime}
FK: {DateTime, Room} refers to ExamLocation
FK: {Student, DateTime} refers to ExamAppointment

ExaminationChapter(Student, Course, Chapter)
Key: {Student, Course, Chapter}
FK: {Student, Course} refers to StudentEnrolment

This is a sample instance (I omit instances of Course, Student, etc...
for simplicity):

Session
DateTime Course
---------------
10/6 Networks
7/7 Networks
10/6 Security
1/7 Security

ExamLocation
DateTime Room Course
--------------------
10/6 101 Networks
10/6 102 Networks
7/7 101 Networks

ExamAppointment
Student Course DateTime
-----------------------
Nicola Networks 10/6
Nicola Security 1/7

StudentExamination
Student DateTime Room
----------------------
Nicola 10/6 102

ExaminationChapter
Student Course Chapter
----------------------
Nicola Analysis 1
Nicola Analysis 2

Two remarks only:

1. I have deliberately associated ExaminationChapter to StudentEnrolment
to make the choice of chapters independent of the exams (imagining that
students are told which chapters they have to study during the course).
This is totally arbitrary of course: you could make {Student,Course} a
foreign key referring to ExamAppointment, for example. Or do something
else.

2. StudentExamination is needed because in general an exam may take
place in more than one room, and we want to know where each student must
go on the day of the exam.

> > > no Update Anomalies
> >
> > Yes, that may well be the case, with the proper inter-relational
> > constraints in place.
>
> I am saying that that is the case with the constraints that are in the model.

I was not denying that. Just I hadn't checked carefully enough as to be
sure. Indeed, that seems the case.

> > For example, the redundancy mentioned above does
> > not cause update anomalies by virtue of a referential integrity
> > constraint.
>
> Gibberish.
>
> Name the constraint you are talking about.

I mean, you cannot update this:

Student Course DateTime Room
--------------------------------
Nicola Networks 3/10,1pm 101
Ada Networks 3/10,1pm 101

to this:

Student Course DateTime Room
--------------------------------
Nicola Networks 3/10,1pm 101
Ada Security 3/10,1pm 101

which would create an inconsistency, because of the foreign key on
{Course, DateTime, Room}. If you remove the foreign key, no constraint
is violated.

Nicola

unread,
Feb 9, 2015, 10:38:51 AM2/9/15
to
In article <c0f47c8f-66bf-4009...@googlegroups.com>,
Good Lord, I shouldn't have changed sex!

Erwin

unread,
Feb 10, 2015, 2:28:56 AM2/10/15
to
Op maandag 9 februari 2015 16:38:51 UTC+1 schreef Nicola:
> In article <>,
> Derek Asirvadem <> wrote:
>
> > Nicola
> >
> > > On Monday, 9 February 2015 22:32:39 UTC+11, Derek Asirvadem wrote:
> >
> > I withdraw the courtesy that I extended, in that post, to lead you down the
> > garden path, holding your hand, teaching you not to be afraid of the
> > butterflies. Goose bumps are ok.
> >
> > It is not my job to teach you, or to un-teach your false teaching. You are
> > too argumentative and not reliable enough to engage re education.
> >
> > If, and when you come to me with a question, I will answer it.
> >
> > For your declarations and claims, I will just destroy it.
>
> Good Lord, I shouldn't have changed sex!
>
> Nicola
>
> --- news://freenews.netfront.net/ - complaints: ---

Nah.

It didn't make no difference. It would have ended up there anyway. It always does.

You can keep having whatever sex you like :-)

Derek Asirvadem

unread,
Feb 10, 2015, 5:39:50 PM2/10/15
to
Nicola

> On Tuesday, 10 February 2015 02:38:51 UTC+11, Nicola wrote:

> Good Lord, I shouldn't have changed sex!

That has nothing to do with it, I would have made the same decision (to withdraw the courtesy) because it was based on the /whole/ history (when he was a she, and she was not reasonably open to being taught; asking for it indirectly; etc). I just would have used sweeter words when I broke the bad news to her.

Two choices are open to you:

1.
> > If, and when you come to me with a question, I will answer it.

I answer questions from anyone, even sewer rats such as Erwin.

Remain in your current obstinate, argumentative, un-teachable position, and ask a question. That will be near-impossible because of your hollow pride, your pretence at knowing what you are evidently quite clueless about.

2.
> > You are
> > too argumentative and not reliable enough to engage re education.

Ask for the courtesy to be extended to you in future.

You don't have to change sex, yet again. You just have to stop pretending, and acquire a bit of humanity.
- Give up some of the schizophrenia that has been implanted in your brain, the self-contradiction.
- Think a little bit, and form your arguments properly (don't make idiotic claims about what you do not understand).
- Stop speaking out of both sides of your mouth (hypocrisy; argue one truth when convenient; ignore that truth and argue against it when it is convenient).
- Stop using private "definitions" of the world, at least the discussed world, when dealing with people from the world.
- Bring yourself to post gratitude and acknowledgement when you learn something, and acknowledge a point that you have conceded instead of just dropping it.

Either one, is a big task.

> On Tuesday, 10 February 2015 02:34:50 UTC+11, Nicola wrote:

Until then, posts like that one, which demonstrate both your eagerness to learn, and your argumentative sub-human impediments to learning, will go un-answered, your learning will not progress.

Cheers
Derek

Derek Asirvadem

unread,
Feb 10, 2015, 6:09:30 PM2/10/15
to

Nicola

> On Wednesday, 11 February 2015 09:39:50 UTC+11, Derek Asirvadem wrote:
>
> > On Tuesday, 10 February 2015 02:38:51 UTC+11, Nicola wrote:
>
> > Good Lord, I shouldn't have changed sex!
>
> Until then, posts like that one, which demonstrate both your eagerness to learn, and your argumentative sub-human impediments to learning, will go un-answered, your learning will not progress.


But the Grace of God gives me Charity, so I will give you a reminder of what you have lost.

> On Tuesday, 10 February 2015 02:34:50 UTC+11, Nicola wrote:
> In article <694b8c03-9c5e-4dc1...@googlegroups.com>,
> Derek Asirvadem <derek.a...@gmail.com> wrote:

What, no acknowledgement "Hey Derek, you did a good job of finding the Keys the "hard" way, without employing our obsessive-compulsive non-FDs, and the long and winding road that takes forty times as long" ?

What, no acknowledgement that Relationalising the data eliminates the theoretical paper; that the paper is NOT Relational, all such allegations are false; that there is no problem as proposed if the data was Relational ?

Ok, we ignore the purpose of the exercise, the unexpected finding, we won't even mention any of the big issues, or the data model, we zero in to the tiny ones.

> > > > - No redundancies;
> > >
> > > In StudentExamination, the fact that, say, Room 101 will host the
> > > Networks exam on 3/10, 1pm, is repeated for each student taking that
> > > exam. This is a form of redundancy.
> >
> > 1. Are you sure you know what Redundancy means, whose "definition" are you
> > using ?
> > (If you are using Date' or Darwen', I know that they are wrong, because they
> > don't know what a Relational Database is, and they think migrated keys are
> > "redundancies". Other pig poop eaters may well think that same.)
>
> No, foreign keys have nothing to do with it. Someone (Tegiri, I think?)
> has said it already in some other thread: the accepted definition of
> redundancy is Shannon's. The relationship between that notion and
> relational databases has been investigated thoroughly.

Imbecile.

The data model is an implementation, which is limited to that which is available for implementations. It is not a futuristic picture of what a database could be, in the future, when the theoreticians have implemented their theory into tools. You are applying something that is valid in one context, to a context in which it does not apply, it cannot apply, at least for another forty five years.

Why don't you apply the theory of flower arrangement to the data model ?

> > 1.a That there are no "forms" of redundancy. Either you have it, or you
> > don't ?
>
> True. Then, read: "This is redundancy".

Now that you have corrected your words, yes, I can read them.

It remains an unproven, unevidenced, claim. From a person who has proved that they have half an understanding of databases, and an incorrect understanding that hinders himself, unable to get past the obstacle, because he insists that when he is wrong, he is right. Unteachable. If you want to learn, you have to be open to the notion that what you think is wrong, is right. Hence ask a question.

> > 6. And why exactly is ( Course, DateTime, Room ) included in the "forms of
> > redundancy" list, and ( Student ) excluded ?
>
> Note that if you show me *only* this:
>
> StudentExamination
> Student Course DateTime Room
> --------------------------------
> Nicola Networks 3/10,1pm 101
> Ada ? 3/10,1pm 101
>
> given the requirements, I can infer the only possible value for ?. This
> is not possible for Student.

Imbecile.

This is not a Sudoku puzzle to be solved. Such inferences (which are valid elsewhere) are invalid here. This is a database, and you are looking at occurrences of data. You cannot even distinguish the difference between Keys and non-key data.

Whatever inferences you draw from the above, are not actually from the same table, they are Atomic facts in another table CourseExamination.

In an RDB, facts are Atomic. Atomic means, it cannot be broken up. The fact that a CourseExamination will occur is established by treating ( DateTime, Room, Course ) as an Atom. The combination of any TWO of those elements is not a fact (that concerns the requirement), they are sub-atomic elements, they are irrelevant.

Thus, when that Atomic fact CourseExamination is carried, migrated, to StudentExamination, for the purpose ensuring that all \student examinations\ occur within the limits of the \course examinations\ that have been scheduled, that Atomic fact is carried, whole, not in bits and pieces.

> > 6.a Note that Student is "repeated" (by your wording) for every Course they
> > take.
>
> You can't make the same kind of inference in that case.

Aw shucks, the Sudoku rule doesn't work.

Good, it doesn't work in [6] as well.

If your argument for [6], breaks when you apply it to [6.a], it breaks when you apply it to [6]. The inference is the same. The inference is invalid in both cases.

There, the right side of your mouth proved that the left side of your mouth is false. You did it all on your own. Give the girl, I mean boy, a yellow star.

> > 7. And what exactly, would you do, to fix this "form of redundancy" that you
> > claim ?
>
> I wouldn't fix it. Your model is good enough. My point was just that it
> is not true that it has "no redundancy".

Imbecile.

It is not false either.

Just because an imbecile can't see that "no redundancy" is true, doesn't make it "not true". In the world of applied science, we need a fact, some evidence to support your insipid claim, otherwise the claim is unevidenced bleating, telling us about the level of your capability, it does not tell us anything (one way or the other) about the data model.

Just look at Erwin. Sniping from under a manhole cover. He can't provide any evidence, for any of his one-line remarks, and he disappears back into the depths of his putrid filth. His remarks cannot be proved false, because he gave no evidence that can be used to prove it true. You are doing the same thing (I am not saying that you live in the sewers like Erwin has been doing, for decades). You are making imbecilic, un-scientific comments, without evidence. Sniping. So put up the evidence, or shut up.

This neither-true-nor-false statement is an ancient trick that is used to demean something, without addressing it squarely. It is a form of attack that is used by the mentally crippled, God's Cursed Ones, who cannot attack something openly and honestly, as scientists do. Pig poop eaters.

Choose your friends carefully.

----
A more honest remark might be, that you refer to those tables in my model, which you are using as reference. They are not "omitted". But then, you live in an universe that is fragmented.
That (data content, not the table descriptions) is unnecessary. You guys have no idea that you are obsessed with data content, that that takes your focus off the real work: understanding the data; classifying it by type, meaning; determining facts; determining dependencies of said facts; determining Keys; etc; etc. Instead you play with puzzles concerning data content, using non-FDs.

> Two remarks only:
>
> 1. I have deliberately associated ExaminationChapter to StudentEnrolment
> to make the choice of chapters independent of the exams (imagining that
> students are told which chapters they have to study during the course).
> This is totally arbitrary of course: you could make {Student,Course} a
> foreign key referring to ExamAppointment, for example. Or do something
> else.
>
> 2. StudentExamination is needed because in general an exam may take
> place in more than one room, and we want to know where each student must
> go on the day of the exam.

1. Most of that is your mental masturbation, and I must say, you are using a good lubricant.

The requirement is the requirement. If you build sand castles and imagine twenty virgins waiting inside, behind your eyelids, over and above the requirement, we don't need to hear about them. Just get the requirement, and get it right. After that, feel free to extend it.

There is evidence here, and earlier when you gave me incorrect info, that you cannot correctly discern the requirement from Köhler's two paras. There is no point in going on about could-be's and might-be's and bumble bees.

You might want to ask a question:
Gee, Derek, how did you manage to discern all that, all the rules that you have implemented in the data model, from those two lousy paras ?

2. Now this is hilarious. You have several redundancies in that, real redundancies, not unproven claims, but you don't see it. Side-splitting. I have to go to the toilet.

> Note that I am not claiming that it is
> better than (or equivalent to) yours in any sense

Ok, you are right. It has no purpose. But you post it anyway. SO there must be some hidden purpose.

> which I hope you will accept as
> relational, normalized,

Aha, the hidden purpose is revealed.

So what you are really saying is "Derek, please help me. Is this a correct progression, a valid start, in order get to the complete model ?"

Since you are asking, since God's Grace is without limit, I will give you the service you indirectly request.

=======================
Requested Service
=======================

Acknowledgement
--------------------

> What, no acknowledgement "Hey Derek, you did a good job of finding the Keys the "hard" way, without employing our obsessive-compulsive non-FDs, and the long and winding road that takes forty times as long" ?

The very act of attempting a data model, stands as:
- an acknowledgement of that fact, and
- since you are attempting some form of my exercise, that it is of value to you.

> What, no acknowledgement that Relationalising the data eliminates the theoretical paper; that the paper is NOT Relational, all such allegations are false; that there is no problem as proposed if the data was Relational ?

The very act of attempting a data model, Relationalising Köhler's data heap, stands as:
- an acknowledgement of each of those facts, and
- since you are attempting some form of my exercise, that it is of value to you.

Congratulations, your understanding of the Relational Model exceeds Köhler !

Scope
-------

Your proposed text strings are half-right, half requirement-exceeded. This is limited to the half that is right, it excludes the requirement-exceeded half, which remain invalid.

I give summaries, and explicit direction, but I can't be bothered to enumerate the errors. An enumeration is not necessary for you to progress, to learn.

Re Requirement
---------------------

- it fails to meet the requirement

- it fails to make the obvious constraints
___ let alone all the constraints in the requirement
_____ (let alone all the constraints that I implemented, which is beyond the requirement).

Normalisation/Modelling
-------------------------------

- It is naïve (especially when you consider that you are looking at a complete data model).

> So what you are really saying is "Derek, please help me. Is this a correct progression, a valid start, in order get to the complete model ?"

- Yes it is. It is an early stage, not yet intermediate level, of the modelling process. You are have established some facts, correctly, and those facts are:
___ a. NOT yet validated, verified, tested (hence less-than-intermediate)
___ b. NOT Normalised (ditto)

- if and when you progress [a], and complete the modelling process, those facts will progress to being Atomic, verified, xor eliminated
- if and when you progress [a], and complete the modelling process, those verified Atomic facts will progress to being Normalised
- ie. you are used to working with binary relations, you have not graduated to ternary relations
- when that magical moment takes place, one ternary relation will replace two binary relations (normalisation), one higher-level Atomic fact states two or three other facts, loss-less-ly, and thus the two or three other facts can be Normalised OUT of the model.
___ Note I said ternary relation, that means three keys, not three columns (three keys may well be ten columns). In the case of my StudentExamination, two keys (one three columns and the other two columns), sit in four columns

- you do not yet understand, that in an RDB, each column does way more than one task
___ eg. my StudentExamination.Course does four separate and discrete tasks

- it is weird (normal for a stunted dwarf, weird for a human) that you can infer some logic, and make *invalid* inferences in the right place, or *valid* inferences in the wrong place, but you cannot use those same powers of logic and inference and identify valid inferences in the right place. Truly amazing.
- you have great gaping holes in your proposition, in the sense that you have lost integrity (or you are clueless about what integrity can be had in a Relational Database) that I have in the data model.
- Ie. a number to FK relationships are missing
- Ie. the ability to read (let alone implement) compound keys, and their meaning is only partial, incomplete
___ eg. your StudentExamination does not ensure that the Course is valid (that it is the Course for which the StudentAppointment is booked, or that it is a Course that the Student is actually taking)
___ eg. your ExamLocation and Session can be normalised into a single Atomic fact, loss-less-ly (note my comments re ternary facts, above)
___ eg. your StudentExamination and ExaminationAppointment can be normalised into a single Atomic fact, loss-less-ly
- again, the mistakes and missing bits are too many to enumerate.

- There are four gross errors re Normalisation which I can't be bothered to enumerate. If you had any of the tools that we have been using since 1985, you would have a visual model (engages the right brain, 96% of the power), and those errors are plainly visible. But you are still, in 2015, using text strings (engaging 4% of the power) that we stopped using in 1985, so those errors will be much more difficult to detect, to 'see'.

But for God's Graces, which are boundless. Have a look at this:
http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Nicola%20A.pdf

Summary
It (your suggested tables) is incomplete, normalised in the sense of up-to-this-point, with issues noted, but unnormalised overall because there are many resolved and duplicated facts. It is incomplete. As you can see (if you want you can infer backwards from my tables), the normalised model has far fewer tables; no duplication; no redundancies.

In terms of a level of Normalisation, or progress of the model, if Köhler's is zero, and my finished model is a ten, yours is a four. Well past the beginner stage, well past the RFS mentality of your colleagues, but not quite intermediate yet. Some RFS mentality remains, and you are bogged down with irrelevant non-FDs.

Relational
--------------

a. On the face of it, it is Relational, but because it is incomplete (detailed baove), the declaration cannot be made. Eg. entry-level is broken, there are masses of duplication.

b. It breaks the RM for second-level items, but I can't be bothered to enumerate them, fix the entry-level first.

c. It is not Relational along the same lines as for Normalisation above (not counting [a][b] of course): that part that is, is, and the part that is incomplete, is not.

Cause
-------

Well to someone who teaches this, the cause is obvious. Separate to the fact that you have little understanding of Relational Keys and their Integrity (which you are trying to learn, big tick); separate to the causes I have already commented about here and elsewhere, the pig poop eating teachers, and their poisonous books. You are focussing on the non-FDS, it is an obsession with you guys.

Now Köhler did the same thing, and I wiped his problem out. So you really should know that that means all the logic he used to construct his case is false. But you use it anyway, and you take his non-FDS, knowing them to be devoid of value (after my floor-wiping), knowing them to be erroneous, and you use them. To model the data. 42 what-ifs. You will spend your life contorting the model with what-if non-FDs, twenty to thirty models. But it is fun, an obsession.

Forget all that nonsense, all that mental masturbation, forget your obsession, and focus on understanding the data. Yes, you do need progressions, but with this amount of data, just three or four. Don't worry about the fact that I did it in one step, that I can 'see' the data more clearly.

You can also learn from examining my model, and comparing it with yours: the entrie Köhler problem is solved with Relational Keys, only Keys, and nothing but Keys, so help me Codd. He does not mention keys (except in the Terminology section), he, like you guys do not understand the RM and that Relational Keys are central. For you, although you are starting to see the light, you are trying to use the minimal set of columns in each Key, eg. a two-column key that does one thing, where I use a four-column Key that does ten things.

===========================
End Requested Service
===========================

> > > > no Update Anomalies
> > >
> > > Yes, that may well be the case, with the proper inter-relational
> > > constraints in place.
> >
> > I am saying that that is the case with the constraints that are in the model.
>
> I was not denying that.

Imbecile.

So the claim is false. You are not denying my declaration, but you are saying, without any proof, that there could be snow in the Seychelles at this time of year.

> Just I hadn't checked carefully enough as to be
> sure.

If you want to be treated as a credible person, you have to do your homework, first, before making unfounded claims, second, and be able to back up your claims when called for, third.

Otherwise, you prove, by your actions, that you are not credible. The devil's children work backwards. Scientists work forwards.

> Indeed, that seems the case.

Well, then, shut up. If you were a gentleman, you would retract you imbecilic claim.

"Ok, Derek, you are right, there are no redundancies" is too much to expect from a dishonest person, a snake, they have to say "Until such time as I can prove a redundancy exists, I can't say for sure that there isn't one". They miss the fact that they have just proved themselves to be a non-scientist, a nonsense-ist. Gobbledegook, like a turkey. Sub-human.

Put up or shut up.

I won't hold my breath.

> > > For example, the redundancy mentioned above does
> > > not cause update anomalies by virtue of a referential integrity
> > > constraint.
> >
> > Gibberish.
> >
> > Name the constraint you are talking about.
>
> I mean, you cannot update this:

Oh God, another question posed as a declaration.

StudentExamination

> Student Course DateTime Room
> --------------------------------
> Nicola Networks 3/10,1pm 101
> Ada Networks 3/10,1pm 101
>
> to this:
>
> Student Course DateTime Room
> --------------------------------
> Nicola Networks 3/10,1pm 101
> Ada Security 3/10,1pm 101
>
> which would create an inconsistency, because of the foreign key on
> {Course, DateTime, Room}.

Imbecile.

You can't "create" an inconsistency, precisely because of the very foreign key constraint that you refer to. You are two things in one sentence, each of which contradicts the other.

That imbecility, that "creation", that "inconsistency", is prevented, by the constraint.

So, the constraint is good, and the redundancy/update anomaly does not exist.

> If you remove the foreign key, no constraint
> is violated.

Imbecile.

That is akin to saying, as idiotic as, if you remove the condom, then no prophylactic will be violated if the woman falls pregnant. The express purpose of the condom is to prevent the pregnancy, to play hide-the-sausage with no consequences.

If you remove that foreign key, you would introduce SEVERAL inconsistencies (in an RDB, each Thing performs more than one Task). It is there for a purpose, it is not an accident. If you remove it, you ALLOW StudentExaminations for ( DateTime, Room, Course ) that are NOT in CourseExamination, which is a separate Atomic fact, which cannot be split up.

Part I Higher Order Answer
---------------------------------

a. There is no fact CourseExamination( Security 3/10,1pm 101 ).

b. In fact, it contradicts a known fact CourseExamination( Networks 3/10,1pm 101 ).

c. And we prevent more than one examination happening in the same room at the same time (as per requirements; via CourseExamination.AK, as per Notes).

Therefore your proposed change is invalid, it is prevented. By constraints alone.

> I mean, you cannot update this:

Yes, by design, imbecile, not by accident.

Now if your brain operated at human capacity, you would understand that,
== AND == that that is all that needs to be understood,
== AND == you will go away and ponder the great things that RDBs can do, that the RFS cannot do.

Such as, here is an example of the higher level of Integrity that RDBs have, that RFSs do not have. Such as, here is an example of how DKNF is achieved (not the deranged Fagin definition, which I have exceeded, but the Codd intent).

But there is a fair amount of evidence that your teachers have ravaged your brain, so you cannot operate at human levels, you need the explanation that sub-humans need ...

Part II Lower Order Answer
---------------------------------

Second, you are treating StudentExamination.Course as non-key data. You are limited to 'seeing' the RDB through the myopic lens of RFS. Take your time, everyone who has been liberated from their cage, needs time to learn how to run. But it is stupid to refuse to run, or learn how to run, once you have been liberated.

In the RFS, in each record, you have one non-key record ID, and all the fields, are non-key data, that you can change. (And yes, there you can make your "inferences", there they are not idiotic).

In the RDB, which is made up of Keys, you have many columns which contain Keys or parts of a Key, as well as migrated keys, and all the Keys are Atomic. Plus columns of non-key data, in each table.

Sure, you can walk up to the RDB and change non-key data, some attribute in some row. But you can't walk up to the RDB and change Keys. Why ? Because the *structure* of any DB is based on the relationships, and in an RDB those are the Keys (in an RFS those are non-key IDs). Because each Key is a fact, that other tables depend upon, or are dependent upon (in the case of StudentExamination). Thus you can't change the Keys that are migrated to child tables, because you are in fact trying to change a fact, that is established higher up in the hierarchy, you are not trying to "change data", you are trying to change Keys, at a level lower than the Atomic fact that the Key represents.

If you understood that, then you would realise, the change you are attempting is invalid in the location that you are attempting. If you want to change the fact of Ada's StudentExamination, you can't change it there. Because it is not an isolated fact with no relationships. It is a fact,
a. based on an assumed parent fact, that there is a CourseExamination in ( Security | 3/10,1pm | 101 ), which had better exist first, which can't because it is prevented
== AND ==
b. based on an assumed parent fact, that there is a StudentEnrolment ( Ada | Security ), which had better exist first

c. So it is naïve, the kind of thing we have to explain to office juniors, to attempt to change either of those higher-level facts[a][b], in StudentExamination which is a lower-level that depends on the higher level.

d. The method is:
___ Delete the fact in StudentExamination ( Ada | Networks ). Notice, only the PK is required, but you are actually addressing the fact of ( Ada | Networks | 3/10,1pm | 101).
___ Add the fact in CourseExamination ( Networks | DateTime_X | Room_Y ), which cannot be ( 3/10,1pm | 101 ) because that slot is already taken by ( Networks )
_____ which may well cause changes further upstream, such as adding a Room ( Room_Y ) or DateTime ( DateTime_X )
___ Add the fact in StudentEnrolment ( Ada | Security ), because it does not exist yet
_____ which may well cause changes further upstream, Add Student ( Ada )
___ Now add the fact StudentExamination( Ada | Security | DateTime_X | Room_Y )
_____ which due to the previous steps, would now be valid, whereas StudentExamination( Ada | Security | 3/10,1pm | 101 ), is, was, and remains, invalid.

Imbecile.

End I & II
---------------

For the next time, keep your idiotic claims to yourself, and choose one of the two available options, explicitly:

Two choices are open to you:

1. Remain in your current sex, and ask a question.

2. Ask for the courtesy, the service of education, to be extended to you.

Cheers
Derek

Derek Asirvadem

unread,
Feb 10, 2015, 11:41:16 PM2/10/15
to
Nicola

On Wednesday, 11 February 2015 10:09:30 UTC+11, Derek Asirvadem wrote:

< big snip, OMG, what a big snip. I mean any and all points discussed therein >

I forgot to mention something. It is pedestrian to me because I use it all the time, but I forgot the notion is (a) denied (b) impossible (c) exclusive [HM xor RM], here amongst the theoreticians.

Congratulations, you have achieved Hierarchical Data Modelling within Relational Data Modelling !

Yes, quite unconsciously, without thinking about it.

Since the data is hierarchical, since you accepted parts of my model (which acknowledges the hierarchical nature of the data), since you dealt with your perception of the Köhler data, and you dealt with it hierarchically ... you are exercising (ie. not denying) the Hierarchical view over the data, and you are executing Hierarchical Data Modelling within the Relational Data Modelling context.

Yes, the two are inseparable.

Yes, every Key that you suggested is Hierarchical. Yes, some Keys maintain not one but TWO hierarchies.

Yes, by doing that, you are modelling more Relational Integrity than can be had in the RFSs of your colleagues (less than mine, no matter, more than RFS).

Certainly puts the lie to [a][b][c], doesn't it. So much for pig-poop-eaters. Keep them locked up in the barn. Or the asylum.

And that puts you two notches above your colleagues. One for not modelling an RFS, one for handling hierarchies naturally, within Relational.

I think, this deserves a silver star.

Congratulations again !

Wooohooo !

>>>>
(Slow)
I'd like to thank the guy
who wrote the song
that made my baby
fall in love with me

(Fast)
Who put the bomp in the bomp-bah-bomp-bah-bop ?
who put the ram in the ramma-lamma-ding-dong ?
who put the bop in the bop-she-bop-she-bop ?
who put the dip in the dip-de-dip-de-dip ?

who was that ma-an
I'd like to shake his hand
he made my ba-by
fall in love with me-ee

...
(Chorus, Backing vocals)
Who-oo-oo-oo oo-oo-oo-oo-oo-oo-oo ?
Who-oo-oo-oo oo-oo-oo-oo-oo-oo-oo ?
Who-oo-oo-oo oo-oo-oo-oo-oo-oo-oo ?
Who-oo-oo-oo oo-oo-oo-oo-oo-oo-oo ?
...

Lyrics: Barry Mann, Gerry Goffin
Music: Johnny Maestro
Artist: Barry Mann and the Halos
1961
http://youtu.be/lXmsLe8t_gg?list=RDiDcvmrHV9Jc
<<<<

Who ? Er, not me, the incomparable Dr Edgar F Codd, I am just a faithful, un-subverted disciple.

Small but Important Correction
-----------------------------------------------------

d. The method is:
___ Delete the fact in StudentExamination ( Ada | Networks ). Notice, only the PK is required, but you are actually addressing the fact of ( Ada | Networks | 3/10,1pm | 101).
___ Add the fact in CourseExamination ( Networks | DateTime_X | Room_Y ), which cannot be
( 3/10,1pm | 101 ) because that slot is already taken by ( Networks )

Should be:
___ Add the fact in CourseExamination ( Security | DateTime_X | Room_Y ), which cannot be

Cheers
Derek

Derek Asirvadem

unread,
Feb 11, 2015, 2:06:27 AM2/11/15
to
==== JKL

James

Thank you for your response.

It is a bit hard to respond to this new post here, as my head is stuck in the Hierarchical Model thread, waiting for the several loose ends to be tied up.

> On Monday, 9 February 2015 10:59:35 UTC+11, James K. Lowden wrote:
> On Sat, 7 Feb 2015 17:46:38 -0800 (PST)
> Derek Asirvadem <derek.a...@gmail.com> wrote:

> I think this exchange illustrates a difference in tradition that you
> feel is idiotic but is really just a question of what one assumes.
>
> From your point of view, you have a customer and a system that
> maintains addresses in a particular place and time. The columns have
> meaning (exemplified in their names) and the keys more or less announce
> themselves to you.

Yes, please! They jumped out at me.

> Any ambiguity or error can be addressed by
> discussing them with your customer. I.e., by agreeing on their
> meaning.

Sure. And from the outset, I have stated that I am the DBA, the policeman. I can answer any of your questions re meaning or anything else. But you have to ask them, I have no idea what you do or do not know, noting that they pretty much "announce themselves to you".

> Any formalism with [non-]FDs is pointless.

Yes.

Note my insertion, you guys do not use the FD definition, you have a fragment of it. I think you mean that fragment, not the real thing, and you use the fragment in a backwards or bottom-up process. The real thing employs a process of determining the Keys first, and testing them for validity using the FDs, second, which is an integral part of [cannot be divorced from] the Normalisation process. Ie. forwards, top-down.

> From the academic point of view -- indeed from the point of view of
> the DBMS, as you know -- no column has meaning.

Totally disagree. When you say "DBMS", you may be meaning "theoretical DBMS", in which case, I don't agree or disagree, you are welcome to entertain it as a theoretical concept, if such is valid. And sure, for your theoretical purposes, you have abstracted all meaning out of all the elements (not just columns! not just the column names !!). Which, as evidenced, is a serious impediment.

In the case of a DBMS, or a RDB, every column has meaning. And the Key columns have very important meaning (due to the structure of any DB is defined by the relationships; and in an RDB, those relationships are Keys; thus the structure is defined by Keys). I have detailed most of that meaning in my post of 11 Feb 10:09 to Nicola, in response to his commendable attempt to model Köhler's non-relational data, Relationally.

If you are going to model data, Relationally, you cannot afford to dismiss the meaning; if you are going to determine Keys (the current task) it is doubly important.

Ok, I agree, the current task is not academic, it is not theoretical, it is a practical requirement. The only theory that applies, is Codd's. I think you acknowledged that elsewhere:
>>>>
One giant leap owed to Codd that I think was (and often still is)
underappreciated is his adoption of value semantics. Your helpful
citation illustrates that point quite well ...
<<<<

> It has a type and
> domain, and some relationship (perhaps functional dependency) on other
> columns. Any statement about keys *must* be based on stated [non-]FDs.

Fine, but that theory (which may be valid theoretically) does not apply to this practical task. Or, you can apply it, and it will fail miserably. Or, all your theories put together, sum up to a grand total of zero that can be applied to the practical task.

Therefore, feel free to apply them, but the evidence is, total failure to determine that the proposal Address [A][B] is non-relational; that it breaks 3NF; total failure to determine the Keys. I don't think you can expect a different result, unless you employ a different technique, a different set of principles and methods.

Of course, it is no secret, we implementers do not employ your theoretical methods (if we did, we would get the same results). We employ practical methods. Codd's Relational Model. Codd's 3NF and FD definition.

> Your corrective notes end with what really is all that need be said,
>
> > * Find out what the data is, what it means, how it relates to
> > all other data in this cluster.

Yes. And I detailed that further in my next response [B] to him.

> That is an option open to your group and not generally to c.d.t..
> Anyone here willing to make an assertion about the correctness of the
> model must also be willing to make assumptions about the meaning of the
> columns. However safe those assumptions might be, they are still only
> assumptions.
>
> Surely you agree that to be unwilling to make assumptions like that need
> not be an exercise in stupidity or obfuscation.

Ok, thanks for pointing that out. Apparently I was not clear enough when I set out the roles.

But no, it doesn't apply here. The task is a practical one. You are free to have assumptions; to test those assumptions by asking me questions, and pondering my answers. I would never suggest a task that you could not perform.

Here, I am your "customer", I am your "users", whom you people can "go to" to have any and all questions answered. I am at your beck and call. Note that I cannot turn around and say that some proposal from one of you is incorrect, if it fulfils the requirement that I have given. The task is not to fulfil the requirement of a customer who cannot be reached.

Eg. I had no idea that anyone (implementer or theoretician) would ever think that ISO 3166-2 should be taken literally, or that they would break 1NF without being conscious of doing so. When I saw that I addressed it. I couldn't have known that earlier, not in a million years.

I am not saying the theoreticians in the RDB space are stupid because they have assumptions and can't proceed, etc.
- I am saying the theoreticians in the RDB space are stupid because they are using a hammer for a task that calls for an axe.
==AND== they will not observe the evidence that the hammer is not working, that it is not suited to the job.
==AND== the are ignorant (or in denial) that axes exist.

> An example of the difficulty arising from unclear meaning is the
> discussion over the relationship of postal code to unit. I would never
> have guessed there are jurisdictions in which they are 1:1, but you
> said (IIUC) that there are.

I said, AFAIK, the finest granularity of post code was 1:1 with an entrance to a building, not 1:1 with an Unit (apartment, suite, townhouse).

> Other questions arise, too. When I looked
> at
> http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf
> I found myself wondering about StreetName and StreetType. I couldn't
> think of an application for which those tables would be useful. They're
> not objectively wrong, but I assume they are.

1. They are classic Reference tables, used to constrain StreetNames and StreetTypes in Street, to valid values.

2. We use Names in a way that is beyond the scope of this exercise here, so you can safely ignore further info re those two tables. Eg. from your days with Sybase 10 (1993 IIRC) you might know about our SOUNDEX() function, which is used to detect spelling errors, etc. We want to prevent "Warshinton", when "Washington" is in the database.

3. In addition to being Reference tables, they are used for Search purposes, to determine valid vectors or Dimensions, thus avoiding:
IF EXISTS (
____SELECT DISTINCT StreetName
________FROM Street
________)

Where (a) Street is a very large table that should not be scanned during production if it can be avoided, (b) certainly not for a Dimension, and (c) the Reference table exists.

> I remember a different example in my work. We had two tables,
> Countries, and CountryGroups. Each Country had and ISO code and was
> the real deal. CountryGroups reflected various political designations
> and business imperatives.
>
> One fine day a developer wrestling with an "application problem" (his
> term) asked for comment on his preferred solution: to add one row to
> Countries named "all countries". (You can imagine my reaction and I
> yours!) I would say the suggestion stemmed fundamentally from a failure
> to understand the meaning of the Countries table. To the man in
> question, the table had no meaning per se, and the "missing" row was a
> deficiency. I suggested, colorfully, that the concept of "all
> countries" belonged squarely in CountryGroups. Obvious as that may be
> to you, it took quite a lot of persuasion to prevent corrupting a basic
> domain table. Meaning is surprisingly hard to pin down.

Understood. And that story emphasises that meaning is very important, and that it must be documented. Eg. the purpose and content of Country vs CountryGroup tables.

> LIke you, I learned about 3NF from an informal description. I don't
> know how many treatises I've read describing an algorithm based on FDs;
> they all read to me like the How to Hunt Elephants
> (e.g., "COMPUTER SCIENTISTS"
> http://paws.kettering.edu//~jhuggins/humor/elephants.html): sure to
> succeed if ever it finished, and unnecessary in my context.

Glad to hear that. But it concerns me, that you use the term FD interchangeably, whilst knowing full-well that the real FD and the theoretical one are quite different, the latter being only a fragment of the former. Eg. they can't be used interchangeably, each has a different purpose.

> Unlike you, I don't think the FD formalism is an exercise in navel
> gazing.

I didn't say quite that, but I accept that you picked that up from my comments.

I said:
a. that it cannot be used for the given purpose (determine Keys when they are jumping out at you, as per this task), or for the normal determination of Keys during the exercise of data modelling, so there is no point is using it in those scenarios.
b. that for the given purpose, another method exists
c. that they are using it anyway, which is silly, given [a][b]
d. they still haven't produced anything by that method
e. but that it remains a valid method for determining keys when a human is not present to perform the analysis of data.
(eg. I am not saying it has no use; the use has not been defined to me; I leave that open)

Note that at Nicola's challenge, I took up a problem that was more complex (Köhler's, five key elements, more the the four key elements Nicola suggested), and I Determined the Keys on data I had never seen before, using Codd's method. In less than 30 mins.

Note that Nicola was stuck in the non-FD non-key rut, probably no less and no more than the rest of you. But when he started his model, he jettisoned it, just dealt with the data, and tried to form keys on his own. He didn't need to be told that non-FD non-key method is irrelevant, or that "the Key has meaning, retain it". He is busy with a serious level of modelling way, way past you guys.

> Would only that the described algorithm were implemented, and
> we could pour our column definitions in and get a 3NF (say) logical
> model out!

I think you mentioned pipe dreams.

To a practical person, that would be interesting, not desirable. But the sequence is entirely without merit, a scientific void.

Further, since 1985, we have had CASE tools such as (just one example) ERwin which implements the RM and IDEF1X. We work forwards and backwards from visual graphical tools, to/from the database [any platform] directly, or to/from DDL, with a single push of a button. So the non-FD determination of non-keys method, bottom-up (ours is top-down) is irrelevant to us. Same ass the Tutorial D monolith.

Obviously, the CASE tools have expanded, they have gained a lot of maturity, in that thirty years.

Whoever suggested that to you, is well over thirty years out-of-date, behind the market. Oh yeah, they are still working with text strings, 1's and 2's.

> The problem as I see it isn't in the formalism per se, but in
> describing the columns' meaning to the algorithm. You do that in your
> head and depict the result with IDEF1X.

Sure, I do it in my head, but the method is scientific, defined, by Codd. Anyone can do it.

I can't comment on the formalism, or the value of it, because no one has defined that to me. What I have seen, as you can tell from this thread, is usage of it in a scenario that is most inappropriate. And of course, that is commonly done: Jan does it; Köhler does it. So I would say, they think it is THE method, they do it in every instance, including the inappropriate. They are unaware that it cannot be used in the normal scenario (human, analysing data). They are unaware that a simpler, faster method exists.

> I've done the same.

I am not saying you haven't, I just haven't seen any of it. From what I have seen of your work, it is RFS, not Relational.

> Are you
> prepared to say that's the last and best way? I'm not.

I am. And it is worse than that. I am saying it is the only method for implementers, for practical people, for humans. Comparatives and superlatives can't be used when there is only one choice, but if we overlook that grammatical rule, then sure, it is the best, simplest, and fastest method. Full stop.

I am open to the possibility that the non-FD non-key Determination method has a use, and only in the theoretical arena, if and when they define that use to me. Until then, noting that I am right in every case that has been put to me thus far, the Codd method destroys it, makes it irrelevant, even in the theoretical arena.

Look, if you can't look at the nutty professor's example on the board, and determine that the SSN is the key, if you can't look at Address B and determine that there is no key possible in the table State as given, then there is a serious problem. You have abstracted your self out of the picture, so so far out, that you have lost the ability to see the meaning that does exist in the words that are given.

So, once that is lost, then sure, play with 1's and 2's and a's and b's, and non-FD non-keys, and some day you might, just might, find a way.

But that does not address the problem, which is a compulsion to abstract the meaning in anything, out of it, and thus, cripple oneself, prevent oneself, to perform the very task that one is supposed to be performing.

It is not about comparing the Codd Key-FD method vs the non-FD-non-key method, because they are not comparable.

> I'm still
> waiting for an FD language (loosely speaking) that will describe my
> database better than SQL, from which I can generate an IDEF1X diagram
> and matching SQL DDL. That would be a better way to work, and would be
> fruit from FD tree.

Ok.

Meanwhile, for the last thirty years, back at the farm, we have been doing:

> describe my
> database better than SQL,

You have that backwards. SQL is a data sub-language. It is implementation level. It is not a high-level or abstract language, or even a full langauge. It is not appropriate for use as describing or defining a database at any level that is higher than the context of an implementation in a specific platform.

Would you use BASIC-PLUS or awk to describe a database ? No ? Good. Then why would you use SQL to do it ?

If you use a hammer, for a job that requires an axe, you will fail. You cannot then turn around and say, the hammer is not working, it is broken. Use an axe.

Separately, there is a huge difference between commercial SQLs and the freeware/shareware/vapourware. What is impossible today on the freeware has been possible in the commercial ware for over thirty years. Recursion, deployed in the right manner. Full OLTP and full ACID Transactions and standards to go with it.

We describe the database in a data model, diagrammatically (ie. not text strings), using a CASE tool, and ==TELL== SQL what it is, by pushing the "publish data model to SQL database" button. Of course, it has version control, diff, transports between platforms, etc. Of course, the tool allows us to add notes to every element in the model, so when we print the reports, it has all that in documentary form.

I would never expect the SQL implementation to describe it back to me.

Ok, there is one exception. If I go into a customer site, and find that they have zero documentation, or I find that they have lied to me about something in the SQL implementation, then sure, I press the "reverse engineer data model from SQL database" button, and obtain it. That is an exceptional case, the once-off starting point for a new project. And even then, I cannot reasonbaly expect anything more than the limits of SQL, the platform.

I would never expect the awk script to describe itself back to me. What the awk script does, is described using SSADM and a IDEF1X data model. Same as any app, SQL or not.

> from which I can generate an IDEF1X diagram

If necessary, we can do that. One "reverse engineer" button press.

> and matching SQL DDL

One button press "publish data model to SQL DDL files". Radio buttons to choose platform, etc, etc.

Do get a trial copy of ERwin, and look into it. It is superior to the competition for many reasons. Just one reason is, it maintains ONE model, and conceptual/logical/physical/etc are renditions of that one model. The others use one model per conceptual/logical/physical/etc, same as the novices and clueless feature check-box-tickers, so they end up spending half their lives transporting from one to the other and fixing-up the diffs.

> that will describe my
> database better than SQL

I suspect you might mean that there is something you can define re RDBs, that is relevant to an RDB (in whatever language you imagine), that you think cannot implemented in SQL. That is incorrect. Please give me an example, and I will show you how to implement it in SQL.

I am aware that the Dates and the Darwens of the world propagate a myth, that SQL is broken, and that you can't do this or that. Let me assure you that the myth is false, self-serving. There is nothing in it. In my three years at the TTM encampment, every so often, either one of the slaves or the slave master himself would post a "here is something SQL cannot do, here is proof that SQL is broken" article. In every instance, I posted a full solution using SQL (nothing but SQL), and proved such to be false.

I am quite willing to do that here. Please give me an example of what SQL cannot do. Re RDBs.

> That would be a better way to work, and would be
> fruit from FD tree.

Only if you refuse to visit the orchards that we have had for thirty years. Or you deny that they exist.

Thanks again for your post.

Cheers
Derek

Derek Asirvadem

unread,
Feb 11, 2015, 2:08:19 AM2/11/15
to
Dear people
The days are up.

The Address [B] proposition is rejected. I will cut-paste the text that I gave the developer. The main directive was to determine the keys:

>>>>
The issues are the same as the last time. This is a Record Filing System. It fails Relational mandates on two counts, it is non-relational.

1 Definition for keys, from the RM: a Key is made up from the data
___* A surrogate (RecordId) is not made up from the data
___* There are no Keys on the data
___* Yes, you did add Keys, but they are invalid, therefore still no Keys

2 The RM demands unique rows (data)
___* A surrogate does not provide row (data) uniqueness, only Keys supply row uniqueness

Re Normalisation, (if we consider that outside the Relational Model):

3 It breaks Third Normal Form, because in each case, there is no Key for the data to be Functionally Dependent upon (it is dependent, yes, but on a pork sausage, outside the data, not on a Key).

This RFS is nowhere near ready for Normalisation, let alone Relational Normalisation:
<<<<

Previous bullet points in annotation in Address [A] have been numbered [a] to [d].

>>>>
e Keys. You did not action my points from your first attempt. I will have to number them for you. Items [c][d] relate to Keys and Identifiers. Read my IDEF1X Intro again.

___* What is the Identifier for a State ? You have StateCode, which is the 2-char ISO code. Really ? You can walk up to the database, ask for a CountyCode, and you will get one row ? Really ? There are thousands of States. Hint: we have the [N]orthern [T]erritory, Canada has the [N]orthwest [T]erritories.

___* What is the Identifier for a County ? When the user walks up to the database, with one County is mind, what does he have in mind, how does he Identify "Lee County" ?

___* County(CountyCode) is unique ? America has 50 States, they have a minimum of 1 County, they start with 001. The moment you attempt to insert the first County for the second State, it will blow you away.

___* Street(StreetName, StreetTypeCode) is unique ?
___ One street of one name+type in the whole Street table ?
___ Did you mean Street(SuburdId, StreetName, StreetTypeCode) is unique ?

___* Do you want two countries named Brazil ? I showed you how to prevent that three years ago, you have forgotten. America has 13 Counties named "Lee County".

___This applies to all seven tables, every one of them has a gross error on the Key. Talk through each point with Ken. If you come back with Keys that are less than 95% complete, if you fail to follow my directions again, I will burn your books.
<<<<

=========================================================================
None of you theoreticians identified any of that
The non-rejection, with or without specifics, is an acceptance
=========================================================================

The exercise now, is to determine the correct Keys. Let's see if you can produce them, before the developer can. Use the Address [B] proposal, because there is no change.
http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20B.pdf

If you want more detail re what I told the developer, visit this. The pages are in RC order. He has not returned with a proposal [C], so the first page is blank (it will be populated when he does):
http://www.softwaregems.com.au/Documents/Article/Normalisation/Relational%20Database%20101%20C.pdf

Now it has been established, for decades, that humans can determine Keys, directly, by working with the data directly, and by using FDs to validate the Keys (the FD Definition is in the 3NF Definition).
- That has recently been confirmed yet again, in this thread, when I Relationalised Köhler's DNF data and eliminated his problem, his proposal.
- It is further confirmed by Nicola, implicitly, because he is following those directions. the developer will be using that method.

In two iterations you have not been able to do that.

However, it has now been exposed (in this thread) that your teachers have taught you to determine keys via non-FDs; to make puzzles; which is great for RFSs, but not for RDBs; to avoid the direct method that is relevant to RDBs. Further, as evidenced in this thread, you people are obsessed with it. Therefore, I invite you to determine the Keys *using whatever method you like*. This is a great opportunity to demonstrate, to confirm, the non-FD method.

----

Key points from another post, that are worth keeping in mind:

> On Saturday, 7 February 2015 19:09:53 UTC+11, Derek Asirvadem wrote:
> > On Friday, 6 February 2015 04:07:52 UTC+11, Jan Hidders wrote:
>
> d. Keep in mind that the users and developers mean "Alaska" when they use "AL", they do not want to, and should not have to, use "US-AL". Ie. a StateCode is an Atomic item, it would be preposterous if we had to mess with SUBSTR(StateCode, 3, 2).
>
> e. Likewise for the CountyCode. It is not CHAR(7) ie. containing CountryCode and StateCode. It is CHAR(3) containing CountyCode, only CountyCode, and nothing but CountyCode, so help me Codd. FIPS is US only, numeric; most countries use a string.

> So what we have to deal with here is, we implementers use the original [3NF & FD] definition and reject the data model as failing 3NF; you use the 42 fragments and accept the same data model as "satisfying 5NF". It is pointless to argue about definitions, especially the established ones; better to examine the difference, and to determine why you people are so crippled.

----

Please keep my response of today to James in mind. Please ask me as "customer" and "user" any and all questions. I don't know what you do not know, unless you ask. I thought the data in that cluster was pretty much known to most IT people.

Here is more detail, to cover some of the issues that have been touched, what I think James may be getting at.

- All columns must be Atomic (1NF).

- CountryCode is CHAR(2) ISO-3166-1

- StateCode is CHAR(2) ISO-3166-2

- CountyCode is CHAR(3) FIPS numeric for America; char for the rest of the world

- The above three are generally well-known. Eg. the sales people know their ambit by heart (not the numerics!). We do not mess with clarity. All dialogues (windows, frames, whatever) that are used by users have three separate fields for the above (which may be set next to each other when relevant)

- All Name columns are CHAR(32)

- All FullName columns are CHAR(64)

- All ID columns are 32- or 64-bit Integers, depending on the max rows possible.
___ He is on notice that he can't have them as he has planned because it is an RFS, that he must use the minimum possible surrogates. But I am not pushing it in this iteration, the Keys are thig for this one.

> It has a type

The type is generally not necessary in an exercise at this level, but having had that pointed out as a limit to assumptions, etc, I have given it.

> domain,

Yes. That is for you to determine, to work out.

> and some relationship (perhaps functional dependency) on other
> columns.

Yes. That is for you to determine, to work out. Which is infinitely more possible, and much easier, if you determine the Keys first.

> Any statement about keys *must* be based on stated FDs.

Fine, but it does not apply to this task.

And feel free to ignore my statement above, and go that route. The result we want is, desperately now, to have the Keys determined. If you think your way is better, go for it.

Hugs and kisses
Derek

Nicola

unread,
Feb 11, 2015, 11:07:19 AM2/11/15
to
In article <467d17c9-0c2d-41b6...@googlegroups.com>,
Derek Asirvadem <derek.a...@gmail.com> wrote:

You've crossed the line, mister. Signal/noise ratio way too low. I don't
get trapped in provocations, sorry. You have entered my kill filter, so
I'm afraid I won't be able to see your posts any longer.

For the rest of the group: still hungry for interesting, on-topic,
fruitful, discussions with diverging opinions!

James K. Lowden

unread,
Feb 12, 2015, 1:23:44 AM2/12/15
to
On Tue, 10 Feb 2015 23:06:25 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > From the academic point of view -- indeed from the point of view of
> > the DBMS, as you know -- no column has meaning.
>
> Totally disagree. When you say "DBMS", you may be meaning
> "theoretical DBMS", in which case, I don't agree or disagree, you are
> welcome to entertain it as a theoretical concept, if such is valid.

Actually, I'm sure you agree. By "DBMS" I mean "database management
system". It's a machine, and it has no concept of meaning. It
provides us with the illusion of semantic engagement by representing
its tuples with names with which we associate meaning. To the machine,
each column simply has a type and some defined relationship to other
columns. It enforces those relationships, thereby consistency, thus
supporting verifiable logical manipulation.

> I am not saying the theoreticians in the RDB space are stupid because
> they have assumptions and can't proceed, etc.
> - I am saying the theoreticians in the RDB space are stupid because
> they are using a hammer for a task that calls for an axe.
> ==AND== they will not observe the evidence that the hammer is not
> working, that it is not suited to the job.
> ==AND== the are ignorant (or in denial) that axes exist.

Because they insist on an using an algorithm and FDs to determine
keys?

> We want to prevent "Warshinton", when "Washington" is in the database.

I see. I never worked on an application where that was warranted.

Re soundex, I guess I reached for it half a dozen times over the
years, ever hopeful, always disappointed.

> But it concerns me, that you use the term FD interchangeably, whilst
> knowing full-well that the real FD and the theoretical one are quite
> different, the latter being only a fragment of the former.

I am using the term conventionally; I have no tricks up my sleeve.
What part of the "real" definition does the "theoretical" one lack?

> a. that it cannot be used [to] determine Keys when they are jumping
> out at you, as per this task), or for the normal determination of
> Keys during the exercise of data modelling, so there is no point is
> using it in those scenarios.

OK.

> b. that for the given purpose, another method exists

Namely, intuition. I'm not being pejorative: The "jumping out" is the
practice of associating meanings with (column) names and deciding what
identifies what.

> c. that they are using it anyway, which is silly, given [a][b]
> d. they still haven't produced anything by that method
> e. but that it remains a valid method for determining keys when a
> human is not present to perform the analysis of data. (eg. I am not
> saying it has no use; the use has not been defined to me; I leave
> that open)

I think you mean that you've never seen a good tool for describing a
database that uses FDs as its vocabulary. Neither have I, but I
suggest to you that ERwin is basically that in disguise, more on which
in a moment.

For an undisguised version, consider Nicola's exercise:

Quoth Nicola on Thu, 05 Feb 2015:
> A few years ago I implemented a few algorithms from Koehler's PhD
> thesis in a Ruby script. Given a set of FDs, the script finds all the
> keys and all the minimal covers.... Then, I had a graduating student
> re-implement it in Java and adding a dependency-preserving
> decomposition in BCNF (when it exists) or in 3NF to the output.

My first reaction is a little unkind. I think this is what lawyers call
"assuming facts not in evidence". *Given* a set of FDs, the program
generated a 3NF database design. Hurray! Now, where to find those
pesky FDs for input?

It reminds me of a joke my Economics professor told the class about
three professors stuck on a desert island devising a way to open a can
of beans. The Economist's solution was simple, "First, assume a can
opener".

On second thought, though, it's cause for optimism. If a Ruby script
and a grad student can generate a correct design, then it is a
tractable problem. What remains is a convenient syntax for capturing
the "pesky FDs", something that is the purview of academia.

> Do get a trial copy of ERwin, and look into it.

The last time I used ERwin in a serious way was in the late 90's. It's
what printed out the size E paper charts. We used the "team" version
that kept the diagrams in a database, and relied heavily on the version
management and reconciliation function. I also reverse engineered
their diagram database and posted that on the wall, to help us fix
anomalies that crept in from time to time. I remember the "role" FK
dialog could rename a column (or associate two names, if you prefer).

I wrote macros to generate triggers to enforce relationships that
couldn't be declared, such as sub/supertype relationships that required
both parts of a two-part entity, where the first part was fixed and the
second part was in one of N mutually exclusive tables. (For example,
every security has a name and type, but bonds have coupons, equities
have common/preferred, and options have an underlying). Note that both
parts *must* exist: no security is described only by its name and type,
and every security (let us say) does have a name and type. ISTR you
said such relationships don't exist, but I think you must have meant you
never came across one.

ERwin is a good tool, the best I ever saw. (We also used the CAST
workbench for stored procedure development, quite a boon.) I have
sometimes wished for a better version of their macro language
independent of the tool. It would be nice to define a
relationship-rule in a symbolic library, and be able to apply it to a
given set of tables.

> > Are you prepared to say that's the last and best way? I'm not.
>
> I am. And it is worse than that. I am saying it is the only method
> for implementers, for practical people, for humans.

Everything that can be invented has been invented?

(Cf.
http://patentlyo.com/patent/2011/01/tracing-the-quote-everything-that-can-be-invented-has-been-invented.html,
seriously)

At least on this point we're clear. You think ERwin is the best (kind
of) tool that can exist for the purpose. I can imagine better, but
doubt the market will demand or produce it.

Specifically, the IDEF1X diagram you construct is a tautology. You say,
here is a key, there is a foreign key, this is unique, that has
such-and-such domain. And, great, you can generate the SQL for the
DBMS. You are doing the designing, designating the FDs by way of those
keys, and reasoning about all the dependencies and potential
anomalies. It's right because you say it's right. The tool doesn't
know beans about normal forms. It can't *check* anything. All it can
do is convert *your* design, expressed as a diagram, into SQL. Sure,
the SQL will be right; that's about as automatic and brain-dead a thing
as can be imagined. Is it 3NF? In 2015, that's on you.

I spent many, many hours re-arranging my boxes and routing my
connections, and tediously moving names up and down to reflect what's
the key and what's not (and what order I wanted the physical columns
in). I worked my way through a few knotty 4-column keys and used such
diagrams to explain contradictory nebulous understandings, wherein
different departments had different ideas of what, say, a product is.

I am not at all convinced that's the best way.

I don't deal in 200-table databases anymore. The databases I deal with
nowadays have fewer tables and users, and lots more rows. I write the
SQL directly and rely on DRI for almost everything. When I want a
diagram for a database, I go the other way, and generate it with pic
(cf. groff) from the SQL, plus some ancillary information for box
placement and line drawing.

I suggest to you that pic is every bit as smart about diagrams as ERwin
is about databases. And both are equally smart about database design.

> Oh yeah, they are still working with text strings, 1's and 2's.

You're convinced that your way -- designing via diagram -- isn't just
the best way, but the only way and the last way that ever will exist.
I'm convinced that's not true. I'm sure a better language than SQL
could be invented that could be "compiled to" SQL and represented
diagrammatically. I don't see why it couldn't also capture meaning not
in the SQL catalog, such as ownership, provenance, and definition.
Rather than defining tables per se, we'd define columns and their
dependencies, and let a model-generator express them as tables in a
schema known to be, say, 3NF.

History is on my side. There have been countless attempts in the last
30 years to express logic graphically. We might start with Logo, say,
and include any number of CASE tools. (I don't suppose you remember
Software Through Pictures?) They. All. Fail. Visual Basic becomes
Manual Basic when you're done drawing the dialog boxes.

Meanwhile, we're living through an explosion of languages and are
making progress with previously intractable problems such as
correctness. If history is any guide, better database designs will
come from better languages, not better diagrams.

> Please give me an example of what SQL cannot do.

Tuple comparison,
select ... where R.(a, b) = S(a, b)
select ... where (a,b) in (select a, b from S)

Column comparison,
check R.a = S.a where R.b = 'Y'

Universal quantification is similar. Needed for relational division.

Yes, you can accomplish those in SQL by writing it out more verbosely
or, as with universal quantification (find students who have taken
all required classes), you can employ De Morgan and use "not not
exists". If you have that filed under "can do", not only do we disagree
on the meaning of "can", but you will find yourself defending the use
of lower-level constructs to implement concepts defined by relational
algebra. That smell you smell is awk.

Table comparison,
where T = (select ... from S where ...)

SQL can't constrain views in any way, can't use views as a FK target.
I would like to be able to use a union as a domain; SQL cannot.

Why does UNION (and similar) require column order to match, but not
name? Why does SELECT return duplicate column names or permit unnamed
columns? Why SELECT DISTINCT but UNION ALL? Why must FROM appear only
between SELECT and WHERE? (Why even say "SELECT"?) What purpose does
HAVING serve anymore? Why do subqueries require aliases even when
unreferenced?

Just look at the butt-ugly porcine ungainliness of UPDATE. Updating
one table from another has to be the most cumbersome, verbose, and
redundant aspect of SQL, not that it lacks competition (except in any
other language). Why can we not say instead

R(a, b, c) = S(a, b, c) WHERE R.x = S.x

?

SQL is a relic of another age, the last man standing after RJE, COBOL,
Cullinet, PL/1 and all the rest have disappeared. To the economics of
those days we owe the fact that we use SQL and not Ingres's QUEL, a
much better language. The strange, ignorant time we currently live in
promises very little progress, if any, because users of databases
doen't realize how much is being lost, never mind forgone.

If your assertion is that SQL can, after a fashion, express any
relational algebra function, I concede the point. When people say it's
"not relational", that's not what they mean. They mean it does not
express relational concepts directly or particularly well, and
sometimes -- bags, column order -- ignores it entirely.

SQL is indefensible. Your serve. ;-)

--jkl




Nicola

unread,
Feb 12, 2015, 4:52:57 AM2/12/15
to
In article <20150212012341.d...@speakeasy.net>,
"James K. Lowden" <jklo...@speakeasy.net> wrote:

> Quoth Nicola on Thu, 05 Feb 2015:
> > A few years ago I implemented a few algorithms from Koehler's PhD
> > thesis in a Ruby script. Given a set of FDs, the script finds all the
> > keys and all the minimal covers.... Then, I had a graduating student
> > re-implement it in Java and adding a dependency-preserving
> > decomposition in BCNF (when it exists) or in 3NF to the output.
>
> My first reaction is a little unkind. I think this is what lawyers call
> "assuming facts not in evidence". *Given* a set of FDs, the program
> generated a 3NF database design. Hurray! Now, where to find those
> pesky FDs for input?

From the requirements? I'm not sure I grasp what your point is.
Do you mean that turning some requirements into FDs is difficult (you
would not be alone: Stonebraker says that "mere mortals do not
understand FDS")? Or that FDs (and even MVDs, JDs, etc) comprise only a
tiny fraction of real system's set of requirements, so it's not worth
bothering?

> It reminds me of a joke my Economics professor told the class about
> three professors stuck on a desert island devising a way to open a can
> of beans. The Economist's solution was simple, "First, assume a can
> opener".

FDs do not come out of void, they follow from your requirements. They
are just a formal version of a very special (and relatively simple) type
of requirement. I don't arbitrarily decide to assume that {desert
island} -> {can opener}, unless you tell me that there cannot be two can
openers in the same desert island, in the world of desert islands you
are interested in.

> On second thought, though, it's cause for optimism. If a Ruby script
> and a grad student can generate a correct design, then it is a
> tractable problem.

Computationally, database design problems are (highly) intractable in
general. But in practice many problems can be solved. For example,
finding the keys of a schema (given the FDs) is an inherently hard
problem, but for schemas with up to a few tens of attributes or so, only
in few cases you are not able to find them in a reasonable amount of
time. On the contrary, finding all the minimal covers is much less
practical, even when there are not many of them.

> What remains is a convenient syntax for capturing
> the "pesky FDs", something that is the purview of academia.

FDs are a special type of first-order formulas. I can imagine defining
an English-like syntax for them.

Erwin

unread,
Feb 12, 2015, 6:57:51 AM2/12/15
to
Op donderdag 12 februari 2015 10:52:57 UTC+1 schreef Nicola:
> From the requirements? I'm not sure I grasp what your point is.
> Do you mean that turning some requirements into FDs is difficult (you
> would not be alone: Stonebraker says that "mere mortals do not
> understand FDS")?

He's right.



> Or that FDs (and even MVDs, JDs, etc) comprise only a
> tiny fraction of real system's set of requirements, so it's not worth
> bothering?
>
> > It reminds me of a joke my Economics professor told the class about
> > three professors stuck on a desert island devising a way to open a can
> > of beans. The Economist's solution was simple, "First, assume a can
> > opener".
>
> FDs do not come out of void, they follow from your requirements. They
> are just a formal version of a very special (and relatively simple) type
> of requirement. I don't arbitrarily decide to assume that {desert
> island} -> {can opener}, unless you tell me that there cannot be two can
> openers in the same desert island, in the world of desert islands you
> are interested in.

There you have it. When people tell you "there can only be one can opener on any given desert island", then what do you see ?

A table {desert island, can opener} with (a.o.) a key "desert island", or
A relation schema of whatever set of attributes in which you must add the FD desert island -> can opener ? Where this then tells you "the identity of a desert island is a determinant factor to the identity of the can opener that is on it" or "the relation over {desert island, can opener} is a function from desert island to can opener" ?

Some would call the former "jumping to conclusions". I doubt I'd join that crew.



>
> > On second thought, though, it's cause for optimism. If a Ruby script
> > and a grad student can generate a correct design, then it is a
> > tractable problem.
>
> Computationally, database design problems are (highly) intractable in
> general. But in practice many problems can be solved. For example,
> finding the keys of a schema (given the FDs) is an inherently hard
> problem, but for schemas with up to a few tens of attributes or so, only
> in few cases you are not able to find them in a reasonable amount of
> time. On the contrary, finding all the minimal covers is much less
> practical, even when there are not many of them.
>
> > What remains is a convenient syntax for capturing
> > the "pesky FDs", something that is the purview of academia.
>
> FDs are a special type of first-order formulas. I can imagine defining
> an English-like syntax for them.

Hmmmmmmmm.

"The projection on {desert island, can opener} of the join of all tables in the database, represents a function from desert island to can opener."

I've never seen anything like that, certainly not in requirements, and certainly not coming from the average user population that is supposed to produce them ...

And besides, avoiding redundancy was only a relevant topic in database design as long as there were no feasible ways to control the redundancies, e.g. via ASSERTIONs. Those days are gone.

Derek Asirvadem

unread,
Feb 12, 2015, 9:55:40 AM2/12/15
to
James

Great post

> On Thursday, 12 February 2015 17:23:44 UTC+11, James K. Lowden wrote:
> On Tue, 10 Feb 2015 23:06:25 -0800 (PST)
> Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> > > From the academic point of view -- indeed from the point of view of
> > > the DBMS, as you know -- no column has meaning.
> >
> > Totally disagree. When you say "DBMS", you may be meaning
> > "theoretical DBMS", in which case, I don't agree or disagree, you are
> > welcome to entertain it as a theoretical concept, if such is valid.
>
> Actually, I'm sure you agree. By "DBMS" I mean "database management
> system". It's a machine, and it has no concept of meaning. It
> provides us with the illusion of semantic engagement by representing
> its tuples with names with which we associate meaning. To the machine,
> each column simply has a type and some defined relationship to other
> columns. It enforces those relationships, thereby consistency, thus
> supporting verifiable logical manipulation.

Ok, so you don't means DBMS.

You mean the theoretical concept of a DBMS. Abstracted to the point where those statements can be true. Ok, I agree.

> > I am not saying the theoreticians in the RDB space are stupid because
> > they have assumptions and can't proceed, etc.
> > - I am saying the theoreticians in the RDB space are stupid because
> > they are using a hammer for a task that calls for an axe.
> > ==AND== they will not observe the evidence that the hammer is not
> > working, that it is not suited to the job.
> > ==AND== the are ignorant (or in denial) that axes exist.
>
> Because they insist on an using an algorithm and FDs to determine
> keys?

I smell bait. I take it, you've never kissed a fish.

I did once, it was a musky that I had been hunting for two years. When I finally caught it, it was much smaller than the fight that it had put up, and just above the limit, so I kissed it and let it go. The bastard went straight to the bottom, so I had to grab it again and flood its gills for half an hour, until it finally awakened and swam away.

Their algorithm is the hammer. Codd's algorithm is the axe. Read my paragraph above again, with that in mind.

Don't forget that theirs doesn't work, hasn't worked, weeks have gone by, and we are still waiting. That Codd's algorithm works in a few minutes.

> > We want to prevent "Warshinton", when "Washington" is in the database.
>
> I see. I never worked on an application where that was warranted.
>
> Re soundex, I guess I reached for it half a dozen times over the
> years, ever hopeful, always disappointed.

That means you didn't get the hang of it, and you didn't use the full four characters returned. It is not something for everyday use, but used properly, it works perfectly for detecting spelling errors and similar.

> > But it concerns me, that you use the term FD interchangeably, whilst
> > knowing full-well that the real FD and the theoretical one are quite
> > different, the latter being only a fragment of the former.
>
> I am using the term conventionally; I have no tricks up my sleeve.
> What part of the "real" definition does the "theoretical" one lack?

I have detailed that in my responses to Jan in this thread, I won't repeat, please read. It appears he accepts it (no further response).

Yours might be conventional within the 1%, the Codd FD is conventional in the 99%.

The issue loses meaning when we talk about it in "parts". Codd's FD is an integrated whole, 3NF and FD defined together, and in the context of Relational Normalisation. No algebraic definition. Yours is a single fragment of that, with no context, no Normalisation, in an algebraic definition. They cannot be compared, the difference is not a list of parts.

The difference is a horse vs the boiled femur of a horse. Yes, absolutely, if and when you find a horse, you can verify that it is a horse, by using your boiled femur. But you can't build another horse with that boiled femur of yours. The 99% can with theirs.

Since you seem to be somewhat aware of the relevance of meaning, and feeding that into the machine, etc, it appears that you take the opposite position here, and that remains a concern for me. A key difference is that in your algorithm, you strip the meaning out of the names (step 1 in the algorithm, if you will) and then you use the non-FD-fragment to determine the non-keys from the x's and y's. Whereas the Codd algorithm retains the meaning, finds the Key first, then uses the FD to validate the determined Key, then the attributes, etc. So yours is bottom-up, devoid of the meaning that you claim is relevant, and ours is top-down down with meaning, and the meaning gets clarified (eg. we improve the names, improving and discriminating the meaning) during the process.

The second (first?) "part" that ours has, that yours lacks is, we take the whole picture is (hence the relevance of a diagrammatic model), we evaluate all the tables in a contemplated cluster, we look at all the keys; FDs (including your "MVDs") together. And we go through iterations of that. Where as, you take one table at a time, again removing it from the context of the tables in the cluster, and evaluate it using a non-FD-fragment in terms of x's and y's.

Here I am defining two discrete levels of meaning:
- one within the table itself, relative to all the keys and attributes in the table (at whatever level of progress; whatever iteration)
- a second, within the context of all the other tables in the cluster, that it could possibly relate to (ditto); reworking the keys overall; the hierarchy; etc

The third (first?) "part" that ours has, that yours lacks, is success. Two full weeks and zero runs on the board.

> Namely, intuition. ... "jumping out"

Nonsense. An algorithm is hardly an intuition. The "jumping out" is something that happens with experience, when the success of the algorithm straightens out the neuronal pathways.

In this example the "jumping out" I meant was the tables are so familiar to all, or should be familiar, and the meaning of the columns are really clear. If you are handling a trading inventory, you would be very clued up about countries.

Other than this example, in the normal case, when examining a new set of requirements ...

> I'm not being pejorative: The "jumping out" is the
> practice of associating meanings with (column) names and deciding what
> identifies what.

There you go with that word again. Yes. Agreed.

Which is why I say your algorithm is stupid because it strips the meaning out of the names that require the meaning to be maintained; rolled over the tongue; and determined; in the context of all the tables in the cluster; not alone. It is self-crippling. So you are left with a Sudoku puzzle, and it is "interesting" when three 9's appear in a line somewhere.

It is crazy, how you admit and claim to understand that meaning is very important, and then, first thing, you remove meaning. The technically correct word is schizophrenic. Second thing, you evaluate it in isolation from its context. Same word.

> > c. that they are using it anyway, which is silly, given [a][b]
> > d. they still haven't produced anything by that method
> > e. but that it remains a valid method for determining keys when a
> > human is not present to perform the analysis of data. (eg. I am not
> > saying it has no use; the use has not been defined to me; I leave
> > that open)
>
> I think you mean that you've never seen a good tool for describing a
> database that uses [non-]FD[fragment]s as its vocabulary.

(Insertion for clarity)

Nah. Any tool that is based on such a stupid algorithm, is unworthy of being written.

And then after you have done all that, you will want to pour the meaning (two levels) that you removed at the first step, back into the bag of bones. Wouldn't it be better to retain both levels of meaning, all the way through ?

> Neither have I, but I
> suggest to you that ERwin is basically that in disguise, more on which
> in a moment.

Ok, I will wait for the more.

> For an undisguised version, consider Nicola's exercise:
>
> Quoth Nicola on Thu, 05 Feb 2015:
> > A few years ago I implemented a few algorithms from Koehler's PhD
> > thesis in a Ruby script.

You may be feeding yourself to the lions here.

Taking up a challenge from Nicola, that Codd's and my way is the "hard" way to determine keys, I looked at Köhler's DNF paper. I understand that his thesis was the basis for his DNF. I looked at that paper, which he alleges is "relational", and it is a total unconscious Straw Man. When I placed his unnormalised data in the Relational context, per his stated requirements and rules, his problem disappeared, vaporised. So his proposal is null and void. It does not apply to the Relational context, or to Normalised data.

This is the trick that many theoretical papers in this space pull: they give a set of data which they allege is "relational", which they allege contains some devious, mysterious problem or other; then they give a proposal on how to sort out that alleged devious problem. Get this. The data given is not Relational, the first aggravating falsity, it is a bunch of unnormalised, non-relational garbage. Easily recognised by the Sudoku players as their compulsive puzzle and they go for it with their non-FD-fragments, like sex addicts who have found a new pornography channel. Köhler did that same.

Easily recognised by the implementers for what it is, unnormalised, non-relational garbage, and ONLY to understand his alleged problem (at that point, no understanding of the proposal, no intention of vaporising it), we simply put the data in a Relational context. And the problem disappers. Boom, whooshka, gone, what problem.

I have detailed all that re the Köhler paper, in a post '"Hard" Key Determination Method is Easy. DNF Paper is Done' on 9 Feb in this thread, please read. Here is a one page summary:
http://www.softwaregems.com.au/Documents/Article/Normalisation/DNF%20Data%20Model%20B.pdf

In that case there were no FDs or non-FD-fragments to worry about, it was a simple arrangement of data. That is, there are no attributes, the entire data set is keys only, I never had to check an FD, and of course I did not reference his non-FD-fragments. Granted, in his paper, he non-FD-fragmented himself to death over the non-relational data. Imagine, I got meaning out of the column names that he has been contriving it (nothing wrong with that particular act, when contemplating an example for a paper), mulling over it, for years. The task took about 10 minutes, 30 mins to draw up the page.

The same thing happened when I examined Jan's DBPL paper. Boom, whooshka, gone.

Anyway, the point is, the Codd and Derek method works like a crucifix at black mass, instantaneous, all creatures and their howling vaporised upon entry. The Date & Zaniolo method works like a black mass without a crucifix, an orgy of creatures, howling without end.

Pssst. Wanna buy a crucifix ? Blessed and everything. Never used. For you, sir, special price.

So that paper, or his PhD thesis, is a non-problem, a favourite of the non-FD-fragmenters, totally without merit in the Relational context. Vaporised by the FD-ers. Score so far is about six /matches/ on our side, exactly zero /runs/ on your side.

But I will hold your context for the rest of the post ...

> Given a set of FDs, the script finds all the
> > keys and all the minimal covers.... Then, I had a graduating student
> > re-implement it in Java and adding a dependency-preserving
> > decomposition in BCNF (when it exists) or in 3NF to the output.
>
> My first reaction is a little unkind. I think this is what lawyers call
> "assuming facts not in evidence". *Given* a set of FDs, the program
> generated a 3NF database design. Hurray! Now, where to find those
> pesky FDs for input?

You mean the meaning ? That you stripped out at the outset ? It is still there, go back to the source docs.

Nah. Any tool that is based on such a stupid algorithm, is unworthy of being written.

> On second thought, though, it's cause for optimism. If a Ruby script
> and a grad student can generate a correct design, then it is a
> tractable problem. What remains is a convenient syntax for capturing
> the "pesky FDs", something that is the purview of academia.

Any problem is tractable. That is not the main consideration. Whether it is worth it, is the main consideration. Also, whether there is a better algorithm, before you spend a penny. Having just one algorithm , that is tractable is not an economical position to be in.

Especially if it has never determined a key.

> > Do get a trial copy of ERwin, and look into it.
>
> The last time I used ERwin in a serious way was in the late 90's. It's
> what printed out the size E paper charts. We used the "team" version
> that kept the diagrams in a database, and relied heavily on the version
> management and reconciliation function. I also reverse engineered
> their diagram database and posted that on the wall, to help us fix
> anomalies that crept in from time to time. I remember the "role" FK
> dialog could rename a column (or associate two names, if you prefer).
>
> I wrote macros to generate triggers to enforce relationships that
> couldn't be declared, such as sub/supertype relationships that required
> both parts of a two-part entity, where the first part was fixed and the
> second part was in one of N mutually exclusive tables. (For example,
> every security has a name and type, but bonds have coupons, equities
> have common/preferred, and options have an underlying). Note that both
> parts *must* exist: no security is described only by its name and type,
> and every security (let us say) does have a name and type. ISTR you
> said such relationships don't exist, but I think you must have meant you
> never came across one.
>
> ERwin is a good tool, the best I ever saw. (We also used the CAST
> workbench for stored procedure development, quite a boon.) I have
> sometimes wished for a better version of their macro language
> independent of the tool. It would be nice to define a
> relationship-rule in a symbolic library, and be able to apply it to a
> given set of tables.

You must have been using a very old un-maintained version. By 1995, it had all that and more. The macro language is much more powerful. But I never use triggers, and I have never had the need for "relationship-rules". All my rules are declared constraints, only, resident in the db, only. One does have to make decisions re what elements to deploy, and at what layer.

> ISTR you
> said such relationships don't exist, but I think you must have meant you
> never came across one.

I wouldn't have said that, because I have some very "complex" relationships, base/subtypes, multiple levels, multiple tables, etc.

> I wrote macros to generate triggers to enforce relationships that
> couldn't be declared, such as sub/supertype relationships that required
> both parts of a two-part entity, where the first part was fixed and the
> second part was in one of N mutually exclusive tables. (For example,
> every security has a name and type, but bonds have coupons, equities
> have common/preferred, and options have an underlying). Note that both
> parts *must* exist: no security is described only by its name and type,
> and every security (let us say) does have a name and type.

First let me say that I am a tiny bit of an expert in the trading space, I have been working almost exclusively for Aussie banks for over twenty years. I can't give the shop away, so this is limited to concepts and elements that are over ten years old, and only where the Funds Under Management is greater than 100 billion. Ie. ultra- legal and compliant with legislature. Institutional banking, massive portfolios, and no dime trades.

My InstrumentType is a genuine hierarchy, eleven levels deep, handling about 150 elements (your security types) at the leaf level. Yes, of course we track both legs, regardless of who owns each leg (your underlying), and for derivatives, we track the nominal as well and real exposure, or risk. All AssetClasses: Eq, FI, Property, commodity, unit trust, currency, etc. And all exchanges in the Pacific, as well as the bigger exchanges in America and Europe. We trade before we get burned, and we hold until the last moment. When we dump a security onto the market, we disguise it, so that the market is minimally affected, the tricks are just too many.

I use base/subtypes freely, but not "everywhere", none in the definition of InstrumentType. All my rules are declared constraints, only, resident in the db, only. Pure ACID high-concurrency transactions (my OLTP standards, but I don't think that matters here). I have never had occasion to use a trigger, and I have ripped out and replaced thousands (I generate most code). But then, I have no circular references, all the tables are Normalised into Relational Hierarchies, etc.

So, after reading your para four times, and I really want to understand the problem, I still have no idea what the problem is. Would you please give me a better description or draw a picture, so that I can help you, or at least so that I can understand the problem and discuss it with vigour. This position of "no idea", in my ambit, is too stupid for me to hold for long.

> > > Are you prepared to say that's the last and best way? I'm not.
> >
> > I am. And it is worse than that. I am saying it is the only method
> > for implementers, for practical people, for humans.
>
> Everything that can be invented has been invented?

Ok, as long as you credit Codd, not me.

There ain't nothing new under the Sun.

> (Cf.
> http://patentlyo.com/patent/2011/01/tracing-the-quote-everything-that-can-be-invented-has-been-invented.html,
> seriously)

Tomorrow.

> At least on this point we're clear. You think ERwin is the best (kind
> of) tool that can exist for the purpose.

Whoa. That sounds like you just switched barrels.

I said the Codd 3NF/FD method was the one and only tool for determining Keys, and as part of the modelling/Normalising task. That your non-FD-fragment method has no merit outside the theoretical context, and that in any case it was severely limited because it isolated itself from two levels of meaning.

Then, quite separately, I said that modelling with diagrams (IDEF1X with a strict Relational context vs circles in Visio or rectangles in UML), which engages 100% of the brain, is way more advanced than modelling using algebraic relation notation which engages 4% of the brain.

And that ERwin happens to be the best for the second task.

The brain, and only the brain, remains the vehicle for the first task.

If you think that I meant ERwin can do both, definitely not.

> I can imagine better, but
> doubt the market will demand or produce it.

For sure. Your tool is good for 1% of the market, irrelevant to the 99%, who do not perceive data in terms of x's and y's. They have other tools.

> Specifically, the IDEF1X diagram you construct is a tautology. You say,
> here is a key, there is a foreign key, this is unique, that has
> such-and-such domain. And, great, you can generate the SQL for the
> DBMS. You are doing the designing, designating the FDs by way of those
> keys, and reasoning about all the dependencies and potential
> anomalies. It's right because you say it's right. The tool doesn't
> know beans about normal forms.

Agreed. Not quite so silly, but I will let you have your fun.

> It can't *check* anything.

That is incorrect, it checks about 40%, before I hit the "generate SQL" button. But that is a result of using standards, a few sexy macros, etc.

For the purpose of this post, ok, it can't check anything.

> All it can
> do is convert *your* design, expressed as a diagram, into SQL.

That is unfair. Especially unreasonable because you have experience with (a) the process and (b) employing ERwin for the process. You should know that it is a model, with hundreds of types of elements (not instances, which is dependent on the db), it is not a mere diagram of a model. We create the model, using the tool, and then in many iterations, keep modelling, until we have a db definition that is sound.

The SQL generation bit is tiny, yes.

And sure, I can do the entire job without ERwin, but it is much faster with it. So at the end of the day, it is a productivity tool, that has modelling capabilities.

Sure, I can draw IDEF1X models in OmniGraffle, but it does even less checking and no change propagation, compared to ERwin.

> Sure,
> the SQL will be right; that's about as automatic and brain-dead a thing
> as can be imagined.

Agreed.

> Is it 3NF? In 2015, that's on you.

Yes.

What about correct ? What about Efficient, high performance, concurrency ? Totally on me ? Yes.

Same as if I wrote a contract using MS Word. Is it correct ? Legal ? Fair ? Totally on me.

Same as a IDEF1X model in OmniGraffle, or Visio. Is it 3NF ? Correct ? Efficient ? Totally on me.

So ?

> I spent many, many hours re-arranging my boxes and routing my
> connections,

I do that, only when ready to publish, once, for each db/app version release.

> and tediously moving names up and down to reflect what's
> the key and what's not (and what order I wanted the physical columns
> in).

I never do that.

> I worked my way through a few knotty 4-column keys and used such
> diagrams to explain contradictory nebulous understandings, wherein
> different departments had different ideas of what, say, a product is.

I do that on a whiteboard, once, draw it up on OmnuGraffle, once, and publish and forget.

I would not dream of doing that in the model itself (ERwin). Fiddling with and changing keys is a serious matter, the changes have to be propagated down the line, and the whole line has to be checked again.

Oh, I forgot. You have an RFS, no Relational Keys, yes, of course, you do not have a propagation problem. But then you don't have a Relational Database for propagation to be a problem in. Ok, you can keep changing your "keys" and moving the columns around without the considerations that I have.

> I am not at all convinced that's the best way.

What, to model ? Nonsense, setting aside the differences between the way you are I use ERwin, there is no other way, and there hasn't been for thirty years. You have to go through many, many iterations, as you develop and improve the model. For us, who cut a new release of db+app every quarter, there is no other way for an additional number of reasons.

But I suspect you mean something else.

> I don't deal in 200-table databases anymore. The databases I deal with
> nowadays have fewer tables and users, and lots more rows. I write the
> SQL directly

No wonder you hate it. I stopped doing that in 1993. I don't think I will ever forget that first IDE, DBArtisan.

> and rely on DRI for almost everything.

I presume you don't use triggers any more, or was that a different project.

I use constraints for everything, and DRI is one form of constraint.

> When I want a
> diagram for a database, I go the other way, and generate it with pic
> (cf. groff) from the SQL, plus some ancillary information for box
> placement and line drawing.

Ok, so you have no model, no iterations. The database is the "model".

> I suggest to you that pic is every bit as smart about diagrams as ERwin
> is about databases.

I wouldn't know, I stopped using troff when decent diagramming programs came out. Since about 2000, there is nothing out there that comes close to OmniGraffle. You have seen only my simplest drawings. Try this:
http://www.softwaregems.com.au/Documents/Article/Sybase%20Architecture/Sybase%20ASE%20Architecture.pdf
That is the free public version, about five years old, the current paid version is 40 pages, a fully cross-referenced (click an object and it takes you to the definition of it, etc) PDF.

> And both are equally smart about database design.

As long as you mean brain-dead, I agree.

> > Oh yeah, they are still working with text strings, 1's and 2's.
>
> You're convinced that your way -- designing via diagram --

See, I knew you switched barrels. To that point, my comments above apply.

First, it is not my way. I did not invent it, or implement it or provide the tools for it. It is Codd, Chen, Brown's way. LogicWorks provided the tool.

Second, the exercise is modelling, an iterative task, the object is a model, that increases in definition and quality, the drawing is single rendition of the model, same as the SQL DDL produced is a single rendition of the model. The sales people will tell you it is a full-blown repository.

Third, the other barrel. The human brain only, for Normalisation; FD; Relationalisation; accuracy; efficiency. There is no alternative. That is not replaced by a diagram, or a diagramming method, or a model.

You are erecting a sly Straw Man. It only takes your proposed counter-argument down, it does not touch the real thing, the real world. There is no smoke over heree, and all the flame is on your side. Make sure you have good-quality marshmallows, I hate the ones that fall off the stick.

> isn't just
> the best way, but the only way and the last way that ever will exist.

Counting the last thirty years and the present landscape, yes. Competition is non-existent.

I wouldn't bet on the future, but there are no contenders anywhere on the horizon, so I would say, at least for the next ten years, yes.

> I'm convinced that's not true.

Ok.

> I'm sure a better language than SQL
> could be invented that could be "compiled to" SQL and represented
> diagrammatically.

Why is that relevant ?

I'm sure a language better than awk could be invented, but I wouldn't give it up, or consider writing a replacement. When something substantially better than troff came along I switched.

SQL is not a language, I really don't understand the logic of faulting it for not being what it isn't.

If you like to write database commands in x's and y's and tick marks, go for it: write a pre-processor for SQL. Less than 1% will use it, but since you are obsessed with it, go for it.

The rest of us need the SQL, because that is what we have to look into when something in the hieroglyphics goes wrong, or has unintentional side-effects; that is what we have to work with when fiddling with the server. It cannot be displaced.

And for iterative modelling, a tool such as ERwin. The fact that it squirts SQL, and not TOTAL commands or COBOL WRITE commands, is irrelevant to this issue.

SQL is not the issue. Get over it. It is jsut the village that you get all the straw men from. Get an IDE.

> I don't see why it couldn't also capture meaning not
> in the SQL catalog, such as ownership, provenance, and definition.

Ownership, security, and definition are already there, at least in commercial SQLs.

Provenance is in the model, not in the catalogue, and it is transported to the catalogue in an indirect way, but it is there.

Oh, I forgot, you don't use hierarchies. Ok, no provenance for you.

Meaning. Well, just write the notes for the two levels at the beginning, just prior to excising it, and pour it back into the catalogue when your bones are ready.

I use ERwin, so my notes are stuck to the object, and go through all the transformations, all the iterations, and stick like glue, for eternity. When we do the SQL squirt bizo, I have a script (macro to you) that transfers all those notes to the SQL catalogue.

> Rather than defining tables per se, we'd define columns and their
> dependencies, and let a model-generator express them as tables in a
> schema known to be, say, 3NF.

(I don't design tables, I design a database in the full context of all its tables.)

That is a great idea.

But there are two problems. First, you don't have the 3NF/FDs, you have a tiny non-FD-fragment, so you are not going to get very far.

You have to get this right, and you are nowhere near it (pardon this, it is a clipping, I cannot give you the doc):
http://www.softwaregems.com.au/Documents/Article/Normalisation/NF%20Note%20FD.pdf

You guys are messing with a non-FD-fragment, plus 17 NF fragments that in toto deliver about 15% of 3NF/FD, devoid of two levels of meaning. We are using a 3NF/FD definition with full context, retaining two levels of meaning. You guys can't even discern a FD from "MVD" properly. We have only FDs, since 1971, with two types of dependency, single and "multi-valued". To me, "MVDs/4NF" are a failure to make that discernment, it actually breaks 2NF, just like Jan broke 1NF without being conscious of it. It is also a failure to understand Atomicity of Facts. (refer Köhler and Hidder papers). You are working with fragments of facts.

The diagram is from Course Notes, of course. It simply shows that the FD is dependent on the Key; without the key, there is not a THING for the FD to be functionally dependent on. And the two types of FDs, single and "MVD".

Until you get those very basic issues sorted, you have no chance at all, of getting columns-and-dependencies model to produce anything except more of the same, as you guys have produced in this thread. Zero. ø.

And then the issue of columns-and-dependencies minus keys, is absurd.

Second, where exactly, does that "known to be 3NF" come in, how is that provided ? Nothing short of a full AI system will give you that. You cannot substitute the human brain with an algorithm.

> There have been countless attempts in the last
> 30 years to express logic graphically. We might start with Logo, say,
> and include any number of CASE tools. (I don't suppose you remember
> Software Through Pictures?)

Loved it, but only as a novelty. SSADM (DFDs) has stayed with us for forty years.

> They. All. Fail. Visual Basic becomes
> Manual Basic when you're done drawing the dialog boxes.

I agree, with each of those statements, but not the context or intent.

> History is on my side.

Now the context and intent.

To the extent that you draw an history of the Straw Man, sure, that is awful, just awful. But it doesn't apply, so it doesn't matter.

Concerning real history, that applies, on our side: Get a grip. History is on our side, the 99%. ERwin is over 30 years old, the others in that category are a little younger. Millions of databases.

Concerning real history, that applies, on your side: there is zero history on your side.

> Meanwhile, we're living through an explosion of languages and are
> making progress with previously intractable problems such as
> correctness. If history is any guide, better database designs will
> come from better languages, not better diagrams.

Well, that is just a recap.

Better database designs come from one source and one source only: better educated humans.

They do not come from diagrams, so I don't know why you go on about that.

Humans will use tools during the iterative modelling process, so the tools have to stay, and you idea hasn't touched the iterative process.

Languages: I am all for it. Of course, it has to be something that the 99% can use. That excludes relation commands in an algebra.

You have articulated the pipe dreams of the one percent. It was never properly thought out or architected (same as Ingres/QUEL., same as Oracle). You have left a number of large and important areas unaddressed. The treatment is very superficial, and very optimistic.

Further you have diminished the value of the tools that exist on an inapplicable (Straw Man) basis. Such diminution has no effect.

The TTM groupies have produced nothing in twenty years.

Nothing has changed.

----

Thank you for your excellent post. I will get to the rest on the weekend

Cheers
Derek

James K. Lowden

unread,
Feb 15, 2015, 8:08:01 PM2/15/15
to
On Thu, 12 Feb 2015 06:55:38 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

Having read over your post several times, ISTM we're more in vociferous
agreement than not. Your focus is on designing a database -- a
particular one, based (of course) on a known enterprise of interest.
The academics you malign are describing -- and still searching for --
a way to do that automatically. You have no particular beef
with them except that from your point of view they've produced "zero"
in twenty years, while all along you've been doing just fine, thanks,
without said algorithm.

I think there's a there there. We should be able to generate
normalized designs from a set of columns and constraints.

I also think there's essentially no academic interest in the hard part:
the problem of devising a convenient notation (never mind tool) for
capturing constraints. To the extent they consider it at all, language
people consider it a database problem, and database people likewise
consider it out of scope, either a trivial non-problem or one belonging
to language design, not database theory.

Absent academic results, industry produces the language poverty we
currently live with: SQL, with all its warts, and proprietary GUI tools
for design.

I doubt you find much to disagree with there, except that you'd give
SQL higher marks, and you have automatic database design generation
filed under "pending results".

Below I try to clarify a few points and offer some particulars on the
super/subtype design we used for securities. Finally, I put it to you
that the design both obeys all rules of normalization and illustrates
the need for deferred constraint enforcement.

> > Actually, I'm sure you agree. By "DBMS" I mean "database management
> > system". It's a machine, and it has no concept of meaning.
>
> Ok, so you don't means DBMS.

I'm using the term "DBMS" literally, in the way I daresay everyone on
c.d.t. understands it. I wouldn't dream of using a private definition
any more than you would!

> You mean the theoretical concept of a DBMS. Abstracted to the point
> where those statements can be true. Ok, I agree.

It's not abstract in the least. I'm describing the very DBMS you're
using at your current place of employ.

Maybe you find it useful to think of the DBMS as "understanding"
things, of enforcing rules as you conceive them according to the
meaning you attach to the labels you attach to the columns. That's OK
with me. It's something else entirely to claim that fiction is somehow
more real, more concrete, than the actual reality: that the machine is
only a machine. It can no more appreciate the meaning of your database
than it can the beauty of the Mona Lisa.

> Here I am defining two discrete levels of meaning:
> - one within the table itself, relative to all the keys and
> attributes in the table (at whatever level of progress; whatever
> iteration)
> - a second, within the context of all the other tables in the
> cluster, that it could possibly relate to (ditto); reworking the keys
> overall; the hierarchy; etc

Thank you for explaining that, because I think it will help clear up a
misunderstanding.

I cannot accept these defintions of "meaning". I don't dispute they're
important. They're just not what "meaning" means in the context of
database design.

When I use the word "meaning", I'm talking about what real-world
objects the terms in the database represent. Remembering that the
machine is just a machine, recognize that every label is arbitrary; to
the machine, the column may as well be "x" as "price". But we human
beings associate meaning with "price". To us, it means something. In
fact, often it means several somethings.

The art of database design is to develop a single shared meaning for
all the relevant terms in the enterprise of interest.

(IMO that fact and its importance is generally under-appreciated. I've
never seen it discussed in any depth in any database-design or
project-management treatise, and I've read a bunch.)

Until and unless we know what e.g. "price" means, we don't know where it
belongs in the database schema. Probably it needs to be associated
with a time, or at least a date. It may be qualified to distinguish it
as the open/close/high/low/composite/bid/ask/trade/purchase price. Is
the price relative to an exchange? Does it apply to a class of bonds
rather than a particular one? And so on. That is the meaning we use
when we design our databases. That is how we know what's a key and
what's not, etc. Meaning doesn't derive from the database design;
meaning exists without the database (in both senses) and is expressed
in the design.

And that is what we mean when we speak of "meaningful results" from a
query: a logical conclusion derived from a set of facts stored in the
database. To the machine, a bunch of tuple-types, devoid of meaning. To
us, yesterday's returns, the very stuff the enterprise of interest is
interested in.

> > I'm not being pejorative: The "jumping out" is the practice of
> > associating meanings with (column) names and deciding what
> > identifies what.
>
> There you go with that word again. Yes. Agreed.
>
> Which is why I say your algorithm is stupid because it strips the
> meaning out of the names that require the meaning to be maintained;
> rolled over the tongue; and determined; in the context of all the
> tables in the cluster; not alone. It is self-crippling.

Count me out. I design databases much as you do. I determine which
columns are unique (PK, UNIQUE), which attributes are required and
which not (NULL), what cardinality (FK). Print, review, repeat.

Implicit in that process is a set of functional dependencies. I know
they're there; I could isolate them if need be. But they don't drive
my process. They're a formal model for my "is this a property of
that?" question. As you put it,

> Whereas the Codd algorithm retains the meaning, finds the Key first,
> then uses the FD to validate the determined Key, then the attributes,
> etc. So yours is bottom-up, devoid of the meaning that you claim is
> relevant, and ours is top-down down with meaning, and the meaning
> gets clarified (eg. we improve the names, improving and
> discriminating the meaning) during the process.

Everyone agrees that the problem of discovering the FDs is
non-trivial. Beyond the problem of knowing there is such a thing,
there's the trick of extracting them from the incoherent thinking that
passes for "business rules" in every organization. (That's not a
remark on anyone's intelligence or seriousness of purpose. These are
human beings; no system exists to ensure referential integrity within
or among our skulls.)

> > > - I am saying the theoreticians in the RDB space are stupid
> > > because they are using a hammer for a task that calls for an axe.

You would agree that if FDs were sprinkled down from the sky, that it
would be nice to apply an algorithm to them to generate your database
design. What you're mocking isn't the logical process they would
employ but that they are, as it seems to you, working on a process with
no input. The difficulty, in your experience, isn't in deriving a
design from a set of FDs, but in discovering them in the first place.

> Wouldn't it be better to retain both levels of meaning, all the way
> through ?

If you're designing a database, sure. If you're designing an algorithm
to design a database, no.

I think the theoretical problem with the "Codd and Derek method" is
that it's not exactly an algorithm. The process terminates when you're
satisfied with it, when you don't see any more problems. An algorithm
terminates when a specific condition is met. For a machine to evaluate
it, that condition has to be clearer than "looks good to me".

> Having just one algorithm , that is tractable is not an economical
> position to be in.

Perhaps. And if your method is an algorithm, that's the position
you're in, no? Because there's no two ways to do it, right? ;-)

> Any problem is tractable.

You are of course aware that some problems are not tractable. I assume
you mean something like "all databases are designable", which I'm sure
is true.

> > ERwin is a good tool, the best I ever saw. [...] It would be nice
> > to define a relationship-rule in a symbolic library, and be able to
> > apply it to a given set of tables.
>
> You must have been using a very old un-maintained version. By 1995,
> it had all that and more. The macro language is much more powerful.

We were using the current version. I guess I wasn't clear wrt the
macro language. Yes, the ERwin macro language was quite powerful (and
arcane). It was also trapped in the ERwin tool. I would have liked to
have had a database-design macro language independent of the tool,
something like m4 for databases.

[long ERwin discussion omitted]

I concede I was oversimplifying to some extent, and found ERwin guilty
of not solving a problem it wasn't designed to solve, namely checking
the model for normalization. To the extent it allows the designer to
work at a higher level of abstraction and avoid repeating things,
that's great. We remain a long way away from a tool that does
automatically that which could be automated, but which ERwin leaves to
the human brain.

> > I wrote macros to generate triggers to enforce relationships that
> > couldn't be declared, such as sub/supertype relationships that
> > required both parts of a two-part entity, where the first part was
> > fixed and the second part was in one of N mutually exclusive
> > tables.
>
> So, after reading your para four times, and I really want to
> understand the problem, I still have no idea what the problem is.
> Would you please give me a better description

Since you ask, I will indulge you. I don't say it's the only or best
way. I haven't thought of a better one, but I haven't addressed myself
to the problem in 15 years.

The Securities table has {ID, Type, Name, Active}. The ID is internally
assigned because as you know we have an entire industry whose central
product has no universally recognized identifier. Type is a
discriminator, one of {'equity', 'bond', 'swap', etc.}; they reflect
the enterprise of interest, not the whole securities market. Name is
obvious, and Active distinguishes between instruments currently traded
and those expired (e.g. an option past its exercise date). We called
this a supertype for ERwin purposes.

Naturally there are other attributes to capture, and they vary by
instrument type. For each Type in Securities there is a table to hold
that instrument's particulars: Equities, Bonds, Swaps, etc. We called
this a subtype in ERwin, although I'm not sure how useful it is to
call Equities a "subtype" of Securities.

The rule is that a security is represented by a pair of rows, one in
Securities and one in the subtype table. There *must* be two rows,
and only two rows, and the subtype table must be the one indicated by
Securities.Type.

Declare that, Kemosabe! ;-)

Of course, you could weaken the rule. You could eliminate the Type
column, and require each subtype have a FK relationship to Securities.
But that would not require a Securities row to have a related subtype,
and would not prevent one Securities row from being referred to by more
than one subtype table (e.g., "being" both Equity and Bond). And, as a
practical matter, you couldn't get the security's type without scanning
the union of subtypes looking for the appearance of the ID.

You could also put all securities in a single table with the union of
all security attributes, all NULL, and apply the mother of all CHECK
constraints to get them right. Although I'm pretty sure SQL Server 7.0
at the time didn't support CHECK. We were pretty spartan with declared
integrity constraint checking in any case, for feasibility reasons.
bulk loads were vetted for correctness before and sanity-checked
afterward, and interactive updates were required to use stored
procedures (some of which used temporary tables as implicit
parameters). But I wouldn't use a single table, even today;
attributes with an optional relationship to the key usually should be in
separate tables.

Aside: some years later, I looked into an incident of the kind that had
by then acquired a name: "SQL injection". Of course I soon discovered
they weren't using stored procedures, and had granted (possibly
administrative) access rights to little Bobby Tables via the
webserver. We have an entire industry built on application-developer
paranoia based on willful ignorance of How to Use the DBMS. I
understand the tech-politics that motivate the situation, but not why
management abdicates its stewardship of the firm's data. [end aside]

Now that I've described the design, I want to take you to task.

You've posted a few colorful rants about the nonexistence of
bona fide circular references in the real world, and therefore in
database designs. ISTM you overlook mutual dependence. There are
reasons to have tables with 1:1 cardinality with both components
mandatory. Sometimes the reason is physical, because one part is
updated or referenced or replicated more than the other. Others are
logical, as with this securities design. There's perfectly nothing
wrong with it relationally, and yet the semantics cannot be enforced
with standard SQL.

You have wickedly mocked the need for deferred constraint enforcement
-- constraints that apply to the transaction, not to the statement --
on the theory that all such mutuality is illusory. But you are in a
trap: if it's OK to have two non-null attributes on one row (1:1
mandatory cardinality), why is it not OK to have the same relationship
between two attributes on two rows? And if 2 tables can be related
that way, why not 3 or more?

--jkl

James K. Lowden

unread,
Feb 15, 2015, 8:08:03 PM2/15/15
to
On Thu, 12 Feb 2015 10:52:54 +0100
Nicola <nvitac...@gmail.com> wrote:

> In article <20150212012341.d...@speakeasy.net>,
> "James K. Lowden" <jklo...@speakeasy.net> wrote:
>
> > Quoth Nicola on Thu, 05 Feb 2015:
> > > A few years ago I implemented a few algorithms from Koehler's PhD
> > > thesis in a Ruby script. Given a set of FDs, the script finds all
> > > the keys and all the minimal covers....
> >
> > My first reaction is a little unkind. I think this is what lawyers
> > call "assuming facts not in evidence". *Given* a set of FDs, the
> > program generated a 3NF database design. Hurray! Now, where to
> > find those pesky FDs for input?
>
> From the requirements? I'm not sure I grasp what your point is.
> Do you mean that turning some requirements into FDs is difficult (you
> would not be alone: Stonebraker says that "mere mortals do not
> understand FDS")? Or that FDs (and even MVDs, JDs, etc) comprise only
> a tiny fraction of real system's set of requirements, so it's not
> worth bothering?

Yes, "turning some requirements into FDs is difficult" and, no, I'm not
suggesting it's not worth bothering.

Part of Derek's argument is that an E-R diagram is the best way to
capture the requirements. He has a tool that lets him nominate primary
keys and designate FK relationships (and enforce some other domain
constraints). He doesn't see any point to FD formalism because that
process lets him create solid designs without it.

The problem with his process IMO is that the proof of the pudding is
only in the eating. Because there is no formal FD description, there
is no verification that the design reflects them.

> Computationally, database design problems are (highly) intractable in
> general. But in practice many problems can be solved. For example,
> finding the keys of a schema (given the FDs) is an inherently hard
> problem, but for schemas with up to a few tens of attributes or so,
> only in few cases you are not able to find them in a reasonable
> amount of time. On the contrary, finding all the minimal covers is
> much less practical, even when there are not many of them.

Even if the problem is NP hard in general, I suspect there are ways to
prune the problem space such that more than "tens" of attributes is
feasible. (Unfortunately, not many people interested in that kind of
work are addressing themselves to this particular problem.) If human
beings are able to design normalized databases with hundreds of
attributes (1000 is a lot, 10,000 I've never heard of), they must be
doing something the machine could also do. To me that's prima facia
evidence the algorithms we're using are naïve.

Granted, that's a hand-wave, and you'd be justified to answer that if
it's so easy, maybe I should just show everyone how it's done. I'm not
saying it's easy. I'm saying we're making progress on other
computationally hard problems by limiting the problem space, and that
there's evidence this field could profit from the same work.

> FDs are a special type of first-order formulas. I can imagine
> defining an English-like syntax for them.

I'm sure, but the problem is not as simple as you suggest.

Notational convenience is a big deal, and more art than science. If I
may say so, the field of databases hasn't produced much in the way of
compilable languages. We got SQL (not from a lab) and stopped.

I had to learn the value of notation the way I learn everything: the
hard way. Some years ago I took up an interest in dependency tracking
for software packages. To represent packages and their dependencies in
a relational database design is no big deal, maybe 10 tables and 50
attributes IIRC. But, how was the user of my database going to present
the dependencies to it? With a sequence of INSERT statements?
Hardly. Thus did I find myself in the world of language design when I
thought the problem was only one of satisfying dependencies.

I believe a language could be devised that requires less input
from the user than do current systems, and yet is complete in the sense
that it defines sufficient input from which to generate a design. I
have not seen that language yet. The tersest "language" yet devised
for the purpose is graphical. Again, though, hardly a hotbed of
research.

--jkl

Erwin

unread,
Feb 16, 2015, 10:41:35 AM2/16/15
to
Op maandag 16 februari 2015 02:08:01 UTC+1 schreef James K. Lowden:
> and yet the semantics cannot be enforced
> with standard SQL.

Tsk tsk tsk.

Standard SQL is the only SQL flavour where it _CAN_ be done ...

Nicola

unread,
Feb 17, 2015, 5:38:02 AM2/17/15
to
In article <20150215200802.3...@speakeasy.net>,
"James K. Lowden" <jklo...@speakeasy.net> wrote:

> Yes, "turning some requirements into FDs is difficult" and, no, I'm not
> suggesting it's not worth bothering.

You might replace "FDs" with any other formalism, and it will still be
true. Turning requirements into anything formal is always going to be
difficult. That's where intuition, experience, and communication skills
play a role, and where mathematics has little or nothing to say. If you
are designing a database for an application domain you are not familiar
with, just understanding the vocabulary and the relevant concepts may be
challenging. Some time ago, it took us several meetings to nail down
what a call center company meant by "services", "activities",
"contacts", "sessions", "events", and so on and so forth, and to tie
these things together into a coherent design. The toughest questions
were not of the kind: "What are the keys of this thing?", but of the
more primitive kind: "What is this thing?" :)

When you have a thorough understanding of the domain, determining which
requirements can be translated into a given formalism, and perform such
translation, is comparatively easy (if you know your formalism, I mean).
Here, a good balance between expressiveness and simplicity is key (pun
intended).

> Part of Derek's argument is that an E-R diagram is the best way to
> capture the requirements. He has a tool that lets him nominate primary
> keys and designate FK relationships (and enforce some other domain
> constraints). He doesn't see any point to FD formalism because that
> process lets him create solid designs without it.

E-R diagrams (we may include IDEF1X, although IDEF1X is a bit different
from typical E-R diagram) have a better balance (in the sense above)
than FDs. They are a reasonable approach to design, possibly the best we
currently have.

> The problem with his process IMO is that the proof of the pudding is
> only in the eating. Because there is no formal FD description, there
> is no verification that the design reflects them.

His claim is that the only dependencies are those from the keys, and
those dependencies are in the diagram. So, they are reflected by design
and there's nothing to verify.

My point of view is that, given the current state of the theory, using a
formal approach (e.g., using FDs a *starting* point) is not really a
superior way to tackle the design problem-and not for the lack of
simplicity, but for the lack of expressiveness. It makes sense, however,
to apply dependency theory after we have a design, as an enrichment to
the design (and as a verification that we haven't missed some important
constraint).

> > Computationally, database design problems are (highly) intractable in
> > general.
>
> Even if the problem is NP hard in general, I suspect there are ways to
> prune the problem space such that more than "tens" of attributes is
> feasible. (Unfortunately, not many people interested in that kind of
> work are addressing themselves to this particular problem.) If human
> beings are able to design normalized databases with hundreds of
> attributes (1000 is a lot, 10,000 I've never heard of), they must be
> doing something the machine could also do. To me that's prima facia
> evidence the algorithms we're using are naïve.

Well, typically humans do some kind of "pre-normalization" by
recognizing, say, that information about a student and about a course
must be in different schemas... This reduces the complexity a lot. If
you don't give that information to the machine, then it will have a
tough time. Once, just for fun, I fed my script with most of the
attributes and the related FDs from a popular web application, to see
whether it would output a better schema. No way (it didn't terminate) :)

And yes, I agree that there hasn't been so much algorithmic research,
probably because there isn't so much motivation (you may already build
solid designs with existing tools, so what?). But when I see certain
database schemas in the real world, I can't stop myself from thinking:
"If you had let the computer do that, it would have turned out much
better!" (Ehm, if it only stopped) :)

Btw, one field where algorithms might turn useful is in the so-called
Object-Relational mapping tools. Currently, those tools result in
horrible database designs, because they typically build a one-one
mapping from classes to schemas and from instance variables to
attributes. How about a procedure that infers FDs from a (possibly
annotated) object graph and derives a good, normalized, database schema
to back that structure?

> I believe a language could be devised that requires less input
> from the user than do current systems, and yet is complete in the sense
> that it defines sufficient input from which to generate a design. I
> have not seen that language yet. The tersest "language" yet devised
> for the purpose is graphical. Again, though, hardly a hotbed of
> research.

Agree.

Derek Asirvadem

unread,
Feb 17, 2015, 6:53:24 AM2/17/15
to
James

> On Monday, 16 February 2015 12:08:01 UTC+11, James K. Lowden wrote:
> > On Thu, 12 Feb 2015 06:55:38 -0800 (PST) Derek Asirvadem <derek.a...@gmail.com> wrote:
>
> Having read over your post several times, ISTM we're more in vociferous
> agreement than not.

Yes, very well put.

And also, there are a few subjects, entire subjects, that we have little in common about.

> Your focus is on designing a database -- a
> particular one, based (of course) on a known enterprise of interest.
> The academics you malign

Excuse me. There are no academics, no theoreticians, no scientists, in this field. There hasn't been, since Codd left.

Sure, there are a few pseudo-scientists, fragmented theoreticians, isolated academics, in this field. Yes, they have produced nothing. I don't normally malign or insult anyone. However, when these sub-humans start (a) their various and sundry frauds, (b) talking down to and insulting practitioners (who have produced something substantial; over twenty databases for me; millions if you count all of us), which is an insult to the mind, yes, that does get up my nose, and yes, I insult them in return. Never as a start, it would be a sin.

> are describing -- and still searching for --
> a way to do that automatically. You have no particular beef
> with them except that from your point of view they've produced "zero"
> in twenty years,

Er, 2015 minus 1970 equals forty five years.

> while all along you've been doing just fine, thanks,
> without said algorithm.

Yes, for those same forty five years. Without said algorithm which is stupid as detailed previously, and with Codd's algorithm, which is sound. Without their non-FD-fragments and with Codd's FD/3NF. Without their pipe dreams, and with existing modelling tools. Yes.

So, your statement is false. I and millions of other RM adherents have been doing fine, precisely because we have been served by other RM adherents, including vendors. We are doing fine in the absence of a stupid algorithm, yes.

Beef. I have many, and I have posted to that effect. Crippled. Frauds (two levels, first by using private definitions and therefore isoalting themselves; second, by propagating falsities about the science that does exist). The dozen or so book-writers commit an additional, third fraud. Interfering with what they are clueless about. Damaging the industry, that exists solely due to the capabilities of others. Failing to provide an exposition of the RM, and instead demeaning it. Fiddling around with non-FD_fragments, which was supposed to produce some marvellous result that would save the planet, and not producing a result of any kind. Dreaming about things (such as automating the description/design of a db), which is completely outside their scope, as well as outside their demonstrated capability (it is twenty years for that one, yes). Not an exhaustive list, but a good start.

Other than that small herd, no, I have no beef.

> I think there's a there there. We should be able to generate
> normalized designs from a set of columns and constraints.

Pipe dreams. Outside their area of responsibility. Outside their demonstrated capability. There is just too many steps to think of, that you are hilariously reducing to a single step. Based on an absurd, and stupid, algorithm that has not produced a key. Three weeks, not one single key amongst the lot of you. Previously detailed. Please read.

> I also think there's essentially no academic interest in the hard part:

Exactly right. There never is. Academics who produce results, who work the hard part, work for vendors.

The other kind put sticky brown stuff in their pipe and smoke it. Then they contemplate the curling smoke. Then they write relational algebra to describe said curls.

They've never kissed a girl.

> the problem of devising a convenient notation (never mind tool) for
> capturing constraints. To the extent they consider it at all, language
> people consider it a database problem, and database people likewise
> consider it out of scope, either a trivial non-problem or one belonging
> to language design, not database theory.

Ok, but you are counting only the theoreticians have have produced nothing. Those who have produced something, do not take either of those positions. Eg. Codd didn't, he covered both sides.

Eg. My colleagues and I don't. I have no problem at all with the existing tools, precisely because they are tools, not a db design engine. It is no problem at all to draw up this or that diagram, using a recognisable notation, to cover some aspect that an IDEF1X Diagram doesn't convey (the IDEF1X Model having those aspects implemented).

> Absent academic results, industry produces the language poverty we
> currently live with: SQL, with all its warts, and proprietary GUI tools
> for design.

Poverty ???

You are living on another planet. There ain't no poverty here. Sure, I haven't answered your points re SQL in your previous posts, but that does not mean we are not served. (I am working on the response.)

Take the RM, Codd invented it, IBM and Sybase built it. Just like the RM, Codd invented the data sub-language for accessing the RDb, IBM built it, and Sybase built it better. And now, in case you haven't noticed, there are thirty or so SQLs and Non-SQLs and Pretend-SQLs.

There ain't no poverty here on this planet. We are well-served. We have the tehory from forty-five years ago, and the implementation platforms are very good, and constantly improving. They have taken the RM to the end, and implemented all of it (the bits that the pig-poop-eaters say is "incomplete"; the natural extensions that are exposed only from faithful and devout use).

The only poverty that I know of is the staggeringly isolated theoreticians who allege that they serve this space, who [repeat above] are (a) ignorant of what we do have and (b) dream about what we should/could have. Waste of oxygen. Those creatures wail about SQL, as if it was their dream that doesn't exist, and moan about it not having the features that their dream that doesn't exist has. The word is schizophrenic. The argument is Straw Man, ok, a second generation form of Straw Man, from the same freaks who have forty five years experience with the Straw Man.

Meanwhile, back at the farm, we just take up the tool, and use it to do the job. We don't cry about the tool not being a twenty-something beauty queen. There ain't no poverty here on this farm. We are not interested in a tractor that would drive itself and plough the acreage, while we sit in front of the telly with a remote control.

Notice, by the Grace of God, I live in the lap of abundance. Notice, you are trying to tell me that I live in poverty. Epic fail. You have to sell that incoherent idea first, in order to justify a replacement.

> I doubt you find much to disagree with there,

Refer above.

> except that you'd give
> SQL higher marks,

From your previous post (the SQL part which remains unanswered), I believe that if you marked SQL against what it is declared to do, and not what it is not declared to do, which is your ongoing lament, you and I would give it the same marks.

> and you have automatic database design generation
> filed under "pending results".

Worse. "No results in forty five years, twenty five years since results were promised". Additionally stamped "No results expected, due absence of academics in the field". Moved to the archives, stamped "Pipe dream, required steps never thought out."

> Below I try to clarify a few points and offer some particulars on the
> super/subtype design we used for securities.

Hopefully some meat, in this boiled rice diet.

> Finally, I put it to you
> that the design both obeys all rules of normalization and illustrates
> the need for deferred constraint enforcement.
>
> > > Actually, I'm sure you agree. By "DBMS" I mean "database management
> > > system". It's a machine, and it has no concept of meaning.
> >
> > Ok, so you don't means DBMS.
>
> I'm using the term "DBMS" literally, in the way I daresay everyone on
> c.d.t. understands it. I wouldn't dream of using a private definition
> any more than you would!

Sure. But it is a dare, and it is drawing a long bow. Given what you really mean (preceding para, not repeated), you are clearly treating it as the theoretical machine, and only the theoretical machine.

> > You mean the theoretical concept of a DBMS. Abstracted to the point
> > where those statements can be true. Ok, I agree.
>
> It's not abstract in the least. I'm describing the very DBMS you're
> using at your current place of employ.

Stop being silly. You are describing the theoretical machine, quite transparently to serve your theoretical purpose, and you have limited the description to machines that do exist. Fine with me, so far. But when you declare that the two are the same, you leave the norm, and enter the asylum.

My Sybase ASE is about one million, may be two million, TIMES different to your in-theory-we-should machine.

> Maybe you find it useful to think of the DBMS as "understanding"
> things, of enforcing rules as you conceive them according to the
> meaning you attach to the labels you attach to the columns. That's OK
> with me.

Ok, we are on the same page.

> It's something else entirely to claim that fiction is somehow
> more real, more concrete, than the actual reality: that the machine is
> only a machine.

Now you are running around in a circle. You stated, and I agreed, that the machine is only machine. No idea what you are alluding to re the "fiction".

> It can no more appreciate the meaning of your database
> than it can the beauty of the Mona Lisa.

Ok, back to the same page.

> > Here I am defining two discrete levels of meaning:
> > - one within the table itself, relative to all the keys and
> > attributes in the table (at whatever level of progress; whatever
> > iteration)
> > - a second, within the context of all the other tables in the
> > cluster, that it could possibly relate to (ditto); reworking the keys
> > overall; the hierarchy; etc
>
> Thank you for explaining that, because I think it will help clear up a
> misunderstanding.
>
> I cannot accept these defintions of "meaning". I don't dispute they're
> important. They're just not what "meaning" means in the context of
> database design.
>
> When I use the word "meaning", I'm talking about what real-world
> objects the terms in the database represent. Remembering that the
> machine is just a machine,

(Good, we are three times on that one page)

> recognize that every label is arbitrary; to
> the machine, the column may as well be "x" as "price". But we human
> beings associate meaning with "price". To us, it means something. In
> fact, often it means several somethings.
>
> The art of database design is to develop a single shared meaning for
> all the relevant terms in the enterprise of interest.

Whoa. That is a massive new declaration, with no supporting arguments. I am not saying that I reject it, just that it is a nice idea, unformed at this stage.

> (IMO that fact and its importance is generally under-appreciated. I've
> never seen it discussed in any depth in any database-design or
> project-management treatise, and I've read a bunch.)
>
> Until and unless we know what e.g. "price" means, we don't know where it
> belongs in the database schema. Probably it needs to be associated
> with a time, or at least a date. It may be qualified to distinguish it
> as the open/close/high/low/composite/bid/ask/trade/purchase price. Is
> the price relative to an exchange? Does it apply to a class of bonds
> rather than a particular one? And so on. That is the meaning we use
> when we design our databases. That is how we know what's a key and
> what's not, etc. Meaning doesn't derive from the database design;
> meaning exists without the database (in both senses) and is expressed
> in the design.

Whoa. After that [six paras] lead-up, I was expecting you define what /you/ mean by "meaning", given that you reject what we mean by "meaning", while acknowledging that our meaning is important. You didn't.

First, I agree with most of that.

If I take your para above as almost-definitive, then my first-level "meaning" includes all of it, every single bit. And anything else in that vein.

So I am at a loss to understand why you excised that out of my "meaning", or considered it different. My "meaning" includes all of your "meaning", PLUS:
a. differentiation at the intra-table level vs the extra-table level
b. determination of the Keys DURING the Normalisation/modelling exercise, which is limited by [a], hence the iterations; the back-and-forth
c. rejection of your non-FD_fragments, which would reverse the process and stand it on its head (not coincidentally, the way the devil is depicted in religious texts)

> That is the meaning we use
> when we design our databases.

Yes.

> That is how we know what's a key and
> what's not, etc.

Nonsense. That is how you do it, and you don't follow the RM, you have only RFSs.

That is not the way we do it. We follow the RM, and we work with IDENTIFIERS, PRIMARY KEYS, etc, that identify Atomic Facts. All three of which you have demonstrated, you (all of you) have no experience with.

So we are executing a form of Normalising/Modelling that is beyond you. The entities we arrive at in the first phase (Key Determination; Atomic Facts), will be completely different to the entities that you arrive at. The fact that I solved Köhler's and Hidders' problem in this first phase alone (no attributes, no second phase), stands as evidence that you (plural) have no understanding of it.

The second phase, now that we have correct Keys, now that we have solid pegs to hang our attributes on, is easy, and you have articulated that well, we are in agreement there. Yes, yes, all types of meaning, ours and mine combined. And notably, again, two levels, if we don't have good pegs to hang our saddles on, we have to build them first.

> And that is what we mean when we speak of "meaningful results" from a
> query: a logical conclusion derived from a set of facts stored in the
> database. To the machine, a bunch of tuple-types, devoid of meaning. To
> us, yesterday's returns, the very stuff the enterprise of interest is
> interested in.

Ok.

You still have to keep whatever meaning there is in the column names, etc, and improve them during the process, etc. But you stripped all that out at step 1 (for academic reasons, yes ?).

And now you want the machine that is devoid of meaning, to answer a query that has a lot of meaning.

Ok.

Just don't try to sell that machine outside, in the real world.

> > > I'm not being pejorative: The "jumping out" is the practice of
> > > associating meanings with (column) names and deciding what
> > > identifies what.
> >
> > There you go with that word again. Yes. Agreed.
> >
> > Which is why I say your algorithm is stupid because it strips the
> > meaning out of the names that require the meaning to be maintained;
> > rolled over the tongue; and determined; in the context of all the
> > tables in the cluster; not alone. It is self-crippling.
>
> Count me out. I design databases much as you do.

(The evidence is, no you don't, our databases could not be more different.)

> I determine which
> columns are unique (PK, UNIQUE), which attributes are required and
> which not (NULL), what cardinality (FK). Print, review, repeat.
>
> Implicit in that process is a set of functional dependencies. I know
> they're there; I could isolate them if need be. But they don't drive
> my process. They're a formal model for my "is this a property of
> that?" question. As you put it,
>
> > Whereas the Codd algorithm retains the meaning, finds the Key first,
> > then uses the FD to validate the determined Key, then the attributes,
> > etc. So yours is bottom-up, devoid of the meaning that you claim is
> > relevant, and ours is top-down down with meaning, and the meaning
> > gets clarified (eg. we improve the names, improving and
> > discriminating the meaning) during the process.
>
> Everyone agrees that the problem of discovering the FDs is
> non-trivial.

You mean your non-FD-fragments ? Sure, for the fools that use such, who have the process inverted, sure, it is non-trivial. And hard. And they dream of a machine that would do it for them.

You mean the real FDs with der Codd algorithm ? Nonsense. The process is trivial. How do you think, in two evidenced instances (PDFs with summary results posted; details in several threads), I was able to solve the entire problem, and elimiante the entire proposal, in ten minutes flat ? Magic ? Intuition (codeword for not setting brains in the toilet while working) ? No, just the algorithm, the method.

We do not "discover" non-FD-fragments. We Determine Keys, and then use the FD Definition to validate the Keys.

THe FDs (or the non-FD_fragments) do not drive our process. Fact determination, of which Key Determination is central, drives our process.

You guys have zero understanding of the process, you have a theoretical understanding of a few fragments of the process.

> Beyond the problem of knowing there is such a thing,
> there's the trick of extracting them from the incoherent thinking that
> passes for "business rules" in every organization. (That's not a
> remark on anyone's intelligence or seriousness of purpose. These are
> human beings; no system exists to ensure referential integrity within
> or among our skulls.)

Excuse me, such a system most certainly exists, and further, it works brilliantly. But you guys don't want to hear about it, you deny that system.

Without that system, which cannot be employed in the requirements analysis phase, although far less people there than here deny it, then:

We don't need tricks. We teach the users as we go, what we are doing, repeating what we think they said back to them, we show them the model (it is an official tool for communication), and we go back and forth. Many iterations, with meaning (all senses) increased with each iteration. Each side gets educated about the other side.

For me, it is not an extraction process, it is a relationship building process that starts at the beginning, and ends, not with the database delivery but at the sign-off three months later. All the meaning is built up during that period (six months to 24 months), and built up consistently, resolving issues, errors, personalities, communication styles.

> > > > - I am saying the theoreticians in the RDB space are stupid
> > > > because they are using a hammer for a task that calls for an axe.
>
> You would agree that if FDs were sprinkled down from the sky, that it
> would be nice to apply an algorithm to them to generate your database
> design.

If you non-FD-fragments were sprinkled down from the sky, I would stay inside, and make sure the dog was inside. The tree-huggers and the meth-heads can run around collecting them. I have stated severally, that I couldn't care less about the algorithm, because (a) it is stupid and (b) has produced zero results in three full weeks. And for the new purpose you present in this post, hilarious, because nothing less than a full-blown Artificial Intelligence system can "generate a database design". you cannot substitute the human mind with a stupid algorithm or three, you need full AI.

> What you're mocking isn't the logical process they would
> employ but that they are, as it seems to you, working on a process with
> no input.

I am mocking both, and the mocking only came later, after repeated attempts to shut the stupid creature down, for being too stupid to contemplate, failed. I also mocked the fact that you excised the meaning at the beginning, and now you are crying about it being absent (no input) at this stage, where emeaning is as you claim, important.

> The difficulty, in your experience, isn't in deriving a
> design from a set of FDs, but in discovering them in the first place.

Please stop putting words in my mouth. Just read the words I gave. I do not think that my powers of expressing the written word are lacking.

1. I didn't say "difficulty",
I said "stupid", "not thought through". I said "impossible", especially given the demonstrated prowess of the people concerned.

2. I didn't say "deriving a design from a set of Non-FD-fragments",
I said your whole non-FD-fragment method is stupid, stupid, stupid. And if and when their purpose is described to me (ie. other than what you have given, which is, "gee, golly, gosh, they are really important"), I will entertain the formation of another considered opinion. I also said you have the process backwards, reversed.

3. I didn't say "discovering non-FD-fragments".
I said, I have no problem at all, "discovering" FDs, the process is trivial. I said I DETERMINE the Keys first, and use the then TRIVIALLY identified FDs second, to validate the keys and attributes.

4. I said, there is nothing that Codd and Derek's FDs have in common with your Non-FD-fragments. I said, they cannot be compared. Do not apply anything that I said about FDs to your non-FD-fragments, it will elevate them, artificially and fraudulently.

> > Wouldn't it be better to retain both levels of meaning, all the way
> > through ?
>
> If you're designing a database, sure. If you're designing an algorithm
> to design a database, no.

That is too funny to contemplate. But I am sure you had a straight face when you wrote, and you still do when reading this. I am not sure which is funnier, the staggering statement, or the fact the you said it with a straight face.

> I think the theoretical problem with the "Codd and Derek method" is
> that it's not exactly an algorithm. The process terminates when you're
> satisfied with it, when you don't see any more problems. An algorithm
> terminates when a specific condition is met. For a machine to evaluate
> it, that condition has to be clearer than "looks good to me".

That is a very flimsy, and single-marginal-point, method for suggesting that it is not an algorithm.

You cannot write an algorithm than terminates when the <total_unresolved_conditions> equals zero ? After giving it a long list of specific quantifiable conditions to be resolved ?

But you are going to write an algorithm that designs a database. Without meaning. That answers queries that are heavily imbued with meaning.

It is an algorithm. Read all about it in the RM, and in books that describe Normalisation (NOT the NFs).

> > Having just one algorithm , that is tractable is not an economical
> > position to be in.
>
> Perhaps. And if your method is an algorithm, that's the position
> you're in, no? Because there's no two ways to do it, right? ;-)

Well, if the algorithm is good, sure. If you haven't canvassed the alternatives, no. But if the algorithm is bad (and yours in abysmal, zero results), don't waste your time, stop. And address the problem again, so as to have more clarity; more definition of the problem; and the algorithms will come from that.

> > Any problem is tractable.
>
> You are of course aware that some problems are not tractable. I assume
> you mean something like "all databases are designable", which I'm sure
> is true.

No. I meant all defined problems are tractable.

If there is a problem that you consider intractable, I say, flatly, you simply have not defined the problem. During that scientific process of defining the problem, you will identify measures, methods, such that the problem is tractable, and the progress made can be tangibly measured.

I agree, your non-FD_fragments, backwards approach, to the "problem" of identifying keys (let alone designing a database), is intractable. Precisely because it is ill-defined. Further, I repeat, there are at least 12 large steps (separate and discrete "problems") that you have collapsed into a single magical flute. Ill-defined squared, cubed.

> > > ERwin is a good tool, the best I ever saw. [...] It would be nice
> > > to define a relationship-rule in a symbolic library, and be able to
> > > apply it to a given set of tables.
> >
> > You must have been using a very old un-maintained version. By 1995,
> > it had all that and more. The macro language is much more powerful.
>
> We were using the current version. I guess I wasn't clear wrt the
> macro language. Yes, the ERwin macro language was quite powerful (and
> arcane). It was also trapped in the ERwin tool. I would have liked to
> have had a database-design macro language independent of the tool,
> something like m4 for databases.
>
> [long ERwin discussion omitted]
>
> I concede I was oversimplifying to some extent, and found ERwin guilty
> of not solving a problem it wasn't designed to solve,

Good that you see that. Now see if you notice, you are doing tthe same thing with SQL.

> namely checking
> the model for normalization. To the extent it allows the designer to
> work at a higher level of abstraction and avoid repeating things,
> that's great. We remain a long way away from a tool that does
> automatically that which could be automated, but which ERwin leaves to
> the human brain.

Sure. But some of us are not waiting for that magical mystical tool.

> > > I wrote macros to generate triggers to enforce relationships that
> > > couldn't be declared, such as sub/supertype relationships that
> > > required both parts of a two-part entity, where the first part was
> > > fixed and the second part was in one of N mutually exclusive
> > > tables.
> >
> > So, after reading your para four times, and I really want to
> > understand the problem, I still have no idea what the problem is.
> > Would you please give me a better description
>
> Since you ask, I will indulge you.

Oh come on. Sure, you are indulging me by answering my question. But that was predicated by you raising a "problem". So the fact is, I am indulging you.

> I don't say it's the only or best
> way. I haven't thought of a better one, but I haven't addressed myself
> to the problem in 15 years.
>
> The Securities table has {ID, Type, Name, Active}. The ID is internally
> assigned because as you know we have an entire industry whose central
> product has no universally recognized identifier.

Correct, but unfairly stated.

There is a good identifier, but it is limited to the exchange. If you trade globally, sure, you have no universal Identifier. If you understand the first sentence, the identifier is obvious ( ExchangeCode, {Ticker | ASXCode |Etc} ). But you don't have Relational Keys. Therefore you are limited to surrogates (not "surrogate keys", please, ain't no such thing.) Can't blame that on the lack of an universal identifier. There ain't no universal standards body for trades, either, which is a pre-requisite for an universal identifier.

Contrast that with ISBN.

> Type is a
> discriminator, one of {'equity', 'bond', 'swap', etc.}; they reflect
> the enterprise of interest, not the whole securities market. Name is
> obvious, and Active distinguishes between instruments currently traded
> and those expired (e.g. an option past its exercise date). We called
> this a supertype for ERwin purposes.

The question begs, what did you call it for human data modelling purposes; for SQL DDL purposes ?

Both humans and ERwin demand the type of basetype::subtype relationship. I presume they are Exclusive.

I presume they are
> Naturally there are other attributes to capture, and they vary by
> instrument type. For each Type in Securities there is a table to hold
> that instrument's particulars: Equities, Bonds, Swaps, etc. We called
> this a subtype in ERwin, although I'm not sure how useful it is to
> call Equities a "subtype" of Securities.

Well, for the model you have described, it IS a subtype of Security, and the useful purpose is to ensure that you do not confuse yourself.

> The rule is that a security is represented by a pair of rows, one in
> Securities and one in the subtype table. There *must* be two rows,
> and only two rows, and the subtype table must be the one indicated by
> Securities.Type.
>
> Declare that, Kemosabe! ;-)

Pfffft.

I thought you were going to give me a serious problem, arrange a loan for you. This is so, so, pedestrian.

So the "declare that" is limited to the scope of:
a. Standard SQL
b. Declarations

Correct ?

1. Tonto, please check this code out
http://www.softwaregems.com.au/Documents/Tutorial/Subtype%20ValidateExclusive_fn.sql
http://www.softwaregems.com.au/Documents/Tutorial/Subtype_CHECK.sql

2. If that is not descriptive enough, check the discussion at the link in either of those files.

3. If you still have a headache, post again, and I will write a prescription, further details, in reply.

> Of course, you could weaken the rule.

Unacceptable. Rules are made to be pivots for further rules, they are not made to be set aside.

> You could eliminate the Type
> column, and require each subtype have a FK relationship to Securities.

Hang on. You have to have that regardless, otherwise you have no RI between the basetype and the subtypes. You did say you use DRI, this is a mundane use of DRI.

> But that would not require a Securities row to have a related subtype,
> and would not prevent one Securities row from being referred to by more
> than one subtype table (e.g., "being" both Equity and Bond).

What ?

I read that four times (three times is my limit, and I extended myself for you), and I do not believe that I understand what you are trying to convey.

Possibilities ...

a. On the face of it, I change my presumption, they are Non-exclusive. Done. But that seems too simple for your numerous words.

b. Assuming that a single Security can "be" an "Equity and a Bond" at the same time (which is totally and violently illegal here, so my mind is fighting the notion, which I have to quell), then why can't the subtyping be Non-exclusive AND the FK be present ??? If they are the same Security, they would have the same SecurityID.

c. So that means that you MIGHT mean, the Equity-Security and the Bond-Security have different SecurityIDs. But that means, they can't "be" the same Security, that statement is false. In that case, you have gross Normalisation errors, which I think is unliely.

Ok, I give up, please explain.

> And, as a
> practical matter, you couldn't get the security's type without scanning
> the union of subtypes looking for the appearance of the ID.

Definitely, for that as well as definitive reasons, do not drop Security.Type.

> You could also put all securities in a single table with the union of
> all security attributes, all NULL, and apply the mother of all CHECK
> constraints to get them right.

Ridiculous. Unnonomalised. Unacceptable.

> Although I'm pretty sure SQL Server 7.0
> at the time didn't support CHECK.

Nah, CHECK has been standard for two decades.

CHECK that allows a subquery is fairly recent.

> We were pretty spartan with declared
> integrity constraint checking in any case, for feasibility reasons.
> bulk loads were vetted for correctness before and sanity-checked
> afterward, and interactive updates were required to use stored
> procedures (some of which used temporary tables as implicit
> parameters). But I wouldn't use a single table, even today;
> attributes with an optional relationship to the key usually should be in
> separate tables.

Yes. Optional or different relationship to the Key. Real FDs again.

> Aside: some years later, I looked into an incident of the kind that had
> by then acquired a name: "SQL injection". Of course I soon discovered
> they weren't using stored procedures, and had granted (possibly
> administrative) access rights to little Bobby Tables via the
> webserver. We have an entire industry built on application-developer
> paranoia based on willful ignorance of How to Use the DBMS. I
> understand the tech-politics that motivate the situation, but not why
> management abdicates its stewardship of the firm's data. [end aside]
>
> Now that I've described the design, I want to take you to task.

Be my guest.

> You've posted a few colorful rants about the nonexistence of
> bona fide circular references in the real world, and therefore in
> database designs.

Yes, and that should be propagated far and wide, if only to counter the pig-poop being propagated.

> ISTM you overlook mutual dependence.

What in God's name is a "mutual dependence" ??? Sounds like the thing homosexuals do, before one of them slaughters the other.

> There are
> reasons to have tables with 1:1 cardinality with both components
> mandatory. Sometimes the reason is physical, because one part is
> updated or referenced or replicated more than the other. Others are
> logical, as with this securities design. There's perfectly nothing
> wrong with it relationally, and yet the semantics cannot be enforced
> with standard SQL.

I have had the challenge (give me an example of a "very very necessary" circular reference and I will give you the Normalised, Relational solution without it) out for over a decade, and I have proved it at least twenty times in that decade. The issue is often tangled up with misunderstanding of what normalisation is; with implementing a set of rules "because the business said so". Which I didn't believe is the case with you, but now that we have evidence of many non-relational things that you do implement, it may well be.

One thing I will declare, that you yourself have provided evidence that you do understand, circular references are simply, flatly illegal in the RM.

As per the the challenge, I need a real world example, that can be worked, not a theoretical discussion, which is endless. So please give me that, or confirm that (eg.) the Security model you have given is a valid example of a circular reference or "mutual dependence", and I will work that and close it.

> You have wickedly mocked the need for deferred constraint enforcement

(Henceforth DCC for short.)

Yes.

But whoa, wicked is the realm of the devil. The devil is the one who preaches filth, anti-RM, falsities, "deferred constraint checking is very very very necessary".

I don't preach. I don't go looking for it, I just walk around destroying the filth wherever I see it. Therefore it is not I who is "wicked", the one who does good cannot be "wicked". Except when it is the devil speaking.

I have no problem mocking the devil, and his minions, the curse of humanity. Destroy is probably more accurate.

> -- constraints that apply to the transaction, not to the statement --

Yeah, I know all that, I spent three years doing hard labour at the TTM slave labour camp.

There, in that one sentence (ok, two clauses), is the entire falsity stated. If you do not understand that, you are already seduced, a victim, of the pig-poop-eaters, and you are henceforth eating the pig-poop they feed their slaves.

For context, you understand that most of the argument you had against ERwin is dishonest, because as you stated, you cannot fault ERwin for NOT being something that it is NOT declared to be. Good. You understand that most of the argument you had against SQL is dishonest, because as you stated re ERwin, you cannot fault SQL for NOT being something that it is NOT declared to be. Good. Ok, so that is the context for this one.

Now for the specifics. See if you understand and agree with each of these statements, in the sequence given. That will result in a set of agreed facts, and leave a small number to be dealt with, resolved.

1. ACID Transactions are an imperative for OLTP. It has existed in software since 1960, CICS/TCP.
2. Database Consistency (the [C] in ACID) is statement level.
3. DBMS with full ACID transactions existed for a decade before the RM, for two decades before a RDBMS platform arrived
4. SQL Standardised the RDBMS market.
5. SQL in it s entry-level compliance requirement, in its first edition (89?) demands ACID transactions. (The RDBMS vendors provided it long before the SQL Standard demanded it, from 1984 in my experience, but it may be earlier.)
5.1 Constraints are applied at the statement level, same as ACID.
6. [2] is now confirmed for sixty five years, in a variety of platforms.
7. Those who follow the RM and ACID, do not need either circular references, or DCC.
8. We do not need "convincing" of the need, the need is for pathetic non-technical people who are struggling in some technical position.
9. It is for the people who are "convinced" of their very very necessary need, to learn from the people who do not have the need, who have enjoyed the simplicity for sixty five years, what they do, and how they do it.
10. Imbeciles such as C J Date (aka TweedleDee) and Hugh Darwen (aka Andrew Warden aka TweedleDumb), who have consistently demonstrated that they have no technical ability whatsoever, suggest that there might be another way. After Dropping ACID.
11. After violating the RM so horribly, that they end up with circular references.
12. After misunderstanding SQL so terribly, that they end up with circular references.
13. Technical, scientific people [8] couldn't care less.
14. Non-techical or unscientific people, their slaves, follow their non-logic, without question.

So it is a gross contradiction, a disgusting lie, to state "constraints that apply to the transaction", coz there ain't no such thing, sonny boy. Never has been. You EITHER have transactions, which maintain and support constraints (the [C]), which are statement level only, XOR you have strings of SQL that run all over the neighbourhood without being able to find their home.

If those poop-eaters had half a teaspoon of honesty, they would define a new model, and it would be complete, "constraints" that apply to a "transaction, not the statement". As you know, they have produced none such, neither model, nor language, definitively, for twenty years, and they are still arguing about the meaning of TYPE, for the third time. It is all flimsy descriptions that keep changing.

But they are proud sons of the devil, their intent is not to create something, but to destroy something good. So they redefine the universe, for every definition, they have some pathetic exceptional case (like special needs kids who can't wipe their backsides), and for these special needs, instead of fixing their problem, where it exists, they demand the whole universe has to change, so that the whole universe has their problem. There, see, they are special needs retards any more.

Database design is for undamaged humans. Imbeciles have a hard time.

Learn how to normalise data, Relationally, such that you do not have circular references, so that you don't feel "deprived" when your vendor doesn't supply DCC. We live in that "deprivation" and love it.

Anyway, just give me a real world example, or confirm the Security one, and I will resolve it for you.

> on the theory that all such mutuality is illusory.

Not just illusory, simply, you have unfinished, incomplete definition. Don't code a single line (that includes DDL) until the definition has been completed.

> But you are in a
> trap: if it's OK to have two non-null attributes on one row (1:1
> mandatory cardinality), why is it not OK to have the same relationship
> between two attributes on two rows? And if 2 tables can be related
> that way, why not 3 or more?

You are talking nonsense. Since when does a column that is functionally dependent on a key have "cardinality" with the key or "mutual dependence" on it. The notion is ridiculous.

And then you take that ridiculous, false, special-needs notion, and you say you can't apply that "relationship" to another context. Good. The notion was absurd the first time, and it remains absurd the second time.

The trap is yours, of your own making.

And just in case that special-needs notion comes from some abstraction that you guys love, that we real-universe types have no knowledge of, then be advised that you have abstracted yourself into a state of ridicule, and into a trap of your own making.

Since I am ultra-legal, since I comply with the RM and SQL, I don't have the special needs; the circular references, I don't need DCC, and the trap doesn't work on me.

----

I note that you and Nicola and others are discussing what "Derek claims". Unfortunately, on this thread. I read your posts, but I ignore some others. Please be advised that I am quite able to express myself, by myself, I don't need help from proven imbeciles (I am not talking about you). What they say, and what you participate in when you reply them, has nothing to do with what I have said, let alone what I meant.

I say this in order to ensure that you don't get mixed up when you are replying to my posts, ie. you do not think I stated something that I did not, that some imbecile stated that I stated, etc. The confusion around FDs is bad enough as it is, we do not need more confusion based on "he saids" and "she saids".

Yes, imbeciles don't have the slightest notion of internet etiquette. And they squeal like piglets who have lost their sow when their imbecility is exposed. Special needs indeed.

Cheers
Derek

Derek Asirvadem

unread,
Feb 18, 2015, 3:23:59 AM2/18/15
to
James
First I will emphasise two points, and then I have one very very very important additional point that underlies the *entire set of problems* that you experience.

Trapped

It is interesting that you use that word. By virtue of the evidence, it is you who is trapped. In 2015 (or 1999 or whatever), you have a fairly straight-forward little database with subtypes that were given to us in 1970. But you have no integrity in those structures. It isn't a database at all, it is not Relational for other reasons such as circular references, and lack of complete Normalisation. But the integrity is by far the most important. Why are you in that situation ? Who placed you there ?

And you are perturbed, it always bothered you. When the chance comes up, you, a man of your stature, is reduced to asking a dumb implementer, how to implement the structure with integrity.

I have hundreds of the same. Full Integrity. Full Relational. Full Normalisation. Zero circular references. Why am I in this situation, who placed me here ? The incomparable, the great, the father of the RM, and the grandfather of everything we touch, Dr Edgar Frank Codd.

Pity you don't listen to him. Pity you listen to monkeys instead. They will trap you, and they do so with such trickery, such fraud, that you will believe you aren't trapped, that the entire world of implementers ho don't have you problem, are trapped.

What are Theoreticians in this Space Crippled, Trapped

One clear and present reason is that which is set out above. My whole post, as well as the emphasised point re Trapped.

You guys have apes for teachers.

Denial of the Hierarchy

This is a huge point, and a huge problem, the cause of an endless litany of problem affecting the entire human world. I do no mean to suggest that it started with the imbecile TweedleDee or his bedfellow TweedleDumb. It starts with the devil, and his minions, destroying the hierarchy that exists in nature, in God's Laws, in human laws. In the last century and a bit the devil's children have escalated their war on humanity, and the vehicle they use, no surprise, is the destruction of the hierarchy. Deicide; regicide; genocide; the destruction of the family. The new age: freedom; equality; fraternity. Liberalism, communism, darwinism (darwenism), modernism.

The point I am making with that history, is that it permeates everything, it is in every show that you watch on the propaganda machine in the living room. You have been trained to love it because it is FREEDOM, EQUALITY. Anyone against it must be mad. Er, and you have no idea that it is slavery, and slavery to the devil.

The point is that it is planted deep, and watered daily, in every aspect of life. It is psychological. So when I come along with the simplest of medicines, you rebel, and violently, because somewhere in your unconscious mind, you realise it is not just the circular reference that I am rejecting, correcting, that the whole deck of cards is threatened, vulnerable.

----

Everything that works, works *because* it honours the hierarchy.

Everything that breaks, breaks *because* it denies the hierarchy.

Everything. I will, however, limit the rest of the discussion to our field.

Deadlock

I have never written a deadlock in my entire life. Ever since 1979, as an employee for the DBMS vendor, through to now, I have cleared deadlocks that customers have written, millions of instances in the code (not occurrences, which would be billions or trillions). The method is Normalisation of Transactions, honouring the hierarchy, although it does not have that name. The fault, was always the same, ignorance of the hierarchy. I learned the method in 1976 from the CICS/TCP boys, and it can be applied anywhere, to any software.

>>>>
Normalisation

As in the science, the whole principle, applicable in every area of the project, specifically NOT the rat-bag collection of NF fragments that you guys are aware of, plus all such fragments that you may collect in the next 100 years, combined.

Normalisation is inseparably bound to the hierarchy. Normalisation minus the hierarchy is an empty shell (better than your isolated fragments, but a fragment none-theless). As clearly stated by Codd in his RM, in several place. That last quote, specifically re resolving circular references, normalising the trees.

Normalisation is integration with the hierarchy. Normalisation whilst ignoring or rejecting the hierarchy is, well, disintegration.
<<<<

Circular Reference

Codd certainly understood the tree, the DAG, the Hierarchy. Which is why he nominates Normalisation as the method for resolving circular references. Codd honours the hierarchy, and gives specific instructions on how to deal with each aspect of it in his Relational Model. Separate to "the HM is fundamental to the RM", which is treated in the other thread. Since I understand Normalisation, and I understand Codd, I have never *needed* a circular reference.

Let's look at the corollary. What you call a "bona fide need for mutual dependence".

__ Mutual dependence:
____ occurs only if you reject the hierarchy
____ occurs only if you believe in Kant, equality, and all that buffoonery, most recently showing its face as homosexuality and destruction of the family
____ occurs only if you ignore Codd's laws re RM, re Hierarchies
______ (Functional Dependence is a one-way street; the attribute is not equal to the key)

__ Need:
____ occurs only if you reject the hierarchy
____ occurs only if you ignore Codd's laws re RM, re Hierarchies

__ Bona Fide:
____ occurs only if you rebel against authority, which is a form of rejecting the hierarchy
____ occurs only if you rebel against authority, Codd's laws re RM, re Hierarchies

__ DCC
____ Insitutionalised rebellion against authority.

Rejected on all points.

The cure, the correction, the fix-up, complete and total release from the problem: honour the hierarchy. Follow Codd's laws re the RM, Normalise your references. Follow engineering principles re the unequal Data vs Process.

Cheers
Derek

Derek Asirvadem

unread,
Feb 23, 2015, 5:33:41 AM2/23/15
to
James

> On Tuesday, 17 February 2015 22:53:24 UTC+11, Derek Asirvadem wrote:
> > On Monday, 16 February 2015 12:08:01 UTC+11, James K. Lowden wrote:
>
> > The Securities table has {ID, Type, Name, Active} ...
> > Type is a discriminator, one of {'equity', 'bond', 'swap', etc.} ...
> > We called this a supertype for ERwin purposes ...
> > Equities, Bonds, Swaps, etc. We called this a subtype in ERwin ...

> > You could eliminate the Type
> > column, and require each subtype have a FK relationship to Securities.
> > But that would not require a Securities row to have a related subtype,
> > and would not prevent one Securities row from being referred to by more
> > than one subtype table (e.g., "being" both Equity and Bond).

What ?


<big snip>

Over the weekend, it became clear to me that your post re Securities was really confused, and my responses (directed at your points) did not address the real problem.

The real problem is two-fold. First, and here is the issue re the different "definitions" that theoreticians use, raising its filthy head, and biting us both on the bum. When you used the terms "supertype" and "subtype", I thought you meant the established industry terms supertype and subtype. Upon reflection, you do not know what those terms mean, precisely. (You may well have a skewed and fragmented understanding of them, from your teachers and their putrescent books.)

That understanding of Subtypes, whatever it is, it is wrong, wrong, wrong. Twisted, crippled, brain-damaged, like the ones who teach it.

More important, the result of that marvellous "teaching" is, that once again as evidenced here, it cripples a perfectly competent implementer such as you, and prevents you from
a. understanding the data, Relationally, and
b. implementing it Relationally.
Prevents you from implementing a simple Subtype cluster (which we have had since 1970) with full integrity and control in a Relational database (which we have had since 1984). The result is you have a non-relational Record Filing System, with no Referential Integrity. That is their success, to cripple humans in their normal thought processes.

So please throw all that broken, fragmented, information re Subtypes out, and let's start again.

Not knowing how to implement Subtypes, is very much the third issue.

==================
Security Cluster
==================

Before we discuss the Security cluster, we need to get clear on what Subtypes are, ala the real universe (Codd, the RM, IDEF1X, the implementers).

=========
Subtypes
=========

Before we discuss Subtypes, we need to get clear on what relationships are are, ala the real universe (Codd, the RM, IDEF1X, the implementers). This is especially important because the maggot-ridden ones who transform humans into schizophrenics have destroyed the definition of this rather basic building block.

I have cut-pasted different sections from course notes for you (Relationship write-up, then Subtype write-up), sorry if it doesn't flow.

http://www.softwaregems.com.au/Documents/Article/Database/Relational%20Model/Subtype.pdf

There are links that will take you to example code segments.

When you are done with that, please ask any and all questions, re those definitive elements, before diving into ...

----------------------
Security Cluster
----------------------

The Security cluster you have is a classic Exclusive Subtype cluster
- for human understanding purposes
- for Relational Data Modelling purposes
--- Logically
--- Physically
- for "ERwin purposes"

Currently it has no Referential Integrity, and all the Files are independent.

Here is what it should be:
http://www.softwaregems.com.au/Documents/Article/Normalisation/Security%20DM.pdf

Fully Relational, full integrity.

No "superkeys", no "distwibuted keys", no double indices that do nothing.

======
Cause
======

I hate to harp on it, but this one must not go without specific reference.

> the maggot-ridden ones who transform humans into schizophrenics have destroyed the definition of this rather basic building block.

I am not kidding. Hugh Darwen, TweedleDumb, the grand master of schizophrenics himself, teaches various falsities and absurdities at the asylum for indoctrination at Warwick (which used to be a place if higher education). In his CS253 class, in the How To Handle Missing Information Without Using Nulls document (available online, and at the TTM site), he actually:

- teaches that exclusive subtypes are ordinary fare, without ever using the technical term, and with stupefying ignorance

- without teaching how Exclusive Subtypes are modelled, or implemented in the RM, or in SQL

- teaches a non-relational, Record Filing System method for the cluster

- includes some (not all) constraints

- teaches, in addition to the first set of constraints, an implementation of those constraints backwards

- teaches circular references

- teaches the record-breaking "distributed keys" (p13), which is he says is "loose", no doubt like his bowels, and

- does not provide the "constraint" that he says he is trying to implement, not to mention the additional indices, etc, etc

- teaches an insane method (schizophrenic, covering up for his schizophrenia by doubling up on the "constraints") to implement the rather simple Exclusive Subtypes

- then cries about the various problems of his abortion

- falsely claims the problem is SQL (Straw Man)

Now, in your Security implementation briefing, you did not mention that level of insanity, but the you did provide me with some of it. And we know where you got it from. While TweedleDumb's document constitutes a veritable cesspool of knowledge, the Exclusive Subtype anti-implementation being a central part, I am not addressing the whole of it here, or even the central part, but I do want to point out one thing. As a matter of course, this cancer-causing agent teaches that a relationship is established by implementing TWO Foreign Keys, one each on both "sides". Which of course, causes a circular reference quite unnecessarily; contradicts normal human logic; as well as SQL; as well as breaching the Relational Model.

Each and every time. Teaches humans to be as retarded as he is. The state of "higher education" today.

He is setting the stage such that everyone who follows him has a circular reference in every relationship, and thus has a demand for "deferred constraint checking". Making sure everyone has cancer, and needs his "remedy".

> > Wicked

Yes. Evil. Cursed by God. Crying out to heaven for vengeance.

I don't know what your religion is, but we are required to do justice where there is iniquity. And we are not required to be nice to maggots that are feeding on us.

Cheers
Derek

Nicola

unread,
Feb 23, 2015, 12:27:53 PM2/23/15
to
In article <20150215200758.4...@speakeasy.net>,
"James K. Lowden" <jklo...@speakeasy.net> wrote:

> The rule is that a security is represented by a pair of rows, one in
> Securities and one in the subtype table. There *must* be two rows,
> and only two rows, and the subtype table must be the one indicated by
> Securities.Type.

I like to think of SQL's insert, update and delete as the (quite
powerful) assembly-level instructions of a data language. If you can
design a model in which a single insert, a single update or a single
delete never causes any integrity violation, well, go with them. But in
many (most?) cases, a well-designed database requires modifying more
than one fact at a time to preserve consistency. In practice, that means
that single inserts/updates/deletes are too low-level wrt the semantics
of the database. Hence, their execution must be forbidden (by revoking
the corresponding privileges in SQL from the users accessing the
database), and more complex "primitives" must be defined. These should
be given along with the logical schema and implemented as user-defined
functions (assuming that such functions are run atomically), whose
execution is granted to the above-mentioned users instead of the raw SQL
instructions.

If the *only* modification permitted on R(A,B) and S(A,C) is through a
function update(a,b,c) that inserts (a,b) in R and (a,c) in S, you don't
need any explicit foreign key clause in your SQL tables for the purpose
of preserving referential integrity.

Derek Asirvadem

unread,
Feb 23, 2015, 9:59:46 PM2/23/15
to
Dear people

Ok, for context we Have:

-- 1 --------
> On Monday, 9 February 2015 10:59:35 UTC+11, James K. Lowden wrote:

In which James appeared to address the inability of the theoreticians in Normalising the given 38 attributes and 9 suggested tables (two of which are complete, so it is real 38 and 7), inability to produce anything of any kind.

And my reply:

-- 2 --------

> On Tue, 10 Feb 2015 23:06:25 -0800 (PST) Derek Asirvadem <derek.a...@gmail.com> wrote:

In which I answered James' points, and provided more detail re the 38 attributes. The floor has been open since then. I have received no further questions. It appears you need no further information.

-- 3 --------
Further, taking up Nicola's challenge, I Normalised a previously unknown set of data that was alleged to be "Hard", one that justified a theoretical paper, and solved the entire problem, using Codd's 3NF/FDs. In so doing, I proved that:
*** "Hard" Key Determination Method is Easy. DNF Paper is Done. ***
and
*** Relationalisation Eliminates Theory ***

> On Monday, 9 February 2015 17:54:40 UTC+11, Derek Asirvadem wrote:

-- 4 --------
Finally, I provided full and complete info (as per points raised by James), and set the context for the next iteration. You were free to use either the normal Key Determination method using Codd's 3NF/FD, or your theoretical non-FD-fragments plus the 17 NF fragments (that in total come up to a fraction of Codd's 3Nf/FD).

Quoted here in full:
I have received no submissions. Twenty two days since the task was initially tabled, fifteen days since the current iteration commenced (with great detail and specific directions as above), and still no submissions.

This is proof that you guys cannot Normalise anything. Not by the normal Method. Not by your high-falutin theoretical method. Not by asking questions and getting them answered.

I was hoping that we could get to say, a set of semi-normalised tables, and we could inspect important issues, such as the difference between Relational databases and Record Filing Systems; why precisely, the former has more Integrity, power, and speed. But no. We didn't even get past square one.

=========================================================================
* Thus you have no right, no position, whatsoever, to tell people who can Normalise data, how to Normalise data.

* Thus you have no right, no position, whatsoever, to lift your nose, to feel superior in any way, to people who can Normalise, how to Normalise.

* Regarding the very subject that you allege to be theoreticians in, you are not merely inferior, you are grossly incompetent, and completely impotent.
=========================================================================

If there is anything in the above that any of you can counter, with specific details, and evidence, I would like to hear it. Caterwauling, squealing, and diminishing comments should be kept to yourselves.

Cheers
Derek

Erwin

unread,
Feb 24, 2015, 3:24:38 AM2/24/15
to
Op maandag 23 februari 2015 18:27:53 UTC+1 schreef Nicola:

> In practice, that means
> that single inserts/updates/deletes are too low-level wrt the semantics
> of the database.

Unfortunately, it is the only level that certain people are capable of understanding.

Derek Asirvadem

unread,
Feb 25, 2015, 6:07:44 AM2/25/15
to
I did say:
>>>> Squealing, and diminishing comments should be kept to yourselves.

Epic fail.

As Usual.

Umpteenth time.

I warned you, maggot, you post filth, there will be consequences. I will use your own words as evidence against you. Something you never learned to do: keep your smout shut, keep the pig poop inside, spilling it in public has consequences.
0 new messages