On Saturday, 28 December 2019 23:46:56 UTC+11, Lifepillar wrote:
> On 2019-12-24, Derek Ignatius Asirvadem <derek.a...@gmail.com
> > On Monday, 23 December 2019 05:10:14 UTC+11, Nicola wrote:
> >> Hi Derek,
> > Long time. Good to see you are active here.
> As you probably recall, we had a private exchange several months ago,
> which was abruptly interrupted when (from my point of view) you stopped
> replying to my messages. I have since tried to contact you privately
> a couple of times, but either you ignored my messages or they didn't
> reach you (spam filters?).
If you do not mind, let's keep private comms private.
I promise to get back to you on that channel in the first week of January.
> > In this thread (as opposed to the header thread (now linked!), I am
> > attempting to deal with the 1971 paper, because it is that one that
> > philipxy quoted from when he was sniping from under a rock.
> Can we please cite papers by title? I am not sure yet which one you are
> referring to.
Codd's 1971 [paper] "Further normalization of the data base relational model"
Let's call that, and nothing else, the //1971 paper//.
> Also, I have no idea who philipxy is and I have found no
> reference to anything he quoted in the post you linked to.
I have not given that yet. I was hoping for some introductory discussion (such as yours) before we launch into the specifics of his quote. I will give it in the next few days.
> As for surrogates, before arguing about their idiocy or foxiness, we
> should agree on the meaning of the term. For me, Codd's definition from
> his 1979's paper (Extending the Relational Model to Capture More
> Meaning) is good. In summary:
> A domain S is a *surrogate domain* for an entity set E iff:
> 1. there is a (time-independent) bijection between S and E;
> 2. the values of S are generated by the system;
> 3. the users may cause the system to generate or delete a surrogate, but
> they have no control over its value (in particular, they cannot
> update it);
> 4. two surrogates *across the entire database* are equal iff they denote
> the same entity in the perceived world of entities.
> Property 4 is the most important, and what distinguishes a surrogate
> from a pure record id. Incidentally, I seldom, if ever, see it mentioned
> in any discussion about surrogates.
> Since surrogates bear no meaning other than being "permanent unique
> identifiers" for entities, Codd suggests that a "coalescing command"
> must be introduced to enable a user to assert that two entities that are
> considered distinct by the system (because they have distinct
> surrogates) are, in fact, one and the same. This is one of the
> difficulties in dealing with surrogates. Codd also correctly points out
> that the introduction of surrogates "does not make user-controlled keys
> obsolete". Whether surrogates should be introduced as primary keys (as
> Codd proposes) or as additional keys may be debated.
(The Impaler is going to love that. That Codd plagiarised his idea forty years before he hatched it.)
First, that is an uncommon definition for surrogates. Second, Codd's definition in the 1971 is the common understanding of the term: table-scope.
To avoid confusion, let's call this one Universal Surrogate. And note that it is rarely implemented.
To differentiate, since the common surrogate (Record ID) is table-scope, which does not work for many reasons (separate to the fact that it is not Relational), and is forever being "enhanced" and fixed" and "oooh this'll do it-ed" , and such fix-ups have included UUIDs; GUIDs; druids; etc, such implementations (provided as platform level), are not Codd's Universal Surrogate, because it does not have that intent. Nor the knowledge upon which to have that intent.
Second, because the intent of the common surrogate it to provide a container for a record (not a logical row, made up of domains)
Third, because the Universal Surrogate (implied but direct) is not intended for the common intent, it maintains relational access to the domains, and thus the logical rows.
Nevertheless, it remains a Relational Breach, because, per the /RM/, it:
- is not of the Normal Form in /RM § 1. Normal Form, Fig 3(b) Normalised Set/
(the obvious name for which is Relational Normal Form)
- breaks the Access Path Independence Rule
- consequently all tables below the breach (descendant rows) are logically cut off from all tables above the breach (ancestor rows).
> My position is:
> - "record IDs", as found in many open-source databases out there where
> each table has an 'id' field with unique values within the table, make
> no sense in the Relational Model or in any database implementation and
> they don't need to be discussed further.
> - Surrogates, as defined above, should never be necessary in any logical
> model (but see my next post). They are useful in some cases to optimize
> a physical design. For example, the primary key of an Address in
> Italy, simplifying a bit, is (Region, Province, Town, Species,
> Toponym, Number, Exponent) , e.g., ('Lombardia', 'Milano', 'Segrate',
> 'piazza', 'Garibaldi', '23', 'A'). In this case, it may be convenient
> (although not necessary from a logical point of view) to introduce an
> artificial AddressID attribute and use that for foreign keys referring
> to the Address table.
But that act is a logical one. Intended for the physical. (Theory must have an implementation intent, otherwise it has not value, it is fantasy, the realm of the Date; Darwen; Fagin; et al Gulag). To say, as our darling "theoreticians do, that theory should have no concerns about the physical implementation is the tattoo of the asylum dweller.
Logical means "with a physical intent", anyting else if not logical. Pig poop.
As evidenced, Codd is both a pure theoretician and an applied theoretician. The motivation (intent) for him to come up with the Universal Surrogate was implementation concerns.
> - In SQL, I usually introduce surrogates as attributes under
> a `unique()` constraint, not as `primary key()`. So, my definition of
> Address would be something like this:
> create table Address (
> Region dom_region not null,
> Province dom_province not null,
> Town dom_town not null,
> Species dom_species not null,
> Toponym dom_toponym not null,
> Number dom_number not null,
> Exponent dom_exponent not null,
> AddressID dom_entity_id not null, -- System-generated
> primary key (Region, Province, Town, Species, Toponym, Number, Exponent),
> unique (AddressID)
You might have gotten an idea from something that has been floating around in the cesspool for a couple of decades ;p
Why the UNIQUE instead of PK ?
If the surrogate is to be migrated to child tables as an FK, then it must be the PK in the parent. It is a gross lie (to one's own intellect, due to the marketed pig poop) to name one domain (or set of domains) a PK, and then proceed to use some other domain as the PK.
PK is a Relational concept. If your db is Relational, use the Relational terms, and use them correctly.
Note that if you use a surrogate for that purpose (in substitution for a wide Key), it is not the same as a Record ID (pointer to a record that contains a non-Relational arbitrary collection of fields [not domains, coz they ain't Relational und der normalisation ist kaputt] ). Whereas yours is thoughtfully chosen, over domains. And of course, the usage (in child tables below the breach) is different. The cannibals are unaware that they have breached the /RM/. They declare the surrogate as the PK out of total ignorance.
A horse that has had its tendon cut, so as not to roam too far afield, is not the same as an earthworm, that can't roam more than 10 metres.
The freaks' use of surrogates was so bad, that eventually the high-end SQL providers allowed an FK to be defined [in the child] referencing a UNIQUE key in the parent. Eventually that was included in the Standard.
Separately, and this is a physical concern. If your SQL platform provider has a Clustered Index feature, that MUST be on the Logical, not physical, key. Because the Clustered Index was conceived, invented, and implemented for the composite, multi-domain Relational Key. Major advantages for data distribution; high concurrency; etc.
Oracle (not SQL compliant; no Server Architecture; no ACID Transactions) has a pathetic version called Index Organised Table.
Why system generated ?
The only thing worse that a surrogate (at the implementation level) is a surrogate that is system generated. I have spent the last thirty years replacing mickey mouse databases, and I simply do not allow anything system generated in TEST or PRODUCTION environments. Allowed only in DEVELOPMENT, and there only to pander to lazy developers. Twenty years ago, one Aussie bank made that a rule for PRODUCTION environments across the whole bank, excepting third-party packages, which suffer all the usual problems.
I am sure that I am not the only disciple of Codd who has had a few alternatives for a system generated value in the context of a large table and with OLTP concurrency concerns in his toolkit, but I am not permitted to discuss that here,
Beautiful use of Datatypes. If I may say so, I would not advise using logical-only terms in teh physical implementation. No user (including developers) would understand what Domain means, and therefore what /dom_/ intends. Once you understand that, you will understand that the prefix /dom_/ is not necessary.
I don't know what you intend by /dom_entity/.
I advise a private Datatype for each Key, including every component of a Key, and a small set of generic Datatypes for attributes. I would use:
Region Region not null,
Province Province not null,
Town Town not null,
Species Species not null,
Toponym Toponym not null,
Number AddressNumber not null,
Exponent Exponent not null,
AddressID AddressID not null, -- System-generated or not
AddressNumber because there would be a generic _Number; _Int; _IntTiny; etc. Whereas _Number may be NUMERIC(10), AddressNumber may be NUMERIC(6).