Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question on Structuring Product Attributes

396 views
Skip to first unread message

Yadda

unread,
Oct 3, 2011, 8:54:52 AM10/3/11
to
Hello,

I have a question on how to efficiently implement a product database
table structure. I have products grouped into Categories which is the
highest level grouping by attribute. Products across Categories share
some attributes in some cases, e.g. Diameter and Color. I need to have
each attribute have a proper data type, e.g. Diameter is of type DECIMAL
and Color is of type CHARACTER. I will also have to have standard
stored queries to inquire on these products according to attribute(s).
Over time new product with new types of attributes may be added.

One way is to just define the attribute in the Product table and then
have a table to define what attributes are applicable to a Product
Category. How ever this table field could get quite large then.
However, I don't think breaking attributes out into separate attribute
tables is a good idea because of the stored query requirement and it
would require more structure changes over time.

I would be curious to read others' opinions on this design issue as it
must be very common.

Thank you
Alex

-CELKO-

unread,
Oct 4, 2011, 1:24:58 PM10/4/11
to
The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUVs
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Derek Asirvadem

unread,
Oct 10, 2011, 8:16:19 PM10/10/11
to
On Oct 5, 4:24 am, -CELKO- <jcelko...@earthlink.net> wrote:
> [snip]

Joe

If I may ask, is this method of implementing RI for Exclusive Subtypes
one of your original concepts (like the Nested Sets), or is it someone
else's or general practice or whatever ?

Regards
Derek

Roy Hann

unread,
Oct 11, 2011, 3:55:31 AM10/11/11
to
Surprisingly to me, Joe is advocating what I would have suggested. It
is entirely respectable, though it may not be all that common.

The common approach is to just cram everything into one table with lots
of nullable attributes to hide the conflation of fact types, then leave
the whole mess under the carpet so that the application code has to
figure out the real business model at run-time, every time.

Explicit support for distributed key constraints in SQL would make the
approach properly robust plus it would, in my opinion, have the
psychological benefit of pointing the way and comforting us that we're
doing it right.

--
Roy

Yadda

unread,
Oct 11, 2011, 9:35:33 AM10/11/11
to
on 10/4/11 12:24 PM -CELKO- said the following:


Much to contemplate here! Is there a reference on the web or book that
fleshes this concept out? Thanks much.

Derek Asirvadem

unread,
Oct 11, 2011, 6:44:45 PM10/11/11
to
Roy

To be clear, I am totally for:
- correct Normalisation, which results in Subtypes
- and certainly a hierarchy, if that is what the exercise produces in
any given situation.
- No Nullable columns, especially not FKs.
- simplicity, which is a classic result of the formal process.
- No "hiding", such that the model is plainly exposed in the catalogue

Credit for the concept or notion of Subtypes goes to Codd & Date; for
the theoretical modelling, to Codd and Chen; and for practical
modelling or implementation specifics, to R Brown.

I appreciate that SQL does not have formal support for Subtypes, as it
does for eg. Declarative Referential Integrity, Foreign Keys (and even
that was added years afterward). Which leads people to
implementations such as this.

This method does have a bit of processing, and of hiding that behind
Views, though.

My question was specifically regarding the method, of implementing RI
for Subtypes. Who deserves credit for this technique ? (I am not
suggesting plagiary here.) When I present this method to others, I
would like to give due credit the original author, as we do when we
discuss Joe's Nested Sets.


Yadda

> Is there a reference on the web or book that
> fleshes this concept out?  Thanks much.

(I assume you mean the concept of Subtypes, since Joe has provided the
subtype RI method in fair detail.)

Well, no. Anyone with a keyboard can "publish" these days. The web
is full of shallow information and misinformation; one has to dig deep
through that morass to find nuggets of truth, and to do that one has
to be able to discriminate between the two, which is not possible for
the seeker. Also, there are many related issues within the one
science; one aspect cannot be excised, held up, and inspected in
isolation from the other aspects (which is the common problem
afflicting any technical discussion on the web). All you need is a
good text book and some formal study. Or work with a mentor over a
period of time.

Subtypes have been with us since the 1980's, when the RM was first
available in product form, and I and many others have used them since
then.

Regards
Derek

-CELKO-

unread,
Oct 12, 2011, 11:20:51 AM10/12/11
to
While I love the flattery, I did not invent Nesteds Sets or this class
hierarchy model. I just make tehm popular and publish teh code to use
them.

Derek Asirvadem

unread,
Oct 12, 2011, 8:51:11 PM10/12/11
to

Joe

Thanks for the response.

What a relief.

The flattery is real, for your *other* work, and I had difficulty
accepting that the same person presented this work.

This "class hierarchy model" is one of the most idiotic things I have
seen in my 34 years in databases.

----------
1. The method of implementing RI for Subtypes. This:
----------

> CREATE TABLE Vehicles
>   UNIQUE (vin, vehicle_type),
>
> CREATE TABLE SUVs
>   UNIQUE (vin, vehicle_type),
>
> CREATE TABLE Sedans
>   UNIQUE (vin, vehicle_type),

a. basically doubles the number of indices in the relevant tables, and
that has a substantial overhead.

b. requires vehicle_type to be carried in the subtypes, which AFAIC is
completely redundant (in SUVs, we know all the rows are SUV, why carry
a column in which every value will be "SUV").

c. I am not entirely against concept of de-normalisation, but that
assumes prior normalisation, and has a specific purpose. Whereas this
is un-normalised: vehicle_type is 1::1 with vin, and belongs in
Vehicles, once.

d. it is indirect, and subverts the normal or direct method of
identifying the relation in the two subtypes, which would normally be:

  FOREIGN KEY (vin)
  REFERENCES Vehicles(vin)

e. introduces complexity and processing; views; instead-of triggers,
etc. I would rather the database remain open as possible, without
such complications for use. Particularly when such nonsense is
completely unnecessary.

> Now start hiding all this stuff in VIEWs immediately and add an
> INSTEAD OF trigger to those VIEWs.

f. Of course vin is unique, therefore adding anything to vin will be
unique. As I understand it, the three added UNIQUE constraints are
merely (or heavily) for the purpose of checking the Subtype is legal,
that it exists in the parent with the proper Discriminator value. In
that case, Why not use normal SQL capabilities (yes, we know there is
not DRI for this) which provide that particular form of RI, *without*:

- one additional redundant column in every subtype
- one additional redundant index in every table (subtypes plus
parent). Note the additional processing load
- complexity
- Views to "hide" the complexity
- Instead-Of triggers

/*
I saw this insanity (posted method) in a database last year, and
ripped it out, in part permitting the database to run about 20 times
(2000%) faster, with no loss of capability. I say "in part" because I
removed a couple of other tumours in the same exercise, therefore I
cannot say for certain exactly how much this tumour contributed to the
negative 2000%. The developer who implemented it (and fancied himself
"modeller" and "DBA") had been reading too much Ambler and Fowler;
(who are famously clueless re the RM or databases; perfect examples of
Maslow's Hammer). The company required a Relational Database, he
created a scratch pad for his app, not a database.

He is still coming to terms with the concept that in the Relational
Database world, we never need to "refactor". First the snake oil
salesmen sell you the cancer, then they sell you the relief. RDB
types do not buy either one.
*/


> I can continue to build a hierarchy like this ...

In which case the doubled indices; subversion; duplicated column; and
complexity will continue down the hierarchy. At each level of the
hierarchy, one more column needs to be duplicated, and it will always
contain one value: SUVs contain a column containing "SUV" only;
TwoDoors contain a column containing "SED" only, *and* a column
containing "2DR" only.

To be clear, I agree with:
- the concept of the hierarchy (the Subtypes may well have subtypes)
and
- the value of ensuring the integrity of Subtypes
I disagree with the implementation method.


----------
2. The case of Sedans and TwoDoor.
----------

(Staying with the posted method.) I do not believe it is a good idea
to mix vehicle_type {SED|SUV}, with sedan_type, they are two separate
Things. We already know it is a sedan because it is in the Sedans
table, likewise, every row in TwoDoor is a two-door sedan (not an
SUV). A hierarchy, which I agree with, would be:

CREATE TABLE Sedans (


vin CHAR(17) NOT NULL
PRIMARY KEY,

  sedan_type CHAR(3) DEFAULT '2DR' NOT NULL
      CHECK (sedan_type IN ('2DR', '4DR') ),
  FOREIGN KEY (vin)
  REFERENCES Vehicles(vin)
CONSTRAINT Sedan_Excl_ck
CHECK ...
)

CREATE TABLE TwoDoor (


vin CHAR(17) NOT NULL PRIMARY KEY,

  FOREIGN KEY (vin)
  REFERENCES Sedans(vin)
CONSTRAINT 2dr_Excl_ck
CHECK ...

As the the model progresses, VehicleType and SedanType will be
Normalised into separate Entities, and Vehicle and Sedan
(respectively) will have Foreign Key References to them.

----------
3. And finally:
----------

> If an entity doesn't have to be exclusively one subtype, you play with
> the root of the class hierarchy:

If the Subtypes are Non-Exclusive, simply exclude the CHECK
Constraint.

----------
4. Class Definitions
----------

Now that the db has first been modelled (separately, as data, using
rules for modelling data) the object definitions (which are separate
as process, and use the data/database) are pedestrian.

The posted method of implementing RI for Subtypes is completely and
utterly without merit in any "database" (this is c.d.t.). I would
respectfully suggest that you apply RDB fundamentals to the cute ideas
of others, such as "class hierarchy model", and test their validity,
before posting. Otherwise you run the risk of harming your
reputation. Due to your standing, your post, your making them
popular, lends a certain credibility to these otherwise in-credible
and absurd concepts. When they get punctured (any of the RDB types on
this forum can puncture them, I just happen to be the first in this
instance), your credibility suffers along with it, and that is not my
intent.

----------
5. Nested Sets
----------

> I did not invent Nesteds Sets

That is a relief as well. Now I can correct my colleagues, who
presented that hilarious concept to me as yours. Who do we credit for
providing us with such a source of laughter ?

Regards
Derek

-CELKO-

unread,
Oct 14, 2011, 9:52:16 PM10/14/11
to
>> The common approach is to just cram everything into one table with lots of nullable attributes to hide the conflation of fact types, then leave the whole mess under the carpet so that the application code has to figure out the real business model at run-time, every time <<

LOL! that is unfortunately true.

-CELKO-

unread,
Oct 14, 2011, 10:29:27 PM10/14/11
to
Now you are obligated to actually post better techniques and explain
why they are better..

I am not bothered by increasing the number of indexes. You have made
the assumption that references are done by more indexes. Sybase (nee
WATCOM) SQL implements the constraints like
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)

by building pointer chains under the covers, so each compound key
actually appears once in the schema. Teradata uses a hashing
algorithm. RDBMS companies have been working on PK-FK joins for a few
decades, so they are pretty good at it now.

What do you have against the Nested Sets? The only laugh I had with
them was teaching them to XML programmers. Traditional procedural
programmers still think in terms of fake pointer chains, recursive
traversals and other stiff they know from assembly language. But the
XML kids look at it and say "Oh, (lft, rgt) is the tag pairs!" then
wrote really great code by mimicking XQuery.

Derek Asirvadem

unread,
Oct 15, 2011, 2:03:37 AM10/15/11
to
On Oct 15, 1:29 pm, -CELKO- <jcelko...@earthlink.net> wrote:
> Now you are obligated to actually post better techniques and explain
> why they are better..

Ok. Happy to take up the challenge. I have already identified the
"feature" is already available in Relational databases; that it is a
CHECK Constraint, no need for duplicating columns and adding indices.

Do you want me to post the code ? Come on, you are the famous SQL guru
in the family, I am just a little old practitioner with a few happy
customers; your code will probably run circles around mine.

> I am not bothered by increasing the number of indexes.

Yes, that bothers me, especially adding *totally 100% redundant*
indices. That is probably partly why my customers are happy.

> Sybase (nee WATCOM)

Actually:
- Their baby DBMS is "SQL Anywhere", which is nee Watcom. It is an
excellent product for a limited market.

- Their flagship prodcut for any size box, is Adaptive Server
Enterprise. It is nee nothing, they designed and wrote it from
scratch. The *lineage* of the original (1984-1987) engineers is
easily traced back to Britton-Lee. And yes, they always had a few
Waterloo grads on staff. brings back memories.

> You have made
> the assumption that references are done by more indexes. SQL implements  the constraints like
>  FOREIGN KEY (vin, vehicle_type)
>   REFERENCES Vehicles(vin, vehicle_type)
>
> by building pointer chains under the covers, so each compound key
> actually appears once in the schema.

Excuse me, but:

a. You posted SQL. I answered SQL. In order to maintain focus, avoid
tangents, and close this subject like gentlemen, let's talk SQL
only.

b. SQL is a standard, anything that does not comply with the Standard
cannot make the claim to be an SQL. The SQL standard requires that
all references in the database are by *KEY*, and specifically *NOT* by
pointers.

b. I made no assumptions. The SQL code that *you provided*
specifically creates two indices:
>>CREATE TABLE SUVs
>>(vin CHAR(17) NOT NULL
>> PRIMARY KEY, -- Index on (vin) for SUV.PK
>> vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
>> CHECK(vehicle_type = 'SUV'),
>> UNIQUE (vin, vehicle_type), -- Dupe Index [1] on (vin, vehicle_type)
>> FOREIGN KEY (vin, vehicle_type) -- no suggestion that this FK is an additional index
>> REFERENCES Vehicles(vin, vehicle_type)
>> ON UPDATE CASCADE
>> ON DELETE CASCADE,
>> ..);

d. The SQL code that *you provided* specifically identifies data
columns, not pointers. Therefore, above the covers, there are two
indices, visible, with data columns, visible. No pointers. What a
product does under the covers is tangential.

e. The issue I have is above the covers:

e.1 Index[1] is a 110% duplicate of Index SUV.PK.

e.2 (vin) is unique, so (vin) plus anything will be unique. It has no
value (except to support this monstrous structure, which is not reqd
to ensure Exclusive Subtype CHECK).

e.3 There is also another, additional, 110% redundant index in the
parent, to provide the key for the REFERENCED key in the child/ Your
SQL code again:

>>CREATE TABLE Vehicles
>>(vin CHAR(17) NOT NULL
>> PRIMARY KEY, -- Index on (vin) for Vehicle.PK
>> vehicle_type CHAR(3) NOT NULL
>> CHECK(vehicle_type IN ('SUV', 'SED')),
>> UNIQUE (vin, vehicle_type), -- Dupe Index [2] on (vin, vehicle_type)
>> ..);

e.4

> so each compound key actually appears once in the schema.

No. It appears once for each type you declare a CONSTRAINT, not once
in the schema. SQL does not normalise the schema (wish that it
would!).

> Teradata uses a hashing algorithm.

Teradata is not SQL.

Sybase, MS, DB2, all create Standard SQL visible indices from data
columns, from the SQL command that you supplied.

Oracle is non-compliant, in that it twists the SQL definitions in
order to appear compliant. It famously uses ROWIDs, above and below
the covers, but the indices and data columns are visible, above the
covers.

> RDBMS companies have been working on PK-FK joins for a few
> decades, so they are pretty good at it now.

Agreed. But the Foreign *KEYS* that you declared in your SQL code are
still ordinary visible data columns, not pointers or pointer chains.

> What do you have against the Nested Sets?

There is a recent thread on that subject. I will take it up there.

Let's close this thread here.

Ambler and Fowler are famous for their "straw man" arguments,
suggesting that the RM or RDBMS can't do X, then proposing some
monstrous method of doing X. AFAIK, you do not do that. SQL, and any
RDBMS supplying SQL, already has everything required to ensure the
Referential Integrity of an Exclusive Subtype, with zero overhead. The
problem does not exist (we can ensure RI for Exclusive Subtypes),
therefore we do not need the "solution" of the additional structure.

Although you did not present the "straw man" intro to this article,
you did imply that Exclusive Subtypes cannot be verified in SQL (which
is not correct), and then promoted this structure. I do not
understand why, when there is a perfectly simple, zero-overhead
Exclusive Subtype CHECK Constraint, you Mr Celko, the famous SQL guru,
would be promoting this non-SQL "class hierarchy" structure, which is
massively inefficient, due to the number of totally redundant columns
and redundant indices it requires.

Response please.

Bob Badour

unread,
Oct 15, 2011, 3:37:52 AM10/15/11
to
Derek Asirvadem wrote:

> On Oct 15, 1:29 pm, -CELKO- <jcelko...@earthlink.net> wrote:
>
>>Now you are obligated to actually post better techniques and explain
>>why they are better..
>
> Ok. Happy to take up the challenge. I have already identified the
> "feature" is already available in Relational databases; that it is a
> CHECK Constraint, no need for duplicating columns and adding indices.
>
> Do you want me to post the code ? Come on, you are the famous SQL guru
> in the family, I am just a little old practitioner with a few happy
> customers; your code will probably run circles around mine.

Don't count on it. Joe's an idiot. He mostly got a name for himself by
copying and pasting large random blocks of text in response to his betters.

Derek Asirvadem

unread,
Oct 15, 2011, 5:03:10 AM10/15/11
to
Ok, on re-reading my post, I have come to realise the obvious, that if
Joe had known how to ensure RI for Exclusive Subtypes, he would not
have posted this "class hierarchy" monstrosity, with it 220% redundant
Indices; additional processing; Views; Instead-Of triggers; etc.
Additionally, he has issued a challenge, which is better answered
directly. So here goes.

------------------------------------------------
-- Referential Integrity of Exclusive Subtype --
------------------------------------------------

0. Attribution.

I am the original author, and you saw it here first, folks. As in,
tiny nugget of value that it is, it is my original conception and
work, I have been using this and deploying it for years, and I have
not seen it anywhere else. It is always a blinking surprise to the
development teams that I mentor in my travels. Even in this forum, it
seems to be news.

I am publishing it here for the benefit of the community.

----------------
1. No DRI in SQL
----------------

It is true that SQL has still not defined, let alone implemented, the
DECLARATIVE RI constraint that we need.

-------------
2. The Method
-------------

Any SQL that supports Instead-Of triggers supports this, so we are
using the same level playing field. Write these four SQL commands
(plus parms and commentary):

--
-- Validate Supertype (Vehicle AND vehicle_type) for Subtype
--
CREATE FUNCTION ValidateExclVehicle_check (
________@vin CHAR(17),
________@vehicle_type CHAR(3)
________)
____RETURNS TINYINT
AS
____-- Exists = 1
____-- Not Exists = Null, 0 substituted
____RETURN (
________SELECT COALESCE(1, 0)
____________FROM Vehicle
____________WHERE vin = @vin
____________AND vehicle_type = @vehicle_type
____)

2.1 Note that in Relational terminology, vehicle_type is called the
Discriminator, as in, it discriminates the Exclusive Subtype.

--------------
3. Changed DDL
--------------

The original or source or example or starting point we will use is
post #2. That can be referenced, so I won't copy it here. The
*changed* DDL required is as follows.

-- You need one of these per parent or supertype

CREATE TABLE Vehicle (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____vehicle_type CHAR(3) NOT NULL
________CHECK (vehicle_type IN ('SUV', 'SED') )
____-- no second duplicate index
____)

-- for each subtype:

CREATE TABLE SUV (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_SUV_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT SUV_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SUV" ) = 1 )
____)

CREATE TABLE Sedan (
____vin CHAR(17) NOT NULL PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_Sedan_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

3.1 All I am doing is implementing the hoped-for requirement of
Declarative RI, in code, there is nothing magical about it. If and
when SQL comes to the party, all I have to do is search for occs of
"%_Excl_check", and change the *syntax* of the content within the
brackets.

--------
4. Chain
--------

I repeat, Subtypes are ordinary, normal Relational constructs, they
existed as such, decades before OO or 8-p or classes or class
hierarchies or X< were invented. Subtypes are very important because
they eliminate Nulls; support so-called "optional" Foreign Keys
correctly; and service optional columns correctly, thus simplifying
code and keeping the database fast.

There is no problem with subtypes having Subtypes of their own or with
class hierarchies. The "chain" is clean, carrying the true PK, and
nothing else. A stuffed-up chain, or one that is 220% heavier than it
needs to be, is plain stupid, a resource drain.

This also corrects Joe's Normalisation error, of mixing vehicle_type
and sedan_type together, damaging the integrity of both:

CREATE TABLE Sedan (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____sedan_type CHAR(3) DEFAULT '2DR' NOT NULL -- not vehicle_type
________CHECK (sedan_type IN ('2DR', '4DR') ),
____CONSTRAINT Vehicle_Sedan_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

CREATE TABLE TwoDoor (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____CONSTRAINT Sedan_TwoDoor_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT TwoDoor_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclSedan_check ( vin, "2DR" ) = 1 )

------------------------
5. Non-Exclusive Subtype
------------------------

> If an entity doesn't have to be exclusively one subtype, you play with
> the root of the class hierarchy:

If the Subtype is Non-Exclusive, simply exclude the CHECK CONSTRAINT
and *Function*. The Discriminator is not required for these.

No playing around necessary. The chain is unchanged.

-----------------
6. Object Classes
-----------------

Now that the data has been modelled correctly, implementing OO classes
*on top of that* is pedestrian. Trying to model data, which is quite
different to processes and objects, using process or object modelling
methods such as OO, is idiotic. Further, the data exists
independently, and it existed long before the objects being
contemplated, so it has to be modelled first; not second; not as
embedded in an object. The reversal of the ancient chronology is for
idiots who only read the books that they themselves have written,
legends in their own mind. It is fair enough that unqualified
florists are now "developing applications", but it is not fair that:

a. the big names do not qualify and test the visions of these
legendary publishers or

b. that they do not know the traditional Relational construct for it.
(this technique is mine, but any technique, which remains within the
relational paradigm, and does not use masses of duplication, would be
acceptable.)

---------
7. Gotcha
---------

Please refrain from implementing stupid things such as two-way
CONSTRAINTS, otherwise you will tie yourself up in knots of your own
making. Electricity flows in one direction only (thatis an analogy,
please do not come back with that AC/DC nonsense unless you are
willing to have it destroyed). That is a subject in itself, which I
won't expand here, but at least, must be mentioned.

---------------------
8. Why it is better ?
------------------------

Well, "better" explicitly requires that the compared element is
"good". Here the compared element is a monstrosity, there is nothing
good about it. Since the Relational Subtype method existed before the
bloated class hierarchy *method*, that former should be the measure
against which the recent interloper is compared. Therefore, I will
provide a comparison as to why the *posted method* of implementing
class hierarchies is much worse than the pre-existing Relational
method, that I have posted here. To be clear, I fully support
Subtypes and class hierarchies and whatever new name isolated writers
choose to come up with, I took issue only with the implementation
*method*.

The pre-existing method is contained in (3) of this post, and used as
the measure. It is the ZERO in the benchmark. In comparison, the
"class hierarchy" method in post #2 uses the following *additional
resources*, to obtain the same result as above, which is quite
unnecessary:

a. One additional 110% redundant index in every supertype
- no matter what Joe believes, every index has a direct overhead, it
directly affects the speed of every INSERT and DELETE
- unless you are working with no concern for production databases,
which means your database is irrelevant, every additional resource
needs to be carefully considered, and every redundant resource needs
to be identified and eliminated
- such an approach eliminates the otherwise predictable post-mortem
gathering of mountains of "diagnostic" info; and the laborious working
through it; writing programs to filter and report; etc; etc

b. One additional column in every subtype
- this break 1NF, and increases table size, which affects overall
performance (less rows on every I/O; less rows in every memory page)

c. One additional 110% redundant index in every subtype
- ditto (a)

d. added complexity and administration
- this cost is substantial, and if one does not treat it with respect,
one's job migrates overseas

e. Hiding, obfuscation, and subversion, as prescribed
- odious in the extreme,
- makes (d) all the more difficult and costly

f. One additional View per supertype/subtype to "hide all this
complexity" as advised by Joe.
- no big deal, just a couple of superfluous views

g. One additional Instead-Of trigger per supertype/subtype, to be able
to INSERT/DELETE/UPDATE the otherwise simple tables (not views).
- that is a big deal

h. Development, testing, debugging, and maintenance of (d)(e)(f) and
(g)
- refer cost again

-------------
9. Conclusion
-------------

The total cost of the proposed class hierarchy *implementation method*
is substantial, both in time/dollars, and in machine resources; that
which I have quantified is at least 220% overhead plus development/
testing/maintenance. All of which can be eliminated, if the author of
such (yet unattributed) would only read a book on Relational
Databases, before making pronouncements of what it can and cannot do,
and inventing monstrosities such as this.

The proposed class hierarchy *implementation method* is devoid of any
value whatsoever.

Interested parties are referred instead to study ordinary Relational
Subtypes, and pre-existing Relational constructs and SQL CONSTRAINTS
as implementation methods.

"... there is nothing new under the Sun." Ecclesiastes 1:9. And those
who say there is, are frauds. Those who believe the charlatans, are
fools.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 5:41:24 AM10/15/11
to
Roy Hann

On Oct 11, 6:55 pm, Roy Hann <specia...@processed.almost.meat> wrote:
>
> Surprisingly to me, Joe is advocating what I would have suggested.  It
> is entirely respectable, though it may not be all that common.

Ok, so you are both fools.

It is not common, because 220% overhead and substantial cost, is just
plain stupid. People are not quite as stupid as you think.

> The common approach is to just cram everything into one table with lots
> of nullable attributes to hide the conflation of fact types, then leave
> the whole mess under the carpet so that the application code has to
> figure out the real business model at run-time, every time.

So that's where you have been spending your time. Now I understand
the questions you ask.

> Explicit support for distributed key constraints in SQL would make the
> approach properly robust plus it would, in my opinion, have the
> psychological benefit of pointing the way and comforting us that we're
> doing it right.

Reading an appropriate book will provide both comfort and the method
of doing things correctly. A bit of genuine coding and experience
will provide knowledge about which methods are better than others.
SQL is 20 years past its use-by date, so no use waiting for them. If
a book is too long, try IDEF1X, it has a good, short synopsis. If
that is too long, try this:
____http://www.softwaregems.com.au/Documents/Documentary%20Examples/
IDEF1X%20Notation.pdf

They are not "distributed keys". They are ordinary Foreign Keys. The
tables happen to be Subtypes.

If you find anything less than proper, robust, declarative (in the DDL
sense, not the syntax), and high performance, in my (2) and (3) above,
let me know. On second thought, don't. That will lead to me having
to write essays on the resources and operation of the computer. There
are perfectly book books on Amazon.com.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 5:51:55 AM10/15/11
to
Bob

On Oct 15, 6:37 pm, Bob Badour <b...@badour.net> wrote:
>
> Don't count on it. Joe's an idiot. He mostly got a name for himself by
> copying and pasting large random blocks of text in response to his betters.

I see. That explains a few things. Thank you. I stand corrected.

-CELKO-

unread,
Oct 15, 2011, 7:58:40 PM10/15/11
to
>> Response please. <<

>> traced back to Britton-Lee. And yes, they always had a few Waterloo grads on staff. brings back memories. <<

I never worked with the Britton-Lee machines , but wish I had. I have
a rule of thumb to look for software from University of Waterloo.
Remember the C compilers? It produced executable 1/3 the size of
Microsoft, conformed to standards and had a lot of nice stuff. But the
“Watrerloons” (is that still the slang?) could not write a human
interface. It was as if they only took hardcore comp sci, math and
optimizer courses.

>> [ by building pointer chains under the covers, so each compound key actually appears once in the schema.] Excuse me, but:

a. You posted SQL. I answered SQL. In order to maintain focus, avoid
tangents, and close this subject like gentlemen, let's talk SQL only.

b. SQL is a standard, anything that does not comply with the Standard
cannot make the claim to be an SQL. The SQL standard requires that
all references in the database are by *KEY*, and specifically *NOT* by
pointers. <<

Not so. The standards use (over-use?) the term “effectively” to
describe the results and not the method. You can use pointers,
indexing in all its flavors, hashing (lot of academic stuff on
Minimal Perfect Hashes for DB right now), or brute force table scans
(looking good for SSD and massive parallelism). Pointers and indexing
are simply the most popular because SQL products grew out of existign
storage engines and file systems.

Around the time of the SQL-89 standard one of vendors on the old ANSI-
X3H2 committee waned to propose a CREATE INDEX statement. The X\Open
consortium has such syntax in their portability guide and the X3H2
committee overlapped with them quie a bit. The proposal was voted
down on the grounds that a CREATE INDEX / CREATE HASH / CREATE
SURROGATE_KEY / CREATE MAGIC_GENIE were too close to a physical
implementation. The goal was to make the Standards as abstract as
possible so that ANY implementation would be conformant.

What Standards have and implementation do not is a generalize CHECK()
constraint and the CREATE ASSERTION statement that can reference any
table in the schema with aggregates. The CREATE ASSERTION is why the
<name> in “CONSTRAINT <name> CHECK(..)” has to be globally unique.
Originally, the CREATE ASSERTION was invented because all constraints
on a empty table are TRUE and therefore you cannot force a table to
contain data. Nice from a pure logical viewpoint (Existential Import
and all that jazz), but not so good for piratical programming with
mandatory relationships.

>> Teradata is not SQL. <<

It is as much ANSI/ISO SQL as any other commercial product, but they
tend to have Data Warehouse extensions because of their market and
optimize for huge amounts of data. The bad part of taching at a
Teradata class is that I am the only guy in the with only a Masters in
Math and everyone else is PhD in Stats.

>> Sybase, MS, DB2, all create Standard SQL visible indices from data columns, from the SQL command that you supplied. <<

Nope, index as NOT standard. DB2 is my favorite; lots of platforms,
good conformance, easy to use. I remember Sybase as very good but I
have not done any financial work in years where they shine. MS is
always a decade behind, but fine for its niche (the programmers are
the problem here; no really).

>> Oracle is non-compliant, in that it twists the SQL definitions in order to appear compliant. It famously uses ROWIDs, above and below the covers, but the indices and data columns are visible, above the covers. <<

I can write Oracle when I have to, and I can USUALLY read it in spite
of the dialect. But in addition to your remarks, let me add that to
keep it tuned is a full-time job with 300+ parameters to watch. If you
can figure it out, it runs like a charm; one parameter wrong and all
bets are off. My big gripe with Oracle programmers is the over-use of
cursors. They might as well be ina file system in COBOL. Rant,
Rant,Rant!

>> [What do you have against the Nested Sets?] There is a recent thread on that subject. I will take it up there. <<

I will cruise over to it when I get my head above water. I need to
spend Sunday and Monday catching up with an on-line free course and
cleaning the garage. Oh, you are being too polite; I like an adversal
dialogue:) My wife is an ordained Soto Zen Priest and she teaches by
beating students with a stick.

-CELKO-

unread,
Oct 15, 2011, 8:47:09 PM10/15/11
to
In this particular case, the way to do this in full, repeat FULL
Standard SQL, is with global CHECK() check constraints. Make and model
information is in the VIN so a regular expression can check it.

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
REFERENCES Vehicles (vin),
CONSTRAINT vin_in_motor_pool
CHECK (EXISTS
(SELECT *
FROM Vehicles AS V1
WHERE V1.vin = SUV.vin),
CONSTRAINT suv_vin
CHECK (<< regular expression >>),
..
);

-CELKO-

unread,
Oct 15, 2011, 8:51:48 PM10/15/11
to
In this particular case, the way to do this in full, repeat FULL
Standard SQL, is with global CHECK() check constraints. Make and model
information is in the VIN so a regular expression can check it.

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,

Derek Asirvadem

unread,
Oct 15, 2011, 10:27:01 PM10/15/11
to
Forget about the make-and-model-in-the-vin business, as that
introduces changes far to late in the progression. Stick with the
meaning of the original postings.

Your suggestion has the following errors:

1 Re PK (vin). My method places the code in One Place, Once.

Your method may work, but it places the code in every Subtype, which
results in *repeated code*, and without a constraint_name, which
hinders administration.

2 Syntax. I used syntax that most SQLs would accept.

Your method performs the existence check within the table DDL, which
most SQLs disallow.

The correlated subquery won't work, because the table does not exist
yet (it is being created).

Do you ever check your code before posting ?

3 The purpose of the xxx_Excl_check is to ensure that the Subtype
exists in the parent *as that subtype*. Therefore the specific
combination of (vin, vehicle_type) must be checked for existence.

You check for (vin) only.

4 The existence check is not a "regular_expression", it is a full
query.

5 When existence checking, one should *never* use "SELECT *", as that
drags the whole row into cache, and returns it to the thread, no
matter what columns are used to qualify the query. And the thread
doesn't use the data it fetched. For existence checking, one should
always use "SELECT 1". That allows the optimiser to avoid handling
and transporting the data altogether. Further, if the query can be
serviced by an index alone (covered query), it avoids accessing the
data structure.

> [your other post]

Does not have any substance worth responding to.

If, and when, you produce evidence (output of SELECT
<true_column_list> ... , which excludes ROWID() functions, etc), then
and only then, might we have some substance to discuss. Until then,
your posting that "SQL implements [FK] constraints by building pointer
chains under the covers" is complete bunkum.

One of the fundamental differences between Relational and pre-
relational models (I will stick to practical understanding that
everyone is familiar with, refer Codd's Twelve Rules, and please do
not introduce distractions such as arguments about the RM) is that in
the pre-relational model, the references, links, chains, etc, was
indeed implemented with *internal pointers*, and the RM demanded that
all such references, links, etc, was implemented with *logical keys
only*. You seem to have missed that.

All the strictly-SQL-compliant products use and display logical keys
only. The "edge" products make internal pointers visible, but even
that is *in addition to* the logical key, which cannot be subverted.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 10:39:20 PM10/15/11
to
Joe

I forgot to mention:

6 CONSTRAINT vin_in_motor_pool is *totally redundant* because you have
a constraint:
REFERENCES Vehicles (vin)
in the line immediately above it.

Regard
Derek

-CELKO-

unread,
Oct 16, 2011, 1:16:28 PM10/16/11
to
Again, the CHECK() constraint in Standard SQL does allow ANY valid
search conditon. No product I know has implemented it and the only
product with a CREATE ASSERTION statement was RDB, a product form DEC
that was bought by OrACLE.

You asked for a Standard SQL and that was one solution.

Actually, the regular expression should have been the SIMILAR TO
predicate. Again, this is ANSI/ISO Standard and based on POSIX from
the Federal Standards. Have you ever downloaded a copy of the
Standards?

The VIN is a special case because we can validate that the vehicle is
an SUV from the VIN. Other identifiers might not have this property.
If you want to see the power of regualr expressions, look up the
pattern

>> When existence checking, one should *never* use "SELECT *", as that
drags the whole row into cache, and returns it to the thread, no
matter what columns are used to qualify the query<<

LOL! Why does anyone still beleive that myth? There was an early
version of Oracle that did that. It was correct the next release. The
options were "SELECT * ", "SELECT <constant>", "SELECT <column>", and
"SELECT <expression>" and they are all treated as *. Here we get into
that "effectively" thing in Standard-speak. Since only the table
cardinaltiy matters, the optimizer throws out the SELECT list in every
SQL product I have used or consulted on.

The SELECT * is not required, but it is the preferred style because
<column name> or <expression> can change, so the subquery will be
invalid, <constant> tells you nothing about the nature of the query
while SELECT * tells you that this is a table level construct, not
dependent on the rows and columns. To stress this consider that
EXISTS (SELECT * FROM .. WHERE..)
is equivalent to
(SELECT COUNT(*) FROM .. WHERE..) > 0

And that is how SQL products actually inplement it under the covers.
The mechanism for counting rows can use the indexes and stop when it
gets the first row back.

Jonathan Leffler

unread,
Oct 16, 2011, 2:25:10 PM10/16/11
to
On 10/14/11 11:03 PM, Derek Asirvadem wrote:
> Sybase, MS, DB2, all create Standard SQL visible indices from data
> columns, from the SQL command that you supplied.

Actually, Standard SQL does not support indexes at all. They are an
implementation detail that actual SQL DBMS use, but they are completely
outside the scope of the SQL standard.

--
Jonathan Leffler <jlef...@earthlink.net> #include <disclaimer.h>

Derek Asirvadem

unread,
Oct 16, 2011, 10:17:20 PM10/16/11
to
On Oct 17, 5:25 am, Jonathan Leffler <jleff...@earthlink.net> wrote:
>
> Actually, Standard SQL does not support indexes at all.  They are an
> implementation detail that actual SQL DBMS use, but they are completely
> outside the scope of the SQL standard.

Sure. But let's not get side-tracked. (a) I am not writing a paper
with fully qualified statements, I am responding to a post. (b) all
SQLs [that are compliant with the standard and thus SQLs] have
indices. Whether it is an Extension or not is not relevant to this
thread, and I did not mean to imply that indices were standard, rather
that all the SQLs have them, and have them in a visible column, non-
pointer, form.

The point being argued is, Celko stated:
>> You have made the assumption that references are done by more
>> indexes. Sybase (nee WATCOM) SQL implements the constraints like
>> FOREIGN KEY (vin, vehicle_type)
>> REFERENCES Vehicles(vin, vehicle_type)
>> by building pointer chains under the covers, so each compound key
>> actually appears once in the schema. Teradata uses a hashing
>> algorithm. RDBMS companies have been working on PK-FK joins for a few
>> decades, so they are pretty good at it now.

Which I state is nonsense. Celko does not understand the effect of his
own SQL:
- each PRIMARY KEY constraint builds an index
- each UNIQUE constraint builds an index
- no SQL normalises the schema
- the nonsense about "pointer-chains" is just a red herring to avoid
dealing directly with my refutation of the nonsense he has posted.

In order to avoid further misunderstandings, yes of course the B-Tree
in any index on any platform that has implemented SQL has pointers at
the non-leaf levels, and may have either pointers or actual data
columns (the key) at the leaf level. The context here is, whether (a)
the index is real (b) visible and (c) consists of visible data
columns. Or not. What one platform or another does under the covers
is not relevant to the method posted by Celko, or the issue I have
raised, or the method posted by me.

Regards
Derek

Derek Asirvadem

unread,
Oct 16, 2011, 11:15:03 PM10/16/11
to
On Oct 17, 4:16 am, -CELKO- <jcelko...@earthlink.net> wrote:
> Again, the CHECK() constraint in Standard SQL does allow ANY valid
> search conditon. No product I know has implemented it and the only
> product with a CREATE ASSERTION statement was RDB, a product form DEC
> that was bought by OrACLE.
>
> You asked for a Standard SQL and that was one solution.
>
> Actually, the regular expression should have been the SIMILAR TO
> predicate. Again, this is ANSI/ISO Standard and based on POSIX from
> the Federal Standards. Have you ever downloaded a copy of the
> Standards?
>
> The VIN is a special case because we can validate that the vehicle is
> an SUV from the VIN. Other identifiers might not have this property.
> If you want to see the power of regualr expressions, look up the
> pattern

Sounds and looks like the volumes of text you are famous for posting.
Smells like bulldust. All red herrings and distractions, in order to
avoid dealing directly with any of the issues I have raised.

> >> When existence checking, one should *never* use "SELECT *", as that
>
> drags the whole row into cache, and returns it to the thread, no
> matter what columns are used to qualify the query<<
>
> LOL! Why does anyone still beleive that myth? There was an early
> version of Oracle that did that. It was correct the next release. The
> options were "SELECT * ", "SELECT <constant>", "SELECT <column>", and
> "SELECT <expression>" and they are all treated as *. Here we get into
> that "effectively" thing in Standard-speak. Since only the table
> cardinaltiy matters,

Tables have "cardinality", do they ? Is that in your new book
*Mangled Relations for Dummies*

> the optimizer throws out the SELECT list in every
> SQL product I have used or consulted on.

Well you should start checking the showplans or query plans and
testing the code before you provide consulting. Until you post one of
those, proving your assertion, it remains bulldust, and just another
distraction.

> The SELECT * is not required, but it is the preferred style

Now it is "preferred", is it.

> because
> <column name> or <expression> can change, so the subquery will be
> invalid, <constant> tells you nothing about the nature of the query
> while SELECT * tells you that this is a table level construct, not
> dependent on the rows and columns.

You know, every time you post, you prove you have no idea about what
you are posting. Every post destroys more of the little credibility
you have left.

1. A query is a query. There is no such thing as a "table level
construct" for what you have identified or for the SELECT you have
given.

2. Some of the SQLs are smart enough to *ignore* the column list and
others are not.

3. Read the freaking manual for your particular platform. Most of
them will categorically state:
____In an EXISTS, always use (SELECT 1 FROM ... WHERE ...)

4. Stop trying to bamboozle the readers with bulldust.

5. You cannot even tell the difference between column issues and
qualifiers in the WHERE clause. What you are stating simply does not
apply.

> To stress this consider that
> EXISTS (SELECT * FROM .. WHERE..)
> is equivalent to
> (SELECT COUNT(*) FROM .. WHERE..) > 0

6. The better (not all) SQLs differentiate an EXISTS from a normal
SELECT. For EXISTS, the better SQLs stop processing the suquery when
the first row that satisfies the WHERE clause is found.

7. In every SQL, SELECT COUNT(*) has to scan the *entire table*

8. The is no "equivalence" between the two statements given.

You need to take some formal education in SQL before you post further
nonsense. Let alone "consult".

> And that is how SQL products actually inplement it under the covers.
> The mechanism for counting rows can use the indexes and stop when it
> gets the first row back.

9. By definition, by standard requirement (God, you're either an idiot
with a capital I or a shameless liar), SELECT COUNT(*) has to count
all rows. There may be faster methods of obtaining a row_count, but
we are discussing the SQL you have posted.

10. The SELECT COUNT(*) is in brackets. That has to be resolved
before the comparison outside the brackets is resolved. You need to
read books written by others on the subject, not only the fairy tales
that you write about SQL.

11. Since you have demonstrated the exact level of you knowledge of
SQL, by your own posts, and that has been pointed out to you, you are
no longer in a position to *tell* anyone anything about how something
works. You are a fraud.

--

Stop hiding behind this "under the covers" nonsense. It is
transparent. Unlike you, I (a) test my SQL and (b) obtain showplans
and resource consumption stats at a variety of levels, *before I
post*. Everything "under the covers" is laid bare.

And stop taking he position of knowing how SQL is parsed, compiled and
executed. You have proved you can't even write SQL, there is no way
that you can possibly write a compiler or speculate about what one
would do.

Finally, I note you have not addressed either the substance of my
refutation of your proposal, or my alternative method. Just an awful
lot of black powder smoke. And both your feet are bleeding.

--------------------------------------
I take it that my posted method stands
--------------------------------------

> My wife is an ordained Soto Zen Priest and she teaches by
> beating students with a stick.

I can understand her taking out her frustration with you, on the
kids. But that is not acceptable, it does not resolve the problem,
the circle continues ad infinitum. She needs to beat the source of
the frustration with a big stick, not pass it down to the kids.

Regards
Derek

Eric

unread,
Oct 17, 2011, 7:05:14 AM10/17/11
to
On 2011-10-17, Derek Asirvadem <derek.a...@gmail.com> wrote:
> On Oct 17, 5:25?am, Jonathan Leffler <jleff...@earthlink.net> wrote:
>>
>> Actually, Standard SQL does not support indexes at all. ?They are an
>> implementation detail that actual SQL DBMS use, but they are completely
>> outside the scope of the SQL standard.
>
> Sure. But let's not get side-tracked. (a) I am not writing a paper
> with fully qualified statements, I am responding to a post. (b) all
> SQLs [that are compliant with the standard and thus SQLs] have
> indices. Whether it is an Extension or not is not relevant to this
> thread, and I did not mean to imply that indices were standard, rather
> that all the SQLs have them, and have them in a visible column, non-
> pointer, form.

It's sort-of fun to watch two people who don't know enough trying to
have an argument.

What on earth is "and have them in a visible column, non-pointer, form"
supposed to mean? All you can ever see of an index (apart from its
effect on query performance) is the list of columns indexed and some
vendor-dependent stuff usually about physical storage.

>
> The point being argued is, Celko stated:

Just to pick on one more thing:

> - each PRIMARY KEY constraint builds an index
> - each UNIQUE constraint builds an index

Not "builds", but "usually needs" :
- nowhere is there a requirement that primary and unique keys be
implemented with an index
- only one index is needed and there is nothing to prevent an SQL parser
being smart enough to realise this
- if the parser is not smart enough there may well be a way of
second-guessing it and still ending up with only one index.

And finally...

> ... under the covers is not relevant ...

True, but you don't always seem to understand where the covers are.

Eric


--
ms fnd in a lbry

-CELKO-

unread,
Oct 18, 2011, 12:36:10 PM10/18/11
to
>> Actually, Standard SQL does not support indexes at all.  They are an implementation detail that actual SQL DBMS use, but they are completely outside the scope of the SQL standard. <<

You already read my short history of X3H2 and CREATE INDEX, so I will
not repeat it. Back in the ealry days, we worried about how to do the
indexes. B-Tree in its mutations were favored. Oracle briefly had a
ersion that carried the prefix of CHAR(n) and VARCHAR(n) strings at
different level; it had too much ovehead on INSERT and UPDATE.

Today, I would look at hashing and brute force scans to enforce unique
constraints. In a large DB, I have to traverse several levels in an
index, but a good hash is one probe no matter how big the table is.

If I have a massively parallel procesor and SSD, I can scan the SSD at
processor speeds; everything is effectively in Main Storage. Why
bother with indexing and all the troulbe to maintain it?

-CELKO-

unread,
Oct 18, 2011, 1:16:26 PM10/18/11
to
On Oct 16, 10:15 pm, Derek Asirvadem <derek.asirva...@gmail.com>
wrote:
>> Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust.  All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised.<<

No, you went right to p[erosnal at

-CELKO-

unread,
Oct 18, 2011, 3:18:19 PM10/18/11
to
>> Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust. All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised. <<

Personal attacks are not issues. You have never worked with the SQL
Standards, probably have worked on less than a dozen SQL
implementations, have never been part of a team designing optimizers.
You also have a number of conceptual problems, which seem to be the
result of confusing the abstract standards and RM versus particular
implementations.


>> Tables have "cardinality", do they ? Is that in your new book “Mangled Relations for Dummies* <<

No actually it is in SQL:2005 Part II. I t is a bad idea to go up
against a man who keeps the ANSI/ISO Standards on his desk :) And
after as decade on X3H2, I have a good idea where to look.

================
4.14.1 Introduction to tables

A table is a collection of rows having one or more columns. A row is a
value of a row type. Every row of the same table has the same row
type. The value of the i-th field of every row in a table is the value
of the i-th column of that row in the table. The row is the smallest
unit of data that can be inserted into a table and deleted from a
table.

A table T2 is part of a column C of a table T1 if setting the value of
T1.C to a NULL value (ignoring any constraints or triggers defined on
T1 or T1.C) would cause T2 to disappear.

The most specific type of a row is a row type. All rows of a table are
of the same row type and this is called the row type of that table.

The degree of a table, and the degree of each of its rows, is the
number of columns of that table. The number of rows in a table is its
cardinality. A table whose cardinality is 0 (zero) is said to be
empty.

The term Cardinality is originally from Set Theory, where we are more
worried about finite and transfinite sets. And even more with kinds of
infinite sets.

>> Well you should start checking the showplans or query plans and testing the code before you provide consulting. Until you post one of those, proving your assertion, it remains bulldust, and just another distraction. <<

Run a EXISTS (SELECT *), EXISTS (SELECT 1), EXISTS (SELECT x), and
EXISTS (SELECT a+b), look for any difference in the SHOWPLAN, EXPLAIN,
or what each vendor is used for this system utility.

>> 1. A query is a query. There is no such thing as a "table level construct" for what you have identified or for the SELECT you have given. <<

No, sorry. Do not confuse a uniform syntax with uniform semantics. A
SELECT in a CURSOR body has limitations, a SELECT in a subquery has to
be in parentheses, a SELECT in an INSERT INTO has limitations, etc.

The model we used in SQL is a hierarchy. The outer level is the
schema; here the schema information tables and schema objects are
exposed to an ADMIN . An ADMIN has the DROP, CREATE and ALTER
privileges; a USER does not. Her is where DRI actions live, etc.

The “DROP <schema object>” is at the table level; we do not care
about the rows or columns inside the table. Likewise EXISTS (SELECT
* ) and COUNT(*) at the table level.

The next level is the rows. The INSERT INTO uses rows are its unit of
work, multi-column CHECK() constraints, etc.

Finally, we have the column level, where simple column CHECK(),
DEFAULT, NOT NULL, etc.

>> 2. Some of the SQLs are smart enough to *ignore* the column list and others are not. <<

I cannot think of any current SQL product that does not use this
optimization pattern:
1) Throw exception if table not in the schema information tables
2) Look for single table parameter without a WHERE clause and check
cardinality in schema information tables. This is short-circuit
evaluation at the schema level. Hey, it pays off!
3) Look for empty tables in the FROM clause. This is short-circuit
evaluation at the table level.
4) Perform the filters in the WHERE, then do the joins in the FROM
clause. This step is done with special code for the joins, since we do
not have to return the whole row.
5) Return FALSE when the result set has a row, This is short-circuit
evaluation at the row level.
6) Otherwise, return TRUE; there is no UNKNOWN in this predicate.

The “special code:” in step #4 is the fun part. Skewed distributions,
de-dupping in the join, hashing, throwing out values and using bit
flag indexes, etc.

>> 3. Read the freaking manual for your particular platform. Most of them will categorically state: ____In an EXISTS, always use (SELECT 1 FROM ... WHERE ...) <<

Not in any of the five SQLs I have on my desk. The examples in Oracle
tend to use a constant, but that is not recommended anywhere. Again,
the use of EXIST(SELECT *..) is favored because it is not clear if the
<select list> is invalid whether or not the SQL engine should return
a warning and continue to process the predicate or throw an
exception. The star is always safe, even if the base table columns
change.

>> 4. Stop trying to bamboozle the readers with bulldust. <<

I will make a note of that :)

>> 5. You cannot even tell the difference between column issues and ualifiers in the WHERE clause. What you are stating simply does not apply. <,

Again, it is not a good idea to go up against a guy with a copy of
the Standards on his machine :) I said

> EXISTS (SELECT * FROM .. WHERE..)
> is equivalent to
> (SELECT COUNT(*) FROM .. WHERE..) > 0

And SQL:2006 says:

8.10 <exists predicate>
Function
Specify a test for a non-empty set.

Format
<exists predicate> ::= EXISTS <table subquery>

Syntax Rules
None.

Access Rules
None.

General Rules
1) Let T be the result of the <table subquery>.

2) If the cardinality of T is greater than 0 (zero), then the result
of the <exists predicate> is TRUE; otherwise, the result of the
<exists predicate> is FALSE.

Conformance Rules

1) Without Feature T501, “Enhanced EXISTS predicate”, conforming SQL
language shall not contain an <exists predicate> that simply contains
a <table subquery> in which the <select list> of a <query
specification> directly contained in the <table subquery> does not
comprise either an <asterisk> or a single <derived column>.

==========================================

>> 7. In every SQL, SELECT COUNT(*) has to scan the *entire table* <<

No, the schema information tables have the cardinality of a table. If
the stats are out of date, tehn they go to the indexes. The table is
used only as a last resort.

>> 8. The is no "equivalence" between the two statements given. You need to take some formal education in SQL before you post further nonsense. Let alone "consult". <<

I remember that next time I see my name in an SQL text book :)

>> 10. The SELECT COUNT(*) is in brackets. That has to be resolved before the comparison outside the brackets is resolved. You need to read books written by others on the subject, not only the fairy tales
that you write about SQL. <<

The optimizer is free to implement an execution plan as it pleases, so
long as it “effectively” returns the same results. Do you also believe
that the FROM clause is executed before the WHERE clause? Since you
did not know the term Cardinality, you might have to look up
“Cartesian Product” to understand this point.

>> Finally, I note you have not addressed either the substance of my refutation of your proposal, or my alternative method. <<

I lost track of it in all your name-calling and screaming. Basically,
you want to use this pattern of REFERENCES

vehicles ← sedans ← two_door

The classic schema I gave uses:

vehicles ← sedans
vehicles ← two_door

Validate a two-door and you have to access three tables. The classic
schema accesses only two
Now drop the sedans and the two-doors are orphaned. The classic
schema does not have this problem.

To find data about a two-door sedan I have to use all three tables.
But to find the data for my four_door, I only need vehicles ← sedans.
Now add convertibles as another class. Do you subordinatre it to
two_doors or to sedans? Or directly to vehicles?

Pretty clearly, the paint job is in Vehicles. But where do the door
locking attributes fit? This is a version of the network /
hierarchical database design problems. How do I handle optional/
mandatory relationship? What level/node does an attribute belong?

Also, Codd did not have anything to do with class models; he was into
DOMAINs. And SQL does have DRI actions; that is section 11.8
<referential constraint definition>.





Hugo Kornelis

unread,
Oct 18, 2011, 3:29:37 PM10/18/11
to
On Wed, 12 Oct 2011 17:51:11 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

(...)
Hi Derek,

I've tried to follow your debate with Joe Celko with interest, though
it's sometimes hard to keep track of the arguments in between the
mud-slingin.

As Joe can confirm, I often disagree with him, on many subjects. But
when I first saw him post the subtype/supertype implementation model
you criticise, I liked it for its robust protection agains integrity
violations. You are correct that the redundant columns and the
extraneous keys require more disk space and hurt performance, but I
always think that integrity comes first, all the rest comes later.

But I have of course an open mind for better alternatives, and you
seem to have one. Unfortunately, the most relevant part for integrity
is left out of your post, or rather replaced by ellipsis. Given the
two tables above (plus the implied vehicles table), how will the
following (inconsistent) transaction be rejected:

BEGIN TRANSACTION;
INSERT INTO Vehicles (vin, vehicle_type)
VALUES ('12345678901234567', 'SUV');
INSERT INTO Sedans (vin, sedan_type)
VALUES ('12345678901234567', '4DR');
INSERT INTO TwoDoor (vin)
VALUES ('12345678901234567');
COMMIT TRANSACTION;

I think you intend the constraints Sedan_Excl_vk and 2dr_Excl_ck to
prevent this, but you left out their specification. In a complete ANSI
SQL implementation, I could easily create this constraint, as the ANSI
SQL standard permits the use of subqueries in a CHECK constraint
(SQL-92: full compliance; SQL-2003: Feature F671, "Subqueries in CHECK
constraints"). But the product I work with (Microsoft SQL Server) does
not implement this feature, and a quick Google search gives me the
impression that most (if not all) major vendors suffer from the same
limitation.

If your point is that in a fully ANSI-compliant RDBMS, the version Joe
posted suffers from needless redundancy (just like this sentence), I
agree. But if you have a way of coding the CHECK constraint that is
supported by some, or preferably all, major current RDBMS
implementations and that will disallow the inconsistent data I gave
above, please let me know. (Or, if you already gave it and I missed it
when wading through the discussion, please tell me where I can find
it).
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Derek Asirvadem

unread,
Oct 18, 2011, 9:10:43 PM10/18/11
to
On Oct 17, 10:05 pm, Eric <e...@deptj.eu> wrote:
>
> It's sort-of fun to watch two people who don't know enough trying to
> have an argument.

It is even more fun to watch someone on the sidelines, who is not
party to the argument, picking up clauses and examining them in
isolation.

> What on earth is "and have them in a visible column, non-pointer, form"
> supposed to mean?

Let me spell it out for you:
- I was not posting a new thread (I would have used proper terms).
- As I detailed to Jonathan, I was not writing a fully qualified
thesis (if you are expecting that, you will find it lacking)
- I was posting a response to Celko's thread. That defines the
context.
- Celko was playing "slippery fish", to avoid dealing with specific
problems that I identified (check the whole thread)
- In an attempt to prevent the slipperiness, I used Celko's terms.
- As it progressed, I used further terms make distinctions, using his
used terms as the base (I would not have used his terms, or my-terms-
that-are-based-on-his-terms in a stand-alone article)

> All you can ever see of an index (apart from its
> effect on query performance) is the list of columns indexed and some
> vendor-dependent stuff usually about physical storage.

Excellent, you seem to understand that correctly. So your question is
either rhetorical or plain absurd.

If you have an honest argument with me, just state it; if there is
something you wish to clarify about one of my statements, ask a
question. But this picking and gnawing on the carcasses left, after
others have feasted, is for the birds.

> Just to pick on one more thing:
>
> > - each PRIMARY KEY constraint builds an index
> > - each UNIQUE constraint builds an index
>
> Not "builds", but "usually needs" :

Correcting my English is petty. If you fancy yourself an editor, send
me your CV, I will send you all my posts to edit, before I post.

> - nowhere is there a requirement that primary and unique keys be
>   implemented with an index

Try reading the entire statement in context. You might draw a
different conclusion.

If that is still confusing, try this: I did not say "there is a
requirement..." so there is no point arguing against what I did not
say. I am quite happy to address issues regarding what I have said
(the thread is ready evidence). But I do not engage with isolated
scraps. It might give the readers the idea that such isolated scraps
have some credibility, or that bird-like behaviour is appropriate for
functioning adults.

> - only one index is needed and there is nothing to prevent an SQL parser
>   being smart enough to realise this

Marvellous. Identify which SQL does that (post evidence).

Until then your suggestion is possibility in the future, not a
reality, which is what Celko posted (he did not say "sometime in the
future, some SQL could ...", he posted "Here is how to do this
[now] ..."), which I am arguing. I can simply state, no SQL does that
[now]. The implication is "now", on Earth, not "ten years from now",
not "if and when the vendors wake up and read Celko's post", not on
the fourth moon of Jupiter.

> - if the parser is not smart enough there may well be a way of
>   second-guessing it and still ending up with only one index.

Feel free to post the method. Otherwise your suggestion is a yet
another fantasy of could-bes and guesses. Just like Celko. Any
examination reveals that his method is based on facilities that are
not available now, as stated "do this", that the method might be
useful at some point in the future, if and when some SQL vendor
supplies the the underlying facility.

My method is based on the here and now, in any SQL (I have
specifically excluded the non-SQLs), on Earth.

> And finally...
>
> > ... under the covers is not relevant ...
>
> True, but you don't always seem to understand where the covers are.

Thanks for your opinion. I will make sure I give it the consideration
it deserves, keeping in mind your demonstrated level of capability, in
understanding technical matters, and your could-be, should-be,
attitude to addressing present day reality.

Should you wish to address any of the salient points in my post, a
piece of meat, rather than a scrap, I might give your avian opinion
more consideration.

Regards
Derek

-CELKO-

unread,
Oct 18, 2011, 10:35:12 PM10/18/11
to
>> As Joe can confirm, I often disagree with him, on many subjects. But when I first saw him post the subtype/supertype implementation model you criticise, I liked it for its robust protection against integrity violations. You are correct that the redundant columns and the extraneous keys require more disk space and hurt performance, but I always think that integrity comes first, all the rest comes later. <<

Hey, we do not disagree very often, where dis you get a silly idea
like that? Sorry, I had to do a Monty Python arguer skit :)

You are of course right about data integrity. New programmers who
never worked with file systems, do not know the first non-RDBMS do not
know that the original intent of a database was data integrity, not
performance. Removing redundancy was the first step in that goal.

>> In a complete ANSI SQL implementation, I could easily create this constraint, as the ANSI SQL standard permits the use of subqueries in a CHECK constraint (SQL-92: full compliance; SQL-2003: Feature F671, "Subqueries in CHECK constraints"). But the product I work with (Microsoft SQL Server) does not implement this feature, and a quick Google search gives me the impression that most (if not all) major vendors suffer from the same limitation.<<

I have to look at more products and I do not know of any either. The
expense could be awful, like a bad TRIGGER that fires with every
insert, update and delete

>> If your point is that in a fully ANSI-compliant RDBMS, the version Joe posted suffers from needless redundancy (just like this sentence), I agree. But if you have a way of coding the CHECK constraint that is supported by some, or preferably all, major current RDBMS implementations and that will disallow the inconsistent data I gave above, please let me know. <<

Wait until he sees Kuznetsov's pattern for preventing temporal gaps in
event sequences or my state change constraints :) Overlapping and
self-referencing uniqueness constraints are hard for an experienced
SQL programmer.

-CELKO-

unread,
Oct 18, 2011, 10:53:04 PM10/18/11
to
>> [only one index is needed and there is nothing to prevent an SQL parser being smart enough to realise this]
Marvelous. Identify which SQL does that (post evidence). <<

Mr. Hann can tell me if my memory is correct, but I think Ingres has
something like 17 kinds of indexes and it did index optimization,
picking the right one for a particular data distribution and merging
or removing redundant ones. MS SQL Server has the INCLUDE option,
which says to carry extra columns but not use them in the indexing
process. This lets you get a cheap covering index.

Hashing algorithms work better with more columns (few collisions), so
they will replace a shorter list of columns with a longer one in the
hash.

Informix lets you create an index on a search condition for a multi-
table join. This is not a VIEW, but pointers to the rows involved.

Bob Badour

unread,
Oct 18, 2011, 11:41:28 PM10/18/11
to
-CELKO- wrote:

>>>Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust. All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised. <<
>
> Personal attacks are not issues. You have never worked with the SQL
> Standards, probably have worked on less than a dozen SQL
> implementations, have never been part of a team designing optimizers.
> You also have a number of conceptual problems, which seem to be the
> result of confusing the abstract standards and RM versus particular
> implementations.

I had a real laugh when Melton thanked the people who actually
contributed to the standard.

Toodles!

Hugo Kornelis

unread,
Oct 19, 2011, 5:41:40 AM10/19/11
to
On Tue, 18 Oct 2011 19:35:12 -0700 (PDT), -CELKO-
<jcel...@earthlink.net> wrote:

>>> As Joe can confirm, I often disagree with him, on many subjects. But when I first saw him post the subtype/supertype implementation model you criticise, I liked it for its robust protection against integrity violations. You are correct that the redundant columns and the extraneous keys require more disk space and hurt performance, but I always think that integrity comes first, all the rest comes later. <<
>
>Hey, we do not disagree very often, where dis you get a silly idea
>like that? Sorry, I had to do a Monty Python arguer skit :)

Hi Joe,

Back in the days when the SQL Server usenet groups were still lively
(before Microsoft replaced them with something worse), I often
challenged you when you posted suggestions that included code that
does not work well on SQL Server, or code that would make SQL Server
grind to a halt because of its awful performance. But I will admit
that Tony Rogerson often beat me to the punch.

(...)


>Wait until he sees Kuznetsov's pattern for preventing temporal gaps in
>event sequences or my state change constraints :) Overlapping and
>self-referencing uniqueness constraints are hard for an experienced
>SQL programmer.

Have you read Kuznetsov's pattern in his Defensive Database
Programming book, or on the web? In the former case, you might want to
check out who tech edited it, just for laughs. ;)

-CELKO-

unread,
Oct 19, 2011, 11:06:44 AM10/19/11
to
>> (before Microsoft replaced them with something worse), .. <<

I agree. So many of the postings are Indian students trying to get
their homework done. The Cut & Paste scrambles the text so it is hard
to read..

>> Have you read Kuznetsov's pattern in his Defensive Database Programming book, or on the web? <<

I have seen it but not gotten a copy. Too much to do with my own
second edition of Trees & Hierarchies, fixing bad plumbing and gettign
ready for eye surgery.

>> In the former case, you might want to check out who tech edited it, just for laughs. ;) <<

Have you seen the 4-th edition of SMARTIES? They out-sourced the
typesetting to India. The compnay had ONE template for ALL books, so
all tabls are set as if they were a display graphic (shadings,
ornaments, etc) and not an SQL result set. Diagrams are misplaced or
repeated. The choice of fonts is awful. Code is set as if it were
text. I really hate it and we now have an agreement to go back to the
style I used in the other seven books in my series.


Hugo Kornelis

unread,
Oct 19, 2011, 5:54:09 PM10/19/11
to
On Wed, 19 Oct 2011 08:06:44 -0700 (PDT), -CELKO-
<jcel...@earthlink.net> wrote:

>Have you seen the 4-th edition of SMARTIES?

No, not really. I happened to see one a few weeks ago and I brielfy
skimmed the table of contents, but I did not have time for anything
more.

Based on what you write, I'm glad I didn't buy it. Such lousy
formatting would distract me so much from the code that the message
probably would not stick.
Is there a lot of new material that the earlier editions did not
cover?

-CELKO-

unread,
Oct 20, 2011, 12:09:20 PM10/20/11
to
[ the 4-th edition of SMARTIES?] Is there a lot of new material that
the earlier editions did not cover?<<

A lot of it is the same topics, but with newer constructs in the
solutions. ~785 page versus ~550 pages for the second edition; I don't
have the third edition in front of me.

Table of Contents

Introduction
Chapter 1: Databases versus File Systems
Chapter 2: Transactions and Concurrency Control
Chapter 3: Schema Level Objects
Chapter 4: Locating Data and Special Numbers
Chapter 5: Base Tables and Related Elements
Chapter 6: Procedural, Semi-Procedural and Declarative Programming
Chapter 7: Procedural Constructs
Chapter 8: Auxiliary Tables
Chapter 9: Normalization
Chapter 10: Numeric Data Types
Chapter 11: Temporal Data Types
Chapter 12: Character Data Types
Chapter 13: NULLs — Missing Data in SQL
Chapter 14: Multiple Column Data Elements
Chapter 15: Table Operations
Chapter 16: Comparison or Theta Operators
Chapter 17: Valued Predicates
Chapter 18: CASE Expressions
Chapter 19: LIKE and SIMILAR TO Predicates
Chapter 20: BETWEEN and OVERLAPS Predicates
Chapter 21: The [NOT] IN() Predicate
Chapter 22: EXISTS() Predicate
Chapter 23: Quantified Subquery Predicates
Chapter 24: The Simple SELECT Statement
Chapter 25: Advanced SELECT Statements
Chapter 26: Virtual Tables: VIEWs, Derived Tables, CTEs and MQTs
Chapter 27: Partitioning Data in Queries
Chapter 28: Grouping Operations
Chapter 29: Simple Aggregate Functions
Chapter 30: Advanced Grouping, Windowed Aggregation and OLAP in SQL
Chapter 31: Descriptive Statistics in SQL
Chapter 32: Sub-Sequences, Regions, Runs, Gaps, and Islands
Chapter 33: Matrices in SQL
Chapter 34: Set Operations
Chapter 35: Subsets
Chapter 36: Trees and Hierarchies in SQL
Chapter 37: Graphs in SQL
Chapter 38: Temporal Queries
Chapter 39: Optimizing SQL


Kevin Kirkpatrick

unread,
Oct 20, 2011, 5:07:41 PM10/20/11
to
This seemed the most appropriate "jumping in" point. In my SQL vendor
of choice (Oracle), while there is not support for ANSI CHECK
constraints, I have found a combination of features that allow for
cross-table constraints to be enforced in a purely declarative and
high-performance manner. The basic idea is to use instant-refresh
materialized-views that bring the data together, then put indexes on
the materialized views to enforce cross-row constraints. All such
objects (materialized view logs, materialized views, constraint
functions, and constraint indexes) used to enforce the constraints
should be considered physical implementation details like table
indexes, i.e. NOT exposed at the logical level and NOT accessible to
users/application code .

For instance, the vehicle sub-typing is easily accommodated with the
following script: (Note, this is a purely declarative solution - the 2
functions merely exist to return meaningful error messages for
constraint violations and play no role in enforcing said
constraints... calls to these functions could be replaced with the
expression "1/0", giving bizarre div-by-zero error messages, but
preventing constraint violations nonetheless).



SQL>
SQL> CREATE TABLE VEHICLES
2 (VIN NUMBER PRIMARY KEY,
3 VEHICLE_TYPE VARCHAR2(3) CHECK (VEHICLE_TYPE IN
('SUV','SED')));

Table created.

SQL>
SQL> CREATE TABLE SEDANS
2 (VIN NUMBER PRIMARY KEY,
3 SEDAN_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.

SQL>
SQL> CREATE TABLE SUVS
2 (VIN NUMBER PRIMARY KEY,
3 SUV_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.


/* And now the fun begins */

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON VEHICLES WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SEDANS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SUVS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW SEDAN_CONSTRAINT_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT V.VEHICLE_TYPE, V.ROWID V_ROW, S.ROWID S_ROW
4 FROM VEHICLES V, SEDANS S
5 WHERE V.VIN = S.VIN;

Materialized view created.

SQL>
SQL> CREATE MATERIALIZED VIEW SUV_CONSTRAINT_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT V.VEHICLE_TYPE, V.ROWID V_ROW, S.ROWID S_ROW
4 FROM VEHICLES V, SUVS S
5 WHERE V.VIN = S.VIN;

Materialized view created.

SQL>
SQL> CREATE OR REPLACE FUNCTION SEDAN_CONSTRAINT_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Only vehicles with type SED
permitted in SEDANS table');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX SEDAN_CONSTRAINT_IDX ON SEDAN_CONSTRAINT_MV
(DECODE(VEHICLE_TYPE,'SED',NULL,SEDAN_CONSTRAINT_FUN));

Index created.

SQL>
SQL>
SQL> CREATE OR REPLACE FUNCTION SUV_CONSTRAINT_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Only vehicles with type SUV
permitted in SUVS table');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX SUV_CONSTRAINT_IDX ON SUV_CONSTRAINT_MV
(DECODE(VEHICLE_TYPE,'SUV',NULL,SUV_CONSTRAINT_FUN));

Index created.


-------------------- PROOF OF CONCEPT --------------------------

SQL>
SQL> INSERT INTO VEHICLES VALUES (1, 'SED');

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (1, 'MY SEDAN');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO VEHICLES VALUES (2, 'SUV');

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (2, 'MY SUV');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL>
SQL> INSERT INTO VEHICLES VALUES (3, 'SED');

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (3, 'NOT AN SUV!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SUV_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SUV permitted in SUVS table
ORA-06512: at "APPS.SUV_CONSTRAINT_FUN", line 6


SQL>
SQL> INSERT INTO VEHICLES VALUES (4, 'SUV');

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (4, 'NOT A SEDAN!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SEDAN_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SED permitted in SEDANS table
ORA-06512: at "APPS.SEDAN_CONSTRAINT_FUN", line 6


SQL>
SQL> UPDATE VEHICLES SET VEHICLE_TYPE = 'SUV' WHERE VIN = 1;

1 row updated.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SEDAN_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SED permitted in SEDANS table
ORA-06512: at "APPS.SEDAN_CONSTRAINT_FUN", line 6


SQL>
SQL> UPDATE VEHICLES SET VEHICLE_TYPE = 'SED' WHERE VIN = 2;

1 row updated.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SUV_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SUV permitted in SUVS table
ORA-06512: at "APPS.SUV_CONSTRAINT_FUN", line 6


SQL>
SQL> SPOOL OFF


Erwin

unread,
Oct 21, 2011, 11:11:06 AM10/21/11
to
On 20 okt, 23:07, Kevin Kirkpatrick <kvnkrkpt...@gmail.com> wrote:
> This seemed the most appropriate "jumping in" point.

Well, now that we're on the track of working declarative solutions I
can't resist jumping in too ...

VAR VEHICLE RELATION {VIN:... , ...} KEY {VIN};
VAR SED RELATION {VIN:... , ...} KEY {VIN};
VAR SUV RELATION {VIN:... , ...} KEY {VIN};

CONSTRAINT ALL_SEDANS_ARE_VEHICLES ISEMPTY(SED NOT MATCHING VEHICLE);
CONSTRAINT ALL_SUVS_ARE_VEHICLES ISEMPTY(SUV NOT MATCHING VEHICLE);
CONSTRAINT NO_SEDANS_ARE_ALSO_SUVS ISEMPTY(SED {VIN} INTERSECT SUV
{VIN});
CONSTRAINT ALL_VEHICLES_ARE_SUV_OR_SEDAN ISEMPTY(VEHICLE NOT MATCHING
(SED {VIN} UNION SUV {VIN}));

Even without support for "distributed keys", or whatever one might
name the concept, a minute or so of writing and you're done.

Hugo Kornelis

unread,
Oct 21, 2011, 4:44:49 PM10/21/11
to
On Thu, 20 Oct 2011 14:07:41 -0700 (PDT), Kevin Kirkpatrick
<kvnkr...@gmail.com> wrote:

(snip)
>This seemed the most appropriate "jumping in" point. In my SQL vendor
>of choice (Oracle), while there is not support for ANSI CHECK
>constraints, I have found a combination of features that allow for
>cross-table constraints to be enforced in a purely declarative and
>high-performance manner. The basic idea is to use instant-refresh
>materialized-views that bring the data together, then put indexes on
>the materialized views to enforce cross-row constraints. All such
>objects (materialized view logs, materialized views, constraint
>functions, and constraint indexes) used to enforce the constraints
>should be considered physical implementation details like table
>indexes, i.e. NOT exposed at the logical level and NOT accessible to
>users/application code .
(snip)

Thanks for your reply. I have no experience with Oracle, but I think I
can parse the code sufficiently to understand what is going on. You
create a materialized view (probably very similar to the indexed view
feature SQL Server has) that joins data from subtype and supertype
table together. Then you add an extra index on that materialized view,
but not on one of the columns itself, but on a virtual computed column
that is computed using the DECODE function (not supported by SQL
Server, but from the looks of it it appears to be a shorthand for a
CASE expression), that will invoke a function if the type in the
supertable does not match what should be in the subtable; and that
function then invokes an error.

I *think* a similar approach would be possible in SQL Server, but
using a few different angles. I can't force an error from a function
(SQL Server functions allow no side effects), but I could add a column
to the view that is computed using a CASE expression with possible
results 1 or 1/0. That would not give a similar user-friendly error
message, but it would provide the functionality (I think - I have not
tried it, though!).

But I am a bit surprised, as your first critique on Joe Celko's
version was not about logical/physical level, but about performance
and overhead. I quote:

On Sat, 15 Oct 2011 02:41:24 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

(...)
>Ok, so you are both fools.
>
>It is not common, because 220% overhead and substantial cost, is just
>plain stupid. People are not quite as stupid as you think.

On Wed, 12 Oct 2011 17:51:11 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

>This "class hierarchy model" is one of the most idiotic things I have
>seen in my 34 years in databases.
(...)
>a. basically doubles the number of indices in the relevant tables, and
>that has a substantial overhead.
>
>b. requires vehicle_type to be carried in the subtypes, which AFAIC is
>completely redundant (in SUVs, we know all the rows are SUV, why carry
>a column in which every value will be "SUV").
(...)
>e. introduces complexity and processing; views; instead-of triggers,
>etc. I would rather the database remain open as possible, without
>such complications for use. Particularly when such nonsense is
>completely unnecessary.

I have no idea how Oracle implements materialized views, but the name
alone suggests that, like SQL Server, the content of the view is
stored on disk. So the storage overhead will be even MORE than the
single extra column that Joe Celko's version adds.
SQL Server also requires an index for each materialized view (hence
the name indexed view). Maybe Oracle doesn't, but you do create one
yourself. That should be about the same overhead as the index created
for the extra UNIQUE constraint in Joe Celko's version (for those
RDBMS's that use an index to implement a UNIQUE constraint). Maybe
even more because of the computed column.

I don't know where you get the 220% figure. I do agree that there is
substantial overhead and cost involved in Joe Celko's solution, but I
expect that your solution will require at least as much, maybe even
more overhead.


In your post to me, you introduce a different argument: that the
columns introduced by Joe do not belong in the logical level. I do
agree with that. But does that imply that they should not be in the
tables? There are several examples of columns that are not part of the
logical model but are added to the tables. One example is a surrogate
key column. Another example is a rowversion column (that's the name in
SQL Server; I don't know if Oracle has a similar feature) that
automatically changes value every time a row is updated, to facilitate
the implementation of optimistic concurrency checking.

There are various ways to not expose these columns to the database
user, such as building views on top of the tables that don't expose
these columns, or creating stored procedures to implement the data
access layer.

Kevin Kirkpatrick

unread,
Oct 21, 2011, 6:31:20 PM10/21/11
to
On Oct 21, 3:44 pm, Hugo Kornelis
<h...@perFact.REMOVETHIS.info.INVALID> wrote:
> On Thu, 20 Oct 2011 14:07:41 -0700 (PDT), Kevin Kirkpatrick
>
Correct.

> But I am a bit surprised, as your first critique on Joe Celko's...
<snipped>

I'm not Derek.

I do think Joe's solution suffers from the problem of exposing the
implementation of the constraint at the logical level. I was merely
proposing an SQL solution (one that admittedly is going to vary in
mileage with different SQL vendors) that does not suffer that
downfall. For the record, I'll sacrifice disk space usage 10 times
over in favor of proper constraint enforcement, and do not consider
220% or any similar linear overhead to be problematic. I bench-marked
my own solution and found it entailed a 75% overhead in disk space
(misleadingly large because it doesn't account for the fact that
VEHICLES, SEDANS, and SUVS are going to be much larger than 2 - 3
columns in practice) usage and 100% overhead in performance; meaning
that adding 1 million vehicles unconstrained took 30 seconds, and
adding 1 million vehicles, constrained, took 60 seconds. That, to me,
is a perfectly acceptable overhead (consider, if I could take the KEY
constraint off VEHICLES and load 1 million rows in 15 seconds... would
that justify removing the constraint?).

I should note that, upon reading Erwin's response (the lousy cheat,
using a TRDBMS) my solution /is/ overly complex and still exposes the
constraint at the logical level. There is no need whatsoever to have
a VEHICLE_TYPE in VEHICLES in the first place; VEHICLE_TYPE can be
inferred based on which table the vehicle is inserted, SUVS or
SEDANS. The following revision would be more appropriate (and
actually has significantly less overhead to boot).


SQL>
SQL> CREATE TABLE VEHICLES
2 (VIN NUMBER PRIMARY KEY);

Table created.

SQL>
SQL> CREATE TABLE SEDANS
2 (VIN NUMBER PRIMARY KEY,
3 SEDAN_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.

SQL>
SQL> CREATE TABLE SUVS
2 (VIN NUMBER PRIMARY KEY,
3 SUV_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.


SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SEDANS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SUVS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW DISJOINT_VEHICLE_TYPES_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT SUVS.ROWID V_ROW, SEDANS.ROWID S_ROW
4 FROM SUVS , SEDANS
5 WHERE SUVS.VIN = SEDANS.VIN;

Materialized view created.

SQL>
SQL>
SQL> CREATE OR REPLACE FUNCTION DISJOINT_VEHICLE_TYPES_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Vehicles must be entered as
exactly one type.');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX DISJOINT_VEHICLE_TYPES_IDX ON
DISJOINT_VEHICLE_TYPES_MV
(DECODE(V_ROW,V_ROW,DISJOINT_VEHICLE_TYPES_FUN));


-- note: Shame on Oracle, the following is not acceptable:
-- CREATE INDEX DISJOINT_VEHICLE_TYPES_IDX ON
DISJOINT_VEHICLE_TYPES_MV (DISJOINT_VEHICLE_TYPES_FUN)

Index created.

SQL>
SQL>
SQL> INSERT INTO VEHICLES VALUES (1);

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (1, 'MY SEDAN');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO VEHICLES VALUES (2);

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (2, 'MY SUV');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO SUVS VALUES (1, 'ALREADY A SEDAN!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-20001: Vehicles must be entered as exactly one type.
ORA-06512: at "APPS.DISJOINT_VEHICLE_TYPES_FUN", line 6


SQL>
SQL> INSERT INTO SEDANS VALUES (2, 'ALREADY AN SUV');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-20001: Vehicles must be entered as exactly one type.
ORA-06512: at "APPS.DISJOINT_VEHICLE_TYPES_FUN", line 6


SQL>
SQL>
SQL> SPOOL OFF



Apart from the superior design, this approach is nifty in that the
materialized view will always be empty (although it's not a free lunch
- there is disk usage for the MV Logs!)


> There are various ways to not expose these columns to the database
> user, such as building views on top of the tables that don't expose
> these columns, or creating stored procedures to implement the data
> access layer.

Right, and again, I'm attempting to offer a solution that avoids
this. The whole "expose as a view" is fine for end-user querying, but
I would never put application developers in a position to work with
anything but base tables (INSTEAD-OF triggers??? blech!)