Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Question on Structuring Product Attributes

396 views
Skip to first unread message

Yadda

unread,
Oct 3, 2011, 8:54:52 AM10/3/11
to
Hello,

I have a question on how to efficiently implement a product database
table structure. I have products grouped into Categories which is the
highest level grouping by attribute. Products across Categories share
some attributes in some cases, e.g. Diameter and Color. I need to have
each attribute have a proper data type, e.g. Diameter is of type DECIMAL
and Color is of type CHARACTER. I will also have to have standard
stored queries to inquire on these products according to attribute(s).
Over time new product with new types of attributes may be added.

One way is to just define the attribute in the Product table and then
have a table to define what attributes are applicable to a Product
Category. How ever this table field could get quite large then.
However, I don't think breaking attributes out into separate attribute
tables is a good idea because of the stored query requirement and it
would require more structure changes over time.

I would be curious to read others' opinions on this design issue as it
must be very common.

Thank you
Alex

-CELKO-

unread,
Oct 4, 2011, 1:24:58 PM10/4/11
to
The classic scenario calls for a root class with all the common
attributes and then specialized sub-classes under it. As an example,
let's take the class of Vehicles and find an industry standard
identifier (VIN), and add two mutually exclusive sub-classes, Sport
utility vehicles and sedans ('SUV', 'SED').

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
UNIQUE (vin, vehicle_type),
..);

Notice the overlapping candidate keys. I then use a compound candidate
key (vin, vehicle_type) and a constraint in each sub-class table to
assure that the vehicle_type is locked and agrees with the Vehicles
table. Add some DRI actions and you are done:

CREATE TABLE SUVs
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
CHECK(vehicle_type = 'SUV'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type = 'SED'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

I can continue to build a hierarchy like this. For example, if I had
a Sedans table that broke down into two-door and four-door sedans, I
could a schema like this:

CREATE TABLE Sedans
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT 'SED' NOT NULL
CHECK(vehicle_type IN ('2DR', '4DR', 'SED')),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE TwoDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '2DR' NOT NULL
CHECK(vehicle_type = '2DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans(vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

CREATE TABLE FourDoor
(vin CHAR(17) NOT NULL PRIMARY KEY,
vehicle_type CHAR(3) DEFAULT '4DR' NOT NULL
CHECK(vehicle_type = '4DR'),
UNIQUE (vin, vehicle_type),
FOREIGN KEY (vin, vehicle_type)
REFERENCES Sedans (vin, vehicle_type)
ON UPDATE CASCADE
ON DELETE CASCADE,
..);

The idea is to build a chain of identifiers and types in a UNIQUE()
constraint that go up the tree when you use a REFERENCES constraint.
Obviously, you can do variants of this trick to get different class
structures.

If an entity doesn't have to be exclusively one subtype, you play with
the root of the class hierarchy:

CREATE TABLE Vehicles
(vin CHAR(17) NOT NULL,
vehicle_type CHAR(3) NOT NULL
CHECK(vehicle_type IN ('SUV', 'SED')),
PRIMARY KEY (vin, vehicle_type),
..);

Now start hiding all this stuff in VIEWs immediately and add an
INSTEAD OF trigger to those VIEWs.

Derek Asirvadem

unread,
Oct 10, 2011, 8:16:19 PM10/10/11
to
On Oct 5, 4:24 am, -CELKO- <jcelko...@earthlink.net> wrote:
> [snip]

Joe

If I may ask, is this method of implementing RI for Exclusive Subtypes
one of your original concepts (like the Nested Sets), or is it someone
else's or general practice or whatever ?

Regards
Derek

Roy Hann

unread,
Oct 11, 2011, 3:55:31 AM10/11/11
to
Surprisingly to me, Joe is advocating what I would have suggested. It
is entirely respectable, though it may not be all that common.

The common approach is to just cram everything into one table with lots
of nullable attributes to hide the conflation of fact types, then leave
the whole mess under the carpet so that the application code has to
figure out the real business model at run-time, every time.

Explicit support for distributed key constraints in SQL would make the
approach properly robust plus it would, in my opinion, have the
psychological benefit of pointing the way and comforting us that we're
doing it right.

--
Roy

Yadda

unread,
Oct 11, 2011, 9:35:33 AM10/11/11
to
on 10/4/11 12:24 PM -CELKO- said the following:


Much to contemplate here! Is there a reference on the web or book that
fleshes this concept out? Thanks much.

Derek Asirvadem

unread,
Oct 11, 2011, 6:44:45 PM10/11/11
to
Roy

To be clear, I am totally for:
- correct Normalisation, which results in Subtypes
- and certainly a hierarchy, if that is what the exercise produces in
any given situation.
- No Nullable columns, especially not FKs.
- simplicity, which is a classic result of the formal process.
- No "hiding", such that the model is plainly exposed in the catalogue

Credit for the concept or notion of Subtypes goes to Codd & Date; for
the theoretical modelling, to Codd and Chen; and for practical
modelling or implementation specifics, to R Brown.

I appreciate that SQL does not have formal support for Subtypes, as it
does for eg. Declarative Referential Integrity, Foreign Keys (and even
that was added years afterward). Which leads people to
implementations such as this.

This method does have a bit of processing, and of hiding that behind
Views, though.

My question was specifically regarding the method, of implementing RI
for Subtypes. Who deserves credit for this technique ? (I am not
suggesting plagiary here.) When I present this method to others, I
would like to give due credit the original author, as we do when we
discuss Joe's Nested Sets.


Yadda

> Is there a reference on the web or book that
> fleshes this concept out?  Thanks much.

(I assume you mean the concept of Subtypes, since Joe has provided the
subtype RI method in fair detail.)

Well, no. Anyone with a keyboard can "publish" these days. The web
is full of shallow information and misinformation; one has to dig deep
through that morass to find nuggets of truth, and to do that one has
to be able to discriminate between the two, which is not possible for
the seeker. Also, there are many related issues within the one
science; one aspect cannot be excised, held up, and inspected in
isolation from the other aspects (which is the common problem
afflicting any technical discussion on the web). All you need is a
good text book and some formal study. Or work with a mentor over a
period of time.

Subtypes have been with us since the 1980's, when the RM was first
available in product form, and I and many others have used them since
then.

Regards
Derek

-CELKO-

unread,
Oct 12, 2011, 11:20:51 AM10/12/11
to
While I love the flattery, I did not invent Nesteds Sets or this class
hierarchy model. I just make tehm popular and publish teh code to use
them.

Derek Asirvadem

unread,
Oct 12, 2011, 8:51:11 PM10/12/11
to

Joe

Thanks for the response.

What a relief.

The flattery is real, for your *other* work, and I had difficulty
accepting that the same person presented this work.

This "class hierarchy model" is one of the most idiotic things I have
seen in my 34 years in databases.

----------
1. The method of implementing RI for Subtypes. This:
----------

> CREATE TABLE Vehicles
>   UNIQUE (vin, vehicle_type),
>
> CREATE TABLE SUVs
>   UNIQUE (vin, vehicle_type),
>
> CREATE TABLE Sedans
>   UNIQUE (vin, vehicle_type),

a. basically doubles the number of indices in the relevant tables, and
that has a substantial overhead.

b. requires vehicle_type to be carried in the subtypes, which AFAIC is
completely redundant (in SUVs, we know all the rows are SUV, why carry
a column in which every value will be "SUV").

c. I am not entirely against concept of de-normalisation, but that
assumes prior normalisation, and has a specific purpose. Whereas this
is un-normalised: vehicle_type is 1::1 with vin, and belongs in
Vehicles, once.

d. it is indirect, and subverts the normal or direct method of
identifying the relation in the two subtypes, which would normally be:

  FOREIGN KEY (vin)
  REFERENCES Vehicles(vin)

e. introduces complexity and processing; views; instead-of triggers,
etc. I would rather the database remain open as possible, without
such complications for use. Particularly when such nonsense is
completely unnecessary.

> Now start hiding all this stuff in VIEWs immediately and add an
> INSTEAD OF trigger to those VIEWs.

f. Of course vin is unique, therefore adding anything to vin will be
unique. As I understand it, the three added UNIQUE constraints are
merely (or heavily) for the purpose of checking the Subtype is legal,
that it exists in the parent with the proper Discriminator value. In
that case, Why not use normal SQL capabilities (yes, we know there is
not DRI for this) which provide that particular form of RI, *without*:

- one additional redundant column in every subtype
- one additional redundant index in every table (subtypes plus
parent). Note the additional processing load
- complexity
- Views to "hide" the complexity
- Instead-Of triggers

/*
I saw this insanity (posted method) in a database last year, and
ripped it out, in part permitting the database to run about 20 times
(2000%) faster, with no loss of capability. I say "in part" because I
removed a couple of other tumours in the same exercise, therefore I
cannot say for certain exactly how much this tumour contributed to the
negative 2000%. The developer who implemented it (and fancied himself
"modeller" and "DBA") had been reading too much Ambler and Fowler;
(who are famously clueless re the RM or databases; perfect examples of
Maslow's Hammer). The company required a Relational Database, he
created a scratch pad for his app, not a database.

He is still coming to terms with the concept that in the Relational
Database world, we never need to "refactor". First the snake oil
salesmen sell you the cancer, then they sell you the relief. RDB
types do not buy either one.
*/


> I can continue to build a hierarchy like this ...

In which case the doubled indices; subversion; duplicated column; and
complexity will continue down the hierarchy. At each level of the
hierarchy, one more column needs to be duplicated, and it will always
contain one value: SUVs contain a column containing "SUV" only;
TwoDoors contain a column containing "SED" only, *and* a column
containing "2DR" only.

To be clear, I agree with:
- the concept of the hierarchy (the Subtypes may well have subtypes)
and
- the value of ensuring the integrity of Subtypes
I disagree with the implementation method.


----------
2. The case of Sedans and TwoDoor.
----------

(Staying with the posted method.) I do not believe it is a good idea
to mix vehicle_type {SED|SUV}, with sedan_type, they are two separate
Things. We already know it is a sedan because it is in the Sedans
table, likewise, every row in TwoDoor is a two-door sedan (not an
SUV). A hierarchy, which I agree with, would be:

CREATE TABLE Sedans (


vin CHAR(17) NOT NULL
PRIMARY KEY,

  sedan_type CHAR(3) DEFAULT '2DR' NOT NULL
      CHECK (sedan_type IN ('2DR', '4DR') ),
  FOREIGN KEY (vin)
  REFERENCES Vehicles(vin)
CONSTRAINT Sedan_Excl_ck
CHECK ...
)

CREATE TABLE TwoDoor (


vin CHAR(17) NOT NULL PRIMARY KEY,

  FOREIGN KEY (vin)
  REFERENCES Sedans(vin)
CONSTRAINT 2dr_Excl_ck
CHECK ...

As the the model progresses, VehicleType and SedanType will be
Normalised into separate Entities, and Vehicle and Sedan
(respectively) will have Foreign Key References to them.

----------
3. And finally:
----------

> If an entity doesn't have to be exclusively one subtype, you play with
> the root of the class hierarchy:

If the Subtypes are Non-Exclusive, simply exclude the CHECK
Constraint.

----------
4. Class Definitions
----------

Now that the db has first been modelled (separately, as data, using
rules for modelling data) the object definitions (which are separate
as process, and use the data/database) are pedestrian.

The posted method of implementing RI for Subtypes is completely and
utterly without merit in any "database" (this is c.d.t.). I would
respectfully suggest that you apply RDB fundamentals to the cute ideas
of others, such as "class hierarchy model", and test their validity,
before posting. Otherwise you run the risk of harming your
reputation. Due to your standing, your post, your making them
popular, lends a certain credibility to these otherwise in-credible
and absurd concepts. When they get punctured (any of the RDB types on
this forum can puncture them, I just happen to be the first in this
instance), your credibility suffers along with it, and that is not my
intent.

----------
5. Nested Sets
----------

> I did not invent Nesteds Sets

That is a relief as well. Now I can correct my colleagues, who
presented that hilarious concept to me as yours. Who do we credit for
providing us with such a source of laughter ?

Regards
Derek

-CELKO-

unread,
Oct 14, 2011, 9:52:16 PM10/14/11
to
>> The common approach is to just cram everything into one table with lots of nullable attributes to hide the conflation of fact types, then leave the whole mess under the carpet so that the application code has to figure out the real business model at run-time, every time <<

LOL! that is unfortunately true.

-CELKO-

unread,
Oct 14, 2011, 10:29:27 PM10/14/11
to
Now you are obligated to actually post better techniques and explain
why they are better..

I am not bothered by increasing the number of indexes. You have made
the assumption that references are done by more indexes. Sybase (nee
WATCOM) SQL implements the constraints like
FOREIGN KEY (vin, vehicle_type)
REFERENCES Vehicles(vin, vehicle_type)

by building pointer chains under the covers, so each compound key
actually appears once in the schema. Teradata uses a hashing
algorithm. RDBMS companies have been working on PK-FK joins for a few
decades, so they are pretty good at it now.

What do you have against the Nested Sets? The only laugh I had with
them was teaching them to XML programmers. Traditional procedural
programmers still think in terms of fake pointer chains, recursive
traversals and other stiff they know from assembly language. But the
XML kids look at it and say "Oh, (lft, rgt) is the tag pairs!" then
wrote really great code by mimicking XQuery.

Derek Asirvadem

unread,
Oct 15, 2011, 2:03:37 AM10/15/11
to
On Oct 15, 1:29 pm, -CELKO- <jcelko...@earthlink.net> wrote:
> Now you are obligated to actually post better techniques and explain
> why they are better..

Ok. Happy to take up the challenge. I have already identified the
"feature" is already available in Relational databases; that it is a
CHECK Constraint, no need for duplicating columns and adding indices.

Do you want me to post the code ? Come on, you are the famous SQL guru
in the family, I am just a little old practitioner with a few happy
customers; your code will probably run circles around mine.

> I am not bothered by increasing the number of indexes.

Yes, that bothers me, especially adding *totally 100% redundant*
indices. That is probably partly why my customers are happy.

> Sybase (nee WATCOM)

Actually:
- Their baby DBMS is "SQL Anywhere", which is nee Watcom. It is an
excellent product for a limited market.

- Their flagship prodcut for any size box, is Adaptive Server
Enterprise. It is nee nothing, they designed and wrote it from
scratch. The *lineage* of the original (1984-1987) engineers is
easily traced back to Britton-Lee. And yes, they always had a few
Waterloo grads on staff. brings back memories.

> You have made
> the assumption that references are done by more indexes. SQL implements  the constraints like
>  FOREIGN KEY (vin, vehicle_type)
>   REFERENCES Vehicles(vin, vehicle_type)
>
> by building pointer chains under the covers, so each compound key
> actually appears once in the schema.

Excuse me, but:

a. You posted SQL. I answered SQL. In order to maintain focus, avoid
tangents, and close this subject like gentlemen, let's talk SQL
only.

b. SQL is a standard, anything that does not comply with the Standard
cannot make the claim to be an SQL. The SQL standard requires that
all references in the database are by *KEY*, and specifically *NOT* by
pointers.

b. I made no assumptions. The SQL code that *you provided*
specifically creates two indices:
>>CREATE TABLE SUVs
>>(vin CHAR(17) NOT NULL
>> PRIMARY KEY, -- Index on (vin) for SUV.PK
>> vehicle_type CHAR(3) DEFAULT 'SUV' NOT NULL
>> CHECK(vehicle_type = 'SUV'),
>> UNIQUE (vin, vehicle_type), -- Dupe Index [1] on (vin, vehicle_type)
>> FOREIGN KEY (vin, vehicle_type) -- no suggestion that this FK is an additional index
>> REFERENCES Vehicles(vin, vehicle_type)
>> ON UPDATE CASCADE
>> ON DELETE CASCADE,
>> ..);

d. The SQL code that *you provided* specifically identifies data
columns, not pointers. Therefore, above the covers, there are two
indices, visible, with data columns, visible. No pointers. What a
product does under the covers is tangential.

e. The issue I have is above the covers:

e.1 Index[1] is a 110% duplicate of Index SUV.PK.

e.2 (vin) is unique, so (vin) plus anything will be unique. It has no
value (except to support this monstrous structure, which is not reqd
to ensure Exclusive Subtype CHECK).

e.3 There is also another, additional, 110% redundant index in the
parent, to provide the key for the REFERENCED key in the child/ Your
SQL code again:

>>CREATE TABLE Vehicles
>>(vin CHAR(17) NOT NULL
>> PRIMARY KEY, -- Index on (vin) for Vehicle.PK
>> vehicle_type CHAR(3) NOT NULL
>> CHECK(vehicle_type IN ('SUV', 'SED')),
>> UNIQUE (vin, vehicle_type), -- Dupe Index [2] on (vin, vehicle_type)
>> ..);

e.4

> so each compound key actually appears once in the schema.

No. It appears once for each type you declare a CONSTRAINT, not once
in the schema. SQL does not normalise the schema (wish that it
would!).

> Teradata uses a hashing algorithm.

Teradata is not SQL.

Sybase, MS, DB2, all create Standard SQL visible indices from data
columns, from the SQL command that you supplied.

Oracle is non-compliant, in that it twists the SQL definitions in
order to appear compliant. It famously uses ROWIDs, above and below
the covers, but the indices and data columns are visible, above the
covers.

> RDBMS companies have been working on PK-FK joins for a few
> decades, so they are pretty good at it now.

Agreed. But the Foreign *KEYS* that you declared in your SQL code are
still ordinary visible data columns, not pointers or pointer chains.

> What do you have against the Nested Sets?

There is a recent thread on that subject. I will take it up there.

Let's close this thread here.

Ambler and Fowler are famous for their "straw man" arguments,
suggesting that the RM or RDBMS can't do X, then proposing some
monstrous method of doing X. AFAIK, you do not do that. SQL, and any
RDBMS supplying SQL, already has everything required to ensure the
Referential Integrity of an Exclusive Subtype, with zero overhead. The
problem does not exist (we can ensure RI for Exclusive Subtypes),
therefore we do not need the "solution" of the additional structure.

Although you did not present the "straw man" intro to this article,
you did imply that Exclusive Subtypes cannot be verified in SQL (which
is not correct), and then promoted this structure. I do not
understand why, when there is a perfectly simple, zero-overhead
Exclusive Subtype CHECK Constraint, you Mr Celko, the famous SQL guru,
would be promoting this non-SQL "class hierarchy" structure, which is
massively inefficient, due to the number of totally redundant columns
and redundant indices it requires.

Response please.

Bob Badour

unread,
Oct 15, 2011, 3:37:52 AM10/15/11
to
Derek Asirvadem wrote:

> On Oct 15, 1:29 pm, -CELKO- <jcelko...@earthlink.net> wrote:
>
>>Now you are obligated to actually post better techniques and explain
>>why they are better..
>
> Ok. Happy to take up the challenge. I have already identified the
> "feature" is already available in Relational databases; that it is a
> CHECK Constraint, no need for duplicating columns and adding indices.
>
> Do you want me to post the code ? Come on, you are the famous SQL guru
> in the family, I am just a little old practitioner with a few happy
> customers; your code will probably run circles around mine.

Don't count on it. Joe's an idiot. He mostly got a name for himself by
copying and pasting large random blocks of text in response to his betters.

Derek Asirvadem

unread,
Oct 15, 2011, 5:03:10 AM10/15/11
to
Ok, on re-reading my post, I have come to realise the obvious, that if
Joe had known how to ensure RI for Exclusive Subtypes, he would not
have posted this "class hierarchy" monstrosity, with it 220% redundant
Indices; additional processing; Views; Instead-Of triggers; etc.
Additionally, he has issued a challenge, which is better answered
directly. So here goes.

------------------------------------------------
-- Referential Integrity of Exclusive Subtype --
------------------------------------------------

0. Attribution.

I am the original author, and you saw it here first, folks. As in,
tiny nugget of value that it is, it is my original conception and
work, I have been using this and deploying it for years, and I have
not seen it anywhere else. It is always a blinking surprise to the
development teams that I mentor in my travels. Even in this forum, it
seems to be news.

I am publishing it here for the benefit of the community.

----------------
1. No DRI in SQL
----------------

It is true that SQL has still not defined, let alone implemented, the
DECLARATIVE RI constraint that we need.

-------------
2. The Method
-------------

Any SQL that supports Instead-Of triggers supports this, so we are
using the same level playing field. Write these four SQL commands
(plus parms and commentary):

--
-- Validate Supertype (Vehicle AND vehicle_type) for Subtype
--
CREATE FUNCTION ValidateExclVehicle_check (
________@vin CHAR(17),
________@vehicle_type CHAR(3)
________)
____RETURNS TINYINT
AS
____-- Exists = 1
____-- Not Exists = Null, 0 substituted
____RETURN (
________SELECT COALESCE(1, 0)
____________FROM Vehicle
____________WHERE vin = @vin
____________AND vehicle_type = @vehicle_type
____)

2.1 Note that in Relational terminology, vehicle_type is called the
Discriminator, as in, it discriminates the Exclusive Subtype.

--------------
3. Changed DDL
--------------

The original or source or example or starting point we will use is
post #2. That can be referenced, so I won't copy it here. The
*changed* DDL required is as follows.

-- You need one of these per parent or supertype

CREATE TABLE Vehicle (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____vehicle_type CHAR(3) NOT NULL
________CHECK (vehicle_type IN ('SUV', 'SED') )
____-- no second duplicate index
____)

-- for each subtype:

CREATE TABLE SUV (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_SUV_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT SUV_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SUV" ) = 1 )
____)

CREATE TABLE Sedan (
____vin CHAR(17) NOT NULL PRIMARY KEY,
____-- vehicle_type not duplicated here
____-- no second duplicate index
____CONSTRAINT Vehicle_Sedan_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

3.1 All I am doing is implementing the hoped-for requirement of
Declarative RI, in code, there is nothing magical about it. If and
when SQL comes to the party, all I have to do is search for occs of
"%_Excl_check", and change the *syntax* of the content within the
brackets.

--------
4. Chain
--------

I repeat, Subtypes are ordinary, normal Relational constructs, they
existed as such, decades before OO or 8-p or classes or class
hierarchies or X< were invented. Subtypes are very important because
they eliminate Nulls; support so-called "optional" Foreign Keys
correctly; and service optional columns correctly, thus simplifying
code and keeping the database fast.

There is no problem with subtypes having Subtypes of their own or with
class hierarchies. The "chain" is clean, carrying the true PK, and
nothing else. A stuffed-up chain, or one that is 220% heavier than it
needs to be, is plain stupid, a resource drain.

This also corrects Joe's Normalisation error, of mixing vehicle_type
and sedan_type together, damaging the integrity of both:

CREATE TABLE Sedan (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____sedan_type CHAR(3) DEFAULT '2DR' NOT NULL -- not vehicle_type
________CHECK (sedan_type IN ('2DR', '4DR') ),
____CONSTRAINT Vehicle_Sedan_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT Sedan_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclVehicle_check ( vin, "SED" ) = 1 )
____)

CREATE TABLE TwoDoor (
____vin CHAR(17) NOT NULL
________PRIMARY KEY,
____CONSTRAINT Sedan_TwoDoor_fk -- normal & expected FK
________FOREIGN KEY (vin)
________REFERENCES Vehicle (vin),
____CONSTRAINT TwoDoor_Excl_check -- validate subtype in parent
________CHECK ( dbo.ValidateExclSedan_check ( vin, "2DR" ) = 1 )

------------------------
5. Non-Exclusive Subtype
------------------------

> If an entity doesn't have to be exclusively one subtype, you play with
> the root of the class hierarchy:

If the Subtype is Non-Exclusive, simply exclude the CHECK CONSTRAINT
and *Function*. The Discriminator is not required for these.

No playing around necessary. The chain is unchanged.

-----------------
6. Object Classes
-----------------

Now that the data has been modelled correctly, implementing OO classes
*on top of that* is pedestrian. Trying to model data, which is quite
different to processes and objects, using process or object modelling
methods such as OO, is idiotic. Further, the data exists
independently, and it existed long before the objects being
contemplated, so it has to be modelled first; not second; not as
embedded in an object. The reversal of the ancient chronology is for
idiots who only read the books that they themselves have written,
legends in their own mind. It is fair enough that unqualified
florists are now "developing applications", but it is not fair that:

a. the big names do not qualify and test the visions of these
legendary publishers or

b. that they do not know the traditional Relational construct for it.
(this technique is mine, but any technique, which remains within the
relational paradigm, and does not use masses of duplication, would be
acceptable.)

---------
7. Gotcha
---------

Please refrain from implementing stupid things such as two-way
CONSTRAINTS, otherwise you will tie yourself up in knots of your own
making. Electricity flows in one direction only (thatis an analogy,
please do not come back with that AC/DC nonsense unless you are
willing to have it destroyed). That is a subject in itself, which I
won't expand here, but at least, must be mentioned.

---------------------
8. Why it is better ?
------------------------

Well, "better" explicitly requires that the compared element is
"good". Here the compared element is a monstrosity, there is nothing
good about it. Since the Relational Subtype method existed before the
bloated class hierarchy *method*, that former should be the measure
against which the recent interloper is compared. Therefore, I will
provide a comparison as to why the *posted method* of implementing
class hierarchies is much worse than the pre-existing Relational
method, that I have posted here. To be clear, I fully support
Subtypes and class hierarchies and whatever new name isolated writers
choose to come up with, I took issue only with the implementation
*method*.

The pre-existing method is contained in (3) of this post, and used as
the measure. It is the ZERO in the benchmark. In comparison, the
"class hierarchy" method in post #2 uses the following *additional
resources*, to obtain the same result as above, which is quite
unnecessary:

a. One additional 110% redundant index in every supertype
- no matter what Joe believes, every index has a direct overhead, it
directly affects the speed of every INSERT and DELETE
- unless you are working with no concern for production databases,
which means your database is irrelevant, every additional resource
needs to be carefully considered, and every redundant resource needs
to be identified and eliminated
- such an approach eliminates the otherwise predictable post-mortem
gathering of mountains of "diagnostic" info; and the laborious working
through it; writing programs to filter and report; etc; etc

b. One additional column in every subtype
- this break 1NF, and increases table size, which affects overall
performance (less rows on every I/O; less rows in every memory page)

c. One additional 110% redundant index in every subtype
- ditto (a)

d. added complexity and administration
- this cost is substantial, and if one does not treat it with respect,
one's job migrates overseas

e. Hiding, obfuscation, and subversion, as prescribed
- odious in the extreme,
- makes (d) all the more difficult and costly

f. One additional View per supertype/subtype to "hide all this
complexity" as advised by Joe.
- no big deal, just a couple of superfluous views

g. One additional Instead-Of trigger per supertype/subtype, to be able
to INSERT/DELETE/UPDATE the otherwise simple tables (not views).
- that is a big deal

h. Development, testing, debugging, and maintenance of (d)(e)(f) and
(g)
- refer cost again

-------------
9. Conclusion
-------------

The total cost of the proposed class hierarchy *implementation method*
is substantial, both in time/dollars, and in machine resources; that
which I have quantified is at least 220% overhead plus development/
testing/maintenance. All of which can be eliminated, if the author of
such (yet unattributed) would only read a book on Relational
Databases, before making pronouncements of what it can and cannot do,
and inventing monstrosities such as this.

The proposed class hierarchy *implementation method* is devoid of any
value whatsoever.

Interested parties are referred instead to study ordinary Relational
Subtypes, and pre-existing Relational constructs and SQL CONSTRAINTS
as implementation methods.

"... there is nothing new under the Sun." Ecclesiastes 1:9. And those
who say there is, are frauds. Those who believe the charlatans, are
fools.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 5:41:24 AM10/15/11
to
Roy Hann

On Oct 11, 6:55 pm, Roy Hann <specia...@processed.almost.meat> wrote:
>
> Surprisingly to me, Joe is advocating what I would have suggested.  It
> is entirely respectable, though it may not be all that common.

Ok, so you are both fools.

It is not common, because 220% overhead and substantial cost, is just
plain stupid. People are not quite as stupid as you think.

> The common approach is to just cram everything into one table with lots
> of nullable attributes to hide the conflation of fact types, then leave
> the whole mess under the carpet so that the application code has to
> figure out the real business model at run-time, every time.

So that's where you have been spending your time. Now I understand
the questions you ask.

> Explicit support for distributed key constraints in SQL would make the
> approach properly robust plus it would, in my opinion, have the
> psychological benefit of pointing the way and comforting us that we're
> doing it right.

Reading an appropriate book will provide both comfort and the method
of doing things correctly. A bit of genuine coding and experience
will provide knowledge about which methods are better than others.
SQL is 20 years past its use-by date, so no use waiting for them. If
a book is too long, try IDEF1X, it has a good, short synopsis. If
that is too long, try this:
____http://www.softwaregems.com.au/Documents/Documentary%20Examples/
IDEF1X%20Notation.pdf

They are not "distributed keys". They are ordinary Foreign Keys. The
tables happen to be Subtypes.

If you find anything less than proper, robust, declarative (in the DDL
sense, not the syntax), and high performance, in my (2) and (3) above,
let me know. On second thought, don't. That will lead to me having
to write essays on the resources and operation of the computer. There
are perfectly book books on Amazon.com.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 5:51:55 AM10/15/11
to
Bob

On Oct 15, 6:37 pm, Bob Badour <b...@badour.net> wrote:
>
> Don't count on it. Joe's an idiot. He mostly got a name for himself by
> copying and pasting large random blocks of text in response to his betters.

I see. That explains a few things. Thank you. I stand corrected.

-CELKO-

unread,
Oct 15, 2011, 7:58:40 PM10/15/11
to
>> Response please. <<

>> traced back to Britton-Lee. And yes, they always had a few Waterloo grads on staff. brings back memories. <<

I never worked with the Britton-Lee machines , but wish I had. I have
a rule of thumb to look for software from University of Waterloo.
Remember the C compilers? It produced executable 1/3 the size of
Microsoft, conformed to standards and had a lot of nice stuff. But the
“Watrerloons” (is that still the slang?) could not write a human
interface. It was as if they only took hardcore comp sci, math and
optimizer courses.

>> [ by building pointer chains under the covers, so each compound key actually appears once in the schema.] Excuse me, but:

a. You posted SQL. I answered SQL. In order to maintain focus, avoid
tangents, and close this subject like gentlemen, let's talk SQL only.

b. SQL is a standard, anything that does not comply with the Standard
cannot make the claim to be an SQL. The SQL standard requires that
all references in the database are by *KEY*, and specifically *NOT* by
pointers. <<

Not so. The standards use (over-use?) the term “effectively” to
describe the results and not the method. You can use pointers,
indexing in all its flavors, hashing (lot of academic stuff on
Minimal Perfect Hashes for DB right now), or brute force table scans
(looking good for SSD and massive parallelism). Pointers and indexing
are simply the most popular because SQL products grew out of existign
storage engines and file systems.

Around the time of the SQL-89 standard one of vendors on the old ANSI-
X3H2 committee waned to propose a CREATE INDEX statement. The X\Open
consortium has such syntax in their portability guide and the X3H2
committee overlapped with them quie a bit. The proposal was voted
down on the grounds that a CREATE INDEX / CREATE HASH / CREATE
SURROGATE_KEY / CREATE MAGIC_GENIE were too close to a physical
implementation. The goal was to make the Standards as abstract as
possible so that ANY implementation would be conformant.

What Standards have and implementation do not is a generalize CHECK()
constraint and the CREATE ASSERTION statement that can reference any
table in the schema with aggregates. The CREATE ASSERTION is why the
<name> in “CONSTRAINT <name> CHECK(..)” has to be globally unique.
Originally, the CREATE ASSERTION was invented because all constraints
on a empty table are TRUE and therefore you cannot force a table to
contain data. Nice from a pure logical viewpoint (Existential Import
and all that jazz), but not so good for piratical programming with
mandatory relationships.

>> Teradata is not SQL. <<

It is as much ANSI/ISO SQL as any other commercial product, but they
tend to have Data Warehouse extensions because of their market and
optimize for huge amounts of data. The bad part of taching at a
Teradata class is that I am the only guy in the with only a Masters in
Math and everyone else is PhD in Stats.

>> Sybase, MS, DB2, all create Standard SQL visible indices from data columns, from the SQL command that you supplied. <<

Nope, index as NOT standard. DB2 is my favorite; lots of platforms,
good conformance, easy to use. I remember Sybase as very good but I
have not done any financial work in years where they shine. MS is
always a decade behind, but fine for its niche (the programmers are
the problem here; no really).

>> Oracle is non-compliant, in that it twists the SQL definitions in order to appear compliant. It famously uses ROWIDs, above and below the covers, but the indices and data columns are visible, above the covers. <<

I can write Oracle when I have to, and I can USUALLY read it in spite
of the dialect. But in addition to your remarks, let me add that to
keep it tuned is a full-time job with 300+ parameters to watch. If you
can figure it out, it runs like a charm; one parameter wrong and all
bets are off. My big gripe with Oracle programmers is the over-use of
cursors. They might as well be ina file system in COBOL. Rant,
Rant,Rant!

>> [What do you have against the Nested Sets?] There is a recent thread on that subject. I will take it up there. <<

I will cruise over to it when I get my head above water. I need to
spend Sunday and Monday catching up with an on-line free course and
cleaning the garage. Oh, you are being too polite; I like an adversal
dialogue:) My wife is an ordained Soto Zen Priest and she teaches by
beating students with a stick.

-CELKO-

unread,
Oct 15, 2011, 8:47:09 PM10/15/11
to
In this particular case, the way to do this in full, repeat FULL
Standard SQL, is with global CHECK() check constraints. Make and model
information is in the VIN so a regular expression can check it.

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,
REFERENCES Vehicles (vin),
CONSTRAINT vin_in_motor_pool
CHECK (EXISTS
(SELECT *
FROM Vehicles AS V1
WHERE V1.vin = SUV.vin),
CONSTRAINT suv_vin
CHECK (<< regular expression >>),
..
);

-CELKO-

unread,
Oct 15, 2011, 8:51:48 PM10/15/11
to
In this particular case, the way to do this in full, repeat FULL
Standard SQL, is with global CHECK() check constraints. Make and model
information is in the VIN so a regular expression can check it.

CREATE TABLE SUV
(vin CHAR(17) NOT NULL PRIMARY KEY,

Derek Asirvadem

unread,
Oct 15, 2011, 10:27:01 PM10/15/11
to
Forget about the make-and-model-in-the-vin business, as that
introduces changes far to late in the progression. Stick with the
meaning of the original postings.

Your suggestion has the following errors:

1 Re PK (vin). My method places the code in One Place, Once.

Your method may work, but it places the code in every Subtype, which
results in *repeated code*, and without a constraint_name, which
hinders administration.

2 Syntax. I used syntax that most SQLs would accept.

Your method performs the existence check within the table DDL, which
most SQLs disallow.

The correlated subquery won't work, because the table does not exist
yet (it is being created).

Do you ever check your code before posting ?

3 The purpose of the xxx_Excl_check is to ensure that the Subtype
exists in the parent *as that subtype*. Therefore the specific
combination of (vin, vehicle_type) must be checked for existence.

You check for (vin) only.

4 The existence check is not a "regular_expression", it is a full
query.

5 When existence checking, one should *never* use "SELECT *", as that
drags the whole row into cache, and returns it to the thread, no
matter what columns are used to qualify the query. And the thread
doesn't use the data it fetched. For existence checking, one should
always use "SELECT 1". That allows the optimiser to avoid handling
and transporting the data altogether. Further, if the query can be
serviced by an index alone (covered query), it avoids accessing the
data structure.

> [your other post]

Does not have any substance worth responding to.

If, and when, you produce evidence (output of SELECT
<true_column_list> ... , which excludes ROWID() functions, etc), then
and only then, might we have some substance to discuss. Until then,
your posting that "SQL implements [FK] constraints by building pointer
chains under the covers" is complete bunkum.

One of the fundamental differences between Relational and pre-
relational models (I will stick to practical understanding that
everyone is familiar with, refer Codd's Twelve Rules, and please do
not introduce distractions such as arguments about the RM) is that in
the pre-relational model, the references, links, chains, etc, was
indeed implemented with *internal pointers*, and the RM demanded that
all such references, links, etc, was implemented with *logical keys
only*. You seem to have missed that.

All the strictly-SQL-compliant products use and display logical keys
only. The "edge" products make internal pointers visible, but even
that is *in addition to* the logical key, which cannot be subverted.

Regards
Derek

Derek Asirvadem

unread,
Oct 15, 2011, 10:39:20 PM10/15/11
to
Joe

I forgot to mention:

6 CONSTRAINT vin_in_motor_pool is *totally redundant* because you have
a constraint:
REFERENCES Vehicles (vin)
in the line immediately above it.

Regard
Derek

-CELKO-

unread,
Oct 16, 2011, 1:16:28 PM10/16/11
to
Again, the CHECK() constraint in Standard SQL does allow ANY valid
search conditon. No product I know has implemented it and the only
product with a CREATE ASSERTION statement was RDB, a product form DEC
that was bought by OrACLE.

You asked for a Standard SQL and that was one solution.

Actually, the regular expression should have been the SIMILAR TO
predicate. Again, this is ANSI/ISO Standard and based on POSIX from
the Federal Standards. Have you ever downloaded a copy of the
Standards?

The VIN is a special case because we can validate that the vehicle is
an SUV from the VIN. Other identifiers might not have this property.
If you want to see the power of regualr expressions, look up the
pattern

>> When existence checking, one should *never* use "SELECT *", as that
drags the whole row into cache, and returns it to the thread, no
matter what columns are used to qualify the query<<

LOL! Why does anyone still beleive that myth? There was an early
version of Oracle that did that. It was correct the next release. The
options were "SELECT * ", "SELECT <constant>", "SELECT <column>", and
"SELECT <expression>" and they are all treated as *. Here we get into
that "effectively" thing in Standard-speak. Since only the table
cardinaltiy matters, the optimizer throws out the SELECT list in every
SQL product I have used or consulted on.

The SELECT * is not required, but it is the preferred style because
<column name> or <expression> can change, so the subquery will be
invalid, <constant> tells you nothing about the nature of the query
while SELECT * tells you that this is a table level construct, not
dependent on the rows and columns. To stress this consider that
EXISTS (SELECT * FROM .. WHERE..)
is equivalent to
(SELECT COUNT(*) FROM .. WHERE..) > 0

And that is how SQL products actually inplement it under the covers.
The mechanism for counting rows can use the indexes and stop when it
gets the first row back.

Jonathan Leffler

unread,
Oct 16, 2011, 2:25:10 PM10/16/11
to
On 10/14/11 11:03 PM, Derek Asirvadem wrote:
> Sybase, MS, DB2, all create Standard SQL visible indices from data
> columns, from the SQL command that you supplied.

Actually, Standard SQL does not support indexes at all. They are an
implementation detail that actual SQL DBMS use, but they are completely
outside the scope of the SQL standard.

--
Jonathan Leffler <jlef...@earthlink.net> #include <disclaimer.h>

Derek Asirvadem

unread,
Oct 16, 2011, 10:17:20 PM10/16/11
to
On Oct 17, 5:25 am, Jonathan Leffler <jleff...@earthlink.net> wrote:
>
> Actually, Standard SQL does not support indexes at all.  They are an
> implementation detail that actual SQL DBMS use, but they are completely
> outside the scope of the SQL standard.

Sure. But let's not get side-tracked. (a) I am not writing a paper
with fully qualified statements, I am responding to a post. (b) all
SQLs [that are compliant with the standard and thus SQLs] have
indices. Whether it is an Extension or not is not relevant to this
thread, and I did not mean to imply that indices were standard, rather
that all the SQLs have them, and have them in a visible column, non-
pointer, form.

The point being argued is, Celko stated:
>> You have made the assumption that references are done by more
>> indexes. Sybase (nee WATCOM) SQL implements the constraints like
>> FOREIGN KEY (vin, vehicle_type)
>> REFERENCES Vehicles(vin, vehicle_type)
>> by building pointer chains under the covers, so each compound key
>> actually appears once in the schema. Teradata uses a hashing
>> algorithm. RDBMS companies have been working on PK-FK joins for a few
>> decades, so they are pretty good at it now.

Which I state is nonsense. Celko does not understand the effect of his
own SQL:
- each PRIMARY KEY constraint builds an index
- each UNIQUE constraint builds an index
- no SQL normalises the schema
- the nonsense about "pointer-chains" is just a red herring to avoid
dealing directly with my refutation of the nonsense he has posted.

In order to avoid further misunderstandings, yes of course the B-Tree
in any index on any platform that has implemented SQL has pointers at
the non-leaf levels, and may have either pointers or actual data
columns (the key) at the leaf level. The context here is, whether (a)
the index is real (b) visible and (c) consists of visible data
columns. Or not. What one platform or another does under the covers
is not relevant to the method posted by Celko, or the issue I have
raised, or the method posted by me.

Regards
Derek

Derek Asirvadem

unread,
Oct 16, 2011, 11:15:03 PM10/16/11
to
On Oct 17, 4:16 am, -CELKO- <jcelko...@earthlink.net> wrote:
> Again, the CHECK() constraint in Standard SQL does allow ANY valid
> search conditon. No product I know has implemented it and the only
> product with a CREATE ASSERTION statement was RDB, a product form DEC
> that was bought by OrACLE.
>
> You asked for a Standard SQL and that was one solution.
>
> Actually, the regular expression should have been the SIMILAR TO
> predicate. Again, this is ANSI/ISO Standard and based on POSIX from
> the Federal Standards. Have you ever downloaded a copy of the
> Standards?
>
> The VIN is a special case because we can validate that the vehicle is
> an SUV from the VIN. Other identifiers might not have this property.
> If you want to see the power of regualr expressions, look up the
> pattern

Sounds and looks like the volumes of text you are famous for posting.
Smells like bulldust. All red herrings and distractions, in order to
avoid dealing directly with any of the issues I have raised.

> >> When existence checking, one should *never* use "SELECT *", as that
>
> drags the whole row into cache, and returns it to the thread, no
> matter what columns are used to qualify the query<<
>
> LOL! Why does anyone still beleive that myth? There was an early
> version of Oracle that did that. It was correct the next release. The
> options were "SELECT * ", "SELECT <constant>", "SELECT <column>", and
> "SELECT <expression>" and they are all treated as *. Here we get into
> that "effectively" thing in Standard-speak. Since only the table
> cardinaltiy matters,

Tables have "cardinality", do they ? Is that in your new book
*Mangled Relations for Dummies*

> the optimizer throws out the SELECT list in every
> SQL product I have used or consulted on.

Well you should start checking the showplans or query plans and
testing the code before you provide consulting. Until you post one of
those, proving your assertion, it remains bulldust, and just another
distraction.

> The SELECT * is not required, but it is the preferred style

Now it is "preferred", is it.

> because
> <column name> or <expression> can change, so the subquery will be
> invalid, <constant> tells you nothing about the nature of the query
> while SELECT * tells you that this is a table level construct, not
> dependent on the rows and columns.

You know, every time you post, you prove you have no idea about what
you are posting. Every post destroys more of the little credibility
you have left.

1. A query is a query. There is no such thing as a "table level
construct" for what you have identified or for the SELECT you have
given.

2. Some of the SQLs are smart enough to *ignore* the column list and
others are not.

3. Read the freaking manual for your particular platform. Most of
them will categorically state:
____In an EXISTS, always use (SELECT 1 FROM ... WHERE ...)

4. Stop trying to bamboozle the readers with bulldust.

5. You cannot even tell the difference between column issues and
qualifiers in the WHERE clause. What you are stating simply does not
apply.

> To stress this consider that
> EXISTS (SELECT * FROM .. WHERE..)
> is equivalent to
> (SELECT COUNT(*) FROM .. WHERE..) > 0

6. The better (not all) SQLs differentiate an EXISTS from a normal
SELECT. For EXISTS, the better SQLs stop processing the suquery when
the first row that satisfies the WHERE clause is found.

7. In every SQL, SELECT COUNT(*) has to scan the *entire table*

8. The is no "equivalence" between the two statements given.

You need to take some formal education in SQL before you post further
nonsense. Let alone "consult".

> And that is how SQL products actually inplement it under the covers.
> The mechanism for counting rows can use the indexes and stop when it
> gets the first row back.

9. By definition, by standard requirement (God, you're either an idiot
with a capital I or a shameless liar), SELECT COUNT(*) has to count
all rows. There may be faster methods of obtaining a row_count, but
we are discussing the SQL you have posted.

10. The SELECT COUNT(*) is in brackets. That has to be resolved
before the comparison outside the brackets is resolved. You need to
read books written by others on the subject, not only the fairy tales
that you write about SQL.

11. Since you have demonstrated the exact level of you knowledge of
SQL, by your own posts, and that has been pointed out to you, you are
no longer in a position to *tell* anyone anything about how something
works. You are a fraud.

--

Stop hiding behind this "under the covers" nonsense. It is
transparent. Unlike you, I (a) test my SQL and (b) obtain showplans
and resource consumption stats at a variety of levels, *before I
post*. Everything "under the covers" is laid bare.

And stop taking he position of knowing how SQL is parsed, compiled and
executed. You have proved you can't even write SQL, there is no way
that you can possibly write a compiler or speculate about what one
would do.

Finally, I note you have not addressed either the substance of my
refutation of your proposal, or my alternative method. Just an awful
lot of black powder smoke. And both your feet are bleeding.

--------------------------------------
I take it that my posted method stands
--------------------------------------

> My wife is an ordained Soto Zen Priest and she teaches by
> beating students with a stick.

I can understand her taking out her frustration with you, on the
kids. But that is not acceptable, it does not resolve the problem,
the circle continues ad infinitum. She needs to beat the source of
the frustration with a big stick, not pass it down to the kids.

Regards
Derek

Eric

unread,
Oct 17, 2011, 7:05:14 AM10/17/11
to
On 2011-10-17, Derek Asirvadem <derek.a...@gmail.com> wrote:
> On Oct 17, 5:25?am, Jonathan Leffler <jleff...@earthlink.net> wrote:
>>
>> Actually, Standard SQL does not support indexes at all. ?They are an
>> implementation detail that actual SQL DBMS use, but they are completely
>> outside the scope of the SQL standard.
>
> Sure. But let's not get side-tracked. (a) I am not writing a paper
> with fully qualified statements, I am responding to a post. (b) all
> SQLs [that are compliant with the standard and thus SQLs] have
> indices. Whether it is an Extension or not is not relevant to this
> thread, and I did not mean to imply that indices were standard, rather
> that all the SQLs have them, and have them in a visible column, non-
> pointer, form.

It's sort-of fun to watch two people who don't know enough trying to
have an argument.

What on earth is "and have them in a visible column, non-pointer, form"
supposed to mean? All you can ever see of an index (apart from its
effect on query performance) is the list of columns indexed and some
vendor-dependent stuff usually about physical storage.

>
> The point being argued is, Celko stated:

Just to pick on one more thing:

> - each PRIMARY KEY constraint builds an index
> - each UNIQUE constraint builds an index

Not "builds", but "usually needs" :
- nowhere is there a requirement that primary and unique keys be
implemented with an index
- only one index is needed and there is nothing to prevent an SQL parser
being smart enough to realise this
- if the parser is not smart enough there may well be a way of
second-guessing it and still ending up with only one index.

And finally...

> ... under the covers is not relevant ...

True, but you don't always seem to understand where the covers are.

Eric


--
ms fnd in a lbry

-CELKO-

unread,
Oct 18, 2011, 12:36:10 PM10/18/11
to
>> Actually, Standard SQL does not support indexes at all.  They are an implementation detail that actual SQL DBMS use, but they are completely outside the scope of the SQL standard. <<

You already read my short history of X3H2 and CREATE INDEX, so I will
not repeat it. Back in the ealry days, we worried about how to do the
indexes. B-Tree in its mutations were favored. Oracle briefly had a
ersion that carried the prefix of CHAR(n) and VARCHAR(n) strings at
different level; it had too much ovehead on INSERT and UPDATE.

Today, I would look at hashing and brute force scans to enforce unique
constraints. In a large DB, I have to traverse several levels in an
index, but a good hash is one probe no matter how big the table is.

If I have a massively parallel procesor and SSD, I can scan the SSD at
processor speeds; everything is effectively in Main Storage. Why
bother with indexing and all the troulbe to maintain it?

-CELKO-

unread,
Oct 18, 2011, 1:16:26 PM10/18/11
to
On Oct 16, 10:15 pm, Derek Asirvadem <derek.asirva...@gmail.com>
wrote:
>> Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust.  All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised.<<

No, you went right to p[erosnal at

-CELKO-

unread,
Oct 18, 2011, 3:18:19 PM10/18/11
to
>> Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust. All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised. <<

Personal attacks are not issues. You have never worked with the SQL
Standards, probably have worked on less than a dozen SQL
implementations, have never been part of a team designing optimizers.
You also have a number of conceptual problems, which seem to be the
result of confusing the abstract standards and RM versus particular
implementations.


>> Tables have "cardinality", do they ? Is that in your new book “Mangled Relations for Dummies* <<

No actually it is in SQL:2005 Part II. I t is a bad idea to go up
against a man who keeps the ANSI/ISO Standards on his desk :) And
after as decade on X3H2, I have a good idea where to look.

================
4.14.1 Introduction to tables

A table is a collection of rows having one or more columns. A row is a
value of a row type. Every row of the same table has the same row
type. The value of the i-th field of every row in a table is the value
of the i-th column of that row in the table. The row is the smallest
unit of data that can be inserted into a table and deleted from a
table.

A table T2 is part of a column C of a table T1 if setting the value of
T1.C to a NULL value (ignoring any constraints or triggers defined on
T1 or T1.C) would cause T2 to disappear.

The most specific type of a row is a row type. All rows of a table are
of the same row type and this is called the row type of that table.

The degree of a table, and the degree of each of its rows, is the
number of columns of that table. The number of rows in a table is its
cardinality. A table whose cardinality is 0 (zero) is said to be
empty.

The term Cardinality is originally from Set Theory, where we are more
worried about finite and transfinite sets. And even more with kinds of
infinite sets.

>> Well you should start checking the showplans or query plans and testing the code before you provide consulting. Until you post one of those, proving your assertion, it remains bulldust, and just another distraction. <<

Run a EXISTS (SELECT *), EXISTS (SELECT 1), EXISTS (SELECT x), and
EXISTS (SELECT a+b), look for any difference in the SHOWPLAN, EXPLAIN,
or what each vendor is used for this system utility.

>> 1. A query is a query. There is no such thing as a "table level construct" for what you have identified or for the SELECT you have given. <<

No, sorry. Do not confuse a uniform syntax with uniform semantics. A
SELECT in a CURSOR body has limitations, a SELECT in a subquery has to
be in parentheses, a SELECT in an INSERT INTO has limitations, etc.

The model we used in SQL is a hierarchy. The outer level is the
schema; here the schema information tables and schema objects are
exposed to an ADMIN . An ADMIN has the DROP, CREATE and ALTER
privileges; a USER does not. Her is where DRI actions live, etc.

The “DROP <schema object>” is at the table level; we do not care
about the rows or columns inside the table. Likewise EXISTS (SELECT
* ) and COUNT(*) at the table level.

The next level is the rows. The INSERT INTO uses rows are its unit of
work, multi-column CHECK() constraints, etc.

Finally, we have the column level, where simple column CHECK(),
DEFAULT, NOT NULL, etc.

>> 2. Some of the SQLs are smart enough to *ignore* the column list and others are not. <<

I cannot think of any current SQL product that does not use this
optimization pattern:
1) Throw exception if table not in the schema information tables
2) Look for single table parameter without a WHERE clause and check
cardinality in schema information tables. This is short-circuit
evaluation at the schema level. Hey, it pays off!
3) Look for empty tables in the FROM clause. This is short-circuit
evaluation at the table level.
4) Perform the filters in the WHERE, then do the joins in the FROM
clause. This step is done with special code for the joins, since we do
not have to return the whole row.
5) Return FALSE when the result set has a row, This is short-circuit
evaluation at the row level.
6) Otherwise, return TRUE; there is no UNKNOWN in this predicate.

The “special code:” in step #4 is the fun part. Skewed distributions,
de-dupping in the join, hashing, throwing out values and using bit
flag indexes, etc.

>> 3. Read the freaking manual for your particular platform. Most of them will categorically state: ____In an EXISTS, always use (SELECT 1 FROM ... WHERE ...) <<

Not in any of the five SQLs I have on my desk. The examples in Oracle
tend to use a constant, but that is not recommended anywhere. Again,
the use of EXIST(SELECT *..) is favored because it is not clear if the
<select list> is invalid whether or not the SQL engine should return
a warning and continue to process the predicate or throw an
exception. The star is always safe, even if the base table columns
change.

>> 4. Stop trying to bamboozle the readers with bulldust. <<

I will make a note of that :)

>> 5. You cannot even tell the difference between column issues and ualifiers in the WHERE clause. What you are stating simply does not apply. <,

Again, it is not a good idea to go up against a guy with a copy of
the Standards on his machine :) I said

> EXISTS (SELECT * FROM .. WHERE..)
> is equivalent to
> (SELECT COUNT(*) FROM .. WHERE..) > 0

And SQL:2006 says:

8.10 <exists predicate>
Function
Specify a test for a non-empty set.

Format
<exists predicate> ::= EXISTS <table subquery>

Syntax Rules
None.

Access Rules
None.

General Rules
1) Let T be the result of the <table subquery>.

2) If the cardinality of T is greater than 0 (zero), then the result
of the <exists predicate> is TRUE; otherwise, the result of the
<exists predicate> is FALSE.

Conformance Rules

1) Without Feature T501, “Enhanced EXISTS predicate”, conforming SQL
language shall not contain an <exists predicate> that simply contains
a <table subquery> in which the <select list> of a <query
specification> directly contained in the <table subquery> does not
comprise either an <asterisk> or a single <derived column>.

==========================================

>> 7. In every SQL, SELECT COUNT(*) has to scan the *entire table* <<

No, the schema information tables have the cardinality of a table. If
the stats are out of date, tehn they go to the indexes. The table is
used only as a last resort.

>> 8. The is no "equivalence" between the two statements given. You need to take some formal education in SQL before you post further nonsense. Let alone "consult". <<

I remember that next time I see my name in an SQL text book :)

>> 10. The SELECT COUNT(*) is in brackets. That has to be resolved before the comparison outside the brackets is resolved. You need to read books written by others on the subject, not only the fairy tales
that you write about SQL. <<

The optimizer is free to implement an execution plan as it pleases, so
long as it “effectively” returns the same results. Do you also believe
that the FROM clause is executed before the WHERE clause? Since you
did not know the term Cardinality, you might have to look up
“Cartesian Product” to understand this point.

>> Finally, I note you have not addressed either the substance of my refutation of your proposal, or my alternative method. <<

I lost track of it in all your name-calling and screaming. Basically,
you want to use this pattern of REFERENCES

vehicles ← sedans ← two_door

The classic schema I gave uses:

vehicles ← sedans
vehicles ← two_door

Validate a two-door and you have to access three tables. The classic
schema accesses only two
Now drop the sedans and the two-doors are orphaned. The classic
schema does not have this problem.

To find data about a two-door sedan I have to use all three tables.
But to find the data for my four_door, I only need vehicles ← sedans.
Now add convertibles as another class. Do you subordinatre it to
two_doors or to sedans? Or directly to vehicles?

Pretty clearly, the paint job is in Vehicles. But where do the door
locking attributes fit? This is a version of the network /
hierarchical database design problems. How do I handle optional/
mandatory relationship? What level/node does an attribute belong?

Also, Codd did not have anything to do with class models; he was into
DOMAINs. And SQL does have DRI actions; that is section 11.8
<referential constraint definition>.





Hugo Kornelis

unread,
Oct 18, 2011, 3:29:37 PM10/18/11
to
On Wed, 12 Oct 2011 17:51:11 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

(...)
Hi Derek,

I've tried to follow your debate with Joe Celko with interest, though
it's sometimes hard to keep track of the arguments in between the
mud-slingin.

As Joe can confirm, I often disagree with him, on many subjects. But
when I first saw him post the subtype/supertype implementation model
you criticise, I liked it for its robust protection agains integrity
violations. You are correct that the redundant columns and the
extraneous keys require more disk space and hurt performance, but I
always think that integrity comes first, all the rest comes later.

But I have of course an open mind for better alternatives, and you
seem to have one. Unfortunately, the most relevant part for integrity
is left out of your post, or rather replaced by ellipsis. Given the
two tables above (plus the implied vehicles table), how will the
following (inconsistent) transaction be rejected:

BEGIN TRANSACTION;
INSERT INTO Vehicles (vin, vehicle_type)
VALUES ('12345678901234567', 'SUV');
INSERT INTO Sedans (vin, sedan_type)
VALUES ('12345678901234567', '4DR');
INSERT INTO TwoDoor (vin)
VALUES ('12345678901234567');
COMMIT TRANSACTION;

I think you intend the constraints Sedan_Excl_vk and 2dr_Excl_ck to
prevent this, but you left out their specification. In a complete ANSI
SQL implementation, I could easily create this constraint, as the ANSI
SQL standard permits the use of subqueries in a CHECK constraint
(SQL-92: full compliance; SQL-2003: Feature F671, "Subqueries in CHECK
constraints"). But the product I work with (Microsoft SQL Server) does
not implement this feature, and a quick Google search gives me the
impression that most (if not all) major vendors suffer from the same
limitation.

If your point is that in a fully ANSI-compliant RDBMS, the version Joe
posted suffers from needless redundancy (just like this sentence), I
agree. But if you have a way of coding the CHECK constraint that is
supported by some, or preferably all, major current RDBMS
implementations and that will disallow the inconsistent data I gave
above, please let me know. (Or, if you already gave it and I missed it
when wading through the discussion, please tell me where I can find
it).
--
Hugo Kornelis, SQL Server MVP
My SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis

Derek Asirvadem

unread,
Oct 18, 2011, 9:10:43 PM10/18/11
to
On Oct 17, 10:05 pm, Eric <e...@deptj.eu> wrote:
>
> It's sort-of fun to watch two people who don't know enough trying to
> have an argument.

It is even more fun to watch someone on the sidelines, who is not
party to the argument, picking up clauses and examining them in
isolation.

> What on earth is "and have them in a visible column, non-pointer, form"
> supposed to mean?

Let me spell it out for you:
- I was not posting a new thread (I would have used proper terms).
- As I detailed to Jonathan, I was not writing a fully qualified
thesis (if you are expecting that, you will find it lacking)
- I was posting a response to Celko's thread. That defines the
context.
- Celko was playing "slippery fish", to avoid dealing with specific
problems that I identified (check the whole thread)
- In an attempt to prevent the slipperiness, I used Celko's terms.
- As it progressed, I used further terms make distinctions, using his
used terms as the base (I would not have used his terms, or my-terms-
that-are-based-on-his-terms in a stand-alone article)

> All you can ever see of an index (apart from its
> effect on query performance) is the list of columns indexed and some
> vendor-dependent stuff usually about physical storage.

Excellent, you seem to understand that correctly. So your question is
either rhetorical or plain absurd.

If you have an honest argument with me, just state it; if there is
something you wish to clarify about one of my statements, ask a
question. But this picking and gnawing on the carcasses left, after
others have feasted, is for the birds.

> Just to pick on one more thing:
>
> > - each PRIMARY KEY constraint builds an index
> > - each UNIQUE constraint builds an index
>
> Not "builds", but "usually needs" :

Correcting my English is petty. If you fancy yourself an editor, send
me your CV, I will send you all my posts to edit, before I post.

> - nowhere is there a requirement that primary and unique keys be
>   implemented with an index

Try reading the entire statement in context. You might draw a
different conclusion.

If that is still confusing, try this: I did not say "there is a
requirement..." so there is no point arguing against what I did not
say. I am quite happy to address issues regarding what I have said
(the thread is ready evidence). But I do not engage with isolated
scraps. It might give the readers the idea that such isolated scraps
have some credibility, or that bird-like behaviour is appropriate for
functioning adults.

> - only one index is needed and there is nothing to prevent an SQL parser
>   being smart enough to realise this

Marvellous. Identify which SQL does that (post evidence).

Until then your suggestion is possibility in the future, not a
reality, which is what Celko posted (he did not say "sometime in the
future, some SQL could ...", he posted "Here is how to do this
[now] ..."), which I am arguing. I can simply state, no SQL does that
[now]. The implication is "now", on Earth, not "ten years from now",
not "if and when the vendors wake up and read Celko's post", not on
the fourth moon of Jupiter.

> - if the parser is not smart enough there may well be a way of
>   second-guessing it and still ending up with only one index.

Feel free to post the method. Otherwise your suggestion is a yet
another fantasy of could-bes and guesses. Just like Celko. Any
examination reveals that his method is based on facilities that are
not available now, as stated "do this", that the method might be
useful at some point in the future, if and when some SQL vendor
supplies the the underlying facility.

My method is based on the here and now, in any SQL (I have
specifically excluded the non-SQLs), on Earth.

> And finally...
>
> > ... under the covers is not relevant ...
>
> True, but you don't always seem to understand where the covers are.

Thanks for your opinion. I will make sure I give it the consideration
it deserves, keeping in mind your demonstrated level of capability, in
understanding technical matters, and your could-be, should-be,
attitude to addressing present day reality.

Should you wish to address any of the salient points in my post, a
piece of meat, rather than a scrap, I might give your avian opinion
more consideration.

Regards
Derek

-CELKO-

unread,
Oct 18, 2011, 10:35:12 PM10/18/11
to
>> As Joe can confirm, I often disagree with him, on many subjects. But when I first saw him post the subtype/supertype implementation model you criticise, I liked it for its robust protection against integrity violations. You are correct that the redundant columns and the extraneous keys require more disk space and hurt performance, but I always think that integrity comes first, all the rest comes later. <<

Hey, we do not disagree very often, where dis you get a silly idea
like that? Sorry, I had to do a Monty Python arguer skit :)

You are of course right about data integrity. New programmers who
never worked with file systems, do not know the first non-RDBMS do not
know that the original intent of a database was data integrity, not
performance. Removing redundancy was the first step in that goal.

>> In a complete ANSI SQL implementation, I could easily create this constraint, as the ANSI SQL standard permits the use of subqueries in a CHECK constraint (SQL-92: full compliance; SQL-2003: Feature F671, "Subqueries in CHECK constraints"). But the product I work with (Microsoft SQL Server) does not implement this feature, and a quick Google search gives me the impression that most (if not all) major vendors suffer from the same limitation.<<

I have to look at more products and I do not know of any either. The
expense could be awful, like a bad TRIGGER that fires with every
insert, update and delete

>> If your point is that in a fully ANSI-compliant RDBMS, the version Joe posted suffers from needless redundancy (just like this sentence), I agree. But if you have a way of coding the CHECK constraint that is supported by some, or preferably all, major current RDBMS implementations and that will disallow the inconsistent data I gave above, please let me know. <<

Wait until he sees Kuznetsov's pattern for preventing temporal gaps in
event sequences or my state change constraints :) Overlapping and
self-referencing uniqueness constraints are hard for an experienced
SQL programmer.

-CELKO-

unread,
Oct 18, 2011, 10:53:04 PM10/18/11
to
>> [only one index is needed and there is nothing to prevent an SQL parser being smart enough to realise this]
Marvelous. Identify which SQL does that (post evidence). <<

Mr. Hann can tell me if my memory is correct, but I think Ingres has
something like 17 kinds of indexes and it did index optimization,
picking the right one for a particular data distribution and merging
or removing redundant ones. MS SQL Server has the INCLUDE option,
which says to carry extra columns but not use them in the indexing
process. This lets you get a cheap covering index.

Hashing algorithms work better with more columns (few collisions), so
they will replace a shorter list of columns with a longer one in the
hash.

Informix lets you create an index on a search condition for a multi-
table join. This is not a VIEW, but pointers to the rows involved.

Bob Badour

unread,
Oct 18, 2011, 11:41:28 PM10/18/11
to
-CELKO- wrote:

>>>Sounds and looks like the volumes of text you are famous for posting. Smells like bulldust. All red herrings and distractions, in order to avoid dealing directly with any of the issues I have raised. <<
>
> Personal attacks are not issues. You have never worked with the SQL
> Standards, probably have worked on less than a dozen SQL
> implementations, have never been part of a team designing optimizers.
> You also have a number of conceptual problems, which seem to be the
> result of confusing the abstract standards and RM versus particular
> implementations.

I had a real laugh when Melton thanked the people who actually
contributed to the standard.

Toodles!

Hugo Kornelis

unread,
Oct 19, 2011, 5:41:40 AM10/19/11
to
On Tue, 18 Oct 2011 19:35:12 -0700 (PDT), -CELKO-
<jcel...@earthlink.net> wrote:

>>> As Joe can confirm, I often disagree with him, on many subjects. But when I first saw him post the subtype/supertype implementation model you criticise, I liked it for its robust protection against integrity violations. You are correct that the redundant columns and the extraneous keys require more disk space and hurt performance, but I always think that integrity comes first, all the rest comes later. <<
>
>Hey, we do not disagree very often, where dis you get a silly idea
>like that? Sorry, I had to do a Monty Python arguer skit :)

Hi Joe,

Back in the days when the SQL Server usenet groups were still lively
(before Microsoft replaced them with something worse), I often
challenged you when you posted suggestions that included code that
does not work well on SQL Server, or code that would make SQL Server
grind to a halt because of its awful performance. But I will admit
that Tony Rogerson often beat me to the punch.

(...)


>Wait until he sees Kuznetsov's pattern for preventing temporal gaps in
>event sequences or my state change constraints :) Overlapping and
>self-referencing uniqueness constraints are hard for an experienced
>SQL programmer.

Have you read Kuznetsov's pattern in his Defensive Database
Programming book, or on the web? In the former case, you might want to
check out who tech edited it, just for laughs. ;)

-CELKO-

unread,
Oct 19, 2011, 11:06:44 AM10/19/11
to
>> (before Microsoft replaced them with something worse), .. <<

I agree. So many of the postings are Indian students trying to get
their homework done. The Cut & Paste scrambles the text so it is hard
to read..

>> Have you read Kuznetsov's pattern in his Defensive Database Programming book, or on the web? <<

I have seen it but not gotten a copy. Too much to do with my own
second edition of Trees & Hierarchies, fixing bad plumbing and gettign
ready for eye surgery.

>> In the former case, you might want to check out who tech edited it, just for laughs. ;) <<

Have you seen the 4-th edition of SMARTIES? They out-sourced the
typesetting to India. The compnay had ONE template for ALL books, so
all tabls are set as if they were a display graphic (shadings,
ornaments, etc) and not an SQL result set. Diagrams are misplaced or
repeated. The choice of fonts is awful. Code is set as if it were
text. I really hate it and we now have an agreement to go back to the
style I used in the other seven books in my series.


Hugo Kornelis

unread,
Oct 19, 2011, 5:54:09 PM10/19/11
to
On Wed, 19 Oct 2011 08:06:44 -0700 (PDT), -CELKO-
<jcel...@earthlink.net> wrote:

>Have you seen the 4-th edition of SMARTIES?

No, not really. I happened to see one a few weeks ago and I brielfy
skimmed the table of contents, but I did not have time for anything
more.

Based on what you write, I'm glad I didn't buy it. Such lousy
formatting would distract me so much from the code that the message
probably would not stick.
Is there a lot of new material that the earlier editions did not
cover?

-CELKO-

unread,
Oct 20, 2011, 12:09:20 PM10/20/11
to
[ the 4-th edition of SMARTIES?] Is there a lot of new material that
the earlier editions did not cover?<<

A lot of it is the same topics, but with newer constructs in the
solutions. ~785 page versus ~550 pages for the second edition; I don't
have the third edition in front of me.

Table of Contents

Introduction
Chapter 1: Databases versus File Systems
Chapter 2: Transactions and Concurrency Control
Chapter 3: Schema Level Objects
Chapter 4: Locating Data and Special Numbers
Chapter 5: Base Tables and Related Elements
Chapter 6: Procedural, Semi-Procedural and Declarative Programming
Chapter 7: Procedural Constructs
Chapter 8: Auxiliary Tables
Chapter 9: Normalization
Chapter 10: Numeric Data Types
Chapter 11: Temporal Data Types
Chapter 12: Character Data Types
Chapter 13: NULLs — Missing Data in SQL
Chapter 14: Multiple Column Data Elements
Chapter 15: Table Operations
Chapter 16: Comparison or Theta Operators
Chapter 17: Valued Predicates
Chapter 18: CASE Expressions
Chapter 19: LIKE and SIMILAR TO Predicates
Chapter 20: BETWEEN and OVERLAPS Predicates
Chapter 21: The [NOT] IN() Predicate
Chapter 22: EXISTS() Predicate
Chapter 23: Quantified Subquery Predicates
Chapter 24: The Simple SELECT Statement
Chapter 25: Advanced SELECT Statements
Chapter 26: Virtual Tables: VIEWs, Derived Tables, CTEs and MQTs
Chapter 27: Partitioning Data in Queries
Chapter 28: Grouping Operations
Chapter 29: Simple Aggregate Functions
Chapter 30: Advanced Grouping, Windowed Aggregation and OLAP in SQL
Chapter 31: Descriptive Statistics in SQL
Chapter 32: Sub-Sequences, Regions, Runs, Gaps, and Islands
Chapter 33: Matrices in SQL
Chapter 34: Set Operations
Chapter 35: Subsets
Chapter 36: Trees and Hierarchies in SQL
Chapter 37: Graphs in SQL
Chapter 38: Temporal Queries
Chapter 39: Optimizing SQL


Kevin Kirkpatrick

unread,
Oct 20, 2011, 5:07:41 PM10/20/11
to
This seemed the most appropriate "jumping in" point. In my SQL vendor
of choice (Oracle), while there is not support for ANSI CHECK
constraints, I have found a combination of features that allow for
cross-table constraints to be enforced in a purely declarative and
high-performance manner. The basic idea is to use instant-refresh
materialized-views that bring the data together, then put indexes on
the materialized views to enforce cross-row constraints. All such
objects (materialized view logs, materialized views, constraint
functions, and constraint indexes) used to enforce the constraints
should be considered physical implementation details like table
indexes, i.e. NOT exposed at the logical level and NOT accessible to
users/application code .

For instance, the vehicle sub-typing is easily accommodated with the
following script: (Note, this is a purely declarative solution - the 2
functions merely exist to return meaningful error messages for
constraint violations and play no role in enforcing said
constraints... calls to these functions could be replaced with the
expression "1/0", giving bizarre div-by-zero error messages, but
preventing constraint violations nonetheless).



SQL>
SQL> CREATE TABLE VEHICLES
2 (VIN NUMBER PRIMARY KEY,
3 VEHICLE_TYPE VARCHAR2(3) CHECK (VEHICLE_TYPE IN
('SUV','SED')));

Table created.

SQL>
SQL> CREATE TABLE SEDANS
2 (VIN NUMBER PRIMARY KEY,
3 SEDAN_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.

SQL>
SQL> CREATE TABLE SUVS
2 (VIN NUMBER PRIMARY KEY,
3 SUV_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.


/* And now the fun begins */

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON VEHICLES WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SEDANS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SUVS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW SEDAN_CONSTRAINT_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT V.VEHICLE_TYPE, V.ROWID V_ROW, S.ROWID S_ROW
4 FROM VEHICLES V, SEDANS S
5 WHERE V.VIN = S.VIN;

Materialized view created.

SQL>
SQL> CREATE MATERIALIZED VIEW SUV_CONSTRAINT_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT V.VEHICLE_TYPE, V.ROWID V_ROW, S.ROWID S_ROW
4 FROM VEHICLES V, SUVS S
5 WHERE V.VIN = S.VIN;

Materialized view created.

SQL>
SQL> CREATE OR REPLACE FUNCTION SEDAN_CONSTRAINT_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Only vehicles with type SED
permitted in SEDANS table');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX SEDAN_CONSTRAINT_IDX ON SEDAN_CONSTRAINT_MV
(DECODE(VEHICLE_TYPE,'SED',NULL,SEDAN_CONSTRAINT_FUN));

Index created.

SQL>
SQL>
SQL> CREATE OR REPLACE FUNCTION SUV_CONSTRAINT_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Only vehicles with type SUV
permitted in SUVS table');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX SUV_CONSTRAINT_IDX ON SUV_CONSTRAINT_MV
(DECODE(VEHICLE_TYPE,'SUV',NULL,SUV_CONSTRAINT_FUN));

Index created.


-------------------- PROOF OF CONCEPT --------------------------

SQL>
SQL> INSERT INTO VEHICLES VALUES (1, 'SED');

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (1, 'MY SEDAN');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO VEHICLES VALUES (2, 'SUV');

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (2, 'MY SUV');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL>
SQL> INSERT INTO VEHICLES VALUES (3, 'SED');

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (3, 'NOT AN SUV!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SUV_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SUV permitted in SUVS table
ORA-06512: at "APPS.SUV_CONSTRAINT_FUN", line 6


SQL>
SQL> INSERT INTO VEHICLES VALUES (4, 'SUV');

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (4, 'NOT A SEDAN!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SEDAN_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SED permitted in SEDANS table
ORA-06512: at "APPS.SEDAN_CONSTRAINT_FUN", line 6


SQL>
SQL> UPDATE VEHICLES SET VEHICLE_TYPE = 'SUV' WHERE VIN = 1;

1 row updated.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SEDAN_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SED permitted in SEDANS table
ORA-06512: at "APPS.SEDAN_CONSTRAINT_FUN", line 6


SQL>
SQL> UPDATE VEHICLES SET VEHICLE_TYPE = 'SED' WHERE VIN = 2;

1 row updated.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12048: error encountered while refreshing materialized view
"APPS"."SUV_CONSTRAINT_MV"
ORA-20001: Only vehicles with type SUV permitted in SUVS table
ORA-06512: at "APPS.SUV_CONSTRAINT_FUN", line 6


SQL>
SQL> SPOOL OFF


Erwin

unread,
Oct 21, 2011, 11:11:06 AM10/21/11
to
On 20 okt, 23:07, Kevin Kirkpatrick <kvnkrkpt...@gmail.com> wrote:
> This seemed the most appropriate "jumping in" point.

Well, now that we're on the track of working declarative solutions I
can't resist jumping in too ...

VAR VEHICLE RELATION {VIN:... , ...} KEY {VIN};
VAR SED RELATION {VIN:... , ...} KEY {VIN};
VAR SUV RELATION {VIN:... , ...} KEY {VIN};

CONSTRAINT ALL_SEDANS_ARE_VEHICLES ISEMPTY(SED NOT MATCHING VEHICLE);
CONSTRAINT ALL_SUVS_ARE_VEHICLES ISEMPTY(SUV NOT MATCHING VEHICLE);
CONSTRAINT NO_SEDANS_ARE_ALSO_SUVS ISEMPTY(SED {VIN} INTERSECT SUV
{VIN});
CONSTRAINT ALL_VEHICLES_ARE_SUV_OR_SEDAN ISEMPTY(VEHICLE NOT MATCHING
(SED {VIN} UNION SUV {VIN}));

Even without support for "distributed keys", or whatever one might
name the concept, a minute or so of writing and you're done.

Hugo Kornelis

unread,
Oct 21, 2011, 4:44:49 PM10/21/11
to
On Thu, 20 Oct 2011 14:07:41 -0700 (PDT), Kevin Kirkpatrick
<kvnkr...@gmail.com> wrote:

(snip)
>This seemed the most appropriate "jumping in" point. In my SQL vendor
>of choice (Oracle), while there is not support for ANSI CHECK
>constraints, I have found a combination of features that allow for
>cross-table constraints to be enforced in a purely declarative and
>high-performance manner. The basic idea is to use instant-refresh
>materialized-views that bring the data together, then put indexes on
>the materialized views to enforce cross-row constraints. All such
>objects (materialized view logs, materialized views, constraint
>functions, and constraint indexes) used to enforce the constraints
>should be considered physical implementation details like table
>indexes, i.e. NOT exposed at the logical level and NOT accessible to
>users/application code .
(snip)

Thanks for your reply. I have no experience with Oracle, but I think I
can parse the code sufficiently to understand what is going on. You
create a materialized view (probably very similar to the indexed view
feature SQL Server has) that joins data from subtype and supertype
table together. Then you add an extra index on that materialized view,
but not on one of the columns itself, but on a virtual computed column
that is computed using the DECODE function (not supported by SQL
Server, but from the looks of it it appears to be a shorthand for a
CASE expression), that will invoke a function if the type in the
supertable does not match what should be in the subtable; and that
function then invokes an error.

I *think* a similar approach would be possible in SQL Server, but
using a few different angles. I can't force an error from a function
(SQL Server functions allow no side effects), but I could add a column
to the view that is computed using a CASE expression with possible
results 1 or 1/0. That would not give a similar user-friendly error
message, but it would provide the functionality (I think - I have not
tried it, though!).

But I am a bit surprised, as your first critique on Joe Celko's
version was not about logical/physical level, but about performance
and overhead. I quote:

On Sat, 15 Oct 2011 02:41:24 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

(...)
>Ok, so you are both fools.
>
>It is not common, because 220% overhead and substantial cost, is just
>plain stupid. People are not quite as stupid as you think.

On Wed, 12 Oct 2011 17:51:11 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

>This "class hierarchy model" is one of the most idiotic things I have
>seen in my 34 years in databases.
(...)
>a. basically doubles the number of indices in the relevant tables, and
>that has a substantial overhead.
>
>b. requires vehicle_type to be carried in the subtypes, which AFAIC is
>completely redundant (in SUVs, we know all the rows are SUV, why carry
>a column in which every value will be "SUV").
(...)
>e. introduces complexity and processing; views; instead-of triggers,
>etc. I would rather the database remain open as possible, without
>such complications for use. Particularly when such nonsense is
>completely unnecessary.

I have no idea how Oracle implements materialized views, but the name
alone suggests that, like SQL Server, the content of the view is
stored on disk. So the storage overhead will be even MORE than the
single extra column that Joe Celko's version adds.
SQL Server also requires an index for each materialized view (hence
the name indexed view). Maybe Oracle doesn't, but you do create one
yourself. That should be about the same overhead as the index created
for the extra UNIQUE constraint in Joe Celko's version (for those
RDBMS's that use an index to implement a UNIQUE constraint). Maybe
even more because of the computed column.

I don't know where you get the 220% figure. I do agree that there is
substantial overhead and cost involved in Joe Celko's solution, but I
expect that your solution will require at least as much, maybe even
more overhead.


In your post to me, you introduce a different argument: that the
columns introduced by Joe do not belong in the logical level. I do
agree with that. But does that imply that they should not be in the
tables? There are several examples of columns that are not part of the
logical model but are added to the tables. One example is a surrogate
key column. Another example is a rowversion column (that's the name in
SQL Server; I don't know if Oracle has a similar feature) that
automatically changes value every time a row is updated, to facilitate
the implementation of optimistic concurrency checking.

There are various ways to not expose these columns to the database
user, such as building views on top of the tables that don't expose
these columns, or creating stored procedures to implement the data
access layer.

Kevin Kirkpatrick

unread,
Oct 21, 2011, 6:31:20 PM10/21/11
to
On Oct 21, 3:44 pm, Hugo Kornelis
<h...@perFact.REMOVETHIS.info.INVALID> wrote:
> On Thu, 20 Oct 2011 14:07:41 -0700 (PDT), Kevin Kirkpatrick
>
Correct.

> But I am a bit surprised, as your first critique on Joe Celko's...
<snipped>

I'm not Derek.

I do think Joe's solution suffers from the problem of exposing the
implementation of the constraint at the logical level. I was merely
proposing an SQL solution (one that admittedly is going to vary in
mileage with different SQL vendors) that does not suffer that
downfall. For the record, I'll sacrifice disk space usage 10 times
over in favor of proper constraint enforcement, and do not consider
220% or any similar linear overhead to be problematic. I bench-marked
my own solution and found it entailed a 75% overhead in disk space
(misleadingly large because it doesn't account for the fact that
VEHICLES, SEDANS, and SUVS are going to be much larger than 2 - 3
columns in practice) usage and 100% overhead in performance; meaning
that adding 1 million vehicles unconstrained took 30 seconds, and
adding 1 million vehicles, constrained, took 60 seconds. That, to me,
is a perfectly acceptable overhead (consider, if I could take the KEY
constraint off VEHICLES and load 1 million rows in 15 seconds... would
that justify removing the constraint?).

I should note that, upon reading Erwin's response (the lousy cheat,
using a TRDBMS) my solution /is/ overly complex and still exposes the
constraint at the logical level. There is no need whatsoever to have
a VEHICLE_TYPE in VEHICLES in the first place; VEHICLE_TYPE can be
inferred based on which table the vehicle is inserted, SUVS or
SEDANS. The following revision would be more appropriate (and
actually has significantly less overhead to boot).


SQL>
SQL> CREATE TABLE VEHICLES
2 (VIN NUMBER PRIMARY KEY);

Table created.

SQL>
SQL> CREATE TABLE SEDANS
2 (VIN NUMBER PRIMARY KEY,
3 SEDAN_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.

SQL>
SQL> CREATE TABLE SUVS
2 (VIN NUMBER PRIMARY KEY,
3 SUV_ATTRIBUTE VARCHAR2(150),
4 FOREIGN KEY (VIN) REFERENCES VEHICLES(VIN));

Table created.


SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SEDANS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW LOG ON SUVS WITH ROWID;

Materialized view log created.

SQL>
SQL> CREATE MATERIALIZED VIEW DISJOINT_VEHICLE_TYPES_MV
2 BUILD IMMEDIATE REFRESH FAST ON COMMIT AS
3 SELECT SUVS.ROWID V_ROW, SEDANS.ROWID S_ROW
4 FROM SUVS , SEDANS
5 WHERE SUVS.VIN = SEDANS.VIN;

Materialized view created.

SQL>
SQL>
SQL> CREATE OR REPLACE FUNCTION DISJOINT_VEHICLE_TYPES_FUN
2 RETURN NUMBER
3 DETERMINISTIC
4 IS
5 BEGIN
6 RAISE_APPLICATION_ERROR (-20001,'Vehicles must be entered as
exactly one type.');
7 END;
8 /

Function created.

SQL>
SQL> CREATE INDEX DISJOINT_VEHICLE_TYPES_IDX ON
DISJOINT_VEHICLE_TYPES_MV
(DECODE(V_ROW,V_ROW,DISJOINT_VEHICLE_TYPES_FUN));


-- note: Shame on Oracle, the following is not acceptable:
-- CREATE INDEX DISJOINT_VEHICLE_TYPES_IDX ON
DISJOINT_VEHICLE_TYPES_MV (DISJOINT_VEHICLE_TYPES_FUN)

Index created.

SQL>
SQL>
SQL> INSERT INTO VEHICLES VALUES (1);

1 row created.

SQL>
SQL> INSERT INTO SEDANS VALUES (1, 'MY SEDAN');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO VEHICLES VALUES (2);

1 row created.

SQL>
SQL> INSERT INTO SUVS VALUES (2, 'MY SUV');

1 row created.

SQL>
SQL> COMMIT;

Commit complete.

SQL>
SQL> INSERT INTO SUVS VALUES (1, 'ALREADY A SEDAN!');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-20001: Vehicles must be entered as exactly one type.
ORA-06512: at "APPS.DISJOINT_VEHICLE_TYPES_FUN", line 6


SQL>
SQL> INSERT INTO SEDANS VALUES (2, 'ALREADY AN SUV');

1 row created.

SQL>
SQL> COMMIT;
COMMIT
*
ERROR at line 1:
ORA-12008: error in materialized view refresh path
ORA-20001: Vehicles must be entered as exactly one type.
ORA-06512: at "APPS.DISJOINT_VEHICLE_TYPES_FUN", line 6


SQL>
SQL>
SQL> SPOOL OFF



Apart from the superior design, this approach is nifty in that the
materialized view will always be empty (although it's not a free lunch
- there is disk usage for the MV Logs!)


> There are various ways to not expose these columns to the database
> user, such as building views on top of the tables that don't expose
> these columns, or creating stored procedures to implement the data
> access layer.

Right, and again, I'm attempting to offer a solution that avoids
this. The whole "expose as a view" is fine for end-user querying, but
I would never put application developers in a position to work with
anything but base tables (INSTEAD-OF triggers??? blech!)

Erwin

unread,
Oct 21, 2011, 7:27:35 PM10/21/11
to
On 22 okt, 00:31, Kevin Kirkpatrick <kvnkrkpt...@gmail.com> wrote:
>
>
> I should note that, upon reading Erwin's response (the lousy cheat,
> using a TRDBMS)
>
>

I went back to the original question to verify whether TRDBMS's were
ruled out. It read "I have a question on how to efficiently implement
a product database table structure.".

It doesn't rule out TRDBMS's on the condition that they are
effectively available for implementing something.

Something many people still appear to believe is completely
impossible.

;-)

Hugo Kornelis

unread,
Oct 22, 2011, 5:21:07 AM10/22/11
to
On Fri, 21 Oct 2011 15:31:20 -0700 (PDT), Kevin Kirkpatrick
<kvnkr...@gmail.com> wrote:

>I'm not Derek.

Oops! You are completely right. My bad for not checking the names - I
was (and apparently still am) waiting for a reply from Derek, so when
I saw a reply I immediately jumped to the wrong conclusion. My
apologies.

It's interesting to see that Derek has not yet responded to my
question, even though he did already post a message elsewhere in this
thread a few hours after I posted...

Eric

unread,
Oct 22, 2011, 1:27:56 PM10/22/11
to
On 2011-10-19, Derek Asirvadem <derek.a...@gmail.com> wrote:
> On Oct 17, 10:05?pm, Eric <e...@deptj.eu> wrote:
>>
>> It's sort-of fun to watch two people who don't know enough trying to
>> have an argument.
>
> It is even more fun to watch someone on the sidelines, who is not
> party to the argument, picking up clauses and examining them in
> isolation.

A private argument, on Usenet? Whatever next? And if a clause is all I
want to comment on, why should I not pick it out? There is no isolation,
everybody has read, or can read, the rest of the thread.

>> What on earth is "and have them in a visible column, non-pointer, form"
>> supposed to mean?
>
> Let me spell it out for you:
> - I was not posting a new thread (I would have used proper terms).
> - As I detailed to Jonathan, I was not writing a fully qualified
> thesis (if you are expecting that, you will find it lacking)
> - I was posting a response to Celko's thread. That defines the
> context.
> - Celko was playing "slippery fish", to avoid dealing with specific
> problems that I identified (check the whole thread)
> - In an attempt to prevent the slipperiness, I used Celko's terms.
> - As it progressed, I used further terms make distinctions, using his
> used terms as the base (I would not have used his terms, or my-terms-
> that-are-based-on-his-terms in a stand-alone article)

All you have spelled out is the context in which you wrote that, whereas
what I wanted to know was what you meant by it and what is the
alternative (since why mention it if there isn't one?).

>> All you can ever see of an index (apart from its
>> effect on query performance) is the list of columns indexed and some
>> vendor-dependent stuff usually about physical storage.
>
> Excellent, you seem to understand that correctly. So your question is
> either rhetorical or plain absurd.

No, see above.

> If you have an honest argument with me, just state it; if there is
> something you wish to clarify about one of my statements, ask a
> question.

Which is exactly what I did, but you provided no clarification
whatsoever.

> But this picking and gnawing on the carcasses left, after
> others have feasted, is for the birds.

Now that's just rude.

>
>> Just to pick on one more thing:
>>
>> > - each PRIMARY KEY constraint builds an index
>> > - each UNIQUE constraint builds an index
>>
>> Not "builds", but "usually needs" :
>
> Correcting my English is petty. If you fancy yourself an editor, send
> me your CV, I will send you all my posts to edit, before I post.

I am _not_ correcting your English, I am questioning your meaning.
Declaring a Primary Key does not necessarily cause an index to be built,
though in most, if not all, current SQL-based systems, it requires that
one exist. However there is no theoretical reason why this should be so,
it is merely a readily available method that has been widely adopted.
And yes, I have deliberately mentioned a theoretical possibility as part
of my argument, you have no right to tell me that I can't.

>> - nowhere is there a requirement that primary and unique keys be
>> ? implemented with an index
>
> Try reading the entire statement in context. You might draw a
> different conclusion.

Obviously not!

> If that is still confusing, try this: I did not say "there is a
> requirement..." so there is no point arguing against what I did not
> say. I am quite happy to address issues regarding what I have said
> (the thread is ready evidence). But I do not engage with isolated
> scraps. It might give the readers the idea that such isolated scraps
> have some credibility, or that bird-like behaviour is appropriate for
> functioning adults.

More rudeness.

I never claimed that you said so, it was merely a part of my argument.
If you want to insist that a discussion should be only on your terms
then you are obviously not sure you can win without this. And, of
course, you can't make rules of discussion here.

>> - only one index is needed and there is nothing to prevent an SQL parser
>> being smart enough to realise this
>
> Marvellous. Identify which SQL does that (post evidence).

I don't know for certain that one exists. So what? I know that it is
possible. Again, you can't make rules of discussion here.

> Until then your suggestion is possibility in the future, not a
> reality, which is what Celko posted (he did not say "sometime in the
> future, some SQL could ...", he posted "Here is how to do this
> [now] ..."), which I am arguing. I can simply state, no SQL does that
> [now]. The implication is "now", on Earth, not "ten years from now",
> not "if and when the vendors wake up and read Celko's post", not on
> the fourth moon of Jupiter.
>
>> - if the parser is not smart enough there may well be a way of
>> ? second-guessing it and still ending up with only one index.
>
> Feel free to post the method.

Create the table without declaring the keys, then create the appropriate
index, then create the keys. No new indexes will be built.

I know this works in Oracle, because I do it frequently, and I am pretty
sure it works in other products.

> Otherwise your suggestion is a yet
> another fantasy of could-bes and guesses. Just like Celko. Any
> examination reveals that his method is based on facilities that are
> not available now, as stated "do this", that the method might be
> useful at some point in the future, if and when some SQL vendor
> supplies the the underlying facility.
>
> My method is based on the here and now, in any SQL (I have
> specifically excluded the non-SQLs), on Earth.
>
>> And finally...
>>
>> > ... under the covers is not relevant ...
>>
>> True, but you don't always seem to understand where the covers are.
>
> Thanks for your opinion. I will make sure I give it the consideration
> it deserves, keeping in mind your demonstrated level of capability, in
> understanding technical matters, and your could-be, should-be,
> attitude to addressing present day reality.
>
> Should you wish to address any of the salient points in my post, a
> piece of meat, rather than a scrap, I might give your avian opinion
> more consideration.

You have failed to understand what I was saying. You have descended to
insult. No doubt you will think you have "won". Enjoy your "victory".

Bob Badour

unread,
Oct 22, 2011, 3:30:09 PM10/22/11
to
Must the troll feedings continue?

-CELKO-

unread,
Oct 23, 2011, 9:33:13 AM10/23/11
to
>> would never put application developers in a position to work with anything but base tables (INSTEAD-OF triggers??? blech!) <<

While I do not like any kind of TRIGGER, I disagree. Developers are
the guys who need VIEWs, not application users. If you let developers
use VIEWs, then they all do complicated things the same way. If they
all write their own code, you get slightly diffrence versions. My
personal favorite was the equivalent of " age > 18" versus "age >= 18"
in all the permutatisn of a a five table join.

derek.a...@gmail.com

unread,
Feb 2, 2013, 11:32:54 PM2/2/13
to
On Saturday, October 22, 2011 8:21:07 PM UTC+11, Hugo Kornelis wrote:

> I was (and apparently still am) waiting for a reply from Derek, so when
> I saw a reply I immediately jumped to the wrong conclusion. My
> apologies.
>
> It's interesting to see that Derek has not yet responded to my
> question, even though he did already post a message elsewhere in this>
> thread a few hours after I posted...

I must have missed it that day, sorry. Your questions are posted in a reply to Kevin, which I did not read too much white noise in other people's posts on this thread). Sorry about the one year lag! I am responding to keep the thread complete.

> But I am a bit surprised, as your first critique on Joe Celko's
> version was not about logical/physical level, but about performance
> and overhead...

Yes. The thread got lost in cacophony to divert attention away from that core issue.

> I don't know where you get the 220% figure. I do agree that there is
> substantial overhead and cost involved in Joe Celko's solution, but I
> expect that your solution will require at least as much, maybe even
> more overhead.

The 220% overhead figure is loosely calculated as:
1. One additional redundant column in each Subtype
2. One additional redundant index in each Subtype
3. One additional redundant index in the Supertype

Of course the base for computing "additional overhead" is the model with the redundant columns removed, which has one index per supertype & subtype. So two extra indices are 200%, and one extra column times two subtypes is 20%

A tighter calculation would be:
1. 10% additional overhead per Subtype (since it has only two columns)
2. 100% additional indexing overhead per Subtype (since it has only one index)
3. 100% additional indexing overhead (since it has only one index)

> In your post to me, you introduce a different argument: that the
> columns introduced by Joe do not belong in the logical level. I do
> agree with that. But does that imply that they should not be in the
> tables?

I think Kevin stated that, not I.

In any case, it has to do with logical vs physical, which is not relevant to my post.
Celko is introducing them as a logical construct (and Kevin may be trying to make it a physical one). Subtypes and supertypes are logical, not physical: the user sees them, understands them, and understands that the children of the subtypes are restricted to those subtypes, etc.

Anyway my basic point is, it is silly to implement sub-standard constructs substantial addition overhead for each subtype, and for the supertype, when
(a) the standard that has been with us for 35 years, does so without the additional columns and indices, and
(b) the implementation is pure DRI since 2007 (it was code until then).

For people who appreciate a Data Model:
http://www.softwaregems.com.au/Documents/Student%20Resolutions/Anders%20DM.pdf

Or those who want DDL including the CHECK constraint:
http://www.softwaregems.com.au/Documents/Tutorial/Subtype_CHECK.sql

And code for the Function used by the CHECK constraint:
http://www.softwaregems.com.au/Documents/Tutorial/Subtype%20ValidateExclusive_fn.sql

Subtypes eliminate the "nullable foreign keys" that unnormalised data heaps have, with the attendent damaged integrity. They allow full Data and Referential Integrity Declaratively. They also eliminate additional indices (that are otherwise required) in the subordinate tables. Here is an example of that:
http://www.softwaregems.com.au/Documents/Documentary%20Examples/College%20Simple%20DM.pdf

Cheers
Derek

derek.a...@gmail.com

unread,
Feb 2, 2013, 11:35:28 PM2/2/13
to
This
> 3. 100% additional indexing overhead (since it has only one index)
should be:
> 3. 100% additional indexing overhead for the Supertype (since it has only one index)

Cheers
Derek

hugoko...@gmail.com

unread,
Feb 3, 2013, 6:45:43 AM2/3/13
to
Hi Derek,

Thanks for your reply. (And for your email pointing me to it - I stopped doing usenet a long time ago, so I would not have found it otherwise. That's also why I may "look" different: I no longer have a usenet account, and I'm posting for the first time in my life through Google Groups).

Op zondag 3 februari 2013 05:32:54 UTC+1 schreef derek.a...@gmail.com het volgende:
> I must have missed it that day, sorry. Your questions are posted in a reply to Kevin, which I did not read too much white noise in other people's posts on this thread). Sorry about the one year lag! I am responding to keep the thread complete.

I did have a reply to Kevin, but my original response in this topic, where I ask you how to implement the CHECK constraints you hinted at in other posts, was a direct response to one of your posts. However, I can understand that you'd miss posts in this long topic, so no problem. I'm glad you posted now!


> The 220% overhead figure is loosely calculated as:
>
> 1. One additional redundant column in each Subtype
>
> 2. One additional redundant index in each Subtype
>
> 3. One additional redundant index in the Supertype
>
>
>
> Of course the base for computing "additional overhead" is the model with the redundant columns removed, which has one index per supertype & subtype. So two extra indices are 200%, and one extra column times two subtypes is 20%
>
>
>
> A tighter calculation would be:
>
> 1. 10% additional overhead per Subtype (since it has only two columns)
>
> 2. 100% additional indexing overhead per Subtype (since it has only one index)
>
> 3. 100% additional indexing overhead (since it has only one index)

Well, that explains why your overhead estimation is so much higher than I could imagine. Your mathematics are off.

Example one: Yesterday, I bought bread, milk, and eggs. Each of them cost me exactly a dollar, so $3 total. Today, the bread is up 20%, to $1.20, and the milk and eggs are up 100%, to $2.00 each. By your logic, my total pricetag would have gone up by 20%+100%+100%=220%, to $9.60. In reality, I now pay $5.20, an increase of "only" 73 1/3%.

Example two: Yesterday, I bought bread for $1.50, milk for $0.80, and eggs for $0.70, so (again) $3 total. Today, the bread is up 20%, to $1.80, and the milk and eggs are up 100%, to $1.60 and $1.40. Now, I now pay $4.80, and the relative increase is now 60%.

Example three: Same as above, but I also bought ham, jelly, and ground beef, for $12.00 total. They have not changed in price. The price increase is still $1.80, but as a percentage, it's now down to 12%.

The actual overhead of the extra columns and indexes depends on a lot of things, like how many other indedxes there are, how many columns, what data types for the extra columns and the other columns, how the RDBMS implements indexes, unique constraints, and foreign key constraints, etc. To get a first approximation by RDBMS, I would implement both versions (with a minimal set of columns and no extra indexes), populate them with a number of rows high enough to minimize the impact of overhead (e.g. 10,000 per subtype), then calculate space used for each scenario. That tells me the overhead as an exact number (divide by 10,000 to get a per-row figure), or the MAXIMUM overhead as a percentage (extra columns and extra indexes make the percentage go down).


(snip)

> Anyway my basic point is, it is silly to implement sub-standard constructs substantial addition overhead for each subtype, and for the supertype, when
>
> (a) the standard that has been with us for 35 years, does so without the additional columns and indices, and
>
> (b) the implementation is pure DRI since 2007 (it was code until then).

The last version of the ANSI standard I have access to is 2003, so I can't comment on point b.
re point a: I was under the impression that user-defined functions were not in the ANSI standard, but luckily I verified and turned out to be wrong. They are. So technically your are right - your method that relies on user-defined functions is in the standard. However, the goal of the standard is ease of moving (both code and people) to other platforms. Unfortunately, the major DBMS vendors all have their own legacy syntax for functions. I checked SQL Server, Oracle, DB/2, and MySQL (using syntax documentation I found through Google). The only implementation that is at least close to the standard is DB/2, all others have their own syntax - so until that changes, whether or not the feature is part of the standard is a moot point.


(snipped links)

I checked the links you provided, and despite all I've said above, I do like your approach. In the future, if people ask me how to implement subtypes, I will probably give them the two versions.

Choosing which one to use depends on various things. Your version eliminates the storage overhead, but at the price of a performance overhead. When inserting or updating a single row, that won't be a big problem. But what happens when inserting or updating thousands or even millions of rows in a single statement? This probably depends on the chosen DBMS. I know that SQL Server is unable to inline scalar user-defined functions into an execution plan, so for SQL Server, the only option is to execute a seperate call to the function for each row (or at least for each unique set of input parameters -which in this case would be the same, since the primary key is included in the input parameters-). So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery. Your performance will, excusez mon French, suck.

Another possible problem you have to be aware of with your solution is that, because the subquery is hidden in the UDF, it is unknown to the DRI mechanisms. Some scenarios will allow violations of the CHECK constraints, becuase the RDBMS does not know it has to re-check. Here's an example:

-- I've got a new book:
INSERT Product (ProductId, ProductType, Name) VALUES (1, 'B', 'Talk That Talk');
INSERT ProductBook (ProductId, NumPages) VALUES (1, 11);
-- Oh, wait - it's not a book, it's a CD!
UPDATE Product SET ProductType = 'C' WHERE ProductId = 1;

This won't trigger an error in SQL Server, because the CHECK constraint is only checked when the specific row in ProductBook is inserted or updated.
Now this is an unusual scenario, but there are also lots of subtype situations where instances can migrate to another subtype. E.g. an employee that starts as a contractor but then decides to join the company and moves to the Payroll subtype. Or a supertype Person with subtypes Single, Married, Widowed, Divorced.

Bottom line: I like your solution. But it's not 100% safe, and though you avoid the storage overhead, you pay for this with a performance overhead. I'd recommend people to choose between the two solutions on a case by case basis.

Cheers,
Hugo

derek.a...@gmail.com

unread,
Feb 3, 2013, 8:01:28 PM2/3/13
to
Hugo

On Sunday, 3 February 2013 22:45:43 UTC+11, hugoko...@gmail.com wrote:
>
> > A tighter calculation would be:
> > 1. 10% additional overhead per Subtype (since it has only two columns)
> > 2. 100% additional indexing overhead per Subtype (since it has only one index)
> > 3. 100% additional indexing overhead (since it has only one index)
> > [Corrected to]
> > 3. 100% additional indexing overhead for the Supertype (since it has only one index)
>
> Well, that explains why your overhead estimation is so much higher than I could imagine. Your mathematics are off.

<shopping snipped>

> By your logic ...

That is neither logic, nor my logic, nor my mathematics, not mathematics. That is your interpretation, and application to some other scenario. I do not see any point in commenting on it. I will answer on the basis that you are not being contrary, that you actually do not understand, and limit my answer to the statements made.

If you want to understand what I stated, please just read my words (which I am responsible for), do not jump to conclusions (which in any case, you are responsible for).

2. If my Subtype table has one index, and Joe's Subtype table (to perform the same function, provide the same integrity, etc) has two indices, then it is a simple fact that Joe's table has
"100% additional indexing overhead per Subtype"

3. If my Supertype table has one index, and Joe's Supertype table (to perform the same function, provide the same integrity, etc) has two indices, then it is a simple fact that Joe's table has
"100% additional indexing overhead per Supertype"

I did not state "100% additional overall overhead", I limited my calculations to the subject matter, the implementation of SubType. I did not imply anything about the rest of the content of the tables.

Now if you wish to apply my statements to the price of sausage, that's your business, but I can't join you there.

A more reasonable example, if you are trying to apply it to everyday life, would be:
• I [Supertype] move around the world, and go about my consulting business with one heavy briefcase; Joe [Supertype] moves around carrying two heavy briefcases wherever he goes.
• Additionally, my whole development team [Subtypes] moves around the country carrying one heavy briefcase each; Joe's subordinates [Subtypes] each carry two heavy briefcases.
• Joe's team performs the same tasks as mine, using 100% additional *briefcase* overhead.
No implications are made re the weight of each person, or their speed, or their eyesight, or the price of fish.

> The actual overhead of the extra columns and indexes depends on a lot of things,

Yes.

> like how many other indedxes there are, how many columns, what data types for the extra columns and the other columns, how the RDBMS implements indexes, unique constraints, and foreign key constraints, etc.

That is not so relevant. All that exercise would produce is a difference between 100% and 98% or between 100% and 102%.

In the example we have one index, and Joe has two, so unless you are moving away from the example, the fact of the existence of the second *additional, redundant* index is 100% (two minus one divided by one).

The number of indices that the table may have for other reasons is irrelevant to the problem at hand, we are discussing Subtype implementation, so regardless of the number of indices on the table, if you implement the standard Subtypes structure, you need one more index; if you implement Joe's structure, you need two more indices, which is "100% additional indexing overhead per Subtype, and 100% additional indexing overhead per Supertype" than the standard.

> To get a first approximation by RDBMS, I would implement both versions (with a minimal set of columns and no extra indexes), populate them with a number of rows high enough to minimize the impact of overhead (e.g. 10,000 per subtype), then calculate space used for each scenario. That tells me the overhead as an exact number (divide by 10,000 to get a per-row figure), or the MAXIMUM overhead as a percentage (extra columns and extra indexes make the percentage go down).

Yes, that might be a good approach to benchmark the *overall* overhead, a new subject, not relevant to the statements that I made.

The fact remains the that two minus one divided by one = 100% additional. If there are 10 Subtypes, Joe has 11 additional redundant indices, and 10 additional redundant columns.

> > Anyway my basic point is, it is silly to implement sub-standard constructs substantial addition overhead for each subtype, and for the supertype, when
> > (a) the standard that has been with us for 35 years, does so without the additional columns and indices, and
> > (b) the implementation is pure DRI since 2007 (it was code until then).
>
> The last version of the ANSI standard I have access to is 2003, so I can't comment on point b.
>
> re point a: I was under the impression that user-defined functions were not in the ANSI standard, but luckily I verified and turned out to be wrong. They are. So technically your are right - your method that relies on user-defined functions is in the standard.

Yes, I know.

(In the genuine Relational world, we have implemented this Subtype from the outset. It is identified in the Relational Model, over 40 years ago, and it was fully catered for in the first publication for Standard for modelling Relational databases, which was 35 years ago. The difference is, until 2007 when Functions arrived, the constraints were implemented in the [required anyway] transactions. Since 2007, we can implement Functions, and implement Declarative CHECK Constraints for the Subtypes.)

> However, the goal of the standard is ease of moving (both code and people) to other platforms. Unfortunately, the major DBMS vendors all have their own legacy syntax for functions. I checked SQL Server, Oracle, DB/2, and MySQL (using syntax documentation I found through Google). The only implementation that is at least close to the standard is DB/2, all others have their own syntax - so until that changes, whether or not the feature is part of the standard is a moot point.

1. You statement re the goal is too simplistic, but let's not get distracted.

2. Sure, the syntax is different. So what ? The syntax for every SQL statement is different, particularly in the area of (a) extensions and (b) features that each vendor implemented before the standard syntax was published. So what ?

What is relevant, is that Functions (tables, datatypes, columns, everything in a database is User Defined, we do not have to keep telling ourselves that fact) are now Standard. And that you can use a function anywhere that you can use a column. So as long as the SQL platform does that (provide the standard requirement), you can implement standard Subtypes *declaratively* on that platform (and non-declaratively since whenever the platform was born). The syntax is irrelevant.

If you can't read my code, and make the necessary syntax changes on your platform to implement my code on your platform, then yes, there is a problem, of a different sort. Name your platform and I will write the code for you.

3. Regardless of the consideration of Subtype implementation, migration from any flavour of SQL to another flavour has a series of problems. Those problems are neither more nor less due to the use of standard support for Subtypes or a sub-standard support of Subtypes which doubles the number of indices. The point does not apply to the discussion.

> I checked the links you provided, and despite all I've said above, I do like your approach. In the future, if people ask me how to implement subtypes, I will probably give them the two versions.
>
> Choosing which one to use depends on various things. Your version eliminates the storage overhead, but at the price of a performance overhead.

Er, what "performance overhead" ? Please specify exactly, or withdraw the statement.

And please, limit your mathematics to the problem at hand, and identify whatever factors you determine to be "overhead" against the same for Joe's structure. We are comparing the two structures, not identifying "overhead" on one side only, that is the point of the thread. So identify whatever "performance overhead" Joe's structure has as well (hint: inserting/deleting to a table with two indices will perform slower than with one index).

> When inserting or updating a single row, that won't be a big problem. But what happens when inserting or updating thousands or even millions of rows in a single statement?

That is a very stupid thing to do, for any scenario, not only for Subtypes, because it will blow the transaction log. ANSI/IEC/ISO SQL is transactional. So whatever proscriptions apply to such attempts, apply to Subtype insertions as well, no more, no less. Each platform will handle that differently.

> This probably depends on the chosen DBMS. I know that SQL Server is unable to inline scalar user-defined functions into an execution plan, so for SQL Server, the only option is to execute a seperate call to the function for each row (or at least for each unique set of input parameters -which in this case would be the same, since the primary key is included in the input parameters-). So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery. Your performance will, excusez mon French, suck.

Excuse me, no, *your* performance will suck.

My performance (DB2 and Sybase) flies. I already have close to one hundred standard Subtype structures in production. Millions of transactional inserts have been tested. Probably a thousand full table unload/reloads (thousands to millions of rows each) have been tested.

Oracle carks itself with subqueries, so I wouldn't try even the single row insert on it. If an Oracle person reads this, and decides that they cannot implement Standard Subtypes, that they have to implement Joe's monstrosity instead, that would be a valid decision. (There are many standard features with Oracle does not support, resulting in their famously bloated databases, but let's not get distracted.)

When you get into contemplating that level of "performance" problem, it becomes very platform dependent, and dependent on the quality of code, for both transactional inserts, and streamed inserts. We are better of if we limit this thread to logical issues and measurements, and keep away from physical implementations of those logical constrcuts, which are heavily dependent of platforms, their specific limitations, etc.

Actually, I have people who use my Subtype structure, and who have implemented it on MS SQL without any problems. So I can say for sure that (a) they did so without jumping through the hoops that you are speculating about, and (b) suffer no performance problems or "performance problems", but I do not know what MS SQL does internally in each version, to agree or refute your speculation about it. Therefore, I will state simply, I seriously doubt that that is the case for MS SQL.

> Another possible problem you have to be aware of with your solution is that, because the subquery is hidden in the UDF, it is unknown to the DRI mechanisms. Some scenarios will allow violations of the CHECK constraints, becuase the RDBMS does not know it has to re-check.

Excuse me, but it *IS* a DRI CHECK Constraint. Please read the code again.

"Some scenarios" is not correct, no, scenarios implies the general case, not platform specific. What you are talking about is how a particular platform operates, not scenarios. So the honest statement is, "some platforms will allow violations of the CHECK Constraint".

Well, that is a serious issue, and on the face of it I would say the indicated platform is totally unreliable, as well as non-compliant.

> Here's an example:
>
> -- I've got a new book:
>
> INSERT Product (ProductId, ProductType, Name) VALUES (1, 'B', 'Talk That Talk');
> INSERT ProductBook (ProductId, NumPages) VALUES (1, 11);
>
> -- Oh, wait - it's not a book, it's a CD!
>
> UPDATE Product SET ProductType = 'C' WHERE ProductId = 1;
>
> This won't trigger an error in SQL Server, because the CHECK constraint is only checked when the specific row in ProductBook is inserted or updated.
>
> Now this is an unusual scenario, but there are also lots of subtype situations where instances can migrate to another subtype. E.g. an employee that starts as a contractor but then decides to join the company and moves to the Payroll subtype. Or a supertype Person with subtypes Single, Married, Widowed, Divorced.

Yeah, sure. It appears that you do not understand context.

1. The example code provides the implementation details for standard Subtypes. There is no suggestion that it does anything more than that. It is not a full blown application.

2. The Supertype-Subtype structure is supposed to be considered as a single unit, not as a bunch of tables with disparate and singular relationships between them. You should not be updating anything in the cluster without considering the whole cluster, transactionally.

3. When writing an application, you are best advised to implement all inserts/updates/deletes as transactions; you should never write singular inserts/updates/deletes.

4. Due to the limitations of the SQL Spec (the Committee is about 20 years behind the high-end vendors, and about 35 years behind E F Codd), part of your data and referential integrity will be implemented declaratively, and the remainder will be implemented in transaction code. Those of us who care about quality and maintenance maximise the declarative part.

5. The example code, being an example of Subtype implementation, uses singular inserts/updates/deletes, rather than transactions, as is commonly accepted in posted examples.

6. The code is syntactically Sybase as declared in the header, but it is presented as standard code; you need to make the minor syntactic changes required for your platform.

7. More important, when contemplating an application of said Subtypes, (as you have pointed out) you need to consider how your platform operates with regard to constraint checking; deferred constraint checking, limitations of functions; limitations of check constraints; limitations of transactions; your transaction design; etc, and design accordingly. I have not provided that; I have provided standard code only, and it is up to the competent implementer to implement it appropriately for their platform.

8. I consider the constructs required to prevent your UPDATE to be ordinary Data Integrity or ordinary Referential Integrity, and that is another reason why the considered declarations or code is not in my posted example. UPDATE Product without the corresponding DELETE ProductBook, INSERT ProductCD falls into the same category of:
• INSERT Product without a INSERT ProductBook
• DELETE ProductBook without the corresponding DELETE Product
• etc

Therefore:

9. It is a no-brainer for me, on my platform, to ensure that your example UPDATE as well as those of the same category [8] are disallowed, as part of the implementation of ordinary Data and Referential Integrity for the whole database. How I do that will be standard SQL, but the implementation will be specific to my platform; its limitations; my existing transactions; my transaction standards; etc.

10. However, you are free to perceive those considerations as part of Subtype Integrity, limited to this cluster, and not as ordinary Data Integrity or ordinary Referential Integrity that would be part of your whole database (which means that your database outside this cluster is devoid of that basic integrity!). In which case, feel free to add them in. In a way that is appropriate to your platform. Any way that you choose to do it, is fine with me, as long as it maintains Data and Referential integrity. Given your discourse re MS SQL, I would not be asking for it to be limited to Standard SQL.

11. If you have trouble writing the required Declarative Constraints and transaction code (no offence, but this structure is new to you, and you do not understand it yet) that is required for ordinary Data and Referential integrity, that you are implying is part of the Subtype integrity, at your request, I would be happy to supply it. Using Standard SQL, and you will have to make the minor syntactical changes required for your platform. i won't be posting transaction basics or tutorials, I will assume the reader knows that, I will post just the code.

> Bottom line: I like your solution. But it's not 100% safe, and though you avoid the storage overhead, you pay for this with a performance overhead. I'd recommend people to choose between the two solutions on a case by case basis.

I refute the "not 100% safe" on the basis described above: implement whatever safety you need, that is not shown in , and cannot reasonably be expected to be shown in, the example of Subtypes.

I refute the "performance overhead", as detailed above, but I cannot be specific because you have not identified specifically what the alleged "performance overhead" is.

I strongly recommend that you do not recommend anything at all about this subject until:

a. you understand it, which clearly, has not been achieved yet

b. you have actually implemented it and tested it (both transactional inserts/updates/deletes, and singular streams inserts, etc), and you have gained actual experience, and some basis for confidence

Until then, your recommendations are based on contemplation, not facts, not experience.

Eg. I recommend that people use standard structures which require half the number of indices, over one that requires twice the number of indices. That is a general recommendation, not limited to Subtypes.

Cheers
Derek

derek.a...@gmail.com

unread,
Feb 3, 2013, 9:02:46 PM2/3/13
to
Hugo

Since you are taking such detailed interest in the posted example code, and forming conclusions about what standard Subtypes can and cannot do, especially in regard to CHECK Constraints; Functions; transactions; etc, it is probably worth explaining a couple of other points.

As stated in my IDEF1X Notation (linked doc), both the RM, and IDEF1X as its Standard Modelling method, support a full range of Subtype configurations. The above examines Exclusive Subtypes and their integrity in detail. There are also Non-exclusive Subtypes, and the integrity for those is exactly the same. Again, I have given the code required for integrity of the Non-exclusive Subtype, but not for ordinary Data and Referential Integrity, which is required for all tables, Subtype or not. Consider this for Non-exclusive Subtype combinations:
• a ProductBook must have a ProductCD
• before there is a PersonWidowed, there must be a pre-existing PersonMarried
• etc

There is a correct method, using FKs between the Subtypes, but that does get cumbersome. Given that the Subtypes each have integrity, which we can rely on for basic existence, etc, and that we do not have to repeat that within the Subtypes cluster, it is often simpler to put the code required for the above in the single function Function that will be called by the single CHECK constraint. I have hundreds of those.

The second point is a general issue re the CHECK Constranit and the use of a function therein, not limited to Subtypes. The CHECK Constraint allows checking of the current inserted/updated row only. We can check other columns in the row, but we cannot check rows in other tables.

After 2007, with Functions, we can now check rows in other tables, and therefore, columns in rows in other tables. Which is what I am doing in the Subtype Check Function. That is a very important capability to note. We can now check:
• PersonMarried.SpouseId exists in PersonSingle(SpouseId) and Person(SpouseId).Sex is not Person(PersonId).Sex
• ProductCD.ArtistId exists in Artist and is not dead

Point being, Functions give us:
a. a vehicle to extend Business Rules in ways we could not before, and
b. to do so using Declarative Constraints

It is not limited to Subtypes.

Cheers
Derek

derek.a...@gmail.com

unread,
Feb 5, 2013, 6:02:36 AM2/5/13
to

Hugo

A couple of things came to mind while I was waiting for your response.

-------------------------------------------------
Ordinary Data & Referential Integrity
-------------------------------------------------

A bit more clarity, re the subject UPDATE statement and my points [8] and [10] above, because the contexts are confusing to you. Let's take an example that has nothing to do with Subtypes, so that you can see the issue clearly, without the opportunity to mix things up. Let's say you have a typical Order (the parent) table and an OrderItem (the child) table. The data model states:
• each Order has one-or-many OrderItems
We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.

1. Via an Add transaction (stored proc, SQL ACID), so that the parent and the first child are inserted as an Atomic change to the database.

2. And via a Drop transaction, so that the delete does not delete the last Orderitem for an Order. (Or by not providing an OrderItem_Drop.)

3. I trust that you know this, but for the benefit of anyone reading this, who does not know: in a normal secured SQL database, we do not allow users to make changes directly to the tables (and stuff things up), we allow changes to the database *only* through transactions. That scheme is provided via:
• on every table, SELECT is GRANTED to Roles that are nominated
• UPDATE/INSERT/DELETE is not granted to any Role or Group or User
• dbo aliases are not permitted
• EXECUTE permission for nominated transactions is GRANTED to nominated Roles
Otherwise you have chaos, not a database.

(
It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities.
)

It is the same for a Subtype cluster, nothing more, nothing less.

Recall the subtype cluster is a single logical unit. Implement:
• one Subtype_Add transaction for adding Subtypes, such that the generic Type and the *correct* Subtype is added together, as an Atomic change to the database.
• one Subtype_Drop transaction for deleting Subtypes, such that both the Subtype and the generic Type are deleted together, as an Atomic changes to the database.

For changing a book to a CD, you have two choices. Implement a transaction that does one of the following. In both cases they are still transactions, Atomic changes to the database:
• (given the above two transactions exist) one Subtype_Alter transaction to
••• execute the Drop transaction,
••• then execute the Add transaction.

• a fresh transaction that performs the following:
••• delete the old Subtype and generic Type
••• insert the new generic Type and Subtype

I can't stand code duplication, so I would go with the former, but some developers like to ensure their job security, they would go with the latter and with redundant indices.

I hope this illustrates that, the issue you raised re the UPDATE, although it is important, has nothing to do with Subtypes, or my Subtypes structure. It has everything to do with the context of ordinary Data and Referential Integrity across the whole database; transactions; the issues re transactions on your platform; and how you implement them. That is why it is not in the Subtype example code. Whatever it is that you do for those pairs of tables anywhere in the database, that require integrity between them ... do that same thing for each Subtype, no more, no less.

-----------------
Performance
-----------------

> > ... So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery.

I am not sure that I understand the specific issue you are concerned about, so forgive me if I am wrong.

I have used MS SQL a fair bit over the decades, although I do not consider myself a specialist, and I can't keep up with the rewrites-from-scratch every three years. As you are probably aware, executing a QP without a suquery, thousands of times is a non-issue; the issue is in the qualifying set size and the table size. And here the set size is *one*, since it is not a streamed insert.

My experience of it, and this is true up to version 2008 (after which, I don't know), is that it generates a very good query plan for subqueries (no normalisation of the QP ala current Sybase, but very good, ala pre-2003 Sybase). The QP is "flattened", there is no subquery in it, it is just a node in the tree. So let me assure you that executing a QP *with* a subquery, thousands of times, is a non-issue.

But please don't take my word for it, you can always try it. Use my DDL, so as to save time, load up a Product table with 16 million rows (a thousand is nothing these days). It takes only 5 to 10 mins. In case it needs to be said, please do not use a cursor: not only is it suh-low, it will skew the result badly. I don't have anything MS or MD in my stable, otherwise I would offer.

(I have heard that recent releases of MS SQL have an issue with Functions, depending on the content of course, but not the QP or the QP which includes a subquery. I think it has to do with large amounts of code in Functions. Therefore we should be safe with a single-statement function.)

If that succeeds, but for some reason you are not blissfully happy, and you want to really push the construct further with a streamed insert, let me know, and I will supply the code.

> > Your version eliminates the storage overhead, but at the price of a performance overhead.
>
> Er, what "performance overhead" ? Please specify exactly, or withdraw the statement.

In your own time.

Cheers
Derek

hugoko...@gmail.com

unread,
Feb 5, 2013, 7:59:38 AM2/5/13
to
Hi Derek,

Thanks for your reply. I do need to address a couple of your points.


Op maandag 4 februari 2013 02:01:28 UTC+1 schreef derek.a...@gmail.com het volgende:
> Hugo
>
>
>
> On Sunday, 3 February 2013 22:45:43 UTC+11, hugoko...@gmail.com wrote:
>
> >
>
> > > A tighter calculation would be:
>
> > > 1. 10% additional overhead per Subtype (since it has only two columns)
>
> > > 2. 100% additional indexing overhead per Subtype (since it has only one index)
>
> > > 3. 100% additional indexing overhead (since it has only one index)
>
> > > [Corrected to]
>
> > > 3. 100% additional indexing overhead for the Supertype (since it has only one index)
>
> >
>
> > Well, that explains why your overhead estimation is so much higher than I could imagine. Your mathematics are off.
>
>
>
> <shopping snipped>
>
>
>
> > By your logic ...
>
>
>
> That is neither logic, nor my logic, nor my mathematics, not mathematics. That is your interpretation, and application to some other scenario. I do not see any point in commenting on it. I will answer on the basis that you are not being contrary, that you actually do not understand, and limit my answer to the statements made.
>
> If you want to understand what I stated, please just read my words (which I am responsible for), do not jump to conclusions (which in any case, you are responsible for).

You are indeed right that I jumped to a conclusion. I originally asked you where you got the 220% overhead figure that you introduced earlier in this discussion:

On Sat, 15 Oct 2011 02:41:24 -0700 (PDT), Derek Asirvadem
<derek.a...@gmail.com> wrote:

(...)
>Ok, so you are both fools.
>
>It is not common, because 220% overhead and substantial cost, is just
>plain stupid. People are not quite as stupid as you think.

In your reply to my question about that 220%, you mentioned points 2 and 3 as above, plus point 1 but with a 20% estimate in that reply (you later changed that to 10%). You never told me how those 20, 100, and 100 numbers had to be combined to arrive at your claimed 220% overall overhead. Since adding happened to result in just the right number, I assumed that you had done that -which, of course, would violate logic and mathemetical rules.
Apologies for this assumption. And, forgive my ignorance, I have to ask now - *how* do we get an overall 220% overhead from 10% on the subtype tables, 100% on the subtype indexes, and 100% on the supertype indexes?



> (In the genuine Relational world, we have implemented this Subtype from the outset. It is identified in the Relational Model, over 40 years ago, and it was fully catered for in the first publication for Standard for modelling Relational databases, which was 35 years ago. The difference is, until 2007 when Functions arrived, the constraints were implemented in the [required anyway] transactions. Since 2007, we can implement Functions, and implement Declarative CHECK Constraints for the Subtypes.)

And even before that, we could implement declarative CHECK constraints using subqueries, in products that implement the feature (SQL:2003, Feature F671 "Subqueries in CHECK constraints"). Or we could even directly declare the subtable hierarchy, again in products that implement the feature (SQL:2003, Feature S081 "Subtables"). These two options are far superior to both the method based on redundant columns as described by Joe in this topic and your function-based method.


> 2. Sure, the syntax is different. So what ? The syntax for every SQL statement is different, particularly in the area of (a) extensions and (b) features that each vendor implemented before the standard syntax was published. So what ?

So people who know that their code might be ported, or who for other reasons prefer to stay as close to the standard as possible, try to use ANSI features as much as possible, and non-standard features as little as possible. The method posted by Joe, that you criticize, uses CHECK and FOREIGN KEY constraints, directly portable across most (if not all) major RDBMS platforms. Your version requires much more work to port.
That doesn't make it a bad solution. As I said before - it is a good solution, but it's not the be-all-end-all. There are situations where I might choose your solution over Joe's, but there are also many situations where I'd prefer Joe's.


> What is relevant, is that Functions (tables, datatypes, columns, everything in a database is User Defined, we do not have to keep telling ourselves that fact) are now Standard. And that you can use a function anywhere that you can use a column. So as long as the SQL platform does that (provide the standard requirement), you can implement standard Subtypes *declaratively* on that platform (and non-declaratively since whenever the platform was born). The syntax is irrelevant.

Before we go off on a debate on what is and what isn't declarative, maybe we should agree on the same definition?

The definition I have always heard and used so far is that declarative code only describes the intended effect without detailing the steps to achieve it, and that the alternative is procedural code that describes multiple steps that have to be taken in the specified order. By that definition, stored procedures, triggers, and functions (*) are all procedural, since they can have multiple statements in their body that have to be executed in order.
(*) One exception to this is a very specific kind of function that SQL Server supports, "inline table-valued function" - actually a misnomer, for it should have been called a parametrized view.

The reason why I care is that the nature of procedural code makes it much harder for the DBMS to fully understand what the result of the code is. And that, in turn, makes it extremely hard, maybe even impossible, to implement an *efficient* way to guarantee that, no matter what we do, the constraint is never violated. (The unefficient way is trivial - simply check all CHECK constraints for all rows in all tables after every transaction (deferred) or data modifying statement (immediate). This is -if I read it correctly- how SQL:2003 defines the effect of CHECK constraints.)



> If you can't read my code, and make the necessary syntax changes on your platform to implement my code on your platform, then yes, there is a problem, of a different sort. Name your platform and I will write the code for you.
>

No worries, I can read and adapt your code just fine.


> Er, what "performance overhead" ? Please specify exactly, or withdraw the statement.

Executing the function, which involves evaluating a query, for every row inserted in each subtable, and for every row updated in each subtable with a statement that has the possiblity to update the primary key column.

How much that will be, and how much more or less that will be than the performance overhead of Joe's version, depends on lots of factors. Chosen platform (and choices made by the engineering team of that vendor) probably being the biggest.


> > When inserting or updating a single row, that won't be a big problem. But what happens when inserting or updating thousands or even millions of rows in a single statement?
>
>
>
> That is a very stupid thing to do, for any scenario, not only for Subtypes, because it will blow the transaction log. ANSI/IEC/ISO SQL is transactional. So whatever proscriptions apply to such attempts, apply to Subtype insertions as well, no more, no less. Each platform will handle that differently.

You think executing an INSERT ... SELECT is a stupid thing to do?
Or an UPDATE ... WHERE?

Those are statements that potentially affect thousands or even millions of rows. I personally prefer to use them over writing code to loop over rowsets one by one - a single statement is less lines of code, easier to understand, and (at least on SQL Server) performs orders of magnitude better - as long as you don't do any stupid things. For SQL Server, using scalar user-defined functions qualifies as a stupid thing.

I don't see the relevance of your remark about log space. If I need to insert millions of rows in a single transaction, I'll need the same log space whether I use a single INSERT ... SELECT statement or a million INSERT ... VALUES statements. (In fact, the former case would probably use LESS log space, depending on how the vendor implements the transaction log).


> > This probably depends on the chosen DBMS. I know that SQL Server is unable to inline scalar user-defined functions into an execution plan, so for SQL Server, the only option is to execute a seperate call to the function for each row (or at least for each unique set of input parameters -which in this case would be the same, since the primary key is included in the input parameters-). So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery. Your performance will, excusez mon French, suck.
>
>
>
> Excuse me, no, *your* performance will suck.

*Your* as in: the performance of your code on SQL Server.

> My performance (DB2 and Sybase) flies. I already have close to one hundred standard Subtype structures in production. Millions of transactional inserts have been tested. Probably a thousand full table unload/reloads (thousands to millions of rows each) have been tested.

Great! Have you done a performance comparison test of your method versus "the Joe construction"?


> When you get into contemplating that level of "performance" problem, it becomes very platform dependent, and dependent on the quality of code, for both transactional inserts, and streamed inserts. We are better of if we limit this thread to logical issues and measurements, and keep away from physical implementations of those logical constrcuts, which are heavily dependent of platforms, their specific limitations, etc.

The same can be said for storage overhead. Unique constraints don't have to be implemented by adding an index. Foreign keys don't have to be implemented by adding a column (some vendors use pointer chains instead). Etc.

Also, it is quite surprising to see someone who said "you are both fools" to Joe Celko and Roy Hann because they advocate a subtype/supertype solution that, allegedly, has 220% overhead now saying that we should keep away from physical implementations.

If we look at the logical side only, not caring about implementation details such as storage or performance, both solutions are good. They both achieve what they set out to achieve - ensure that there are no integrity violations in a subtype/supertype relationship.

If we look at implementation efffect as well, we should look at all effects. In this case, the effects are on storage and on performance.


> Actually, I have people who use my Subtype structure, and who have implemented it on MS SQL without any problems. So I can say for sure that (a) they did so without jumping through the hoops that you are speculating about, and (b) suffer no performance problems or "performance problems", but I do not know what MS SQL does internally in each version, to agree or refute your speculation about it. Therefore, I will state simply, I seriously doubt that that is the case for MS SQL.

I'll try to find the time to do a simple test. If I find the time, I'll post repro code and test results in a later message.


> > Another possible problem you have to be aware of with your solution is that, because the subquery is hidden in the UDF, it is unknown to the DRI mechanisms. Some scenarios will allow violations of the CHECK constraints, becuase the RDBMS does not know it has to re-check.
>
> Excuse me, but it *IS* a DRI CHECK Constraint. Please read the code again.
>
> "Some scenarios" is not correct, no, scenarios implies the general case, not platform specific. What you are talking about is how a particular platform operates, not scenarios. So the honest statement is, "some platforms will allow violations of the CHECK Constraint".

I don't have to read the code again. I know what it does.
Will the sequence of two INSERT statements and one UPDATE statement cause a constraint violation on any of the platforms you use? On SQL Server, it won't. The UPDATE will be accepted, causing the data in my table to violate the subtype/supertype rules. The corresponding sequence of two INSERT statements and one UPDATE statement against Joe's implementation *will* cause an error to be raised.
This can be fixed - I would need to add a CHECK constraint on the Type column in the supertype table that checks that the corresponding primary key value does not exist in any of the subtype tables that are mutually exclusive with the one indicated by the type value. for a supertype/subtype relationship with (n) mutually exclusive types, that requires (n-1) subqueries.


(snip)
> Yeah, sure. It appears that you do not understand context.

The context is that:
1. Joe Celko posted a generic framework that can be used to protect integrity in a subtype/supertype relationship;
2. You called him and Roy Hann "fools" for proposing and endorsing that framework;
3. You then posted your, supposedly better, alternative; and
4. I assumed (yup, did it again!) that this alternative was designed to meet the same goals as the original: being a generic framework to protect integrity in a subtype/supertype relationship.

If the code you posted is *not* intended to be a generic method for subtype/supertype relationships, or if protecting integrity is *not* what the code sets out to do, please say so now - that would save both you and me a lot of time and effort!


> 1. The example code provides the implementation details for standard Subtypes. There is no suggestion that it does anything more than that. It is not a full blown application.

Ah, there's part of the answer already.


> 2. The Supertype-Subtype structure is supposed to be considered as a single unit, not as a bunch of tables with disparate and singular relationships between them. You should not be updating anything in the cluster without considering the whole cluster, transactionally.

Agreed that I shouldn't. Just as I should not delete a "parent" (hate those terms!) that still has a "child" in a foreign key relationship. But if I do so anyway, I trust the FOREIGN KEY constraint to slap me and prevent the change. Similarly, I expect whatever construct I use to guard the supertype/subtype relationship to slap me and prevent the update if I do something that would break integrity. The code I posted proves that there is at least one case on at least one platform where it doesn't.

> 3. When writing an application, you are best advised to implement all inserts/updates/deletes as transactions; you should never write singular inserts/updates/deletes.

When writing singular inserts/updates/deletes, a transaction is implicitly started. SQL Server (and probably other vendors too) support two options on when these are committed - implicitly at the end of the statement (if no constraint was violated), or only after an explicit COMMIT statement.

But I fail to see the point. Constraints (and I consider both your and Joe's code as a way to work around a limitation of a constraint, so I expect the same behaviour) should always work. Even when application code does not follow best practices. (In fact, in that case constraints are even MORE important!)

(snip)
> 5. The example code, being an example of Subtype implementation, uses singular inserts/updates/deletes, rather than transactions, as is commonly accepted in posted examples.

So why slap _me_ for using singular inserts/updates/deletes in my posted examples?


(snip)
> 11. If you have trouble writing the required Declarative Constraints and transaction code (no offence, but this structure is new to you, and you do not understand it yet) that is required for ordinary Data and Referential integrity, that you are implying is part of the Subtype integrity, at your request, I would be happy to supply it. Using Standard SQL, and you will have to make the minor syntactical changes required for your platform. i won't be posting transaction basics or tutorials, I will assume the reader knows that, I will post just the code.

This is now the fourth time in a single post that you hint at my incompetence. You are free to believe anything you want of me, but saying this out loud and then repeating it multiple times does not really add anything to the credibility of your arguments.


> > Bottom line: I like your solution. But it's not 100% safe, and though you avoid the storage overhead, you pay for this with a performance overhead. I'd recommend people to choose between the two solutions on a case by case basis.
>
>
>
> I refute the "not 100% safe" on the basis described above: implement whatever safety you need, that is not shown in , and cannot reasonably be expected to be shown in, the example of Subtypes.

Your code as posted does not provide 100% safety against integrity problems in the supertype/subtype relationship on at least one platform. The code I provided will result in a supertype indicating a CD, with a row in the table for the Book subtype.

Before starting this reply, I spent a lot of time trying to decipher the relevant sections of the SQL:2003 standard (not the latest, I know, but I have to make do with what I have). If I didn't misunderstand anything, your code (or at least the ANSI-standard equivalent code) will provide 100% safety in a fully compliant RDBMS, because (again, if I understand completely) the standard says that the database should re-check all CHECK constraints for all rows in all tables after any modification - or at least behave as if it does.

I have no experience with other mainstream RDBMS products, but I would be very surprised if it turns out that all other vendors do manage to pull that off with CHECK constraints that use reference other tables through a user-defined function. (But I am not excluding the possibility - I have been surprised before). For those vendors that, like SQL Server, only check a CHECK constraint for inserted and modified rows, an extra CHECK constraint and an extra function would be required. I sketched a rough outline of this extra function above, but I'll be happy to provide code if you want me to.

> I refute the "performance overhead", as detailed above, but I cannot be specific because you have not identified specifically what the alleged "performance overhead" is.

I replied to this above.

> I strongly recommend that you do not recommend anything at all about this subject until:
>
>
>
> a. you understand it, which clearly, has not been achieved yet

And there's number five. :)

> b. you have actually implemented it and tested it (both transactional inserts/updates/deletes, and singular streams inserts, etc), and you have gained actual experience, and some basis for confidence

As I said above - if I find the time, I'll run some tests on SQL Server 2012 and post back with the results when finished.


> Until then, your recommendations are based on contemplation, not facts, not experience.

And until you run an actual side-by-side comparison test of your version against Joe's version, the exact same can be said about your recommendations.

(BTW, the only recommendation I actually made so far is to "choose between the two solutions on a case by case basis")

Cheers,
Hugo

hugoko...@gmail.com

unread,
Feb 5, 2013, 8:51:01 AM2/5/13
to
Hi Derek,

A quick reply (I really need to get going on some other stuff, but I can't let this slip) to some issues.


Op dinsdag 5 februari 2013 12:02:36 UTC+1 schreef derek.a...@gmail.com het volgende:
> -------------------------------------------------
>
> Ordinary Data & Referential Integrity
>
> -------------------------------------------------
>
>
>
> A bit more clarity, re the subject UPDATE statement and my points [8] and [10] above, because the contexts are confusing to you. Let's take an example that has nothing to do with Subtypes, so that you can see the issue clearly, without the opportunity to mix things up. Let's say you have a typical Order (the parent) table and an OrderItem (the child) table. The data model states:
>
> • each Order has one-or-many OrderItems
>
> We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.

In standard SQL, this *is* in fact possible with declarative constraints. On the OrderItems table, add a constraint CHECK (EXISTS (SELECT * FROM Orders WHERE Orders.OrderID = OrderItems.OrderID)) [actual syntax may be slightly different; I never worked on a product that supports subqueries in a CHECK constraint so I don't know the exact syntax for implementing the correlation]. Then make this constraint deferred (unless you want a very complicated way of making the two tables read-only).
In a real database, this only works if both subqueries in CHECK constraints and deferrable constraints (feature F271 in SQL:2003) are supported.


Since the databases I know don't implement those features, I agree that this business rule can not be implemented as a declared constraint.

(snip basic stuff I full agree with)


> It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities.

A foreign key instead of a check constraint would even be better, indeed. So, does this make me "fully deranged"? And if so - why?
If you happen to be working on a platform that does support this feature set, using the tools available to you would, in my opinion, be the smart thing to do.

(You'd still need to provide the add and drop stored procedures and still need to use DCL to secure the database, but if you can use a declared constraint to express a business constraint, you'd be folly not to do it).

If your "fully deranged" refers to people trying to implement this on a database that doesn't support deferred constraints and then blaming the vendor - well, in that case I agree with the sentiment you expressed, allthough I would have used different words.


> It is the same for a Subtype cluster, nothing more, nothing less.

I sense a misunderstanding.

I am fully aware that your function-based framework does not force people to insert a row in the ProductBook table for a product of type 'B', or to insert a row in the ProductCD table for a product of type 'C'. Neither does Joe's framework. (And I actually believe that to be a good thing - the business requirements could easily allow us to know about books or CDs for which the number of pages/tracks is unknown). The code I posted before does indeed end up with product of type 'C' with no row in ProductCD, but that is not the problem I was trying to demonstrate.
The real issue with the code I posted is that, after running it, I have a product of type 'C', that still has a row in ProductBook. This is not allowed. The framework does prevent this if the usual order of actions is followed (insert supertype; insert wrong subtype). But a different order (insert supertype; insert correct subtype; update supertype) is not prevented.


(snip)
> For changing a book to a CD, you have two choices. Implement a transaction that does one of the following. In both cases they are still transactions, Atomic changes to the database:
(snip)

Yes, I understand that I can code my way around the limitations. I can also write stored procs to ensure that a normal foreign key relationship is never violated. But I still declare the foreign key relationship.

To me (and your opinion may differ), it is an either/or, with no in-between. Either I accept that my database has no out-of-the-box declarative method to ensure that the integrity of my supertype/subtype relationship is never violated; I accept this shortcoming, write access modules that do the work instead and be done with it. Or I find a creative way to use whatever tools I have available to still get the RDBMS to do that work. (And I'd still write the access modules, as a convenience for the app developers - but I also know that whatevery anyone does, my data's integrity is covered).
Joe's solution is the latter category. Yours is in-between - it uses functions and check constraints to prevent violation in some -admittedly the most common- cases, but leaves a back door open so that I still *need* those access modules, and *need* to ensure that they are 100% reliable in all cases.

(snip)

> -----------------
>
> Performance
>
> -----------------
>
>
>
> > > ... So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery.
>
>
>
> I am not sure that I understand the specific issue you are concerned about, so forgive me if I am wrong.

There's actually a combination of two issues. Both specific to SQL Server.
1. SQL Server is heavily optimized towards set-based operations. With very, very, very, very few exceptions (like running totals before the ROW or RANGE clause of the OVER clause was implemented), set-based SQL will always run rings around the row-by-row counterpart. (I know that this is not the case in all DBMS's. For instance, I have heard from reliable sources that Oracle has almost no performance difference between well-coded row-by-row logic and the equivalent query).
2. Scalar user-defined functions can not be "inlined" - that is, there definition can not be sucked into the main query before it is sent to the optimizer. The optimizer will produce a plan for the main query that builds most of the result set as efficient as possible. Then it will send all rows one by one into the function - basically a cursur in disguise.

Add 1 and 2 together, and you'll see that everytime a scalar UDF has to be evaluated against a large row set, performance will suffer.

(snip SQL Server basics I already knew)
> But please don't take my word for it, you can always try it. Use my DDL, so as to save time, load up a Product table with 16 million rows (a thousand is nothing these days). It takes only 5 to 10 mins. In case it needs to be said, please do not use a cursor: not only is it suh-low, it will skew the result badly. I don't have anything MS or MD in my stable, otherwise I would offer.

As promised in my previous reply - I will test when I have the time.

> (I have heard that recent releases of MS SQL have an issue with Functions, depending on the content of course, but not the QP or the QP which includes a subquery. I think it has to do with large amounts of code in Functions. Therefore we should be safe with a single-statement function.)

Not just recent releases. All versions that support user-defined functions (which off the top of my head is all versions since SQL Server 2005) suffer from this issue.
And the number of statements in the function is not related. See for example this blog post, where I demonstrate the performance difference with a UDF to triple the input value: http://sqlblog.com/blogs/hugo_kornelis/archive/2012/05/20/t-sql-user-defined-functions-the-good-the-bad-and-the-ugly-part-1.aspx


Cheers,
Hugo

derek.a...@gmail.com

unread,
Feb 7, 2013, 1:36:37 PM2/7/13
to
On Tuesday, 5 February 2013 23:59:38 UTC+11, hugoko...@gmail.com wrote:
>
> In your reply to my question about that 220%, you mentioned points 2 and 3 as above, plus point 1 but with a 20% estimate in that reply (you later changed that to 10%). You never told me how those 20, 100, and 100 numbers had to be combined to arrive at your claimed 220% overall overhead.

> Apologies for this assumption. And, forgive my ignorance, I have to ask now - *how* do we get an overall 220% overhead from 10% on the subtype tables, 100% on the subtype indexes, and 100% on the supertype indexes?

I have already stated:
> > A tighter calculation would be:
> > The 220% overhead figure is loosely calculated as ... A tighter calculation would be:
> > 1. 10% additional overhead per Subtype (since it has only two columns)
> > 2. 100% additional indexing overhead per Subtype (since it has only one index)
> > 3. 100% additional indexing overhead (since it has only one index)

> > ... two [indices] minus one [index divided] by one [index equals] 100% ...

100% plus 100% plus 10 equals 210%

If there were 9 subtypes that would be 900% *additional* overhead for the *additional* indices for the subtypes; plus 100% *additional* overhead for the *additional* index for the supertype; plus some *additional* overhead for the *redundant* column in each subtype.

I don't know how to be more specific, other than providing DDL again and counting the columns and the indices. Most IT people I know can figure it out from the above.

I have already pointed out that you are assuming the 220% is *overall* overhead, and that I meant that as *additional* overhead, not *overall* overhead. I don't know if you missed that or if you are hung up about the overall overhead (which you detailed, but which I consider irrelevant to the issue because it is platform specific; and the *additional* overhead is relevant since we are comparing the two constructs at the logical level).

Maybe go over the posts again, with those reminders in mind. Please do not rush.

Please try to focus on the issue and the argument, and not on one or the other platform. I did not discuss platform-specific performance (of either submission), only logical constructs. I responded to some of your platform-specific issues, and in doing so, I might have given you the impression that they are relevant. They are not relevant to the logical consideration of the issues raised in the thread. They are relevant to platform-specific implementation, only.

> Since adding happened to result in just the right number, I assumed that you had done that -which, of course, would violate logic and mathemetical rules.

No idea what you mean. Please re-state.

> > (In the genuine Relational world, we have implemented this Subtype from the outset. It is identified in the Relational Model, over 40 years ago, and it was fully catered for in the first publication for Standard for modelling Relational databases, which was 35 years ago. The difference is, until 2007 when Functions arrived, the constraints were implemented in the [required anyway] transactions. Since 2007, we can implement Functions, and implement Declarative CHECK Constraints for the Subtypes.)
>
> And even before that, we could implement declarative CHECK constraints using subqueries, in products that implement the feature (SQL:2003, Feature F671 "Subqueries in CHECK constraints"). Or we could even directly declare the subtable hierarchy, again in products that implement the feature (SQL:2003, Feature S081 "Subtables"). These two options are far superior to both the method based on redundant columns as described by Joe in this topic and your function-based method.

Marvellous. Feel free to post a full submission.

In the meantime, in this thread, we have two submissions are being compared. Probably not a good idea to keep introducing obtuse issues. Better to stick to the thread.

> > 2. Sure, the syntax is different. So what ? The syntax for every SQL statement is different, particularly in the area of (a) extensions and (b) features that each vendor implemented before the standard syntax was published. So what ?
>
> So people who know that their code might be ported, or who for other reasons prefer to stay as close to the standard as possible, try to use ANSI features as much as possible, and non-standard features as little as possible. The method posted by Joe, that you criticize, uses CHECK and FOREIGN KEY constraints, directly portable across most (if not all) major RDBMS platforms. Your version requires much more work to port.

You consider changing one constraint definition "work", do you ? Ok, fine.

Use Joe's method with the 110% *additional* overhead per subtype, and 100% *additional* overhead per supertype. And stop discussing logical performance or platform-specifc performance, because if you do, you will be proving yourself to be hypocritical or self-contradictory. Choose portability over performance, or performance over portability, and stick to it.

But if you keep both considerations active and arguing for space, that will leave you in a state of constant activity, without resolution. Going backwards and forwards at the same time. Lots of power and steam, but no progress in any direction. No decision, no commitment to anything. It is fine with me, if you want to do that. But please, you don't need to tell me about it, and you don't need to post it on a public newsgroup.

I have commitments, I stand for something. Therefore making a decision is easy. I spend more time moving, than I do talking about moving, or deciding to move, or thinking about deciding to move, or talking about thinking about deciding to move.

> That doesn't make it a bad solution. As I said before - it is a good solution, but it's not the be-all-end-all. There are situations where I might choose your solution over Joe's, but there are also many situations where I'd prefer Joe's.

Fine with me.

> > What is relevant, is that Functions (tables, datatypes, columns, everything in a database is User Defined, we do not have to keep telling ourselves that fact) are now Standard. And that you can use a function anywhere that you can use a column. So as long as the SQL platform does that (provide the standard requirement), you can implement standard Subtypes *declaratively* on that platform (and non-declaratively since whenever the platform was born). The syntax is irrelevant.
>
> Before we go off on a debate on what is and what isn't declarative, maybe we should agree on the same definition?
>
> The definition I have always heard and used so far is that declarative code only describes the intended effect without detailing the steps to achieve it, and that the alternative is procedural code that describes multiple steps that have to be taken in the specified order. By that definition, stored procedures, triggers, and functions (*) are all procedural, since they can have multiple statements in their body that have to be executed in order.
>
> (*) One exception to this is a very specific kind of function that SQL Server supports, "inline table-valued function" - actually a misnomer, for it should have been called a parametrized view.
>
> The reason why I care is that the nature of procedural code makes it much harder for the DBMS to fully understand what the result of the code is. And that, in turn, makes it extremely hard, maybe even impossible, to implement an *efficient* way to guarantee that, no matter what we do, the constraint is never violated. (The unefficient way is trivial - simply check all CHECK constraints for all rows in all tables after every transaction (deferred) or data modifying statement (immediate). This is -if I read it correctly- how SQL:2003 defines the effect of CHECK constraints.)

I hope all that is going to be relevant to the thread at some point.

I don't need long definitions or the considerations of procedural code, and I didn't realise that people are changing the meaning of words to such an extent that every time we have to have a technical conversation, we have to stop and check our definitions of technical terms. It is sickening what MS have done to the industry. I stick to the original definitions of the words, and I don't change anything. If anyone needs some other definition, the honest, professional thing to do, is to create a new word. Only dishonest, subversive, insane people seek to change the meaning of established words. But then, MS has been dishonest, subversive, insane since 1984.

• Declarative means declarative
••• as identified by E F Codd in the Relational Model
••• specified in SQL, as DDL
••• not procedural

Now you understand what I stated before. There is no need to worry about all the other possible definitions or non-definitions or options or considerations.

> > Er, what "performance overhead" ? Please specify exactly, or withdraw the statement.
>
> Executing the function, which involves evaluating a query, for every row inserted in each subtable, and for every row updated in each subtable with a statement that has the possiblity to update the primary key column.

The two ISO/IEC/ANSI SQL platforms that I specialise on, do nothing of the sort. As stated already, I have close to a hundred of these implemented. I have scores of performance tests and QP examinations, that give me confidence about it. I won't get into platform-specific performance issues.

My question was intended to remain in the logical performance area (you know, count the indices, think about what overhead the constraint and the function would have). Instead you give MS SQL performs like a dead bear. As already stated, and for reasons already given, I doubt it very much. But you are free to think that it performs this construct badly.

> How much that will be, and how much more or less that will be than the performance overhead of Joe's version, depends on lots of factors. Chosen platform (and choices made by the engineering team of that vendor) probably being the biggest.

Good. So you understand that performance is a platform-specifc issue. Note that I have stated that for the comparison, we need to evaluate the logical concerns, not the platform-specifc concerns. This is c.d.t., not SQLSever Central or the thousands of similar websites. The issue is subtype implementation, not how SQL Server of a particular version works. Try to maintain focus.

> > > When inserting or updating a single row, that won't be a big problem. But what happens when inserting or updating thousands or even millions of rows in a single statement?
>
> > That is a very stupid thing to do, for any scenario, not only for Subtypes, because it will blow the transaction log. ANSI/IEC/ISO SQL is transactional. So whatever proscriptions apply to such attempts, apply to Subtype insertions as well, no more, no less. Each platform will handle that differently.
>
> You think executing an INSERT ... SELECT is a stupid thing to do?
> Or an UPDATE ... WHERE?

I didn't state that. Read again.

> Those are statements that potentially affect thousands or even millions of rows. I personally prefer to use them over writing code to loop over rowsets one by one - a single statement is less lines of code, easier to understand, and (at least on SQL Server) performs orders of magnitude better - as long as you don't do any stupid things.

Evidently you are used to very small databases, very small commercial deployments, and single user applications. You probably don't need a RDBMS platform. The C: drive will do nicely. I don't know why you take so much interest in RDBMSs.

(no idea what a "rowset" is.)

In large corporations, with large databases, that are hosted on large servers, with thousands of online users, much as a fresh developer would like to use one statement to affect millions of rows, they are not allowed to do so. The auditors (we have them in large companies) will fry him in oil, and feed him to the hundreds of users who are baying because they are hung up on the locks, and the lock duration.

No, we have measures to prevent murder and aggravated assault.

They are called standards.

The Auditor will make a declaration such as "maximum transaction size is 1000" or "maximum locks per connection is 100" or both. If the fresh developer is so stupid, that he can't figure out the four lines of code required to convert his one statement that affects multi-milion rows (MS people love their their big numbers, don't they) to run in a multi-user environment, we take him aside and teach him. The developers learn all the required standards before they submit code for promotion to dbo, and the DBA (policeman) knocks it back if it doesn't comply. There are levels of testing from Dev Test to UAT to Stress test on Production environment, before the code gets to Production, so the auditors will catch whatever the DBA misses.

We don't hire Mentally Subversive types. They spend hours speculating and they get nothing done, but the worst part is, they try to invent their own "standards" from scratch, as if the universe began on their birthday.

It is up to people like me to write the standards. I did that 20 years ago, and they have not changed, they are platform-independent. There is a small platform-dependent component, but because the two vendors provide 100% backward and forward compatibility (I know, it is magic, but I don;t have time to explain it), that too, hasn't changed in 20 years. I supply both a commentary with pictures and everything, and code templates. If a developer keeps breaching the standards (caught at promotion to dbo, anyway), he gets shown the door.

So there are perfectly simple ways (four lines of code, available in a template, for those who can't remember them), tried and tested, decades old, running in thousands of servers, that affect millions of row, executed in batches of 1000 rows, with no harm to performance, no impact on thousands of users, and no impact on the log file.

For database maintenance, such as changing the schema, and populating new tables from existing tables, by all means, skip the four lines of code, and execute the whole thing in a single transaction. But when you do that, you must take into consideration the platform-specific issues, out-of-scope for this thread, and the quality of the database implementation, otherwise it will fail, and you will have to keep repeating the steps, over and over, until you learn what those platform-specific and database-specific issues are. Then document them, and keep them in a safe place, because you will need it for the next maintenance job.

And, yes, that (not knowing your platform specific limitations) would be a very stupid thing to do. You know, come in on a weekend to perform db maintenance; code and test the changes; set 32 parallel threads up; start them off; go for a long boozy lunch; come back in three hours; only to find that 29 threads crashed after 10 mins, three threads are still rolling back. MS type think that that is "normal". We call it subnormal.

So yes, what you say is important, but there are simple ways around those issues, without resorting to the extremities that you identify. But the platform issues are more important. And they are irrelevant to this thread.

> For SQL Server, using scalar user-defined functions qualifies as a stupid thing.

So don't use them!

> I don't see the relevance of your remark about log space. If I need to insert millions of rows in a single transaction, I'll need the same log space whether I use a single INSERT ... SELECT statement or a million INSERT ... VALUES statements. (In fact, the former case would probably use LESS log space, depending on how the vendor implements the transaction log).

See above.

Well, my two vendors use about 1% of the log space, but still, there are issues re getting up to currency. But platform-specific issues and performance for new developers are all irrelevant to a logical discussion about subtypes.

> > > This probably depends on the chosen DBMS. I know that SQL Server is unable to inline scalar user-defined functions into an execution plan, so for SQL Server, the only option is to execute a seperate call to the function for each row (or at least for each unique set of input parameters -which in this case would be the same, since the primary key is included in the input parameters-). So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery. Your performance will, excusez mon French, suck.
>
> > Excuse me, no, *your* performance will suck.
>
> *Your* as in: the performance of your code on SQL Server.

No, *your* as in, _your_ fixation about low-level performance of _your_ code on the _only_ server you know. Qualified professionals have no such problems. Recall the context. You made a declaration about code performance, and you provided excruciating detail about the performance of one plaform.

First, it is irrelevant to a logical comparison of subtypes (the thread).

Second it is relevant to only to one platform. (FYI, although not relevant to the thread, the two that I know well have no such problems, my code does not suck, it flies.)

Third, if God ever cursed me, the way so many others are cursed by HIM, and I had to implement Subtypes on MS SQL, I would spend some time and get to know what works well, and what does not, then I would implement Subtypes, using the best method, based on knowledge and actual testing. I would not be writing reams on c.d.t or SQL Central; I would not have to have long discussions; I would not be speculating and contemplating; I would just do it.

And whatever I do, it would be standard SQL (not pedantic or obsessed about syntax).

And whatever I do, it would *NOT* have one redundant column plus one additional index per subtype, plus one additional index per supertype. No, I couldn't justify that to anyone, unless they came from an asylum.

And I wouldn't write about it on c.d.t or SQL Central.

Unless some idiot posts a "solution" that uses twice the indices necessary and has redundant columns. Sure, then I would post. On second thought, no, I won't post, because if I do, I would have to spend hours (days?) explaining the to MS people who don't understand the simplest things about their job are who are obsessed about things that do not matter.

> > My performance (DB2 and Sybase) flies. I already have close to one hundred standard Subtype structures in production. Millions of transactional inserts have been tested. Probably a thousand full table unload/reloads (thousands to millions of rows each) have been tested.
>
> Great! Have you done a performance comparison test of your method versus "the Joe construction"?

1. "Construction" ?!?!? Are you crazy ? No, don't answer that.

The English word (unless you have a private definition) has an established meaning. Construction Engineers, even bricklayers, would not do that. Construct means build, not destroy, not build using twice the resources. Part (of the reason is, they get paid. The customer is not going to approve a house with an extra bedroom, or a shelving unit with an extra cabinet. No. The brickie or the construction engineer cannot deliver it free, and the customer is not going to pay for something they did not order. We have laws in some western countries for this purpose.

Celko gives his cancer away free. MS types suck it up, and implement monstrosities "free", they get paid the same if their code runs twice as slow, or if it needs a server twice as powerful. There is no pride. but they will print off reams of paper from SQL Central, "proving" that their "construction" is widely used (it is, like cancer), and that it is "standard" (cancer is standard in the MD world, they invent a new one every time they find a cure for the old one). They never read c.d.t.

I construct things, in such a way, that they never need to call me back to fix it. MS types need their future job security, so they are quite happy to place little bombs and cancers everywhere.

2. Test
No, I have not done a test on Joe's cancerous method on my platform. That would be very stupid. Given all the benchmarks that I have done, that include Functions, and indices, and duplicate/redundant indices, I already have a good idea of the logical cost. My memory has not failed. I do not take drugs. I do not need to test the same thing, from scratch, every time, to find out, gee whiz, it produces the same results.

I know what the result of one redundant index per subtype plus one redundant index per supertype minus one function is.

Perhaps that is the difference between people who were qualified in universities 35 years ago, and those who get qualified now. Perhaps that is the difference between "logical" and stupid (those who have to test the physical, over and over, from scratch, because they have no logic).

> > When you get into contemplating that level of "performance" problem, it becomes very platform dependent, and dependent on the quality of code, for both transactional inserts, and streamed inserts. We are better of if we limit this thread to logical issues and measurements, and keep away from physical implementations of those logical constrcuts, which are heavily dependent of platforms, their specific limitations, etc.
>
> The same can be said for storage overhead. Unique constraints don't have to be implemented by adding an index. Foreign keys don't have to be implemented by adding a column (some vendors use pointer chains instead). Etc.

So you haven't heard about "A Relational Model of Data for Large Shared Data Banks" by a gentleman named Dr E F Codd. It is a worthwhile paper to read. It might cut your work in half. I might eliminate your concerns about irrelevant matter. No, forget it, you are too obsessed with irrelevant matter.

> Also, it is quite surprising to see someone who said "you are both fools" to Joe Celko and Roy Hann because they advocate a subtype/supertype solution that, allegedly, has 220% overhead now saying that we should keep away from physical implementations.

Not worth responding to. That is your tail that you are chewing on. If you keep eating in that direction, you will swallow ... never mind.

> If we look at the logical side only, not caring about implementation details such as storage or performance, both solutions are good. They both achieve what they set out to achieve - ensure that there are no integrity violations in a subtype/supertype relationship.

That's all ?

You call that "logic" ?

Must be a private definition.

See my response [2] re Test.

> If we look at implementation efffect as well, we should look at all effects. In this case, the effects are on storage and on performance.

Well, you could look at storage and performance logically, without getting into platform-specifics. But that does not appear to be in your private definition of "logical".

Let me guess, you are an all-or-nothing kinda guy. Black-or-white. Consider either no "effects" at all, or consider every possible "effect", logical, physical, platform-specific. And he he, you don't charge for it, it is "free". I am qualified in IT, I am not qualified in psychiatry, sorry, I cannot help you.

> > Actually, I have people who use my Subtype structure, and who have implemented it on MS SQL without any problems. So I can say for sure that (a) they did so without jumping through the hoops that you are speculating about, and (b) suffer no performance problems or "performance problems", but I do not know what MS SQL does internally in each version, to agree or refute your speculation about it. Therefore, I will state simply, I seriously doubt that that is the case for MS SQL.
>
> I'll try to find the time to do a simple test. If I find the time, I'll post repro code and test results in a later message.

If it takes more than 15 mins total development time, or one minute execution time, to load 16 million rows with the DDL I have already posted, Product plus two subtypes, please note, _you_are_doing_something_wrong_.

Come back here and post, or email me direct, and I will help you.

Last time I ran a very similar benchmark, for an Oracle guru, 2 x 16 million rows inserted took 13.2 secs, and that was on a tiny demo box (laptop), I did not have access to my dev box at the time. Because Oracle carks it with subqueries, the benchmark on that side never finished; the modified version that used derived tables took 30 something mins. Given your statements re row-processing to avoid function calls per row, I won't tell you what to do, just figure out whatever is the best for your particular version of your particular platform, and do it.

> > > Another possible problem you have to be aware of with your solution is that, because the subquery is hidden in the UDF, it is unknown to the DRI mechanisms. Some scenarios will allow violations of the CHECK constraints, becuase the RDBMS does not know it has to re-check.
> >
> > Excuse me, but it *IS* a DRI CHECK Constraint. Please read the code again.
> >
> > "Some scenarios" is not correct, no, scenarios implies the general case, not platform specific. What you are talking about is how a particular platform operates, not scenarios. So the honest statement is, "some platforms will allow violations of the CHECK Constraint".
>
> I don't have to read the code again. I know what it does.

Well, then, you should know that it is NOT an "unknown DRI mechanism" AFA SQL Standard is concerned. Please be informed that it is a normal, visible, non-secret SQL DRI Constraint. The subquery is not hiden or "hidden" either.

You may know what it does on your platform, and you will not listen to what it does on other platforms, but you make statements anyway.

> Will the sequence of two INSERT statements and one UPDATE statement cause a constraint violation on any of the platforms you use? On SQL Server, it won't. The UPDATE will be accepted, causing the data in my table to violate the subtype/supertype rules. The corresponding sequence of two INSERT statements and one UPDATE statement against Joe's implementation *will* cause an error to be raised.

I have already explained at length, how my two platforms handle it. Read the posts again, 'cause I am not going to type it again.

> This can be fixed - I would need to add a CHECK constraint on the Type column in the supertype table that checks that the corresponding primary key value does not exist in any of the subtype tables that are mutually exclusive with the one indicated by the type value. for a supertype/subtype relationship with (n) mutually exclusive types, that requires (n-1) subqueries.

I have already warned against circular references. But you appear to insist on it. That is your insane "logic", nothing to do with me.

> > Yeah, sure. It appears that you do not understand context.
>
>
>
> The context is that:
>
> 1. Joe Celko posted a generic framework that can be used to protect integrity in a subtype/supertype relationship;
>
> 2. You called him and Roy Hann "fools" for proposing and endorsing that framework;

Half-truths are lies. I gave an explanation, and pointed out the excess baggage.

> 3. You then posted your, supposedly better, alternative; and

No, no, no ! I didn't !

It is only better, or less overhead, or standard, only to logical IT qualified people.

Not for the rest. Your statement implies better overall, for everyone, and that is simply not true. I have evidence on c.d.t, of a couple of people who are absent logic, who have private definitions of "logic", that prove it it not "better" for everyone.

> 4. I assumed (yup, did it again!) that this alternative was designed to meet the same goals as the original: being a generic framework to protect integrity in a subtype/supertype relationship.

I cannot see how it was an assumption, my declared that in writing.

> If the code you posted is *not* intended to be a generic method for subtype/supertype relationships, or if protecting integrity is *not* what the code sets out to do, please say so now - that would save both you and me a lot of time and effort!

As stated several times, but I will repeat for your convenience, since it is still unclear to you, my posted solution is:
••• Standard (Relational, if you can read the Codd paper)
••• Standard (SQL)
•••••• excludes non-SQLs
•••••• you need to figure out your platform-specific issues and limitations
••• generic
••• protects both data and referential integrity (I did give exact methods, for logical people)

> > 1. The example code provides the implementation details for standard Subtypes. There is no suggestion that it does anything more than that. It is not a full blown application.
>
> Ah, there's part of the answer already.
>
> > 2. The Supertype-Subtype structure is supposed to be considered as a single unit, not as a bunch of tables with disparate and singular relationships between them. You should not be updating anything in the cluster without considering the whole cluster, transactionally.
>
> Agreed that I shouldn't. Just as I should not delete a "parent" (hate those terms!) that still has a "child" in a foreign key relationship. But if I do so anyway, I trust the FOREIGN KEY constraint to slap me and prevent the change. Similarly, I expect whatever construct I use to guard the supertype/subtype relationship to slap me and prevent the update if I do something that would break integrity.

Yes. Weird. You do appear to understand some things, but strangely, not others.

> The code I posted proves that there is at least one case on at least one platform where it doesn't.

No, it doesn't. Read [1] and [2] again, slowly.

> > 3. When writing an application, you are best advised to implement all inserts/updates/deletes as transactions; you should never write singular inserts/updates/deletes.
>
> When writing singular inserts/updates/deletes, a transaction is implicitly started. SQL Server (and probably other vendors too) support two options on when these are committed - implicitly at the end of the statement (if no constraint was violated), or only after an explicit COMMIT statement.

Too silly to respond to. And if I do, I would have to provide an entire tutorial on SQL transactions. There are many course on the subject.

> But I fail to see the point. Constraints (and I consider both your and Joe's code as a way to work around a limitation of a constraint, so I expect the same behaviour) should always work. Even when application code does not follow best practices. (In fact, in that case constraints are even MORE important!)

Oooooooookay.

> > 5. The example code, being an example of Subtype implementation, uses singular inserts/updates/deletes, rather than transactions, as is commonly accepted in posted examples.
>
> So why slap _me_ for using singular inserts/updates/deletes in my posted examples?

I didn't slap you, I informed you that one had a transactional context (there's that word you do not understand again) and the other did not. And it is for you (assuming you can write the required code), to apply the transactional context when required, especially after that it pointed out to you.

> > 11. If you have trouble writing the required Declarative Constraints and transaction code (no offence, but this structure is new to you, and you do not understand it yet) that is required for ordinary Data and Referential integrity, that you are implying is part of the Subtype integrity, at your request, I would be happy to supply it. Using Standard SQL, and you will have to make the minor syntactical changes required for your platform. i won't be posting transaction basics or tutorials, I will assume the reader knows that, I will post just the code.
>
> This is now the fourth time in a single post that you hint at my incompetence. You are free to believe anything you want of me, but saying this out loud and then repeating it multiple times does not really add anything to the credibility of your arguments.

Er, I have not been "arguing" re my submission. I have not stated anything new since my submission re Subtypes.

All the voluminous posts since then, have been responding to your voluminous questions and discourses, which have little to do with the submission. Which proves your level of being able to learn something new and to genuinely apply it on your platform without failing, and which proves your incompetence, yes. ANy "argument" in that content, is about your issues.

> > > Bottom line: I like your solution. But it's not 100% safe, and though you avoid the storage overhead, you pay for this with a performance overhead. I'd recommend people to choose between the two solutions on a case by case basis.
> >
> > I refute the "not 100% safe" on the basis described above: implement whatever safety you need, that is not shown in , and cannot reasonably be expected to be shown in, the example of Subtypes.
>
> Your code as posted does not provide 100% safety against integrity problems in the supertype/subtype relationship on at least one platform. The code I provided will result in a supertype indicating a CD, with a row in the table for the Book subtype.

I cannot help you if you cannot read the posts and understand them

> Before starting this reply, I spent a lot of time trying to decipher the relevant sections of the SQL:2003 standard (not the latest, I know, but I have to make do with what I have). If I didn't misunderstand anything, your code (or at least the ANSI-standard equivalent code) will provide 100% safety in a fully compliant RDBMS, because (again, if I understand completely) the standard says that the database should re-check all CHECK constraints for all rows in all tables after any modification - or at least behave as if it does.

Name-dropping won't help you.

I have already explained in detail, we have no deferred constraint checking, we do not need it, we are not so stupid as to write circular references.

> I have no experience with other mainstream RDBMS products, but I would be very surprised if it turns out that all other vendors do manage to pull that off with CHECK constraints that use reference other tables through a user-defined function. (But I am not excluding the possibility - I have been surprised before).

Evidently you cannot read. If you do not trust me, or the submitted code, just say so and stop this voluminous nonsense.

It is not reasonable to not believe what you (admittedly) do not know, but then hey, you are not logical either.

Maybe phone someone and check, or post a question on the Sybase newsgroup. (let me know, I would like to make sure that you have not screwed up the wording and sabotaged yourself).

> For those vendors that, like SQL Server, only check a CHECK constraint for inserted and modified rows, an extra CHECK constraint and an extra function would be required. I sketched a rough outline of this extra function above, but I'll be happy to provide code if you want me to.

The "not safe" refutation stands. Your concept of "not safe" is limited to the only plaform you know, and an inability to port my standrad code to your platform.

[3] No, I do not need your code. Just do not use my code on your platform.

Others who use your platform, and who do not have your level of skill, can use my code on your platform. They will not require 20 pages of explanation from me. ANd they will be able to get around any "advice" that you give them.

> > I refute the "performance overhead", as detailed above, but I cannot be specific because you have not identified specifically what the alleged "performance overhead" is.
>
> I replied to this above.

No you did not, in the logical context. Yes, you did go on for days about irrelevant platform-specific issues whichyou could not figure out for yourself (and I cannot help you with). That is limited capability or incompetence or whatever you call it. That is not "performance overhead" logically. That is performance overhead for a person who does not understand the construct required, and who does not understand his platform enough to implement it without making a disaster of it.

The disaster is yours, it has nothing to do with the logical construct submitted.

> > I strongly recommend that you do not recommend anything at all about this subject until:
> >
> > a. you understand it, which clearly, has not been achieved yet
>
> And there's number five. :)

Yes. Supported by volumes of evidence produced by your own hand (ie. excluding my statements). We've entered double-digits in this post.

> > b. you have actually implemented it and tested it (both transactional inserts/updates/deletes, and singular streams inserts, etc), and you have gained actual experience, and some basis for confidence
>
> As I said above - if I find the time, I'll run some tests on SQL Server 2012 and post back with the results when finished.

Whenever you have fifteen minutes.

> > Until then, your recommendations are based on contemplation, not facts, not experience.
>
> And until you run an actual side-by-side comparison test of your version against Joe's version, the exact same can be said about your recommendations.

I did not recommend anything. I submitted a solution for qualified IT people. They can figure out the logical cost. And if they are incompetent, they will have to get into the physical, and then they will figure out the physical costs. But if they are stupid, they will implement a disaster of a test, and then form a conclusion that it does not work. Then they will post on a blog somewhere on SQL Team or SQL Central that it doesn't work.

So for you, I am telling you straight, do not use my solution, use Joe's. Do not waste any time "testing" it, no matter what you do, it will fail.

> (BTW, the only recommendation I actually made so far is to "choose between the two solutions on a case by case basis")

I did read that, yes. That was the recommendation that I recommended against. It is based on incompetence and fear of the unknown, not on knowledge.

> A quick reply (I really need to get going on some other stuff, but I can't let this slip) to some issues.

None of your replies are quick, you can stop lying to yourself.

> > We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.
>
> In standard SQL, this *is* in fact possible with declarative constraints. On the OrderItems table, add a constraint CHECK (EXISTS (SELECT * FROM Orders WHERE Orders.OrderID = OrderItems.OrderID)) [actual syntax may be slightly different; I never worked on a product that supports subqueries in a CHECK constraint so I don't know the exact syntax for implementing the correlation]. Then make this constraint deferred (unless you want a very complicated way of making the two tables read-only).
>
> In a real database, this only works if both subqueries in CHECK constraints and deferrable constraints (feature F271 in SQL:2003) are supported.

That is a contradiction, again, like so many others from you. Contradiction inside the one cranium is insanity. Which you appear to be quite sanguine with.

Real databases do not support deferred constraint checking. Real databases support standard Subtypes, with full Data & Referential Integrity (both directions, yes!) without deferred constraint checking. We don;t have the hilarious CASCADE either. Amazing. We don't read the standard and argue about it either, we just do it.

> Since the databases I know don't implement those features, I agree that this business rule can not be implemented as a declared constraint.

Okay.

> > It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities.
>
> A foreign key instead of a check constraint would even be better, indeed. So, does this make me "fully deranged"? And if so - why?

Er, the foreign key that you suggest *IS* the circular reference.
1. Look the term up, and understand it.
2. Read the above paragraph again, slowly, with the knowledge that the foreign key that you suggest *IS* the foreign key that I identify in the first sentence, and that it *IS* the circular reference that I identify in the next clause.

If you still don't understand, then you need formal education, which I do not have time to provide.

Further, with your demonstrated technical ability, you would require at least twice the normal time.

Further, as stated already, your demonstrated mental condition is not something that I am qualified to address, and overcome, which is a pre-requisite to education.

> If you happen to be working on a platform that does support this feature set, using the tools available to you would, in my opinion, be the smart thing to do.

Ooooookay. But that opinion has the same credibility as all your opinions of this page.

> (You'd still need to provide the add and drop stored procedures and still need to use DCL to secure the database, but if you can use a declared constraint to express a business constraint, you'd be folly not to do it).

Ooookay.

> If your "fully deranged" refers to people trying to implement this on a database that doesn't support deferred constraints and then blaming the vendor - well, in that case I agree with the sentiment you expressed, allthough I would have used different words.

Ooookay.

> > It is the same for a Subtype cluster, nothing more, nothing less.
>
> I sense a misunderstanding.
>
> I am fully aware that your function-based framework does not force people to insert a row in the ProductBook table for a product of type 'B', or to insert a row in the ProductCD table for a product of type 'C'.

Er, it does force exactly that !

You say you can read and understand code, but the evidence is, you cannot! Maybe you can read and understand isolated code snippets. But you definitely cannot understand the effect, the overall total logic, of all the posted code snippets working together in an integrated "system".

No you me asking you to "read again". You have already proved to be non-logical.

> Neither does Joe's framework. (And I actually believe that to be a good thing - the business requirements could easily allow us to know about books or CDs for which the number of pages/tracks is unknown).

You are getting distracted and tangential again. One of Joe's tricks.

> The code I posted before does indeed end up with product of type 'C' with no row in ProductCD, but that is not the problem I was trying to demonstrate.

No, it does not. You do not understand transactions. Read further.

The most accurate thing you could say is "If I [Hugo] implemented the code that you [Derek] posted, on my platform, given my abject understanding of the construct, and my complete and total lack of knowledge about Standard Transactions, would not prevent breaches in data & referential integrity." Which really means "I [Hugo]do not understand Dat & Referential Integrity enough to implement it correctly or consistently; when given some example code, I will stuff it up; I need the entire application coded and given to me."

> The real issue with the code I posted is that, after running it, I have a product of type 'C', that still has a row in ProductBook. This is not allowed.

I agree it is not allowed. But the simple evidenced fact is, you have allowed it. Further proof of my point immediately above.

> The framework does prevent this if the usual order of actions is followed (insert supertype; insert wrong subtype). But a different order (insert supertype; insert correct subtype; update supertype) is not prevented.

Barking up the wrong tree.

Maybe THREE circular references, that should do it.

> > For changing a book to a CD, you have two choices. Implement a transaction that does one of the following. In both cases they are still transactions, Atomic changes to the database:
>
> Yes, I understand that I can code my way around the limitations. I can also write stored procs to ensure that a normal foreign key relationship is never violated. But I still declare the foreign key relationship.

Yes, that is the Standard practice. It has two separate and distinct putposes (scary for an all-or-nothing person!) But those who are not deranged state it in the reverse Order. Note the capital "O".

> To me (and your opinion may differ), it is an either/or, with no in-between. Either I accept that my database has no out-of-the-box declarative method to ensure that the integrity of my supertype/subtype relationship is never violated;

Presumably you mean, via code.

In that case, yes, that is exactly what I advised. That would leave you with no extra indices, no redundant columns.

> I accept this shortcoming, write access modules that do the work instead and be done with it.

Yes! Assuming "access modules"means stored procs with transaction control (no autocommit baby puke).

> Or I find a creative way to use whatever tools I have available to still get the RDBMS to do that work. (And I'd still write the access modules, as a convenience for the app developers - but I also know that whatevery anyone does, my data's integrity is covered).

No, no ! Don't do that. Given your long discourse, you will probably prevent the access modules from working.

> Joe's solution is the latter category. Yours is in-between - it uses functions and check constraints to prevent violation in some -admittedly the most common- cases, but leaves a back door open so that I still *need* those access modules, and *need* to ensure that they are 100% reliable in all cases.

Oh, really ?

That's a *neeed* that everyone has, in every cluster of tables, in which you *neeeed* to ensure data & referential integrity, because SQL declarations do not provide 100% data & referential integrity, and the "back door" is always open, all the time, for every pair of related tables. The whole village could be gang-raping you and you wouldn't know it, yes. And if you don't provide the access modules for *all* the tables that *neeed* them, you will have spinsters and orphans, yes. And father-less single mothers.

It is no more *neeedy* or less *neeedy* than any other pair of *neeedy* tables. The "back door" is the same "back door" for every table.

Disgusting.

> > > > ... So insert a thousand rows, and instead of converting the subquery that is hidden in the function to a join, SQL Server will have to perform thousand executions of the subquery.
> >
> > I am not sure that I understand the specific issue you are concerned about, so forgive me if I am wrong.
>
> There's actually a combination of two issues. Both specific to SQL Server.

Finally, after 22 pages, an admission of what I said in the first place. Hallelujah! God is Merciful! Praise be to Allah!

Now please, we do not need to hear about what one idiotic, broken platform does or does not do with regard to performance.

> 1. < portions irrelevant to a logical evaluation of the subject <snipped> >
>
> 2. < portions irrelevant to a logical evaluation of the subject <snipped> >
>
> Add 1 and 2 together, and ... < portions irrelevant to a logical evaluation of the subject <snipped> >

I repeat, for you, given your demonstrated competence on your platform *do *not* implement my Subtype submission*

> > But please don't take my word for it, you can always try it. Use my DDL, so as to save time, load up a Product table with 16 million rows (a thousand is nothing these days). It takes only 5 to 10 mins. In case it needs to be said, please do not use a cursor: not only is it suh-low, it will skew the result badly. I don't have anything MS or MD in my stable, otherwise I would offer.
>
> As promised in my previous reply - I will test when I have the time.

Nah. Forget it. Honestly. Given the evidence above, it will fail. Trust me. I did a uni-level psychology once.

> > (I have heard that recent releases of MS SQL have an issue with Functions, depending on the content of course, but not the QP or the QP which includes a subquery. I think it has to do with large amounts of code in Functions. Therefore we should be safe with a single-statement function.)
>
> Not just recent releases. All versions that support user-defined functions (which off the top of my head is all versions since SQL Server 2005) suffer from this issue.

Maybe they will fix it in the next century. I can't wait.

> And the number of statements in the function is not related. See for example this blog post, where I demonstrate the performance difference with a UDF to triple the input value: http://sqlblog.com/blogs/hugo_kornelis/archive/2012/05/20/t-sql-user-defined-functions-the-good-the-bad-and-the-ugly-part-1.aspx

Nah, it is not for me. The article takes far too long to explain simple things that logical people can understand in one sentence.

----

Just to be clear, I did not say you are stupid. I said you were incompetent; contradictory; arguing dishonestly and hypocritically; insane (deranged, particularly). You relate things that are not related, and you fail to relate things that are related. Clever in that insane devilish Einstein sort of way, clever enough to avoid being locked up, but not clever enough to avoid killing people or contracting AIDS or hanging yourself on a rope that you are setting up to hang someone else. Dahmer, Bundy, et al, stayed under the radar for decades, all their friends said they were highly intelligent. They wrote books.

Celko is stupid. And many here and elsewhere initially informed me of that, for which I am grateful, before it was proved to me in several instances. He knows when to shut up and leave the premises. You don't.

Cheers,
Derek

derek.a...@gmail.com

unread,
Feb 7, 2013, 2:39:45 PM2/7/13
to
Hugo

On Monday, 4 February 2013 12:01:28 UTC+11, derek.a...@gmail.com wrote:

> If you can't read my code, and make the necessary syntax changes on your platform to implement my code on your platform, then yes, there is a problem, of a different sort. Name your platform and I will write the code for you.

"Syntax changes" has now expanded to include platform-specific SQL implementation considerations and platform-specific performance issues (eg. that the use of Functions would be crippling).

1. For you Hugo, I retract that offer.

2. For anyone else, who is a MS SQL specialist, and who can understand my code, in toto, and who can work with, and around any MS platform specific issues, the offer remains open. That is, anyone who is genuinely interested in making it work, correctly , and fast, who is free of mental conditions that may cause them to sabotage themselves and blame others for it. If you like, I will supply intended-for-MS code, and you mess with it; we work back-and-forth handling issues, and make joint decisions re options, etc. As per normal professional adult IT working environment, where everything is recorded, and that is not an issue.

Apostates need not apply.

No circular references, no duplication of any software or data asset.

ACID Transactions only (you *must* be conversant in that; autocommit boffins need not apply).

If you cannot write a test to load 16 million rows in less than 15 minutes development time, you need not apply.

Half a day total development time, not counting communication time, is my estimate.

Cheers
Derek

Lennart Jonsson

unread,
Feb 8, 2013, 12:53:26 AM2/8/13
to
On 2013-02-04 02:01, derek.a...@gmail.com wrote:
[...]
> My performance (DB2 and Sybase) flies. I already have close to one hundred standard Subtype structures in production. Millions of transactional inserts have been tested. Probably a thousand full table unload/reloads (thousands to millions of rows each) have been tested.
>

I tried to read through this thread, so there might be peaces that I
missed. I like the idea that you present, but AFAIK DB2 (LUW) does not
support sub queries in check constraints. I tried to rewrite your code
for DB2 (LUW), but get an error. Are you on ZOS, or am I missing
something? can you provide a short example for DB2?
[...]

Cheers
/Lennart

derek.a...@gmail.com

unread,
Feb 8, 2013, 4:36:51 AM2/8/13
to
On Friday, 8 February 2013 16:53:26 UTC+11, Lennart Jonsson wrote:
> On 2013-02-04 02:01, derek.a...@gmail.com wrote:
>
> I tried to read through this thread, so there might be peaces that I
> missed. I like the idea that you present, but AFAIK DB2 (LUW) does not
> support sub queries in check constraints.

Someone else may have said that, but I did not (I don't read or reply all the posts, for reasons that are probably obvious). Also, there are a few non-technical types here who keep on misunderstanding my posts, and keep on posting solutions that do not work, instead of trying to understand my posts, but their posts look like they are trying to understand ... and in those exchanges I gave up trying to correct them.

Neither DB2 nor Sybase allow subqueries in CHECK constraint definitions, and my posted code Sybase does not contain that. However that is exactly what I want, and I gave the method to overcome that "limitation".

The method uses a Function (UDF) which contains the SELECT (which is a straight query, not a subquery) and returns an INT true/false. The CHECK Constraint was a constraint-with-search-condition, which called the Function as an expression (exactly the same as the posted Sybase code, with syntax differences of course). In DB2, the constraint defn is separate (in Sybase it is part of the CREATE/ALTER TABLE). In DB2, the CREATE TABLE named the Check Constraint at the table, not column level.

> I tried to rewrite your code
> for DB2 (LUW), but get an error. Are you on ZOS, or am I missing
> something? can you provide a short example for DB2?

I have not actually *coded* on DB2 for a few years. The last time I implemented subtypes in a DB2 project was late 2011 in NZ. I was the architect and 100% Sybase DBA, the bank gave me a DB2 DBA for the duration of the assignment (I am a consultant, and they pay us direct the coders, not to code; it took all of the six months for them to accept that I did not have a coder on the Sybase side). We had 200 tables on the Sybase side, and 35 of them were replicated on the DB2/WebSphere side, updated via stored procs that we called. The subtype cluster was 22 Event subtypes. One data model, I produced DDL and code, and he converted it to DB2 syntax. Therefore I cannot give you a DB2 code snippet, I can provide direction; but the code is pretty much the same as Sybase.

After some minor initial problems (he did not understand Relational Keys, and he was CASTing, which I stopped, and forced him to use private DataTypes for all Keys), he had no problems, and was about one day behind me with changes and minor releases.

They already had an established system, which meant I could not dictate everything I wanted to on their side, and that's why they gave me one DBA. Various surprising (to me) bits of code were compiled. But I am pretty sure, the constraint defns were SQL defns, not language code. In my notes, at the contract/statement of work stage, when I was checking various items; exactly what I would produce; etc, in response to my "We have had the ability to call a Function from a Check Constraint since 2007", they responded with "<roll-eyes> yes, Derek, we have had the ability to call a Function from a Check Constraint since 2006".

For Sybase-to-DB2, the syntax changes are (which you probably know):
• library, not "dbo"
• variable do not have @ sign
• semi-colons at end of each statement
Type was SQL, Scalar. And a few silly things such as the function had to be defined as "no SQL", even though it was SQL, but I am sure you can figure those out. No External, and Deterministic. Same as for a column function.

As far as I can recall, it was V9.1 for z/OS.

The tested Sybase code that I posted is directly from a tutorial.
For those who want DDL including the CHECK constraint:
Please let me know how it goes.

Cheers
Derek

Lennart Jonsson

unread,
Feb 8, 2013, 6:10:16 AM2/8/13
to
On 02/08/2013 10:36 AM, derek.a...@gmail.com wrote:
> On Friday, 8 February 2013 16:53:26 UTC+11, Lennart Jonsson wrote:
>> On 2013-02-04 02:01, derek.a...@gmail.com wrote:
>>
>> I tried to read through this thread, so there might be peaces that I
>> missed. I like the idea that you present, but AFAIK DB2 (LUW) does not
>> support sub queries in check constraints.
>
> Someone else may have said that, but I did not (I don't read or reply all the posts, for reasons that are probably obvious). Also, there are a few non-technical types here who keep on misunderstanding my posts, and keep on posting solutions that do not work, instead of trying to understand my posts, but their posts look like they are trying to understand ... and in those exchanges I gave up trying to correct them.
>
> Neither DB2 nor Sybase allow subqueries in CHECK constraint definitions, and my posted code Sybase does not contain that. However that is exactly what I want, and I gave the method to overcome that "limitation".
>
> The method uses a Function (UDF) which contains the SELECT (which is a straight query, not a subquery) and returns an INT true/false. The CHECK Constraint was a constraint-with-search-condition, which called the Function as an expression (exactly the same as the posted Sybase code, with syntax differences of course). In DB2, the constraint defn is separate (in Sybase it is part of the CREATE/ALTER TABLE). In DB2, the CREATE TABLE named the Check Constraint at the table, not column level.
>
>> I tried to rewrite your code
>> for DB2 (LUW), but get an error. Are you on ZOS, or am I missing
>> something? can you provide a short example for DB2?
>
> I have not actually *coded* on DB2 for a few years. The last time I implemented subtypes in a DB2 project was late 2011 in NZ. I was the architect and 100% Sybase DBA, the bank gave me a DB2 DBA for the duration of the assignment (I am a consultant, and they pay us direct the coders, not to code; it took all of the six months for them to accept that I did not have a coder on the Sybase side). We had 200 tables on the Sybase side, and 35 of them were replicated on the DB2/WebSphere side, updated via stored procs that we called. The subtype cluster was 22 Event subtypes. One data model, I produced DDL and code, and he converted it to DB2 syntax. Therefore I cannot give you a DB2 code snippet, I can provide direction; but the code is pretty much the same as Sybase.
>
> After some minor initial problems (he did not understand Relational Keys, and he was CASTing, which I stopped, and forced him to use private DataTypes for all Keys), he had no problems, and was about one day behind me with changes and minor releases.
>
> They already had an established system, which meant I could not dictate everything I wanted to on their side, and that's why they gave me one DBA. Various surprising (to me) bits of code were compiled. But I am pretty sure, the constraint defns were SQL defns, not language code. In my notes, at the contract/statement of work stage, when I was checking various items; exactly what I would produce; etc, in response to my "We have had the ability to call a Function from a Check Constraint since 2007", they responded with "<roll-eyes> yes, Derek, we have had the ability to call a Function from a Check Constraint since 2006".
>
> For Sybase-to-DB2, the syntax changes are (which you probably know):
> � library, not "dbo"
> � variable do not have @ sign
> � semi-colons at end of each statement
> Type was SQL, Scalar. And a few silly things such as the function had to be defined as "no SQL", even though it was SQL, but I am sure you can figure those out. No External, and Deterministic. Same as for a column function.
>
> As far as I can recall, it was V9.1 for z/OS.
>
> The tested Sybase code that I posted is directly from a tutorial.
>
> For people who appreciate a Data Model:
> http://www.softwaregems.com.au/Documents/Student%20Resolutions/Anders%20DM.pdf
>
> For those who want DDL including the CHECK constraint:
> http://www.softwaregems.com.au/Documents/Tutorial/Subtype_CHECK.sql
>
> And code for the Function used by the CHECK constraint:
> http://www.softwaregems.com.au/Documents/Tutorial/Subtype%20ValidateExclusive_fn.sql
>
> Please let me know how it goes.
>
> Cheers
> Derek
>

Thanks, I'll have a look this weekend


/Lennart


Lennart Jonsson

unread,
Feb 8, 2013, 2:29:42 PM2/8/13
to
On 2013-02-08 10:36, derek.a...@gmail.com wrote:
> On Friday, 8 February 2013 16:53:26 UTC+11, Lennart Jonsson wrote:
>> On 2013-02-04 02:01, derek.a...@gmail.com wrote:
>>
>> I tried to read through this thread, so there might be peaces that I
>> missed. I like the idea that you present, but AFAIK DB2 (LUW) does not
>> support sub queries in check constraints.
>
> Someone else may have said that, but I did not (I don't read or reply all the posts, for reasons that are probably obvious). Also, there are a few non-technical types here who keep on misunderstanding my posts, and keep on posting solutions that do not work, instead of trying to understand my posts, but their posts look like they are trying to understand ... and in those exchanges I gave up trying to correct them.
>
> Neither DB2 nor Sybase allow subqueries in CHECK constraint definitions, and my posted code Sybase does not contain that. However that is exactly what I want, and I gave the method to overcome that "limitation".
>
> The method uses a Function (UDF) which contains the SELECT (which is a straight query, not a subquery) and returns an INT true/false. The CHECK Constraint was a constraint-with-search-condition, which called the Function as an expression (exactly the same as the posted Sybase code, with syntax differences of course). In DB2, the constraint defn is separate (in Sybase it is part of the CREATE/ALTER TABLE). In DB2, the CREATE TABLE named the Check Constraint at the table, not column level.
>

I tried to rewrite the function in a number of ways (db2 9.7 and 10.1
for LUW), but db2 treats it as a sub select no matter what. It might be
possible to fool the compiler by calling a stored procedure that uses
dynamic sql from the function (I'll try later on). DB2 for z/OS probably
does it another way.

I can only speculate on why LUW treats it as a sub select, but LUW
pretty much inlines functions and triggers into the query that is
evaluated. A query like:

insert into ... values ( ... )

can be thought of as:

insert into ... select * from (values ( ... ))
where TRUE

<=>

insert into ... select * from (values ( ... ))
where ValidateExclusive_fn (ProductId, 'B') = 1

<=>

insert into ... select * from (values ( ... ))
where
(select COALESCE(1, 0)
from Product
where ProductId = x.ProductId
and ProductType = x.ProductType) = 1


BTW, is the COALESCE necessary? Either we have a row with tuple(1) or we
don't have a row at all?

I like the idea that you present, always nice to see new ways of doing
things. I'll try to think of a way to get it to work within db2 luw


Cheers
/Lennart


[...]

derek.a...@gmail.com

unread,
Feb 8, 2013, 5:30:46 PM2/8/13
to
On Saturday, 9 February 2013 06:29:42 UTC+11, Lennart Jonsson wrote:
>
> I tried to rewrite the function in a number of ways (db2 9.7 and 10.1
> for LUW), but db2 treats it as a sub select no matter what. It might be
> possible to fool the compiler by calling a stored procedure that uses
> dynamic sql from the function (I'll try later on). DB2 for z/OS probably
> does it another way.

Yes. Coming from the Sybase environment, where there is rigid consistency and predictability, that sort of thing in DB2 (and there are a few), really annoyed me when I was coding. Also the finnicky differences between deployments, within the same version. I appreciate that LUW has to be different due to the executables-only requirement, but not *that* different.

Sybase architecture is such that we only see source SQL; it compiles that and executes a genuine Query tree as an executable, with the stack, etc, as an internal process (think: Unix process). We don't ever handle executables or worry about fixing them all up so that they work together. In that regard I think DB2/LUW has left the province of a data sublanguage.

> I can only speculate on why LUW treats it as a sub select, but LUW
> pretty much inlines functions and triggers into the query that is
> evaluated. A query like:
>
> insert into ... values ( ... )
>
> can be thought of as:
>
> insert into ... select * from (values ( ... ))
> where TRUE
>
> <=>
>
> insert into ... select * from (values ( ... ))
> where ValidateExclusive_fn (ProductId, 'B') = 1
>
> <=>
>
> insert into ... select * from (values ( ... ))
> where
> (select COALESCE(1, 0)
> from Product
> where ProductId = x.ProductId
> and ProductType = x.ProductType) = 1

Uggh!

I don't have a problem with any product parsing the SQL that is thrown at it, and determining a Query Plan or Query Tree for internal execution. All the SQLs have to do some form of that; the high-end ones do an elaborate job; the low-end ones do a mediocre job. Sybase does a phenomenal job, it flattens the query, builds a QP, and then Normalises it.

There's genuine flattening and then there forcing everything into a simple plan or template (sure, you can convert any code into a subquery). LUW seems to be doing the latter (like the low end nonSQLs do), inling everything.

And none of the DB2s Normalise the tree. Before the tree can be normalises, it has to be a valid tree (echoing Codd's Normalisation, unsurprisly!). If the tree is a forced-simple-tree, it is not a genuine tree or a valid one. Here we get to contemplate the consequences of invalid trees.

> BTW, is the COALESCE necessary? Either we have a row with tuple(1) or we
> don't have a row at all?

Correct.

I generalised that [tutorial] code to work with most SQLs (eg. COALESCE() is SQL but Sybase and others had ISNULL() to do the same thing. The syntax and operation of ISNULL is intuitive; COALESCE is non-intuitive.). If we don't have a row at all, the SELECT returns NULL. The COALESCE converts the NULL to zero, in order to RETURN an INT.

I use strict Null handling, to avoid ambiguities and avoidable errors (if you have standards, there is no "Null Problem" in SQL).

> I like the idea that you present, always nice to see new ways of doing
> things. I'll try to think of a way to get it to work within db2 luw

Thank you.

Cheers
Derek

hugoko...@gmail.com

unread,
Feb 8, 2013, 7:42:27 PM2/8/13
to
Hi Derek,

Wow, you sure put in a lot of effort. Thanks for your long reply! I'll try to keep my reply from being overly long by limiting the quotebacks, not responding to any ad-hom (that saves a LOT of time!), and only responding to the main issues.


> 100% plus 100% plus 10 equals 210%

That's not how percentages work. I tried to explain this in a previous post using a simple shopping analogy, but you still fail to see the fallacy in your computation. I can't explain it any better than this. You may want to pick up a highschool maths book and read up on basics. Or you can continue to yell insults my way from your high horse without really listening to what I actually say.


> In the meantime, in this thread, we have two submissions are being compared. Probably not a good idea to keep introducing obtuse issues. Better to stick to the thread.

I though you were interested in implementations that are possible in a full ANSI implementation. I apparently misunderstood. I'll stick to just the two options under consideration in this thread.


> • Declarative means declarative
>
> ••• as identified by E F Codd in the Relational Model
>
> ••• specified in SQL, as DDL
>
> ••• not procedural

Thanks for the definition. SQL Server only has a procedural implementation for functions that return a scalar value. I guess that makes your suggested method unusable for SQL Server. I'm glad I didn't yet do any performance comparisons, for it would not be a fair comparison.


> My question was intended to remain in the logical performance area (you know, count the indices, think about what overhead the constraint and the function would have).

"Count the indexes"? I see. That explains our misunderstandings. You seem to think number of indexes is a relevant performance metric. I don't.
I measure response time, because I care about my end users who want the next screen to load as fast as possible. I measure disk space because I know that the company has no unlimited budget for enterprise storage. And I measure all kind of metrics that are early warning signs for response time and disk space.

But we're in c.d.t. here. In theory, one could measure performance by counting indexes and only thinking about their impact without ever actually measuring them. By that theory, your solution is great.


> (no idea what a "rowset" is.)

You know "row". You know "set". Now combine the two.


> The Auditor will make a declaration such as "maximum transaction size is 1000" or "maximum locks per connection is 100" or both.

You must have inherited some very inaptly designed databases, or be working with very inferior DBMS products and/or hardware. In a properly tuned database on modern hardware, affecting 10,000 or 100,000 rows in a single statement should not be any problem at all. Millions is indeed over the edge during business hours, but perfectly acceptable as part of a batch job during off hours.


> It is up to people like me to write the standards. I did that 20 years ago, and they have not changed,

Ah, that explains why you still work with 1000 as the cutoff number.


> And, yes, that (not knowing your platform specific limitations) would be a very stupid thing to do. You know, come in on a weekend to perform db maintenance; code and test the changes; set 32 parallel threads up; start them off; go for a long boozy lunch; come back in three hours; only to find that 29 threads crashed after 10 mins, three threads are still rolling back. MS type think that that is "normal". We call it subnormal.

Your last experience with the SQL Server platform must be well before my first. Which, for the record, is about twenty years ago now.


> > For SQL Server, using scalar user-defined functions qualifies as a stupid thing.
>
>
>
> So don't use them!

I don't use them, you do. In a solution you present to the world as being "generic" and superior.


> No, *your* as in, _your_ fixation about low-level performance of _your_ code on the _only_ server you know. Qualified professionals have no such problems. Recall the context. You made a declaration about code performance, and you provided excruciating detail about the performance of one plaform.

If you claim something to be better and generic, don't be surprised if someone challenges you on a specific implementation. If it sucks on one platform, it is not generic.


> Second it is relevant to only to one platform. (FYI, although not relevant to the thread, the two that I know well have no such problems, my code does not suck, it flies.)

So it is fast on two platforms. Maybe even faster than the Celko version. (You claim it does, based on your understanding of those platforms, but not based on actual performance comparisons).
And it's probably slower than the Celko version on one other platform (based on my understanding of that platform, since I haven't yet done my performance comparisons either).
Neither of us have sufficient knowledge of other platforms to make claims about them.

Based on the above, I'd call both methods valuable tools that have to be assessed on a case by case basis. I would not bash either method the way you bash Celko's method. Maybe your next job will take you to a platform that performs better with Celko's method. Always keep an open eye, never discard theories, however unlikely they seem, without proper evidence.


> Third, if God ever cursed me, the way so many others are cursed by HIM, and I had to implement Subtypes on MS SQL, I would spend some time and get to know what works well, and what does not, then I would implement Subtypes, using the best method, based on knowledge and actual testing. I would not be writing reams on c.d.t or SQL Central; I would not have to have long discussions; I would not be speculating and contemplating; I would just do it.

So that would probably be the extra indexes and extra columns.

> And whatever I do, it would be standard SQL (not pedantic or obsessed about syntax).
>
>
>
> And whatever I do, it would *NOT* have one redundant column plus one additional index per subtype, plus one additional index per supertype. No, I couldn't justify that to anyone, unless they came from an asylum.

So that would probably not be "the best method, based on knowledge and actual testing".

Hmmmm.


> > Great! Have you done a performance comparison test of your method versus "the Joe construction"?
>
(snip irrelevant rant)
>
> 2. Test
>
> No, I have not done a test on Joe's cancerous method on my platform. That would be very stupid. Given all the benchmarks that I have done, that include Functions, and indices, and duplicate/redundant indices, I already have a good idea of the logical cost. My memory has not failed. I do not take drugs. I do not need to test the same thing, from scratch, every time, to find out, gee whiz, it produces the same results.

The scientific method has five steps: phrase a question, formulate a hypothesis based on prior proven knowledge, do a prediction based on the hypothesis, test the prediction, analyse whether the test results support or disprove the hypothesis. You did the first three things, and now you claim that the reasoning for your hypothesis is so solid and logical that testing is not required.

There is a reason scientists work the way they do.


> > If we look at the logical side only, not caring about implementation details such as storage or performance, both solutions are good. They both achieve what they set out to achieve - ensure that there are no integrity violations in a subtype/supertype relationship.
>
>
>
> That's all ?
>
>
>
> You call that "logic" ?
>
>
>
> Must be a private definition.
>
>
>
> See my response [2] re Test.

Indexes are an implementation detail. Disk space used is an implementation detail. Response time is an implementation detail.
You focus only on the one implementation detail that supports your position, and take a blind eye to the others. That definitely doesn't meet my definition of logic. I'll let others decide on theirs.


> Well, you could look at storage and performance logically, without getting into platform-specifics. But that does not appear to be in your private definition of "logical".

That would mostly appear in my private definition of "utter waste of time". Since the actual storage and performance impact of extra columns and extra constraints veries vastly by vendor, looking at those aspects without getting into platform-specifics would be moot.

(Note that I say "extra constraints" above, not "extra indexes". the code posted by Joe does not call for extra indexes. You suffer the same problem I suffered in some of my previous posts: looking at how *your* two platforms implement a feature and assuming that all others do so. The extra uniqueness constraints Joe adds are redundant, only needed to satisfy a requirement of the SQL syntax. Any vendor that wishes to do so could add in the code to detect redundant uniqueness constraints and avoid the unneccessary extra indexes).


> Last time I ran a very similar benchmark, for an Oracle guru, 2 x 16 million rows inserted took 13.2 secs, and that was on a tiny demo box (laptop), I did not have access to my dev box at the time. Because Oracle carks it with subqueries, the benchmark on that side never finished; the modified version that used derived tables took 30 something mins.

Thirty *MINUTES* ?????
Sheesh!
I guess that makes Oracle a second platform that doesn't like your solution.


> > 3. You then posted your, supposedly better, alternative; and
>
>
>
> No, no, no ! I didn't !
>
>
>
> It is only better, or less overhead, or standard, only to logical IT qualified people.
>
>
>
> Not for the rest. Your statement implies better overall, for everyone, and that is simply not true. I have evidence on c.d.t, of a couple of people who are absent logic, who have private definitions of "logic", that prove it it not "better" for everyone.

Thanks for overloading my sarcasm detector. Surely, any argument that is so dripping with sarcasm has to be true in spite of all other evidence.

Your version is "better, or less overhead" for everyone who considers counts of columns and indexes to be the most important metrics.
For the (apparently ill-informed) people like me, who think response times and disk space are more important, your solution might or might not be better on your two platforms (I accept your claims that they perform well, but since you never tested the alternative there's no way to know if they perform better), on Oracle (though the 30 minute figure you quote suggests that it sucks), or on other platforms.

Comparisons on SQL Server would be unfair. SQL Server can't use either declarative UDFs or subqueries in a check constraint, so your solution cannot be properly implemented at all. The closest one could do is to use a procedural UDF. With the issues I pointed out before.


> > The code I posted proves that there is at least one case on at least one platform where it doesn't.
>
>
>
> No, it doesn't. Read [1] and [2] again, slowly.

If by "It is not a full blown application", you meant "depending on your platform, you may have to add a check constraint on the supertype table and a function that references all the subtype tables to be used in that check constraint - then yes, you are right.

If you meant anything else, you are wrong. I can read [1] and [2] a million times, but if I then hit the Execute button on my SQL script again, I still get the same results. In at least one implementation (SQL Server), check constraints that use a UDF are only checked when rows are updated or inserted, and only for those rows. That means that in at least one implementation (SQL Server), you *need* that extra check constraint and extra function.

I may not be as smart as you (hey, you say that so often that is simply has to be true, right?), but I was at least smart enough to see this potential problem. Other people might take your words at face value, copy your code, make the minor modifications to make it run on their platform and implement it. Then, depending on their test matrix, they either have wastes some development time, or they may introduce inconsistent data in their database.

If you claim your code to be generic, then at least add a disclaimer that some platforms require an additional check constraint on the supertype table.


> I didn't slap you, I informed you that one had a transactional context (there's that word you do not understand again) and the other did not. And it is for you (assuming you can write the required code), to apply the transactional context when required, especially after that it pointed out to you.

With a transactional context, there can still be bugs in the code. If that were not the case, we would not need to define any constraints in the database at all - but we do, and for good reasons.

If we define a constraint and then still can execute code that introduces a constraint violation without raising an error, it's a bug. On at least one platform, your code has a bug, and fixing that bug requires the addition of an extra check constraint and an extra function (that includes as many subqueries as there are subtypes).



> > > We can implement Declarative Constraints to ensure that a child cannot exist without the appropriate parent (CONSTRAINT Child.FK REFERENCES Parent.PK), but we cannot do so to ensure that at least one child exists for each parent. How do you implement that ? In Standard SQL, please, not worrying about syntactical differences.
>
> >
>
> > In standard SQL, this *is* in fact possible with declarative constraints. On the OrderItems table, add a constraint CHECK (EXISTS (SELECT * FROM Orders WHERE Orders.OrderID = OrderItems.OrderID)) [actual syntax may be slightly different; I never worked on a product that supports subqueries in a CHECK constraint so I don't know the exact syntax for implementing the correlation]. Then make this constraint deferred (unless you want a very complicated way of making the two tables read-only).
>
> >
>
> > In a real database, this only works if both subqueries in CHECK constraints and deferrable constraints (feature F271 in SQL:2003) are supported.
>
>
>
> That is a contradiction

Prove. Or retract.

But you might want to reread first. To help you understand, here is a short overview of my logic:
[1] You can ensure that a parent has at least one child in ANSI standard SQL, using (list of specific features).
[2] In an actual implementation, you can only use this method if that implementation supports (list of optional ANSI features used in [1]).
[3] Hence, an implementation that does not support those features cannot use the method suggested in [1].


> Real databases do not support deferred constraint checking.

By what definition of "real databases"?
Are you even aware that deferred constraint checking is ANSI standard?

> Real databases support standard Subtypes, with full Data & Referential Integrity (both directions, yes!) without deferred constraint checking.

Doesn't that make your whole function-based idea redundant?



> > > It is worth noting that the seriously deranged will implement an FK in the parent, that references the child, which results in a circular reference, and then insist that the platform must support "deferred constraint checking". These people have no sense of Rules, or Order of Rules. But those of us who don't have that form of insanity in the first place, don't have the demands in the second place. DB2 and Sybase do not have "deferred constraint checking"; we have enjoyed secured databases with full data and referential integrity for 35 years without it. Typically the platforms that provided "deferred constraint checking" did not have transactions, and were not SQL anyway. Some of them have implemented a weird form of non-ACID non-SQL "transactions" since then. Therefore as stated, SQL only, and no weirdness with non-SQLs that provide asylum facilities.
>
> >
>
> > A foreign key instead of a check constraint would even be better, indeed. So, does this make me "fully deranged"? And if so - why?
>
>
>
> Er, the foreign key that you suggest *IS* the circular reference.

Yes. And any other implementation of the same business rule would be the same circular reference, just less visible.

You dismiss deferred constraint checking without any argument (other than "real" databases don't do it), then say that the very problem they are designed to solve is unsolvable.

But this is getting off-topic, so I'll stop here and return to the main issue.


> > I am fully aware that your function-based framework does not force people to insert a row in the ProductBook table for a product of type 'B', or to insert a row in the ProductCD table for a product of type 'C'.
>
>
>
> Er, it does force exactly that !
>
>
>
> You say you can read and understand code, but the evidence is, you cannot! Maybe you can read and understand isolated code snippets. But you definitely cannot understand the effect, the overall total logic, of all the posted code snippets working together in an integrated "system".

My ability to read code is obviously way better than your ability to read English. Please look at what I actually wrote, not at what you want to believe I wrote.

"your function-based framework does not force people to insert a row in the ProductBook table for a product of type 'B'" - in case you still don't understand, let me illustrate this using an example:

INSERT Product VALUES (99, "B", "Gone with the Wind")

This statement inserts a product of type 'B'. It does not raise an error. In other words, it does not force me to insert a row in the ProductBook table.
So again, I am fully aware that your code allows this. And that Joe's code allows it too. No difference between the two methods here.


> I agree it is not allowed. But the simple evidenced fact is, you have allowed it. Further proof of my point immediately above.

I assume that by "you have allowed it" you mean "I [Derek] posted code and claimed it to be generic, you [Hugo] saw that it has an issue on at least one platform, you [Hugo] then posted a simple script to prove the issue, and then you [Hugo] outlined how to fix this issue and warned about the performance implications".

Also note that the code to fix this is far longer than the code you posted. It's not a simple, small change. If it were, I wouldn't be making such a fuss about it. This is too big to be dismissed as minor platform-dependent changes.


> Now please, we do not need to hear about what one idiotic, broken platform does or does not do with regard to performance.

You claim that your solution is generic and better. I disprove that statement by proving it is not better in at least one case. And now you try to refute that proof by calling that case "idiotic" and "broken".
That is logic I cannot refute. I bow to the sheer logic of these arguments.

P.S.: Don't forget to exclude Oracle as well, given the 30 minute duration you measured on your test. It only has the largest market share overall, and MS SQL Server has the largest market share on the Windows market, so who cares about those niche products, eh? It works on the two platforms you happen to have access to, so who cares about the rest. And you can still call it generic because it works on more than one platform.


> > 1. < portions irrelevant to a logical evaluation of the subject <snipped> >
>
> >
>
> > 2. < portions irrelevant to a logical evaluation of the subject <snipped> >
>
> >
>
> > Add 1 and 2 together, and ... < portions irrelevant to a logical evaluation of the subject <snipped> >

Earlier, you wrote that tests are not necessary because you can predict performance based on logical understanding of how constructs are implemented in a specific database. Now you label my explanation of how constructs are implemented in a specific database as irrelevant.

Please, make up your mind. Don't switch between positions all the time.


> > And the number of statements in the function is not related. See for example this blog post, where I demonstrate the performance difference with a UDF to triple the input value: http://sqlblog.com/blogs/hugo_kornelis/archive/2012/05/20/t-sql-user-defined-functions-the-good-the-bad-and-the-ugly-part-1.aspx
>
>
>
> Nah, it is not for me. The article takes far too long to explain simple things that logical people can understand in one sentence.

In my blog posts, I try to ensure that everyone can understand me. Smart people can glance over it, zoom in on the relevant bits and skip the rest. Normal people can read the rest as well. I write like that because I try to get my message across to as many people as possible.

If I had a deep desire to come across as way superior to everyone else, I would probably write differently.


Cheers,
Hugo

hugoko...@gmail.com

unread,
Feb 8, 2013, 8:13:12 PM2/8/13
to
Op vrijdag 8 februari 2013 23:30:46 UTC+1 schreef derek.a...@gmail.com het volgende:

(snipped most)

> I generalised that [tutorial] code to work with most SQLs (eg. COALESCE() is SQL but Sybase and others had ISNULL() to do the same thing. The syntax and operation of ISNULL is intuitive; COALESCE is non-intuitive.). If we don't have a row at all, the SELECT returns NULL. The COALESCE converts the NULL to zero, in order to RETURN an INT.

1. You don't have to convert NULL to zero in order to return an int. Any data type in a relational database should support a NULL value that is distinct from any other value allowed by the data type, so you can return NULL as an int.
2. The SELECT actually returns an empty set, not a NULL. It is converted to NULL as part of the implicit cast required in order to RETURN an int value.
3. In order to get the COALESCE to do what you claim it does, you need to change the order: RETURN COALESCE((SELECT ...), 0). This will convert the empty set to NULL, then replace it with zero.

Proof:
SELECT dbo.ValidateExclusive_fn(1234, 'B')
returns NULL


If it returns zero on other database platforms, it is either a bug or a deviation from the ANSI standard.


I think the code in the function can be simplified to omit the COALESCE, and the CHECK constraints changed to CHECK (dbo.ValidateExclusive_fn (ProductId, 'B') IS NOT NULL)


Cheers,
Hugo

derek.a...@gmail.com

unread,
Feb 8, 2013, 9:18:56 PM2/8/13
to
On Saturday, 9 February 2013 12:13:12 UTC+11, hugoko...@gmail.com wrote:
>
> <snipped entire load of insanity>

Hugo

You have a really disgusting habit of trying to drag me into your insanity.



Everyone Else

Hugo evidently does not understand syntactical differences between platforms, and he is obsessed with syntax that does not work on his platform (in this post as well as in the entire set of his posts in this thread), and he is unable to resolve such minor issues. I trust other readers are not so crippled.

As stated, the original code works on Sybase, using ISNULL. As stated, I converted it to work on MySQL for someone who implemented my Subtype structure there. It does not have ISNULL, but it does have COALESCE. I converted ISNULL to COALESCE (which required syntactical changes and which are not worthy of explanation, except for the insane). Sybase has COALESCE, so I tested the converted code. As stated, since COALESCE is in the standard, and ISNULL is not, I posted the COALESCE version, so that more people, rather than less, could use it. My MySql colleague did not write back re any problem (syntactical or otherwise), he only confirmed that the whole structure was tested and coded and it worked.

There is no suggestion that the example provided will work on every SQL platform, as is, unchanged for syntactical differences between platforms. I will not be posting examples for each platform, because that will only be relevant to babies who cannot resolve minor issues, and because it will elevate minor syntax issues to the relevance of blog posts.

People like Hugo need to be told in excruciating detail, how to write simple code; I have not provided that level of direction. Most developers don't need it; they can read my code and make whatever minor syntactical changes are required without having to write a blog post about it, or wail about syntax on one platform not working on some other platform. The seriously deranged will pose that (their insane mental spinning) as if it were a fault in someone else's example code.

Cheers
Derek

Lennart Jonsson

unread,
Feb 8, 2013, 10:01:22 PM2/8/13
to
But is COALESCE() evaluated at all? We have two cases:

[1] SELECT COALESCE(1,0) from LATERAL(VALUES(1)) WHERE 1=0

[2] SELECT COALESCE(1,0) from LATERAL(VALUES(1)) WHERE 1=1

The result of [1] should be an empty set, and the result of [2] should
be {(1)}, so:

SELECT 1 from LATERAL(VALUES(1)) WHERE ...

looks equivalent. I assume that the empty set in [1] is transformed into
null in the equals predicate:

(SELECT 1 from LATERAL(VALUES(1)) WHERE 1=0) = 1

null = 1

null

If not mistaken a check constraint must not evaluate to FALSE. My
question was therefor whether the construction should have been:

COALESCE((SELECT 1 from LATERAL(VALUES(1)) WHERE ...), 0)


/Lennart





derek.a...@gmail.com

unread,
Feb 9, 2013, 1:48:03 AM2/9/13
to
On Saturday, 9 February 2013 14:01:22 UTC+11, Lennart Jonsson wrote:
>
> > I use strict Null handling, to avoid ambiguities and avoidable errors (if you have standards, there is no "Null Problem" in SQL).
>
> But is COALESCE() evaluated at all? We have two cases:
>
> <snipped>
>
> If not mistaken a check constraint must not evaluate to FALSE.

Yes. The detail you presented is good for novice coders; just that sentence is the relevant issue to be addressed. The intent of the function is to return zero for the empty set. Taking into consideration that the reader is responsible for making any changes that are required to make that function work on their platform, we could say, that is the end of the story, the rest is coding detail issues.

(My example CHECK Constraint specifically requires 1, but I think you understand all that)

Thank you for giving my words attention.

> My question was therefor whether the construction should have been:
>
> COALESCE((SELECT 1 from LATERAL(VALUES(1)) WHERE ...), 0)

(Apologies, I did not get that from your previous post.)

I looked at the function code again with your comments in mind, and I realised that it may be "too clever", and therefore may need some explanation.

I am not going to divert and give explanations for each of them, but keep in mind that SQL requires that COALESCE is implemented as a Function, however, that does not limit the scope of any particular implementation. So for most SQLs, you still need a SELECT outside your code above:

SELECT COALESCE(
____(SELECT 1 from LATERAL(VALUES(1)) WHERE ...)
____, 0)


In Sybase we check for existence by:
SELECT 1
____FROM Product
____WHERE ProductId = @ProductId
____AND ProductType = @ProductType

The 1 is recommended over asterisk. The WHERE clause in this case grabs one row (or not), but in many cases of existence checking there is more than one row (thousands, whatever). Unlike some SQL platforms, in Sybase the 1 will cause the existence check to succeed on the first qualifying row, and the remaining [thousands of] rows are not scanned or processed by the back end.

So what we need is:
--[1]--
SELECT COALESCE(
____SELECT 1
________FROM Product
________WHERE ProductId = @ProductId
________AND ProductType = @ProductType
________)
____, 0 )

I think you will be totally with me, so far, yes ?

Ok, now for the Schnitzel mit Rot Kohl und Wein.

If you know that COALESCE is a function that returns **the first** non-Null element from a list; that a function can be used wherever a column or value can be used (the value being 1 in this case); that if the existence check fails, the inner SELECT will return NULL instead of 1; **and** you have Sybase or MySQL, we can write the above as:
--[2]--
SELECT COALESCE( 1, 0 )
____FROM Product
____WHERE ProductId = @ProductId
____AND ProductType = @ProductType

The placement of COALESCE binds it to the SELECT. The SELECT returns 1 or NULL. The COALESCE does nothing if it is 1 (because I placed it first in the list), but it converts a Null to 0. Tested on Sybase ASSE 15.0 and 15.5.

Since my MySQL colleague whom I posted that to, didn't mention anything, except that all the functions and DDL and transaction code worked as planned (implemented from my directions), I can only presume that either the above worked as is, or that if it did not, given that he understands the intent, the minor change required was **so** trivial that it was unworthy of mention. I know that it worked on MySQL but I don't know if it worked in that exact, literal, syntactical form.

I have no idea if DB2 can handle [2]; I would hazard a guess that z/OS would but LUW wouldn't. If it doesn't, revert to [1]

Of course this is a simple example; in large applications, the SELECTs are "more complex". I have noticed in my dealings with many MS SQL teams, that they always code SELECT [1] with (eg) 16 subqueries, where I code a single SELECT [2] with no subqueries. MS SQL (up to 2008, no idea what the latest MS abortion does) has very much the same Optimiser as Sybase 12 (superceded in 2007); it handles [2] beautifully. When I ask them why they did not know that, why they had not thought it out, applied logic and tried that, they always give the same answer: they read it in some blog on SQL Central or SQLTeam or the back of a bus, and they did not think for themselves. Great for herd animals being lead around by the nose. Not only do blog posters have a lot to answer for when they die; but blog readers have a liability in their own enslavement.

(In case it needs to be said, that para is not aimed at you, Lennart, you are precise and thoughtful.)

Thanks again for pointing that out to me. I will think about adding explanatory notes to my posted function.

Cheers
Derek

hugoko...@gmail.com

unread,
Feb 9, 2013, 4:45:10 AM2/9/13
to
Derek:

Calling someone's arguments insane is a very convenient way to avoid having to address them. Most commonly used by people who have no other refute.



Everyone else:

This topic presents two possible methods to implement a supertype/subtype hierarchy in a relational database. One based on extra UNIQUE and FOREIGN KEY constraints, the other one based on CHECK constraints that call a user-defined function.
Both methods have the same goal: presenting details in a subtype that doesn't correspond to the type indicator in the supertype table (e.g. providing book details for a product marked as CD). Both methods achieve that goal, provided you put in the effort to adapt the code to your specific platform. This effort can vay from minor code changes in most cases, to major additions in some cases.

Derek claims that his method is better. He bases this observation on counting indexes. He is so convinced of the logic of his arguments that actually testing the two alternatives would be a waste of time.

I claim that both methods have their merits. Which one is better depends on how specific platforms implement UNIQUE and FOREIGN KEY constraints, how specific platforms implement CHECK constraints that call a user-defined function, and (most important) on what metrics are relevant in your situation.

If you ever have to implement such a scenario, here are your options:

1. Pick the one you like best, based on whatever reason is important to you. Don't consider the alternative. This guarantees that you'll save some research effort, but not much else.

2. Set up a test environment and implement both. Test both (don't forget to include a test case where the type indicator in the supertype table changes when is present in the subtype table - that change should be rejected). Then define what performance characteristics are important in your situation and set up and run a test with a representative load to measure the relevant performance characteristics.


Derek also made a lot of explicit and implicit statements about my mental condition, my intelligence, my reading ability, my intentions for participating in this discussion, and so on. I trust that the readers in this group are smart enough not to falll for this fallacy.


Cheers,
Hugo

derek.a...@gmail.com

unread,
Feb 9, 2013, 5:14:47 AM2/9/13
to
On Saturday, 9 February 2013 17:48:03 UTC+11, derek.a...@gmail.com wrote:

The Russian Circus showcases a cunning array of stunts

The gypsies in Kolland expose a ...

Cheers
Derek

Lennart Jonsson

unread,
Feb 9, 2013, 1:36:01 PM2/9/13
to
On 2013-02-09 07:48, derek.a...@gmail.com wrote:
[...]
> So what we need is:
> --[1]--
> SELECT COALESCE(
> ____SELECT 1
> ________FROM Product
> ________WHERE ProductId = @ProductId
> ________AND ProductType = @ProductType
> ________)
> ____, 0 )
>
> I think you will be totally with me, so far, yes ?
>

Yes, I think we are in agreement all and all. I'll just lay of a few
more thoughts below.


> Ok, now for the Schnitzel mit Rot Kohl und Wein.
>
> If you know that COALESCE is a function that returns **the first** non-Null element from a list;
> that a function can be used wherever a column or value can be used (the value being 1 in this case);
> that if the existence check fails, the inner SELECT will return NULL
instead of 1; **and** you have Sybase or MySQL, we can write the above as:
> --[2]--
> SELECT COALESCE( 1, 0 )
> ____FROM Product
> ____WHERE ProductId = @ProductId
> ____AND ProductType = @ProductType
>

The following are my arguments for why I believe that COALESCE((select
...),0) should be the norm, and that select COALESCE(0, 1) is an
exception that happens to work in some products. Not terribly important,
we can agree to disagree :-)

I think that it is mandated that we select FROM something (atleast
SQL2003). I tried to find where it is stated in SQL2003 and SQL2008, but
boy, those papers are not for the heart of fainted. Cheating a bit and
checking with:

http://developer.mimer.se/validator/parser200x/index.tml#parser

indicates that this is the case:

select 1;
^
syntax error: ;
correction: . * FROM <identifier> ;

Furthermore we are allowed to declare a function like:

CREATE FUNCTION f (x INT, y CHAR(1))
RETURNS SMALLINT
LANGUAGE SQL
DETERMINISTIC
READS SQL DATA
RETURN COALESCE((select 1 from T),0);

Conforms to Core SQL-2003

But not:

CREATE FUNCTION f (x INT, y CHAR(1))
RETURNS SMALLINT
LANGUAGE SQL
DETERMINISTIC
READS SQL DATA
RETURN select COALESCE(1,0);
^----- ^
syntax error: RETURN select
correction: BEGIN select
syntax error: ;
correction: . * INTO <identifier> FROM <identifier> ;


> The placement of COALESCE binds it to the SELECT. The SELECT returns 1 or NULL.
> The COALESCE does nothing if it is 1 (because I placed it first in the
list),
> but it converts a Null to 0. Tested on Sybase ASSE 15.0 and 15.5.
>

My understanding is that a query is - logically - evaluated in the
following order:

1. FROM (JOIN)
2. WHERE (ON)
3. GROUP BY -- irrelevant here
4. HAVING -- irrelevant here
5. SELECT (Projection)
6. ORDER BY

The DBMS is of course free to apply another evaluation order, as long as
it preserves the semantic of the query

Also COALESCE is a function from [a', a', ...,a'] -> a', for simplicity
I'll use [int, int] -> int

Given the query:

SELECT COALESCE(1, 0) FROM T WHERE 1=0

there is nothing that we can apply COALESCE on. We have to create
something from nothing in order to return 0.


Perhaps it is just a matter of definition, the standard is rather vague
on a number of corner cases (I have a vague memory of a discussion
concerning partitions of an empty set, it turned out that you could
implement an operation (grouping sets(?)) with several semantics without
violating any rules).

However, to me the construction COALESCE((select ...),0) is more obvious
than select COALESCE(1,0) ...

> Since my MySQL colleague whom I posted that to, didn't mention anything, except
> that all the functions and DDL and transaction code worked as planned
(implemented from
> my directions), I can only presume that either the above worked as is,
or that if it did
> not, given that he understands the intent, the minor change required
was **so** trivial
> that it was unworthy of mention. I know that it worked on MySQL but I
don't know if it
> worked in that exact, literal, syntactical form.
>

Some things I'm pretty sure that they had to change. MySQL does not
support CHECK constraints. A common trick I've seen people use to mimic
CHECK constraints is an in-memory table such as:

create table whatever ( msg varchar(255) not null primary key )
engine=MEMORY

and then insert and update triggers that does a primary key violation in
that table when the condition of the "CHECK" constraint triggers are
violated. Now we are talking ugly ;-)


Cheers
/Lennart

[...]

derek.a...@gmail.com

unread,
Feb 9, 2013, 7:34:51 PM2/9/13
to
Lannert

On Sunday, 10 February 2013 05:36:01 UTC+11, Lennart Jonsson wrote:
>
> Yes, I think we are in agreement all and all. I'll just lay of a few
> more thoughts below.

Sure.

> The following are my arguments for why I believe that COALESCE((select
> ...),0) should be the norm, and that select COALESCE(0, 1) is an
> exception that happens to work in some products. Not terribly important,
> we can agree to disagree :-)

Yes.

Agreeing to disagree is the closure of an unresolved argument, for gentlemen. There is no conflict here; no right or wrong to be concluded. Just a discussion, to further understanding. Consider the arguments that are set forth, and choose your position carefully and consciously, not as a result of propaganda or passive contamination.

> I think that it is mandated that we select FROM something (at least SQL2003).

There is the rub.

1. From personal interaction, I know that the SQL Committee Is made up of capable, qualified people from vendors (I was a Sybase Partner from 1997 to 2009), as well as unqualified people, as well as lunatics who think themselves genius, as well as pure subversives (I have had years of written communications with some of them). Each has their own interests in mind, and each is in mental war with the others (notice the circus performers on this thread). Getting them to agree to *anything* is a nightmare. Often, they are merely formalising a feature that some vendor or two has implemented years (decades?) before. So the result is commonly contradictory.

They *mandate* insanity. Eg:
• mandate tight ACID Transactions *and also* mandate methods to break ACID, and to break transactions
• mandate the correct (logical, required for an implementation to work) handling of Nulls *and also* mandate the insane null handling (that does have a theoretical abstract basis, but is idiotic in any implementation)
• and due to their poor understanding of the Relational Model, mandate all kinds nonsense that they allege to be mandated by the RM (which is not!)

So the implementer who is sane, clear of mind, does not follow the SQL Standard as "mandated". The Order in the Universe is God, my mind, then the SQL Committee. I look at the mandate, have a good laugh, remember fondly the days when professionals were proud and not at war. I implement Null handling that is logical for an implementation (half the mandate, but not the second half, that contradicts the first half). I had transaction standards that were developed and perfected in the 1970's, by IBM, rewritten for Sybase, which I could not be asked to contravene by a bunch of circus performers. So I implement the ACID Transactions half, and (by Standard decree) reject the insane half of the mandate. So my implementations remain sane and non-contradictory.

Specific case. Sybase implements the insanity "Null Problem" as mandated, because it is a compliance requirement. But that is not the final word, for engineers who are human and who have not been separated from their brains. They provide us with a session-level option ANSI_NULL_BEHAVIOUR. The default setting is OFF. The behaviour when it is OFF is:
• NULL does not equal anything in the range of values for the subject Domain
• NULL equals NULL
• NULL does not equal NOT NULL
• the empty set returns NULL
thereby implementing sanity and predictability and allowing simplified Null handling in app code (if other standards are used, but let's not get distracted by tangents)

Therefore we work in an environment that is sane, and coding is simplified. Most developers do not know that there is a "Null Problem". If they are one of the few that do know, they know it is a problem in the SQL Spec, it is not a problem in their platform, and they do not argue the theoretical existence of the "Null problem". The criminally insane, who demand that they should be able to write insane code, and produce unpredictable results, are free to set the option to ON, and enjoy coding insanity, they like bathing in excreta, as insane people do. Generally our hiring practices ensure that we do not hire such people.

Oracle, which does not genuinely comply with the SQL Standard in more important areas, complies with the standard in this area, and provides no relief for sane people. So in that environment, the "Null problem" is very real, and fixed in their platform; they have to concern themselves with it in every code segment. The product is excreat, but the marketing is excellent.

Another context. Say a bank hires me to re-write their existing system, which "works" in the application sense, but that is broken, in the areas of contention; concurrency; difficulty to isolate and fix code problems; etc. This has happened many times, the assignment includes elevating the mindset of their development team, in order to ensure that they do repeat the first insanity. Half way through the project, a senior manager insists that I implement a "transaction" that updates millions of rows because it a "business requirement". I do not get into arguments with the insane. I write up the requirement, with the specific notice that it contradicts the original contract signed by him, and (as required by the original contract), ask for his and the auditors signature, identifying that the contradiction is in his domian, not mine, and that I will not let him contaminate his contract or my project with it. I explain only that without contravening *his* contract, he can have the "business requirement" that he wants, but he can't have it his *way*. They never sign the change to the contract.

Say the priest tells me that this month, my work detail is to feed the homeless at the church kitchen: I will jump to perform the task, and make sure that the women do not carry heavy loads, etc. Say he comes to me and tells me that I should send my niece to him for private tuition on Saturday afternoons: I will look him directly in the eye, and tell him that if he ever says that to anyone in my congregation, I will kill him, very slowly, and feed his organs to the birds.

My righteousness does not come from the SQL Committee, or the purchaser of my services, or the priest. It comes from, the Order in the Universe is:
• God first
• Society (the real physical one, not the virtual one) second
••• family, as part of that
• pre-existing standards (produced by greater minds than mine, before I was born) third
• and the clarity of an uncorrupted mind, to evaluate representations, and to choose carefully, fifth.
Since my memory is intact, and I remember history, I am well aware of the consequences of each such representation, and being older than a toddler, I know I am responsible for the consequences of whatever I choose. If the SQL Committee or the senior manager or the priest tell me something that fits into that domain, I accept it as *sixth*; if the request or mandate contradicts that domain, I reject it, because it violates the Order of nature, God, logic.

Therefore, my implementations have never had the "Null Problem" or insane "transactions", etc.

I keep track of the progress of the SQL Committee, but I do not accept that the mandate is a mandate for app implementers, as presented by the saboteurs. The vendors mostly do the same, but there the "mandate" is a mandate, so they are required to implement the insanity and contradictions, depending on how separated they are from their brains (which allow us to implement insanity and contradictions). But in many cases, their feature is already written, they do not-recode to comply with the mandate, they declare partial compliance (compliance by intent but not by word), and their customers are quite happy.

My vendors have implemented the "Null Problem" because it is a vendor mandate, but in a manner that does not cause insanity in the platform. The reasonably capable and intelligent (say IQ 90) write code that does not violate logic. The unqualified and the not so intelligent, and the God-less, do not understand the "Null Problem", and therefore implement insanity. Especially if they read blog posts, or they have never read history.

The mandate is really aimed at the vendors, not application implementers. There are many books and blog posts that advocate the mandate *for us*. There are many books and blog posts that advocate portability. It is a joke, they are written by subversives and saboteurs. If you really understand databases and systems; the nature of the "mandate"; the relevance of "portability", they are a sad joke.

For the insane database and application implementer (Hugo for instance, but I doubt he has ever written a system for payment), they view the SQL mandate *for vendors* as a mandate for themselves, because it validates their insanity. They wave it around like a flag of honour and obsess about how to write more blog posts to influence others to write contradictions and insanity into their systems. And they laugh at those of us who listen and obey. And they scream at those of us who reject insanity, and furiously wave the flag that means nothing to us, because it constitutes the reason for existence to them.

Something that sane people should keep in mind when dealing with insane people. They are pretending to be sane; their "reality" is fragile. They seek recognition and most of all, validation, from sane people, because (a) they secretly know they are insane, children of the devil, and (b) like vampires, they need validation from us, to constantly support their fragile "reality". And if they do not get it, it is a matter of life and death, the death of their reality. So they vociferously present insanity as "logic"; they wave every flag that one of their fellow devil-worshipping saboteurs wrote as a "mandate". Do not give them any recognition or validation, let them shrivel up and die without your blood.

Nietzsche is famous, especially if you look up the entry in the bible of madness. The problem with his writings is, they are referenced most during the criminal trials of serial killers; used as a defence argument; that every one has the right to "experience" the darkest parts of themselves, in order to "feel whole". Anyone who is close to society will realise that that is dangerous to society. Anyone who reads something deeper than wiki will find out that he spent most of his life locked up in an institution for the insane, he comes from a species that is inbred for thousands of years, that practices devil-worship; cannibalism; paedophilia; bestiality; etc. Anyone who is close to God will realise that he is a child of the devil, against God's Word, against Nature. But those forces that subvert and destroy human society market this insane persons insane cravings as "self actualisation", and "philosophy". So more and more normal people are infected with insanity, under the guise of something that it is is not.

You are neither insane, nor incompetent, but you have not recognised the constant propaganda of sabotage for what it is. You do not have an Order or Hierarchy in your life (eg. God, society, yourself, and last, the insane) and therefore you do not have protection from the forces of evil. And you have taken the propaganda of the saboteurs and subversives to heart.

I am asking you to get closer to God; closer to your race, religion, culture, language, history; closer to those who came before you, who are proud of their work, and did not have to explain it; closer to honour. See all these issues (above) for what they are; seek clarity of mind that comes from authority (the [d]evil will tell use that everyone is equal; we are the centre of the universe; we can think for ourselves in isolation. That is a guaranteed path to mental illness and separation from human society). Do not do anything, or accept anything, that violates that Order. The insane will demand that we think in an isolated, fragmentary way; I ask you to reject anything that is not integrated with the whole. Then we can enjoy integrity of mind.

That's the context all of us work in. Back to the specific context of the question.

No. SELECT does not require a FROM. The mandate of the RM is higher than the vendor-mandate of the SQL Committee. SQL is data sublanguage defined in the RM; it is the servant of the RM, not the other way around.

SELECT "My Report Header"
SELECT <column_list>
____FROM <joined_tables>
SELECT <total_column_list>
____FROM <joind_tables>
SELECT variables = <total_column_list>
____FROM <joind_tables_2>
SELECT "Totals"
____FROM variables

The RM defines the source as Domains, not tables or entities.

SELECT 2 + 3

> I tried to find where it is stated in SQL2003 and SQL2008, but boy, those papers are not for the heart of fainted.

Yes. And the 2012 Spec is twice the size. Good for unemployed lawyers and the criminally insane. Required reading for the vendors. Irrelevant to application and database implementers (there are good articles that inform us, using one thousandth the words; these are relevant).
Well, that is a brain dead parser, it proves nothing, except that that parser parses.

It is not physically possible to parse the total requirement of SQL (any version of the standard, eg. SQL 2008), because it is self-contradictory. If a parser is written, that allows all that, then it is not a parser, it is a syntax-checker (fraudulently promoting itself as a "parser"). Syntax is irrelevant, every vendor has a different syntax for whatever feature is required for compliance.

What you are trying to prove is not relevant to me, or to the explanation I provided in my previous post.

The direction I give you is:
• read and understand my example code
••• anything that you do not understand there, is worthy of question or discussion
• figure out how to implement that on your platform (without any regard to the SQL Standard because it does not apply to you, and without too much regard to "potability", both of which are heavily marketed and propagated by those who sabotage any human endeavour)
••• anything that you do not understand there is not worthy of question or discussion, because you clearly have all the resources to figure out (a) the issue and (b) the resolution, for yourself
• educate yourself re genuine standards (eg. Transactions, the elimination of the "Null Problem")
• forget about the SQL Standard because it applies to vendors, not to implementers
• learn to recognise deceit and sabotage, and reject them

> Perhaps it is just a matter of definition, the standard is rather vague
> on a number of corner cases

No kidding!

> (I have a vague memory of a discussion
> concerning partitions of an empty set

The proposition is absurd, do not give it further thought or energy.

> it turned out that you could
> implement an operation (grouping sets(?)) with several semantics without
> violating any rules).

You are giving it your blood, that it feeds on, in order to exist as something relevant to implementers. I don't.

> However, to me the construction COALESCE((select ...),0) is more obvious
> than select COALESCE(1,0) ...

Definitely.

Then code that, always, and forget about code that works on other platforms.

> Some things I'm pretty sure that they had to change. MySQL does not
> support CHECK constraints. A common trick I've seen people use to mimic
> CHECK constraints is an in-memory table such as:

No. MySQL (which is not SQL anyway, and we really should not be discussing this, because it gives relevance to some minor coding issues) has two "engines". The other one supports more of SQL, such as the constraints required. MySQL is a broken toy, the other engine is a broken but serious toy, but we can make it work, if we stick to Standards.

No triggers were used anywhere in the application. He did use them initially, because that was the best he knew, and his transaction context was poor. I gave him my transaction context and rewrote all the non-transactions for one table cluster, as functions (in order to obtain transactions in the stupid thing, because functions are transactional and everything else is not!). Following which he rewrote all his code as functions, and eliminated the triggers.

The almost-final version of the Data Model is here:
http://www.softwaregems.com.au/Documents/Student%20Resolutions/Mark%20Simonetti/Mark%20DM%20V0_11.pdf
After that, I provided direction for about two months offline, which included about 12 progressions (minor-minor releases of the DM), during which time he completed all the code.

As you see, there are not only subtypes, but subtypes of subtypes. All defined via declarative constraints; all protected by transactions (functions in his case). He did spin his wheels a bit due to circular references (an insanity that his platform supports, along with the deferred constraint checking that is required to sort-of resolve the insanity)), but after I explained the insanity of circular references, the wheel-spinning ended. His mind was clear, but contaminated by the propaganda (like yours); I removed the contamination. He figured out almost everything himself after that.

Generally, we did not discuss specific syntax; there were only two occasions where I researched the MySQL documentation and discussed it with him.

The whole database and app was written with portability in mind, but we did not obsess about specifics, for reasons explained above.

> Now we are talking ugly ;-)

Yes. Don't do that. Figure out a method that does not cause nausea or blindness.

Please understand one thing. At no point did I state that my code is better, or that others should code as I do. Therefore I do not need to defend it. I was merely explaining what my code does, to someone who had questioned it.

By way of explanation re the core question (not the minor implementation details that the contamination has programmed you to obsess about), I offer the following. I'll give you another perspective. Paraphrase, not code:

Let me assume you understand Codd's concept of Domain. I want the intersection of the Domain of EXISTS{ TRUE | FALSE } and the Domain of NOT_NULL(). COALESCE gives me NOT_NULL(). The SELECT gives me EXISTS{ TRUE | FALSE }. It does not matter if I code
____EXISTS{ TRUE | NOT_NULL() }
or
____NOT_NULL( EXISTS{ TRUE | FALSE } )

I will get the same result. The Optimiser can figure it out, and it does not matter which Order it Chooses.

What would you say, if I coded:
--[3]--
SELECT 1 * COALESCE( 1, 0 )
____FROM Product
____WHERE ProductId = @ProductId
____AND ProductType = @ProductType

COALESCE means the English word coalesce. "select the first non-Null value from a list" is a good definition for a Reference Manual, it cannot comment on the context of its use. Here it means, collect all those things together and give me a 1 or a 0.

[3] is Normalised version of [1]. [2] is a Normalised version of [3].

Thank you for your attention.

Cheers
Derek

derek.a...@gmail.com

unread,
Feb 9, 2013, 7:43:30 PM2/9/13
to
On Saturday, 9 February 2013 20:45:10 UTC+11, hugoko...@gmail.com wrote:

I cannot add to, or subtract from, God's curse against your behaviour

http://www.kingjamesbibleonline.org/Isaiah-5-20/

James K. Lowden

unread,
Feb 10, 2013, 7:38:51 PM2/10/13
to
On Sat, 09 Feb 2013 19:36:01 +0100
Lennart Jonsson <erik.lenna...@gmail.com> wrote:

> My understanding is that a query is - logically - evaluated in the
> following order:
>
> 1. FROM (JOIN)
> 2. WHERE (ON)
> 3. GROUP BY -- irrelevant here
> 4. HAVING -- irrelevant here
> 5. SELECT (Projection)
> 6. ORDER BY

Except #6. ORDER BY, strictly speaking, returns a cursor, not a
virtual table. Logically, it's a post-processor for the SELECT
statement. That's why it can't be included in subqueries.

> Given the query:
>
> SELECT COALESCE(1, 0) FROM T WHERE 1=0
>
> there is nothing that we can apply COALESCE on. We have to create
> something from nothing in order to return 0.

Absolutely so. But YMMV. You know what they say,

"In theory there's no difference between theory and practice.
In practice, there is."

--jkl

James K. Lowden

unread,
Feb 10, 2013, 7:38:54 PM2/10/13
to
On Sat, 9 Feb 2013 16:34:51 -0800 (PST)
derek.a...@gmail.com wrote:

> ? NULL equals NULL

Well, for operator AND, Kleene and Lukaeiwicz say

T T T
T F F
T U U
F T F
F F T
F U U
U T U
U F U
U U F <== you're saying U U T

Could you please tell me which 3VL supports your contention that NULL =
NULL?

> No. SELECT does not require a FROM. The mandate of the RM is higher
> than the vendor-mandate of the SQL Committee.

Lennart said, reasonably, that SELECT is the SQL version of RM
RESTRICT. If there's no FROM, there's logically nothing to restrict,
ergo "select 1" should return no row (not 1 and not NULL).

You could argue that

select 1

is short for an implicit row constructor, e.g,

select 1 from ( VALUES( 'foo' ) ) as T

but you didn't. I might, but then I'm not sure why SQL requires SELECT
when there's no project required. IMO

VALUES(1)

would serve as well or better.

> SQL is data sublanguage defined in the RM; it is the servant of the
> RM, not the other way around.

If only. AFAIK the word "relational" doesn't appear in the SQL
standard unless to explicitly deny fealty to it.

Lennart Jonsson

unread,
Feb 11, 2013, 2:15:02 AM2/11/13
to
On 02/11/2013 01:38 AM, James K. Lowden wrote:
> On Sat, 09 Feb 2013 19:36:01 +0100
> Lennart Jonsson <erik.lenna...@gmail.com> wrote:
>
>> My understanding is that a query is - logically - evaluated in the
>> following order:
>>
>> 1. FROM (JOIN)
>> 2. WHERE (ON)
>> 3. GROUP BY -- irrelevant here
>> 4. HAVING -- irrelevant here
>> 5. SELECT (Projection)
>> 6. ORDER BY
>
> Except #6. ORDER BY, strictly speaking, returns a cursor, not a
> virtual table. Logically, it's a post-processor for the SELECT
> statement.

Agreed


> That's why it can't be included in subqueries.
>

Wasn't this introduced in SQL-2011 (perhaps even SQL-2008)? Looking at
the draft 6IWD6-02-Foundation-2011-01.pdf (can be found at
http://www.wiscorp.com/SQLStandards.html link
http://www.wiscorp.com/sql20nn.zip)

7.13 <query expression>:
[...]
<query expression> ::=
[ <with clause> ] <query expression body>
[ <order by clause> ] [ <result offset clause> ] [ <fetch first clause> ]
[...]
<query expression body> ::=
<query term>
| <query expression body> UNION [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term>
| <query expression body> EXCEPT [ ALL | DISTINCT ]
[ <corresponding spec> ] <query term>

<query term> ::=
<query primary>
| <query term> INTERSECT [ ALL | DISTINCT ]
[ <corresponding spec> ] <query primary>
<query primary> ::=
<simple table>
| <left paren> <query expression body>
[ <order by clause> ] [ <result offset clause> ] [ <fetch first clause>
] <right
paren>

and the conformance rule (page 420):

12) Without Feature F851, �<order by clause>in subqueries�, in
conforming SQL language, a <query expression>
contained in another <query expression> shall not immediately contain an
<order by clause>.

it is my impression that you actually can order by in a sub clause. From
a logical point of view it doesn�t make much sense, but I suspect that
the construction relates to "fetch first ...".

FWIW, I find it rather difficult to to interpret the document, so it
might be a misunderstanding on my behalf.


Cheers
/Lennart

[...]



Derek Asirvadem

unread,
Feb 11, 2013, 3:05:46 AM2/11/13
to
On Monday, 11 February 2013 11:38:51 UTC+11, James K. Lowden wrote:
>
> Except #6. ORDER BY, strictly speaking, returns a cursor, not a
> virtual table.

Let me assure that my platform Sybase does NOT use a cursor; it does not use a virtual table or a derived table or an inline table either. It uses an internal worktable (two columns: the ORDER BY key and a pointer to the result set). Same with MS SQL until 2008. Same with DB2 (not LUW) until four years ago.

In 2013, I would say using a cursor for such an operation is a very primitive thing to do.

> Logically, it's a post-processor for the SELECT
> statement. That's why it can't be included in subqueries.

Ok, so scalar subqueries (single row, single column) are excluded in this discussion, otherwise your point would be moot.

Isn't ORDER only relevant to the *outermost* query ? When would ORDER BY be relevant in a subquery (and therefore the prohinbition of it be relevant) ?

> You know what they say,
> "In theory there's no difference between theory and practice.
> In practice, there is."

Yeah, sure.

The issue, I think, is difference between implementations (platforms), which is being discussed at length. How does the [cerebral or temporal] difference between theory and practice apply to that ?

Cheers
Derek

Derek Asirvadem

unread,
Feb 11, 2013, 5:51:46 AM2/11/13
to

On Monday, 11 February 2013 11:38:54 UTC+11, James K. Lowden wrote:
> On Sat, 9 Feb 2013 16:34:51 -0800 (PST) derek.a...@gmail.com wrote:
>
> Could you please tell me which 3VL supports your contention that NULL = NULL?

I stated:
"
Specific case. Sybase implements the insanity "Null Problem" as mandated, because it is a compliance requirement. But that is not the final word, for engineers who are human and who have not been separated from their brains. They provide us with a session-level option ANSI_NULL_BEHAVIOUR. The default setting is OFF. The behaviour when it is OFF is:
• NULL does not equal anything in the range of values for the subject Domain
• NULL equals NULL
• NULL does not equal NOT NULL
• the empty set returns NULL
thereby implementing sanity and predictability and allowing simplified Null handling in app code (if other standards are used, but let's not get distracted by tangents)

Therefore we work in an environment that is sane, and coding is simplified. Most developers do not know that there is a "Null Problem". If they are one of the few that do know, they know it is a problem in the SQL Spec, it is not a problem in their platform, and they do not argue the theoretical existence of the "Null problem". The criminally insane, who demand that they should be able to write insane code, and produce unpredictable results, are free to set the option to ON, and enjoy coding insanity, they like bathing in excreta, as insane people do. Generally our hiring practices ensure that we do not hire such people.
"

Where, in that, does the implicit or explicit statement lie, that 3VL or "a" 3VL, contends that NULL = NULL ?

I am not sure, but it is possible that you skimmed my post, and read or inferred something that is not there. Let me save another back-and-forth, and dismiss waiting for your answer, if I may.

If I implied in anyway that Sybase ANSI_NULL_BEHAVIOUR = OFF is an implementation of 3VL, I did not mean to do that, my intention was to convey the opposite: relating how logical practical implementers need a logical predictable non-theoretical practical platform. To be clear:
• Sybase ANSI_NULL_BEHAVIOUR = OFF (where NULL = NULL) is 2VL.
• The Null handling required by the SQL Spec is 3VL (great for theory; insane in practice or implementation, kind of guaranteed sabotage by the communist or committee members)
•  Sybase ANSI_NULL_BEHAVIOUR = ON (where NULL != NULL) is 3VL, the SQL Spec implemented.


> ergo "select 1" should return no row (not 1 and not NULL).

I disagree. A result set is demanded. Here it is one row, one column.

> IMO
>
> VALUES(1)
>
> would serve as well or better.

Ok.

> > SQL is [the] data sublanguage defined in the RM; it is the servant of the
> > RM, not the other way around.
>
> If only.

Hah! I love it!

You young people think that history began when you were born, or when the project that you are working on started, on that the Iraq war started on 11 Sep 2001. That is not intended as an insult, it is intended to make you think, and perhaps to do a bit of reading history (not the propaganda written by the victors; not the propaganda written about the RM by the detractors and subversives) before forming conclusions, and certainly before making statements on a public newsgroup where people may be a bit knowledgeable about the subject.

The short answer is, the RM was published in 1970, SQL as a standard was published 1989. It is not possible to contemplate the reverse.

The slightly longer answer or quick Synopsis:
• The RM was entirely invented by Dr E F Codd, published in Jun 1970.
(many want to take some credit, and ride along on his coattails; and after he passed away, many suggest there are "faults" and "incompletions" in it. Disgusting creeps, I say.))
____In it, he defined a Data Sublanguage, which was ground-breaking and industry-changing. Up to then, we have excellent pre-Relational DBMSs, but they were all what is now known as proprietary, and the calls to the db were embedded in the 3G language constructs; they did not have the formal separation of process-language vs data-sublanguage that COdd devised.

• Codd worked at IBM, and they had a huge investment in IMS, etc, and not changing, so things did not progress as fast as they should have

• Circa 1975 to 1978. IBM first implemented the RM as System/R, and Codd worked closely with that team. It was successful, but not product-ready, a first good cut. That was the child of the RM.
____Various vendors implemented parts of the RM, myself included, at Cincom/TOTAL. We were competing for the same market.

• Circa 1978 to 1982. IBM eventually realised the value of the RM and commissioned their Structured Query Language; Codd worked less closely with them, and produced most of his related papers during this time. That was the grandchild of the RM.
____Various vendors rushed to provide competing products. Codd wrote his Twelve Rules, to stop Vendors from cheating and misrepresenting their offering as "relational" or a "data sublanguage".
____Digital Equipment Corp produced their Rdb, which was a beautiful rendition of the RM/data sublanguage, on their VAX/VMS systems; but it was not as complete as SQL. There were many others, that were not memorable.
____Britton-Lee produced their Database Machine, based completely on the RM, but with high performance OLTP and OLAP in mind, which was hugely successful, but in a niche market.
____Codd left IBM, and made the biggest mistake of his life: he formed a consultancy with criminal. That engaged him in such a way that nothing he did after that succeeded to any level of substance, and the criminal turned coattail riding into a hollywood movie, complete with sequels and sequins and syphilis.

• Circa 1982 to 1984. IBM/SQL was very successful. The market forces pushed IBM to provide SQL to the public, so that it could become a standard, and so that other vendors did not have to lag behind it, or commit commercial espionage to provide it. They did so on the basis that the RM was public domain, and SQL derived directly from it. IBM finally did so, after more than ten years of pushing, circa 1986. IEC/ISO and later ANSI published it as a standard in 1989.
____ Vendors started their SQLs before the standard was published.
____ 1984. The founders of Britton-Lee founded Sybase, and produced a genuine HP RM; the first one with a Client/Server architecture; the beginning of the mid-range database assault on the mainframe database market.

• IBM changed the name of their offering to DB2, to differentiate it.

(I may have missed a couple of milestones. The dates may have mistakes, in any case they are irrelevant in todays context. Do not trust wiki.)

> If only.

Sheesh.

Therefore the SQL Standard as of first publication is the grandchild of the RM. And now, it is one that has been adopted out to 20 foster homes per year, where the carers are convicted paedophiles.

> AFAIK the word "relational" doesn't appear in the SQL
> standard unless to explicitly deny fealty to it.

So what ? It is a standard. On what basis should it be expected to identify it origins (any of the above milestones), prior to its inception as a standard ? Standards generally do not include history; the books or articles do that (or the memories of old soldiers like me), and the books are sadly wanting.

Now the RM is 43 years old. The understanding of the RM amongst some vendors has progressed greatly. The RM has:

a. been "completed".
I have double quotes around that because neither I nor the high-end vendors view it as incomplete; but some non-technical types and flower arrangers did make that charge in books that were slightly related to the RM; that "incompletion" has been closed.

b. Extended
Since we have had the RM, there are some natural progressions that took place in the minds of genuine RM implementers; that were pressed on their vendors; which the vendors eventually implemented in their platforms; and which many years later (decades sometimes); the communists added to the SQL Standard. Such Extensions are natural, and cannot be credited to the vendors (who don't try to) or the commies (who do try to). They are items which only exist within the context of the RM, and are meaningless without. A few vendors are ten years ahead of the rest. The non-SQLs are 20 years behind the rest.

Neither of those points constitute a denial of the statement that SQL is the servant, the grandchild, of the RM.

With regard to the RM (the grandfather), the SQL Spec is on the cusp of puberty (not toddler; not teenager; and certainly nowhere near adult). It provides only a fraction of the RM, and it provides that fraction in a primitive form. Setting aside the missing fraction, after 27 years, that specified fraction should have progressed, so that SQL is not the cumbersome, plodding, data sublanguage that it is today. If not for the commissariat, it would have.

If you have only read the books that purport to be about the RM, which have little RM items (crime of omission), and are full of their pet non-RM items (separate crime of commission & fraud), then sure, you and many thousands have been mislead. Those cretins misrepresent the RM, and suggest that it has "faults", and "missing bits" (third crime). The only thing that has faults and missing bits is their souls, and that condition does not upset them, because they made a Faustian compact.

I strongly suggest you throw out any book that purports to be about the RM, that is not written by Dr E F Codd, and read only the RM and related papers. For one year. To clear your mind of all the non-RM rubbish that has been fed to you, and cited by various other non-RM authors. The RM is freely available (Codd's other papers are recommended, but they require a small fee).

Of course, if you have only read the books by these odious false prophets, who reject the messiah, sure, you can be excused for thinking as you did.

There is one particularly odious idiot who proposes an anachronism: remove the data sublanguage and replace it with an, um, full language. Yeah, right. Twenty years have passed, and not one spec has been produced (a florist's description of flower arrangement, yes, a florist's description of a toy language ala spacecraft to the next galaxy, yes, but a technical spec, no).

Cheers
Derek

James K. Lowden

unread,
Feb 11, 2013, 10:19:34 AM2/11/13
to
On Mon, 11 Feb 2013 02:51:46 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

>
> On Monday, 11 February 2013 11:38:54 UTC+11, James K. Lowden wrote:
> > On Sat, 9 Feb 2013 16:34:51 -0800 (PST) derek.a...@gmail.com wrote:
> >
> > Could you please tell me which 3VL supports your contention that
> > NULL = NULL?

> Where, in [quote omitted], does the implicit or explicit statement
> lie, that 3VL or "a" 3VL, contends that NULL = NULL ?
...
> To be clear:
> ? Sybase ANSI_NULL_BEHAVIOUR = OFF (where NULL = NULL)
> is 2VL.

Um, no. A logic with 3 values can't be a 2VL.

This is c.d.t.; we're supposed to be able to refer here to the theory
of databases. I showed you a truth table and asked which 3VL you were
referring to. If you're going to contend that 2 values are 3, I'm
going to stand well clear.

> > > SQL is [the] data sublanguage defined in the RM; it is the
> > > servant of the RM, not the other way around.
> >
> > If only.
>
> Hah! I love it!
>
> You young people think that history began when you were born,

I was born ... when?

> > AFAIK the word "relational" doesn't appear in the SQL
> > standard unless to explicitly deny fealty to it.
>
> So what ? It is a standard.

Yes, it's a standard defined explicitly in terms of itself, not in
terms of RM. Your claim that, "SQL is [the] data sublanguage defined
in the RM" is thus denied.

> Standards generally do not include history; the books or articles do
> that (or the memories of old soldiers like me), and the books are
> sadly wanting.

Standards generally do not rest on much of any theory, let alone
something as robust as RM. It would have been nice if a standard query
language had evolved that implemented relational operations per se.
Instead we're living with a 1970s-era 4GL, a pushme-pullyou brute. IBM
imagined it would be used by end-users, who at long last would write
their own !@#$% reports. They didn't, and now programmers, lacking any
proper mathematical expression, are required to squoosh their queries
through the opaque mess that is SQL.

> I strongly suggest you throw out any book that purports to be about
> the RM, that is not written by Dr E F Codd, and read only the RM and
> related papers.

It's good advice. A little late, and more than a little presumptuous,
but good advice, no doubt. I would only add that Codd is easier to
understand and appreciate with some prior knowledge of first order
predicate logic.

> There is one particularly odious idiot who proposes an anachronism:
> remove the data sublanguage and replace it with an, um, full
> language.

I guess you are referring to Tutorial D? Why the circumlocution?

--jkl

hugoko...@gmail.com

unread,
Feb 11, 2013, 11:17:12 AM2/11/13
to
Op maandag 11 februari 2013 09:05:46 UTC+1 schreef Derek Asirvadem het volgende:
> On Monday, 11 February 2013 11:38:51 UTC+11, James K. Lowden wrote:
>
> >
>
> > Except #6. ORDER BY, strictly speaking, returns a cursor, not a
>
> > virtual table.
>
>
>
> Let me assure that my platform Sybase does NOT use a cursor;

James said "returns", not "use".
A set is unordered, by definition. Hence, after applying ORDER BY, what is returned can not be a set. The ANSI standard calls it a cursor instead, which is a confusing term because a cursor is also the name of an SQL construction that appears to be mostly designed to give consultants an easy way to come in and play hero.
Unfortunately, this overloaded term is what we are not stuck with.

How various vendors implement this is irrelevant in this group. (But I can assure you that you have been ill informed about the implementation details of MS SQL Server; I'll gladly provide more details if you want, but I'm not sure if that would be on-topic here).


> When would ORDER BY be relevant in a subquery (and therefore the prohinbition of it be relevant) ?

When combined with OFFSET and FETCH, or whatever other vendor-specific implementation of that functionality. Without it, it would indeed make no sense whatsoever.


Cheers,
Hugo

hugoko...@gmail.com

unread,
Feb 11, 2013, 11:21:52 AM2/11/13
to
Op maandag 11 februari 2013 17:17:12 UTC+1 schreef hugoko...@gmail.com het volgende:
> Unfortunately, this overloaded term is what we are not stuck with.

Typo. This should have read:

"Unfortunately, this overloaded term is what we are now stuck with."

Derek Asirvadem

unread,
Feb 11, 2013, 7:48:54 PM2/11/13
to

On Tuesday, 12 February 2013 02:19:34 UTC+11, James K. Lowden wrote:
>
> On Mon, 11 Feb 2013 02:51:46 -0800 (PST) Derek Asirvadem wrote:
>
> > On Monday, 11 February 2013 11:38:54 UTC+11, James K. Lowden wrote:
>
> > > Could you please tell me which 3VL supports your contention that
> > > NULL = NULL?
>
>
> > Where, in [quote omitted], does the implicit or explicit statement
> > lie, that 3VL or "a" 3VL, contends that NULL = NULL ?
>
> > To be clear:
>
> > ? Sybase ANSI_NULL_BEHAVIOUR = OFF (where NULL = NULL)
> > is 2VL.
>
> Um, no. A logic with 3 values can't be a 2VL.

With the capability that you have demonstrated, and the detail I have already provided, I am sure that you can construct a truth table for Sybase non-ANSI Null Handling.

Or you could look up the manual, there might be one there.
http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.commands/html/commands/commands85.htm

I think they call it reduction, but I am not sure.

There are many blog posts about it, but you know what bloggers are like: they spin their wheels about minor irrelevant matters, or post at length about things they only half-understand (20 pages instead of my single statement):
http://sybasease.blogspot.com.au/2005/05/nulls-in-sybase-ase.html

As I stated previously, the OFF setting allows developers to NOT know 3VL, to NOT care about 3VL, to NOT have to make 3VL considerations in every code segment. Coding SQL is a fraction of the developers job, we do not want them be contaminated by the neurosis of the academics who think there is a problem where there is none; we want them to focus on the parts that they can control and to forget about the parts they cannot control. We want good code from them, not theoretical arguments.

> This is c.d.t.; we're supposed to be able to refer here to the theory
> of databases. I showed you a truth table and asked which 3VL you were
> referring to.

Which is either insulting, or plain stupid, because there is only one 3VL. The consequences (my reply) are the consequences of your communication.

> If you're going to contend that 2 values are 3, I'm
> going to stand well clear.

I didn't contend that, but you keep telling me that I did.

If you are so fixated in the belief that the 3VL requirement must be implemented in a platform, and you cannot imagine an alternative, not even when a specific one is detailed to you, I can't help you.

If you are trying to understand my statement, or need expansion, that is one thing; following that path will provide understanding of what you do not understand yet. Trying to prove the statement false by picking at fragments taken out of context is a different thing; following that path will provide only a small comfort in your denial of reality, until the next time something exposes that which you deny..

> > > > SQL is [the] data sublanguage defined in the RM; it is the
> > > > servant of the RM, not the other way around.
> > >
> > > If only.
>
> > Hah! I love it!
>
> > You young people think that history began when you were born,
>
> I was born ... when?

In the year of the Snake. Or maybe the Rat. Whatever it was, it was one in which the level of education delivered caused an ensuing level of ignorance of history, that in turn caused you to make the comment "If only". That year.

> > > AFAIK the word "relational" doesn't appear in the SQL
> > > standard unless to explicitly deny fealty to it.
> >
> > So what ? It is a standard.
>
> Yes, it's a standard defined explicitly in terms of itself, not in
> terms of RM. Your claim that, "SQL is [the] data sublanguage defined
> in the RM" is thus denied.

Ok, by you. You must have used 3VL logic to conclude that one. Or perhaps 4VL.

May be if you take my quoted statement in the context of the statements from which you ripped it out and considered it in isolation, and instead, considered it in context, you may not need a truth table to evaluate the truth of said fragmented statement. But you might need a truth table nevertheless.

May be you would understand me better if I stated an even shorter histotical synopsis (I can't ask you to avoid fragmented thinking, but if you quote me, please do not break this sentence up):

••• The RM defined a Data Sublanguage, nothing else did; the first SQL was the rendition of the Data Sublanguage defined in the RM; the SQL Standard is the progressed definition of a portion of the Data Sublanguage defined in the RM; any SQL implementation is the progressed rendition of the Data Sublanguage defined in the RM.

No doubt you will argue, and I will have to patch that up in response, but let's use that to get started, and please do not break it up into little fragments and argue the fragments; please argue the whole.

Or, if you like working the logic backwards:

••• Denying that SQL (an implementation or the standard) is a rendition of the Data Sublanguage defined in the RM is a denial of historical fact. A person making such denials will be in denial of their own family, lineage, and heritage; they are both symptoms of the one mental impairment that is installed by modern education.

I prayed for guidance in dealing with your problem, and I was lead to:
http://www.kingjamesbibleonline.org/book.php?book=Matthew&chapter=23&verse=24

> > Standards generally do not include history; the books or articles do
> > that (or the memories of old soldiers like me), and the books are
> > sadly wanting.
>
> Standards generally do not rest on much of any theory, let alone
> something as robust as RM. It would have been nice if a standard query
> language had evolved that implemented relational operations per se.
> Instead we're living with a 1970s-era 4GL, a pushme-pullyou brute.

I agree with you on those two points.

3GL not 4GL. It remains a data sublanguage; nowhere near a 4GL or a language on its own, or one that can be used to produce an application. It has to be compiled, and included with either another 3GL or a 4GL, in order to create anything that executes.

> IBM imagined it would be used by end-users, who at long last would write
> their own !@#$% reports.

That is a half-truth, and half-truths are lies.

For the purpose of reporting, and if you use the theoretical definition of "user", yes, agreed. And make no mistake, with the $0 to $500 simple reporting tools of today, millions of users do exactly that (as determined in the RM: instead of "terminal" and "menu", read "PC" and "simple report program"). For five figures, you can enjoy very powerful tools, which allow millions of users to overcome the limitatiions posed by morassses of data provided by todays developers, again with simple clicks and drags. Those are the 4GLs, if you will. Codd's vision has been realised, from at least 20 years ago.

For the purpose of applying changes to the database, no. It had nothing to do with IBMs imagination. Codd defined that there shall be one (and only one) data sublanguage, for both purposes. The Twelve Rules expands the RM in this regard, and details this.

The problem is, the pathetic SQL Comminterns have not progressed to providing a "proper mathematical expression" or relational calculus or relational algebra as a front end. Given the richness of the mid- and lower level of the SQL implementations, and the investment, both of vendors and of their customer bases, which we would not want to re-invent, the obvious thing to do is to write a relational calculus front end that simply calls SQL underneath, but no one has done that.

> They didn't, and now programmers, lacking any
> proper mathematical expression, are required to squoosh their queries
> through the opaque mess that is SQL.

It is not that bad. First, totally granted that SQL is cumbersome, and problematic. But
• the better vendors have removed the opaqueness (those who are fixated on it will still demand to have it and enjoy, instead of producing code). It is not broken as some academics love to rant about (The standard is, but some implementations are not [the blind implementations may well be as broken as the standard] ).
• I think it is very limited, but for reasons that those academics have not identified.
• I don't know about you or your site. But we have tools these days, most of us do not code SQL directly; we point and click, or drag, or type a few letters and the tool fills in the rest. So the largesse demanded is somewhat tamed.

In days of old, programmers had a small understanding of "proper mathematical expression". Programmers of the last 15 years have none at all, some of them have not even heard of it or the RM. If we had one (that allowed "proper mathematical expression", they would not be able to use it. The academics who espouse a data sublanguage or full language that allows "proper mathematical expression"are grossly dishonest: they trumpet the need for a prescription that can never be used by the patients. I am hoping you are not one of them.

The summary point is
• SQL can definitely be improved (but that effort is severely limited by the very same academics on the committee)
• there are many capabilities that it should have (within the scope of what it does have), that are simply missing, that were identified 20 years ago, and are still missing
• SQL is not broken as per the charges of some academics (summary point, note the qualifications to that statement, above)
• In the absence of any of the large array of SQL development tools, that have been available for 20 years, one will find it cumbersome, yes. Like programming in a 2GL in the days of 3GLs, and complaining about the problems of 2GLs. The problem is not the 2GL; the problem is the person who cannot progress to a 3GL.

I am not defending SQL.

I am saying, there is nothing, zilch, zero on the horizon, in terms of a replacement, and the academics who imagine that (on the far side of the horizon) have nothing that will reach our horizon in this century. So stop waiting for a replacement; stop thinking about how broken it is; just get your head around it (your particular implementation) and get really good at it. It is a mundane but important part, but certainly not the whole, of any development today. Just deal with it instead of wailing about its rather obvious limitations.

As for the theoreticians (non-implementers) they are free to argue about the imagined problems that are not real to implementers. Eg. the "SQL Problem", the "Null Problem", etc, has been discussed ad infinitum for 25 years (that I personally know of), and nothing has been resolved, nothing has changed in the physical universe. The "SQL Problem", the "Null Problem", etc, are non-problems to an implementer ... unless they get confused by the writings of the subversives.

> > I strongly suggest you throw out any book that purports to be about
> > the RM, that is not written by Dr E F Codd, and read only the RM and
> > related papers.
>
> It's good advice. A little late, and more than a little presumptuous,
> but good advice, no doubt.

Thank you. I had no control of the timing, so I cannot be held responsible for the lateness.

> I would only add that Codd is easier to
> understand and appreciate with some prior knowledge of first order
> predicate logic.

Indeed.

> > There is one particularly odious idiot who proposes an anachronism:
> > remove the data sublanguage and replace it with an, um, full
> > language.
>
> I guess you are referring to Tutorial D?

Yes, Andrew Warden or Hugh Darwen or Tweedle Dee or whatever he calls himself these days. He has not written a single line of code, but he has tricked a score of talented and eager young people into writing the impossible. It it were not so sad and disgusting, it would be hilarious, yes.

Apparently he was on the SQL committee for a few years. One can understand and appreciate the impotence of that committee, the contradictory demands in the standard, given that they have committed saboteurs and subversives such as he on it.

> Why the circumlocution?

Because:
• it is difficult to write the name without throwing up my breakfast
• despite his assault on my senses during the three years that I spent in dutiful discussion on his website, due to God's Grace, I still have a tiny bit of politeness remaining
• I failed to submit my writings to your for your prior approval
• I was born in the year of the Rooster
• I went to Montessori, Sunday School, Boy Scouts, and the best English public (means private) school in the country, for which I am eternally grateful to God and my parents
• if I were to attack the pestilence directly (non-circumlocution as implied), then I should write a paper and publish it (which is begging, actually)
• following which I should open a new thread on c.d.t
• and this is not that thread, this is another thread, in which the false prophet who denies the messiah is a minor reference

Cheers
Derek

Derek Asirvadem

unread,
Feb 11, 2013, 7:54:50 PM2/11/13
to
On Tuesday, 12 February 2013 02:19:34 UTC+11, James K. Lowden wrote:
>
> ... and more than a little presumptuous,

I do not believe I am being presumptuous at all.

Please advise.

Cheers
Derek

James K. Lowden

unread,
Feb 12, 2013, 2:11:14 AM2/12/13
to
On Mon, 11 Feb 2013 16:48:54 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > > > AFAIK the word "relational" doesn't appear in the SQL
> > > > standard unless to explicitly deny fealty to it.
> > >
> > > So what ? It is a standard.
> >
> > Yes, it's a standard defined explicitly in terms of itself, not in
> > terms of RM. Your claim that, "SQL is [the] data sublanguage
> > defined in the RM" is thus denied.
>
> ??? The RM defined a Data Sublanguage, nothing else did; the first
> SQL was the rendition of the Data Sublanguage defined in the RM; the
> SQL Standard is the progressed definition of a portion of the Data
> Sublanguage defined in the RM; any SQL implementation is the
> progressed rendition of the Data Sublanguage defined in the RM.

Words fail me, clearly. To me, the fact that the SQL standard defines
the language without reference to the relational model ipso facto means
it's not "defined in the RM". Your argument, to the contrary, seems to
be:

1. The relational model defined a data sublanguage
2. SQL was an implementation ("rendition") of that sublanguage
3. Ergo, SQL is defined by the RM

Exactly how does #3 follow from #2?

Put another way: How poor does an implementation
have to be -- and how much must it disavow RM -- before it can no
longer be claimed to be defined by it?

At the risk of angering the copyright gods, let me quote from "The
RELATIONAL MODEL for DATABASE MANAGEMENT: VERSION 2", 1990, by EF Codd,
pages 12-13.

"SOL was invented in late 1972. Although it was claimed that
the language was based on several of my early papers on the relational
model, it is quite weak in its fidelity to the model. Past and current
versions of this relational language are in many ways inconsistent with
both of the abstract machines RM/V1 and RM/V2. Numerous features are
not supported at all, and others are supported incorrectly. Each of
these inconsistencies can be shown to reduce the usefulness and
practicality of SOL.

"The most noteworthy error in several current implementations
is that SQL permits the use and generation of improper or corrupted
relations, that is, "relations" that contain duplicate tuples or rows.
In Chapter 23 this problem is examined in sufficient detail to
demonstrate the seriously adverse consequences of this very simple
infidelity to the model. As each feature of RM/V2 is introduced in this
book, the attempt is made to comment on SQL's support, non-support, or
violation of the feature.

"Several relational languages other than SQL have been
developed. An example that I consider superior to SQL is Query Language
(abbreviated QUEL). This language was invented by Held and Stonebraker
at the University of California, Berkeley, and was based on the
language ALPHA [Codd 1972]. More attention is devoted to SQL, however,
because it has been adopted as an ANSI standard, and is supported in
numerous DBMS products."

Chapter 23 is entitled "Serious Flaws in SQL".

If you want to say SQL is defined by the relational model because IBM
claimed as much back in the day, OK. I for one don't find that
argument persuasive.

--jkl

James K. Lowden

unread,
Feb 12, 2013, 2:57:15 AM2/12/13
to
On Mon, 11 Feb 2013 16:48:54 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > > ? Sybase ANSI_NULL_BEHAVIOUR = OFF (where NULL = NULL)
> > > is 2VL.
> >
> > Um, no. A logic with 3 values can't be a 2VL.
>
...
> As I stated previously, the OFF setting allows developers to NOT know
> 3VL, to NOT care about 3VL, to NOT have to make 3VL considerations in
> every code segment.

The manual says of the SET statement that with ANSI null behavior
turned off, comparisions to NULL yield FALSE, and NULL = NULL is TRUE.

> As I stated previously, the OFF setting allows developers to NOT know
> 3VL, to NOT care about 3VL, to NOT have to make 3VL considerations in
> every code segment.

Not so fast. What about arithmetic? Would this be accurate?

V1 V2 = < ¬< > <&&>
2 NULL F F F F F

I'm not a Sybase expert, but you are. If you'd be kind enough to fill
in that table correctly, we could discuss whether Sybase in fact
defines a 2VL and, if so, its merits.

--jkl

Derek Asirvadem

unread,
Feb 12, 2013, 7:25:09 AM2/12/13
to
On Tuesday, 12 February 2013 18:57:15 UTC+11, James K. Lowden wrote:
> On Mon, 11 Feb 2013 16:48:54 -0800 (PST) Derek Asirvadem wrote:
>
> > > > ? Sybase ANSI_NULL_BEHAVIOUR = OFF (where NULL = NULL)
> > > > is 2VL.
> > >
> > > Um, no. A logic with 3 values can't be a 2VL.
> >
> > As I stated previously, the OFF setting allows developers to NOT know
> > 3VL, to NOT care about 3VL, to NOT have to make 3VL considerations in
> > every code segment.
>
> The manual says of the SET statement that with ANSI null behavior
> turned off, comparisions to NULL yield FALSE, and NULL = NULL is TRUE.

I thought I said that, rather clearly, and that you had great difficulty *accepting* that ... but it is good that you found that the manual confirmed that. ( I know most commands by heart, I did not read the manual page, I just gave you the link.)

> > As I stated previously, the OFF setting allows developers to NOT know
> > 3VL, to NOT care about 3VL, to NOT have to make 3VL considerations in
> > every code segment.
>
> Not so fast.

Personally, I thought this was progressing really slowly.

> What about arithmetic? Would this be accurate?
>
> V1 V2 = < ¬< > <&&>
> 2 NULL F F F F F
>
> I'm not a Sybase expert, but you are.

Anyone who told you that is exaggerating greatly.

> If you'd be kind enough to fill
> in that table correctly, we could discuss whether Sybase in fact
> defines a 2VL and, if so, its merits.

Oh, come on. Now you are really getting up my nose. I am getting tired of spending time, writing answers to your questions, giving you info, only to find that instead of closing the issues raised like a gentleman, you raise new issues from the margins of the new info. Please lift your game.

• Read the manual
• build the truth table
• do not forget strings and bits and the empty set
• The product is honest, there are no tricks. I do not know what sort of dumpy SQL you are used to, but I do know that some of the SQLs are very inconsistent, so I suppose you can be forgiven for the "not so fast" and for needing to verify every condition.
• The product is predictable. whatever truth table you build, given the precision you have demonstrated (in technical matters, but unfortunately not in arguments or honour) thus far, will be correct. Whatever code you write, to prove those entries, will produce the predicted results.

Hint: "comparisons to NULL yield false". Whatever derives from that statement, derives precisely, predictably, and consistently.

Put another way, with ANSI_NULL = OFF, Null holds a single fixed position, in the entire Domain of values. It has a slightly special status, like a spoiled child, in that if you compare it with anything, it asserts its contrived self, and insists on being first or last or greater or smaller, the Chosen One. Rather than being the Unknown Value, which cannot be reasonably coded for, Null is the Non-Value, which is simple to code for. We have tamed the Cursed One into the Chosen One. No doubt 3VL lives in the hearts and minds of obstinate academics, 30 years after it was solved, but it simply does not exist for us.

You can tell me what to call it: 2VL; non-3VL; NonVL; VeryVL; TrivialVL; TriVL, and I will be sure to include it in my next formal communication to Sybase Engineering. No guarantees about changing the manual, because (a) no one in the Sybase community knows about the non-problem, or cares about the name we have not given it, and (b) 3VL (in any form) is not mentioned in the manuals.

Sybase has no merits. Not for your crowd anyway. You will get bored with it in one week flat. You need PostgreSQL, so that you can fiddle and fuss about with abstract things, behaviour that changes with every minor version, a product that cannot be used to produce code (except for single-user systems), and one that provides endless material for abstract discussion. Ingres, diarrhoea and abysmal failure that it was, did not die, the academics loved it so much that they gave it a new front end, silicon boobs and falsies. You will love it. Sybase has only one function that you will like: the RAND() function.

----------

Tweedle Dee asked me the exact same question (first one re Null code) in an exchange, three full years after I informed him that we did not have the SQL Null Problem. He could not believe that we (Sybase, thousands of implementations, millions of users), had worked around his beloved Null Problem that he insists everyone must have. Even after answering it at length; providing many manual pages; executing requested code, he asked me to re-run the code, with more values, he asked me the exact same question (second one re Null code, above).

http://www.softwaregems.com.au/Documents/The%20Third%20Manifesto/Miscellany/Hugh%20Darwen%2002%20Feb%2012.pdf

You are the same. You are simply not getting that intelligent vendors and implementers simply do not have the disease that the two of you love, cherish, and insist every human being must have. It is not that we have found penicillin, and therefore we have a cure for your syphilis, no, we never slept with prostitutes, we never contracted the disease. You are still checking if penicillin is effective against stage II syphilis, Hugh is at stage III, so communication is futile.

[If you are inspecting my signature with a microscope, "Standard-compliant" refers to *all* the standards that apply to my work, not the SQL Standard which is unfit to be deemed a standard, which is a mundane issue in my work, and which I have real standards to work around.]

Cheers
Derek

Derek Asirvadem

unread,
Feb 12, 2013, 9:47:53 AM2/12/13
to
On Tuesday, 12 February 2013 18:11:14 UTC+11, James K. Lowden wrote:
> On Mon, 11 Feb 2013 16:48:54 -0800 (PST) Derek Asirvadem wrote:
>
> > > > > AFAIK the word "relational" doesn't appear in the SQL
> > > > > standard unless to explicitly deny fealty to it.
> > > >
> > > > So what ? It is a standard.
> > >
> > > Yes, it's a standard defined explicitly in terms of itself, not in
> > > terms of RM. Your claim that, "SQL is [the] data sublanguage
> > > defined in the RM" is thus denied.
> >
> > ??? The RM defined a Data Sublanguage, nothing else did; the first
> > SQL was the rendition of the Data Sublanguage defined in the RM; the
> > SQL Standard is the progressed definition of a portion of the Data
> > Sublanguage defined in the RM; any SQL implementation is the
> > progressed rendition of the Data Sublanguage defined in the RM.
>
> Words fail me, clearly.

I accept that words fail you, but clearly ?

Look, you are an academic, an abstractionist. I am an engineer, non-abstract. We have a chasm between us, no the Grand Canyon. That words fail you or why words fail you, is clear as mud. You've been arguing for days, and I still have no idea what your intent is, what you are trying to prove. Every time I answer your questions, and provide info, you drop the previous argument (without the courtesy of conceding anything) and pick up some fragment from the margins of the new info, and argue some new issue.

Let's try to close at least one point, shall we, before embarking on the next new point. Since you are fragmenting things, and acting as per above para, nothing gets closed, the scope of the argument is wide open, and keeps getting wider. Even the filthy disgusting lawyers (scribes) are required to (a) declare the charge, and (b) declare "Agreed Facts", so that the case can be narrowed down to disputed facts, otherwise cases will take decades to close, not years.

So let me interject precisely, so that what is agreed or not is clear, so that the question of dispute can be corralled from the forest.

> To me, the fact that the SQL standard defines
> the language without reference to the relational model

Agreed so far.

> ipso facto means
> it's not "defined in the RM".

Either you are stupid, or you are not stupid and you are twisting my words, which is an offence. Are you trying to understand what I mean (read the words I have posted), or are you arguing some thus far unknown point by making the above statement ?

Rubbish. Are you unable to imagine a circumstance where the son writes himself an autobiography, that does not mention his father ??? Especially if the autobiography (being an independent standard) is required to be independent ??? You need an honest biography (note the "auto" has been removed), if you expect lineage to be identified.

Did use miss my Synopsis ??? My Short Synopsis ???

Are you so removed from reality, that you are unable to contemplate the ordinary and everyday occurrence that a father can define and imagine and create the child that has not yet been born ??? And somewhat after the child is born, the father can accurately state, this child is mine, I made him ??? That it does not matter one iota if the child acknowledges that ??? In an autobiography that is not required to mention his father ???

> Your argument,

Excuse me. I have no argument with you. I don't know you from a bar of soap, and I have nothing to prove to you. I was sitting here enjoying my serenity, after having answered a practical non-abstract question that Lennart asked; you came along, hijacked the thread, incised a single fragment, and mounted an argument with me. And you did so repeatedly, after I answered each of your fragmented questions. Evidently you are trying to prove something, which remains undeclared.


> to the contrary,

No, I am not arguing to the contrary. I said something else, which you keep ignoring, and you separately propose what I am arguing. Disgusting, really.

> seems to
> be:
>
> 1. The relational model defined a data sublanguage

Yes.

> 2. SQL was an implementation ("rendition") of that sublanguage

Yes.

> 3. Ergo, SQL is defined by the RM

I did not state that. You propose that I did. In spite of the fair amount of detail that I provided ergo-ing otherwise. Twice.

Anyone saying that [3] would be very stupid, yes. Anyone reading everything that I wrote and interpreting that body as [3] is very stupid. Anyone reading everything that I wrote and proposing that *I* stated that or argue that is is a disgusting liar.

Let me state it differently, one last time, and if you behave dishonestly, again, I will stop responding to you. Do not break the following up, and do not interpret and re-interpret my statements.

1. The relational model defined a data sublanguage
1.a. Any data sublanguage that came after the data sublanguage defined in the RM, whether it references the RM or not, and which fulfils basic elements of the RM [1], is [ipso facto] *an* implementation (rendition) of the RM.
2. IBM System/R was a fully credited implementation (rendition) of the data sublanguage defined in the RM [1], working closely with the author of [1]
3. IBM/SQL was an fully credited implementation (rendition) of the data sublanguage defined in the RM, [1] working with the author of [1], and derived directly from [2]
4. The SQL standard was an un-credited implementation (rendition) of the data sublanguage defined in the RM [1], and derived directly from [3], and the documentation for [3]
5. Any implementation that purports to be an implementation of the SQL Standard [4] is an implementation (rendition) of the data sublanguage defined in the RM [1], and derived directly from [4], which is derived directly from [3], which is derived directly from [2], which is derived directly from [1],
6. Therefore it cannot be disputed the any SQL implementation or the SQL Standard is defined *IN* the RM, in the manner that the RM defines it.
7. An idiot who cannot keep track of chronological events and lineage, or a disingenuous liar who isolates one fragment of truth and waves it around, whilst committing a fraud, will propose that since the SQL Definition does not lie in the RM, therefore SQL is not defined in the RM. Time to take that creature out the back, and put a bullet through its head, in order to avoid contaminating the community with that insanity.

> Exactly how does #3 follow from #2?

It doesn't. No one said it does. You proposed the insanity, and then you proposed that the insanity is an insanity. You are masturbating, thinking that it is sex, and complaining that is it horrible.

If you genuinely want to know how "something like [3] that you do not currently understand" follows from [2] which you appear to understand, read the above points [1] through [7], twice a day, for a month.. You won't have any questions.

> Put another way: How poor does an implementation
> have to be -- and how much must it disavow RM -- before it can no
> longer be claimed to be defined by it?

Hallelujah! Finally, some sense.

Good question. Leading somewhere else, so I don't see how it is the previous issue "put another way". I will have to think about it.

"Disavow" does not matter. The child can disavow the father or the grandfather until doomsday, that disavowal does not change the fact of his heritage.

The question begs: Implementation of what ?

If I take "how much must it disavow RM" to mean "how much must it not-ontain RM", then it is a good question. I will probably reverse the wording, to remove the double negatives, etc.

> At the risk of angering the copyright gods ...

God help me. I will answer the rest tomorrow, if I can bring myself to work through your fragmented thought processes and incorrectly attributed propositions.

Cheers
Derek

James K. Lowden

unread,
Feb 13, 2013, 8:36:28 PM2/13/13
to
On Tue, 12 Feb 2013 06:47:53 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > > > Yes, it's a standard defined explicitly in terms of itself, not
> > > > in terms of RM. Your claim that, "SQL is [the] data sublanguage
> > > > defined in the RM" is thus denied.
> > >
> > > ??? The RM defined a Data Sublanguage, nothing else did; the first
> > > SQL was the rendition of the Data Sublanguage defined in the RM;
> > > the SQL Standard is the progressed definition of a portion of the
> > > Data Sublanguage defined in the RM; any SQL implementation is the
> > > progressed rendition of the Data Sublanguage defined in the RM.
> >
> > Words fail me, clearly.
>
> I accept that words fail you, but clearly ?
>
> Look, you are an academic, an abstractionist. ... You've been
> arguing for days, and I still have no idea what your intent is, what
> you are trying to prove.

I'm going to stop trying soon, not because I think I'm right and you're
wrong, or any permutation thereof, but because of your certitude, and
because when I try to restate your argument in my own words, you accuse
me of stupidity or dishonesty. (Yes, "argument". Not as in "fight",
but as in a sequence of logical statements intended to prove a
conclusion. I'll adopt another word if you'll propose one.)

What I'm trying to prove is that the statement "SQL is [the] data
sublanguage defined in the RM" is false. What I've learned is that the
truth or falsity of that statement depends on the meaning of the word
"defined".

My argument rests on the fact that the relational model doesn't define
SQL, and SQL's definition -- i.e., the SQL Standard -- doesn't refer to
the relational model.

That's what I mean by "defined". :-)

For you, none of that matters because (if I may paraphrase) SQL depends
on RM for intellectual heritage. As you said,

> 1. The relational model defined a data sublanguage
> 1.a. Any data sublanguage that came after the data sublanguage
> defined in the RM, whether it references the RM or not, and which
> fulfils basic elements of the RM [1], is [ipso facto] *an*
> implementation (rendition) of the RM.
> 2. IBM System/R was a fully credited implementation (rendition) of
> the data sublanguage defined in the RM [1], working closely with the
> author of [1]
> 3. IBM/SQL was an fully credited implementation (rendition) of the
> data sublanguage defined in the RM, [1] working with the author of
> [1], and derived directly from [2]
> 4. The SQL standard was an un-credited implementation (rendition) of
> the data sublanguage defined in the RM [1], and derived directly from
> [3], and the documentation for [3]

I think it matters "whether it references the RM or not" (1a). In
fact, I think that's all that matters. If I accept your line of
reasoning, I suppose I have to accept your conclusion. I don't, but I
see how you got there.

In short, I think we can agree:

There exists a "defined" such that A defined B.

Or not. Surely, one of the two.

--jkl

James K. Lowden

unread,
Feb 13, 2013, 10:52:17 PM2/13/13
to
On Tue, 12 Feb 2013 04:25:09 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> > What about arithmetic? Would this be accurate?
> >
> > V1 V2 = < ¬< > <&&>
> > 2 NULL F F F F F
> >
> > I'm not a Sybase expert, but you are.
>
> Anyone who told you that is exaggerating greatly.

You're too modest. What is an expert, if not someone who knows "most
commands by heart"?

> I am getting tired of spending time, writing answers to your
> questions, giving you info, only to find that instead of closing the
> issues raised like a gentleman, you raise new issues from the margins
> of the new info.
...
> Hint: "comparisons to NULL yield false". Whatever derives from that
> statement, derives precisely, predictably, and consistently.

I'm really not trying to dodge your questions. You have repeated
dodged mine, though. I ask a very simple question, please confirm/deny
5 truth values, and you don't. You spend hundreds of words dodging the
question, all while complaining about how much time you're spending
"giving me info".

> You can tell me what to call it: 2VL; non-3VL; NonVL; VeryVL;
> TrivialVL; TriVL, and I will be sure to include it in my next formal
> communication to Sybase Engineering.

Touché. I choose "non-3VL", because it's a 3VL, except not.

Look, you say everything's solved, it's totally simple, that "no one in
the Sybase community knows about the non-problem, or cares about the
name we have not given it". But someone is wrong on the Internet, and
this time it's you. En garde!

Consider a column "a" defined as "a float NULL" in the create table
statement. Then

INSERT into T (a) values (NULL)

Now the WHERE clause:

WHERE not a < 2
AND a != 2
AND not a = 2
AND not a > 2

returns the row. At least in ANSI land,

WHERE a is NULL

returns the row and none of

WHERE not a < 2
WHERE a != 2
WHERE not a = 2
WHERE not a > 2

does.

Quite honestly and sincerely, with no malice of intent or forethought,
I don't see what problem you think is solved, or how subverting simple
arithmetic doesn't create a worse problem.

Certainly

a < 2 or a = 2 or a > 2

is always true in Boolean logic.

If it's ever false in Sybase logic, then Sybase logic is not Boolean.
The *only* form of 2VL logic is Boolean, therefore Sybase logic is not
2VL. Neither is it a recognized 3VL insofar as I'm aware. (That is
where we started. BTW, there is no single 3VL. I mentioned Kleene and
Lukaeiwicz when I first joined this thread.)

Perhaps you willingly concede all that, but maintain the problem is
solved, for practical purposes. But here, "practical" means you can
put up with or internalize the inconsistencies, and work around them.
Not everyone can do that, so it's practical only for those who can.

I remember problems along these lines, a long time ago, when
System 10 was new. Especially nasty are tests like NOT IN, because

from A
where v not in (select a from T where a = NULL)

returns every row in A, and

from T as t
where not exists (
select 1 from T where a = t.a)

returns all rows in T where a is NULL.

Or something like that. As I recall. I don't have a system handy to
test with.

If my characterization above is correct, and you think that's "problem
solved", there's probably nothing I can say that will change your
mind, nor you mine.

But as to the question of whether forcing all tests with NULL to FALSE
reduces the logic to 2VL, the answer must unequivocally be No.

--jkl

Derek Asirvadem

unread,
Feb 14, 2013, 4:31:46 PM2/14/13
to
On Thursday, 14 February 2013 14:52:17 UTC+11, James K. Lowden wrote:
>
> > You can tell me what to call it
>
> I choose "non-3VL", because it's a 3VL, except not.

Marvellous. I accept that Sybases Null handling when ANSI_UNLL is OFF, is a 3VL except not. And therefore that it cannot be 2VL.

The example code you provide is a special form of straw man argument. It is not stupid. It is "clever" in the sense of the "cleverness" of neurotics such as Date and Darwen. It is dishonest, non-logic presented as "logic". If I answer, it will afford it some credibility, which is otherwise absent.

I do not agree that System 10 had that problem. Of course, neurotic or contrary programmers have that problem, all the time, regardless of the version. Generally, we do not hire them.

There is nothing else in your post that is worth responding to. I have already answered all your points in previous posts.

Cheers
Derek

Derek Asirvadem

unread,
Feb 14, 2013, 4:46:05 PM2/14/13
to
On Thursday, 14 February 2013 12:36:28 UTC+11, James K. Lowden wrote:
>
> ... because when I try to restate your argument in my own words, you accuse
> me of stupidity or dishonesty.

That is correct. There is a substantial volume of my words, in response to yours. Don't restate my words., just read them. There is no need to go "so you are saying that ..." or "by your logic ..." when the words already exist on the board. Just read the words.

> ... (if I may paraphrase) ...

Don't. You make a total mess of it, and since you are arguing, you make a mess that suits your argument. That is what is dishonest about it. Just read the words and try to understand them.

Cheers
Derek

James K. Lowden

unread,
Feb 14, 2013, 11:03:35 PM2/14/13
to
On Thu, 14 Feb 2013 13:31:46 -0800 (PST)
Derek Asirvadem <derek.a...@gmail.com> wrote:

> The example code you provide is a special form of straw man
> argument. It is not stupid. It is "clever" in the sense of the
> "cleverness" of neurotics such as Date and Darwen. It is dishonest,
> non-logic presented as "logic".

I do tire of your insults. You seem to enjoy decrying that day is
night, and harrumphing that anyone who disagrees is some combination of
neurotic, dishonest, and illogical. Preferably in the same sentence.

What reason have I deceive you? Why should I be dishonest, or present a
straw man argument? Just to prove a point on the usenet?

Not only do I lack motivation, but you lack facts. My queries aren't
dishonest. They're just simple -- not clever -- illustrations of the
*fact* that the logic you so admire resolves to TRUE when it is
tautologically FALSE.

The reason I remember those kinds of queries from over 10 years ago is
that they were problematic. The real ones weren't as simple as what I
posted for illustration; they were more complex. The NOTs and ANDs
were sometimes many lines apart.

SQL was intended, I'm sure you agree, to be a language for deriving
logical inferences. If SQL is "defined in the RM" as you so avidly
believe, it should at least be logical. But Sybase's treatment of NULL
before ANSI contravened the generally accepted meaning of the symbols
both outside the language *and* within it. You can't apply Boolean
logic. You can't apply De Morgan's laws. You just have to remember the
weirdness and live with its limitations.

I'm far from the first or only person to notice. I don't know why I
thought I could explain it to you, especially after Darwen couldn't.

"I can explain it to you. But I can't understand it for you."
-- Ed Koch

I don't know how to penetrate your shield of certitude. Every
question I've posed in *theoretical* terms -- that's the "t" in c.d.t.,
remember? -- you declined to answer. Many words you wrote, but
answer be there none.

So here's my first and last piece of advice to you: instead of vacant
assessments about the quality of everyone's posts, and claiming to know
the answer without showing it, how about you restrict yourself to saying
something -- hopefully correct -- about the theory we're all supposedly
here to discuss? Unless I miss my guess, that should save you a *lot*
of time. You might not have to type a single word.

--jkl

Derek Asirvadem

unread,
Feb 15, 2013, 3:34:40 AM2/15/13
to
On Friday, 15 February 2013 15:03:35 UTC+11, James K. Lowden wrote:

Too clever for words.

I love the bit where to TELL me that you are honest, you TELL me that you're not deceptive, despite the consistent evidence otherwise.

This may be c.d.t, theoretical and all that, but you fail to realise that my exchange with Lennart was not theory, it was practice, thousands of implementations. I was not discussing the theory of the Null Problem, or of 3VL, I was answering questions about an implementation which does not have the null problem or 3VL (it has, according to you, who needs to name it from that theoretical position, "3VL except not").

That was before you told me that I needed your permission and direction. Hilarious but sad.

Just letting you know, I can't be asked to respond to the rest, or to answer your way, I was in the midst of serving someone else. Go back to Lennart's post and keep your "challenge", which has taken this long to be exposed, to yourself.

Goodbye
Derek

paul c

unread,
Feb 16, 2013, 6:20:01 PM2/16/13
to
On 14/02/2013 8:03 PM, James K. Lowden wrote:
> SQL was intended, I'm sure you agree, to be a language for deriving
> logical inferences. If SQL is "defined in the RM" as you so avidly
> believe, it should at least be logical.

James, I like your posts and agree with pretty much everything you've
said lately, plus what I've noticed on your site. (I'm interested mostly
only in basics so there will be details you know and care about that I
don't.) I agree with your second sentence but have to quibble with the
first.

From Codd 1970: "For expository reasons, we shall frequently make use of
an array representation of relations, but it must be remembered
that this particular representation is not an essential part of the
relational view being expounded."

Then: (such an) "array which represents an n-ary relation R has the
following properties :
(1) Each row represents an n-tuple of R.
(2) The ordering of rows is immaterial.
(3) All rows are distinct.
(4) The ordering of columns is significant-it corresponds
to the ordering S1, Sz , . . . , S, of the domains
on which R is defined (see, however, remarks
below on domain-ordered and domain-unordered
relations ).
(5) The significance of each column is partially conveyed
by labeling it with the name of the corresponding
domain.

It's pretty clear to me that the SQL originators missed Codd's point
about this kind of array being merely for "expository reasons". So they
implemented something that is more physical than the the logical kind of
relations Codd had in mind. Perhaps they missed the point on purpose, I
don't know about that, but the result of what might have seemed a minor
detour at the time has generated nearly thirty years of confusion, SQL
practioners imagining that they are using the logic Codd had in mind and
imagining they are applying his model. This is exactly parallel to the
old tale of the Emperor's New Clothes, walking around naked and not
knowing it.

I saw your BSD youtube presentation which references your website.
Another comment I have is in spite of my comments above, is that I agree
that there's value in some of the existing SQL implementations. I think
the main value is that they might be used as a completely physical layer
by a more faithful logical implementation of Codd's idea. This might
need to excise some of their silly gewgaws.

Keep up the good work.

Rob

unread,
Feb 16, 2013, 9:30:04 PM2/16/13
to
On Saturday, February 16, 2013 4:20:01 PM UTC-7, paul c wrote:
> On 14/02/2013 8:03 PM, James K. Lowden wrote: > SQL was intended, I'm sure you agree, to be a language for deriving logical inferences.
[omitted text].

Something that never seems to get addressed is the pre-history of DB technology before Codd, before SQL. Within that universe, IMS and IMS-like databases were used much as today to track the state of some model, but were queried by a mechanism called RPG (for Report Program Generator). The RPG spec could be given within COBOL or in a standalone interface, but the idea was to spit out retrieved data in a report formed as the interpretive software made a single pass over the linearized physical (after RDBMS, logical) order of the records, or segments as they were called. Any sorts or aggregates were computed after-the-fact from an intermediate containing the retrieved data in the logical/physical order in which it was accumulated.

So when the System R folks at IBM were defining Sequel (which became SQL), the objective was not logical inference, it was report-writing. After all, the whole point of Codd's thrust was to replace IMS (et. al.) with a system that was immune to changes in the physical storage structure of the data. But that didn't obviate the major purpose of such databases, report-writing If you examine the most primitive SQL form of the SELECT statement, it is a non-procedural specification of a traversal of the logical product (in the FROM clause) to retrieve the attributes (in the SELECT clause) restricted by the predicate of the WHERE clause). And again, sorted (as specified in the ORDER clause), formatted and aggregated (SELECT clause again).

Database technology has come a long way since the 1970s when SQL was initially specified, and relational DBMSs do alot more than report-writing. But to suggest that in the 1970s, the authors were interested in "logical inference" requires great poetic license compared to "replacing the report-writer capabilities of IMS", and feels like history rewritten.

Rob

Eric

unread,
Feb 17, 2013, 5:00:15 PM2/17/13
to
On 2013-02-16, paul c <toledob...@oohay.ac> wrote:
...
> From Codd 1970: "For expository reasons, we shall frequently make use of
> an array representation of relations, but it must be remembered
> that this particular representation is not an essential part of the
> relational view being expounded."

<snip properties of these arrays>

> It's pretty clear to me that the SQL originators missed Codd's point
> about this kind of array being merely for "expository reasons". So they
> implemented something that is more physical than the the logical kind of
> relations Codd had in mind. Perhaps they missed the point on purpose
> ...

I don't think it's all that clear, because Codd also says:

"The simplicity of the array representation which becomes feasible
when all relations are cast in normal form is not only an advantage
for storage purposes but also for communication of bulk data between
systems which use widely different representations of the data."

I think they did the wrong thing, but they did have that justification.

Eric
--
ms fnd in a lbry

Eric

unread,
Feb 17, 2013, 5:09:15 PM2/17/13
to
On 2013-02-17, Rob <rmps...@gmail.com> wrote:
> On Saturday, February 16, 2013 4:20:01 PM UTC-7, paul c wrote:
>> On 14/02/2013 8:03 PM, James K. Lowden wrote: > SQL was intended, I'm
>> sure you agree, to be a language for deriving logical inferences.
> [omitted text].

[more omitted text]

> Database technology has come a long way since the 1970s when SQL
> was initially specified, and relational DBMSs do alot more than
> report-writing. But to suggest that in the 1970s, the authors were
> interested in "logical inference" requires great poetic license compared
> to "replacing the report-writer capabilities of IMS", and feels like
> history rewritten.

To quote from Codd[1970]:

"The adoption of a relational model of data, as described above,
permits the development of a universal data sublanguage based on an
applied predicate calculus. A firstorder predicate calculus suffices
if the collection of relations is in normal form."

I rather think there is a connection between "predicate calculus" and
"logical inference", yes? So perhaps they should have been interested.

Not that I am denying that the report-writing mentality had a strong
influence.

Derek Asirvadem

unread,
Sep 29, 2013, 8:46:00 PM9/29/13
to
On Monday, 18 February 2013 09:09:15 UTC+11, Eric wrote:

> On 2013-02-17, Rob <...@gmail.com> wrote:
>
> > On Saturday, February 16, 2013 4:20:01 PM UTC-7, paul c wrote:
>
> >> On 14/02/2013 8:03 PM, James K. Lowden wrote: > SQL was intended, I'm
>
> >> sure you agree, to be a language for deriving logical inferences.
>
> > [omitted text].
>
> [more omitted text]
>
> > Database technology has come a long way since the 1970s when SQL
> > was initially specified, and relational DBMSs do alot more than
> > report-writing. But to suggest that in the 1970s, the authors were
> > interested in "logical inference" requires great poetic license compared
> > to "replacing the report-writer capabilities of IMS", and feels like
> > history rewritten.
>
> To quote from Codd[1970]:
>
> "The adoption of a relational model of data, as described above,
> permits the development of a universal data sublanguage based on an
> applied predicate calculus. A firstorder predicate calculus suffices
> if the collection of relations is in normal form."
>
> I rather think there is a connection between "predicate calculus" and
> "logical inference", yes? So perhaps they should have been interested.

The point is absurd.

Rob's point (as I see it) is that the asylum dwellers rewrite history, as they often do. Charging either the System R team, or the Sequel Team, or the SQL Committee (of old, before they were subverted), with having missing something that is relevant today, which was not a criterion in the 1970's, is simply dishonest.

No one has suggested that there is NOT a connection between the two, or that the subject should NOT be considered. But some people think that if the sky is not black, that means, by some insane "logic" that it must be white. There are no colours in-between, they cannot look upwards. We used to put them in asylums. Now they walk the streets. And write books.

Clearly, one of the indicators of insanity is the inability to discern the abstract from the real. Human beings are capable of contemplating both, and precisely because we can differentiate them. Higher intelligence is in part about the ability to think in theoretical or abstract terms. But one who cannot actually cross over to the real side of the fence and implement something abstract, in a way that works in the real world, is merely a raving lunatic with nothing to demonstrate their existence on the physical plane.

Those who can contemplate the abstract (limited to the fact that it is abstract) *and* specify or implement something physical (specifically *not* limited to the abstract, but limited to the physical) are normal. Not necessarily of higher intelligence.

Those who demand that the physical implementation must be limited to the limitations of the abstract, are lunatics.

Case in point. Date and Darwen and the whole TTM cult keep chanting the mantra that "SQL is broken", "SQL Suffers The Null Problem". Yes, yes, of course it does. The SQL Committee in its hallowed state of specifying absurdity, has ensured that the language will never work *as specified*. Any vendor who implements SQL as a physical platform, has to resolve that absurdity. Some do, eg. Sybase, MS, DB2, so the people who purchase those products and implement applications and databases in those platforms are oblivious to The Null Problem which abstractionists scream is real. Yes, yes, it is a very real problem in the cranium, but not outside it. There is no null problem in the universe or the server or the database or the application.

But 30 years after the null problem was resolved, the cult keeps on chanting the mantra. You can't teach a neurotic that the sky has not fallen in in 30 years, therefore it is not falling in now. They are attached to their "reality", it is what makes them insane, special.

Other vendors implement the Null Problem. Products such as Oracle, which are passed off as "DBMS" and even "RDBMS" "platforms" have the Null Problem. It can't do simple subqueries without cacking itself, but it can do the Null Problem Really Well. If a functioning human being has to implement an application in Oracle, the onus would be on them to resolve the Null Problem demanded by the SQL Committee, present in the platform, and the the absence of it, demanded by integrity (not abstract) of not contradicting oneself. So they would implement standards in both the database, and the application, such that it does not have the Null Problem disease which the platform provides. One is not required to use all aspects of the platform, just because it is there. Thousands of Oracle implementers have done so. They are normal.

And there are many more thousands of Oracle implementers who have *not* done so. They are abnormal. They spend years explaining insane behaviour as if it were normal. Typical of the insane, they will not take responsibility for the disease, they will point out that it an acquired disease, and they infect anything they touch. They have made the insane sane, but that is the act of the insane. In their state, they have failed to make the discernment that normal humans make.

The sane do not enter into arguments with them. They are still chanting their beloved mantra, 30 years after is was resolved. They write books about it.

Cheers
Derek

Derek Asirvadem

unread,
Sep 29, 2013, 8:59:56 PM9/29/13
to
This one should not pass unnoticed.

On Monday, 18 February 2013 09:09:15 UTC+11, Eric wrote:
>
> To quote from Codd[1970]:
>
> "A firstorder predicate calculus suffices
> if the collection of relations is in normal form."

Codd specifies what that Normal Form is, that reference is not merely a generic or general term.

And the asylum dwellers propose that the RM has nothing to do with Normal Forms.

(It should be noted that they have instead, defined, and marketed, various abnormal forms that have (a) nothing to do with the RM and (b) nothing to do with Normal Forms, as "normal forms". Clever. in the lunatic sense.)

There is a second, consequential point, that if one understands and implements the RM in its un-perverted form, ie. the Normal Form specified therein, "first order predicate logic suffices". I have scores of Relational databases in which that is true.

Cheers
Derek

Derek Asirvadem

unread,
Sep 29, 2013, 11:01:46 PM9/29/13
to
Rob

Thank you for an excellent post.

I would like to add a couple of points, if I may.

On Sunday, 17 February 2013 13:30:04 UTC+11, Rob wrote:
>
> Something that never seems to get addressed is the pre-history of DB technology before Codd, before SQL. Within that universe, IMS and IMS-like databases were used much as today to track the state of some model, but were queried by a mechanism called RPG (for Report Program Generator). The RPG spec could be given within COBOL or in a standalone interface, but the idea was to spit out retrieved data in a report formed as the interpretive software made a single pass over the linearized physical (after RDBMS, logical) order of the records, or segments as they were called. Any sorts or aggregates were computed after-the-fact from an intermediate containing the retrieved data in the logical/physical order in which it was accumulated.

Absolutely.

> So when the System R folks at IBM were defining Sequel (which became SQL), the objective was not logical inference, it was report-writing. After all, the whole point of Codd's thrust was to replace IMS (et. al.) with a system that was immune to changes in the physical storage structure of the data. But that didn't obviate the major purpose of such databases, report-writing If you examine the most primitive SQL form of the SELECT statement, it is a non-procedural specification of a traversal of the logical product (in the FROM clause) to retrieve the attributes (in the SELECT clause) restricted by the predicate of the WHERE clause). And again, sorted (as specified in the ORDER clause), formatted and aggregated (SELECT clause again).

Absolutely.

The RM is applicable, by engineers, to data. Period. It is not merely the data model, it is about modelling data. It is not about the data sublanguage, but about how data is perceived, understood, modelled. (I believe Jan Hidders stated the same thing, in different words, in another post.)

I have two commercial instances of the following. I had a fairly complex Relational database that does both OLTP and DSS from the single set of tables. Completely faithful to the RM (the real RM, not the nonsense written up in books by the insane). Completely Normalised (Codd's Normalisation, plus Normalisation as a principle, not the absurdities of abstract "thinkers" that produce unnormalised models). I pride myself on the fact that I can produce any report (planned and unplanned) with a single SELECT statement. That includes various types of Pivots, etc (without using the PIVOT function). Platform is Sybase ASE.

After that was well-established, I had the need to implement that db+app with no database footprint. I chose awk. The entire Sybase database, the rows (forget 'tuple', it is an implementation) from all reference or static tables, are implemented in a single file (segment), and the source data which is site-specific, in a separate single file. The reference tables are a fully integrated data dictionary with constraints, etc (allows customisation, and traps errors).

awk (just one example) is RPG, post 1970. I notice that programmers who have difficulty with awk simply do not understand RPG as a working principle, as a Method. I perform a single pass of the two datasets, and produce any of the same reports using the awk-equivalent of a single SELECT.

for ( row in awk-array ) {
printf ...
}

Relations are as simple as:

for ( row in parent_array ) {
for ( parent_row in child-array ) {
printf ...
}
}

Which means, ORDER BY is merely choosing a path through the Ordinals or some parent-array (table).

Which also means, I never perform table scans (which is the common method of handling awk arrays), I read only the rows that are relevant to the query. On an ancient Unix box, the entire process, regardless of report complexity, runs in sub-second speeds. 24 (hourly) or 48 (half-hourly) text reports such as this:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/SEQUOIA_120806_sysmon_09

are converted into a "raw" grid such as this (ugly but readbale, compared to 24x40 pages of text):
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Capture.pdf

or something more logical such as this:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/Sysmon%20Processor%20Eg%20Date%20Trend.pdf

and with a few more clicks and drags, using Excel or Numbers, into something like this:
http://www.softwaregems.com.au/Documents/Article/Sysmon%20Processor/New%20Storage%20DiskGroup.pdf

awk has a beautiful implementation of arrays, that easily provides for Relational tables. It also has a very nice method of handling the empty set (the Null Problem to the cultists) without any fanfare.

I use Codd's concept of Ordinals as well, but that is only for advanced implementers of the RM, those who understand surrogates without suffering from black-or-white thinking.

The point is, if one understands the RM, one can implement it in any language, any record filing platform. One does need to make the discernment between data and modelling the data, versus the data sublanguage for operating on the data.

(Darwen, who has no technical qualifications, and cannot program his way out of a cardboard box, but who has declared that he will invent a full "relational" language, has been unable to make that normal human discernment. Twenty years of dishonest bashing the RM, but nothing to replace it. The padded cell waits.)

The implementation is a commercial product, if anyone has questions about details of the implementation, please note, there are *some aspects* that I will not be able to answer here. I will however answer them if you send me a private email.

Cheers
Derek
derek.a...@gmail.com

Eric

unread,
Sep 30, 2013, 4:41:41 PM9/30/13
to
On 2013-09-30, Derek Asirvadem <derek.a...@gmail.com> wrote:
> On Monday, 18 February 2013 09:09:15 UTC+11, Eric wrote:
>> On 2013-02-17, Rob <...@gmail.com> wrote:
>>> On Saturday, February 16, 2013 4:20:01 PM UTC-7, paul c wrote:
>>>> On 14/02/2013 8:03 PM, James K. Lowden wrote:
>>>>> SQL was intended, I'm sure you agree, to be a language for deriving
>>>>> logical inferences.
>>>> [omitted text].
>>
>> [more omitted text]
>>
>>> Database technology has come a long way since the 1970s when SQL
>>> was initially specified, and relational DBMSs do alot more than
>>> report-writing. But to suggest that in the 1970s, the authors were
>>> interested in "logical inference" requires great poetic license compared
>>> to "replacing the report-writer capabilities of IMS", and feels like
>>> history rewritten.
>>
>> To quote from Codd[1970]:
>>
>> "The adoption of a relational model of data, as described above,
>> permits the development of a universal data sublanguage based on an
>> applied predicate calculus. A firstorder predicate calculus suffices
>> if the collection of relations is in normal form."
>>
>> I rather think there is a connection between "predicate calculus" and
>> "logical inference", yes? So perhaps they should have been interested.
>
> The point is absurd.

James said that SQL was intended to be a language for deriving logical
inferences.

Rob said that saying this required "great poetic license".

I suggested that there is evidence to the contrary.

Where is the absurdity?

> Rob's point (as I see it) is that the asylum dwellers rewrite history,
> as they often do. Charging either the System R team, or the Sequel Team,
> or the SQL Committee (of old, before they were subverted), with having
> missing something that is relevant today, which was not a criterion in
> the 1970's, is simply dishonest.

You are making this up (and most of the rest of it) to support your own
rather strange viewpoint.

snip ><

> Those who demand that the physical implementation must be limited to
> the limitations of the abstract, are lunatics.

The physical layer can do whatever the hell it likes, as long as it can
produce correct results efficiently enough. However it is supposed to
have an interface limited to the abstract model being supported (in this
case, Relational Theory). There are two levels here, and those who
cannot distinguish them are very wrong. This does not imply lunacy.
Unfortunately you seem to be one of those who fails to make the
distinction.

snip ><

> But 30 years after the null problem was resolved, ...

So tell me, in as few sentences as possible, what *you* think the Null
Problem is.

While you are at it, tell me what Oracle "features" show that it can "do
the Null Problem Really Well".

And finally, people can disagree with you, and even be wrong in doing
so, without being lunatics, or insane, or abnormal.

Jan Hidders

unread,
Oct 1, 2013, 7:59:46 AM10/1/13
to
Let me chime in here.

Derek, you and I probably share many points of view and are probably
fighting the same fight debunking dbdebunk, but I'm not cool with your
aggressive wording and think it is counterproductive.

-- Jan Hidders

It is loading more messages.
0 new messages