foreign key constraint versus referential integrity constraint

Keith H Duggar

unread,

Oct 21, 2009, 11:29:15 AM10/21/09

to

I'm trying to understand the relationship between referential
integrity and foreign key constraints. My example is suppose we
have the ubiquitous supplier who sends the usual Orders with
the usual info but also includes a Coupon that can be redeemed
on future Orders:

Orders { OrderID, CustomerID, ... usual stuff ..., CouponID }

and the database tracks if a Coupon is redeemed and if so the
order it was redeemed on:

Redemptions { CouponID, OrderID }

Assume there are relvars rvOrders and rvRedemptions of type
Orders and Redemptions respectively. So if a coupon is redeemed
an entry will be added to rvRedemptions with the OrderID of the
order coupon was redeemed on (not he order it was mailed out
with).

So Coupons may or may not be redeemed so there may be CouponID
appearing in rvOrders that have no entry in rvRedemptions.

Now it seems to me that 1) conceptually to me rvOrders:CouponID
is a foreign key to rvRedemptions 2) referential integrity is
violated since rvOrders can have CouponID not appearing in
rvRedemptions 3) this does not seem "evil" to me.

Please help me understand where my thinking is going wrong. To
that end here are some questions. Is rvOrders:CouponID in fact
a foreign key to rvRedemptions? Or is a foreign key constraint
a synonym for a referential integrity constraint? If it is not
a foreign key then what does one call the constraint that joins
on rvOrders:CouponID and rvRedemptions:CouponID are semantically
correct? If it is a foreign key then does it violate referential
integrity? If so is this evil and why?

Thanks in advance!

KHD

Tegiri Nenashi

unread,

Oct 21, 2009, 11:49:54 AM10/21/09

to

I prefer the term "inclusion dependency": projection of one relation
(that is rvRedemptions v [CouponID]) is a supepersetset of projection
of the other (i.e. rvOrders v [CouponID]). I thought that all three
terms are the same; perhaps with foreign key constraint adding some
insignificant matter, like the "smaller" set being unique.

Sampo Syreeni

unread,

Oct 21, 2009, 12:05:25 PM10/21/09

to

> So Coupons may or may not be redeemed so there may be CouponID
> appearing in rvOrders that have no entry in rvRedemptions.
>
> Now it seems to me that 1) conceptually to me rvOrders:CouponID
> is a foreign key to rvRedemptions

You have the direction wrong. Coupons are created and go out with
orders, so the set of coupons defined in rvOrders is the total set
known of by the system and serves to define the associated domain.
rvRedemptions has the set of coupons that have been redeemed, and of
course it's not possible to redeem an inexisting coupon. So there's an
inclusion dependency from rvRedemptions to rvOrders, which is then
enforced using a foreign key, in that direction. Notice that a foreign
key can't be used to enforce this constraint unless CouponID is a
candidate key of rvRedemptions.

> Or is a foreign key constraint a synonym for a referential integrity constraint?

It is an SQL mechanism, and not particularly powerful or convenient
one at that, to implement referential integrity constraints. On the
theoretical side the slightly more general concept of inclusion
dependency is used instead of referential integrity constraint; it
additionally applies to the case where multiple such constraints can
have arbitrary overlap in their attributes, and the "target" side need
not constitute a candidate key. The definition is "one projection must
remain the subset of another".
--
Sampo

Sampo Syreeni

unread,

Oct 21, 2009, 12:18:25 PM10/21/09

to

> I prefer the term "inclusion dependency": projection of one relation
> (that is rvRedemptions v [CouponID]) is a supepersetset of projection
> of the other (i.e. rvOrders v [CouponID]). I thought that all three
> terms are the same; perhaps with foreign key constraint adding some
> insignificant matter, like the "smaller" set being unique.

BTW, one limitation of foreign keys which I find particularly annoying
is that they only work when we're talking about base tables whereas
I've already bumped a few times into a situation where I would have
liked to constrain (on) the contents of a view. That sometimes happens
when you have to go beyond 3NF or you're working with a conceptual
model which allows multiple inheritance and/or union types. Do you
happen to know whether this sort of thing is formally covered by the
concept of inclusion dependency?
--
Sampo

Cimode

unread,

Oct 21, 2009, 3:00:10 PM10/21/09

to

Quite frankly, I do not see how that could bring any value in the
context of arbitrary cardinalities...

Regards...

paul c

unread,

Oct 21, 2009, 3:15:34 PM10/21/09

to

Perhaps a clue lies in asking whether it is a requirement that a coupon
is only issued once and if so, how the design should reflect that
(presumably a coupon can be redeemed only once).

(Since I first saw Codd's 1970 paper I've never been able to understand
why on the one hand he talked about "symmetrical" access but on the
other he used the very restrictive term "foreign key".) For me, a
"reference" is a reference even if most dbms' have similar restrictions
to Codd's term. It seems arbitrary and unnecessarily doctrinaire to
restrict references to only primary keys and unique indexes because I
think if one wants a particular constraint, one should not have specify
two constraints, eg., a 'reference' as well as a key or index. This
doesn't prevent a language from offering shortcuts that are understood
to involve both.)

Sampo Syreeni

unread,

Oct 21, 2009, 6:36:47 PM10/21/09

to

> (Since I first saw Codd's 1970 paper I've never been able to understand
> why on the one hand he talked about "symmetrical" access but on the
> other he used the very restrictive term "foreign key".) For me, a
> "reference" is a reference even if most dbms' have similar restrictions
> to Codd's term.

I think the later talk about domains and e-relations in RM/T (and to a
lesser extent in RM/V2) makes that a bit easier to understand. As I
interpret it, Codd started with a very clean and purely syntactic
model in his seminal article, which will definitely lead to pure
symmetry in access. But from the very start he also worried deeply
about the semantics and making them explicit as metadata, which leads
to the second influential discovery of his: the integral, reflective
data dictionary. Later on his development of the relational model not
only added surrounding functionality (which weighs in so heavily in RM/
V2, what with all of'em T-joins and the lot) but semantics as well.

I think what was driving Codd was the need to make his model fully
self-contained. If you really think about it, that tends to exclude
any and all free-form input by the user. I mean, let's say you let an
end user input a free form name. It's pretty much guaranteed that lots
of nonsense is going to be input then. After that the name field
doesn't really constitute a well-defined domain at all; to a
mathematician like Codd its proper type should probably be more or
less 'general-ape-shit;no-real-type;use-at-own-risk'. I.e. the field
could be straightforwardly likened to a blob of text as we call the
more sustained kind of drivel today; certainly not something one would
make into a domain, and not something which would fit Codd's neat idea
of formal, repetitive business data.

So, what he did was to go the way we'd now call Master Data
Management. He went on to separate all of the data we'd intuitively
call entities, and more formally would define as a) relatively stable
data b) centrally and well managed as a fully controlled vocabulary,
so that c) applying the closed world assumption to it is actually well
founded because there is a formal process to keep the real world and
the minimal model of the database trying to approximate it in close
correspondence, d) attaching to things that have real, societally
established permanence, uniqueness and omnipresence which can be
likened to "a candidate key on an established class of real world
entities which is at most slowly varying", and most of all e)
discrete, enumerated data that is mostly used to classify/qualify
other data, the latter usually being composed of abstract measurements
on a continuous scale (originally account balances, given that we're
after all talking about developments which happened within the
International *Business* Machines Corporation). What we'd nowadays
call "complex", "semistructured", "unstructured", "unconventional" or
even "nasty" data was explicitly excluded by Codd from the very
beginning.

That then lead to a sort of revival of the earlier intuitive idea of
entities as opposed to relationships. Only this time there was a clear
distinction in the criteria: relations were always relations, and they
represented a theory of the outlying reality that was perfectly
symmetric. Nevertheless, the connection to the real world was also
taken into account in that some relations had additional semantics
bolted on in addition to just the logical constraints. They referred
to real world entities with the kinds of properties that would make it
especially easy to keep the correspondence between the minimal model
of the DB and the real world the DB was trying to represent in synch.

The difference to today's MDM and OLAP-dimension frenzy was then that
the logical level design criteria were fully unaffected by such
semantic, real world concerns. They remained completely syntactic, as
they should have. The only thing added was the idea that some
relations which happened to have a certain commonly reoccurring
structure, and in addition were meant to semantically correspond to
real life in what was also a commonplace correspondence, could have
those facts formally notarized in the data dictionary, using a well-
known syntax.

That, and only that, was what made a relation an e-relation. You
neither could nor had to reference the e-relation, but still its
semantics necessarily made it so that a) it made no sense to refer to
any real life modelable thing that wasn't already modeled by the e-
relation, b) thus semantically speaking we had an inclusion dependency
towards the e-relation already, c) of course all such semantic
constraints should be enforced if possible, d) a foreign key
constraint is pretty much The Mechanism to do that in the databases of
the past couple of decades, and finally e) foreign keys seem an awful
lot like unidirectional links, eventhough they could and should be a
lot more besides.

So, in addition to the semantics, we also have the direct implication
that an e-relation fully defines a domain (though not all domains need
to be defined via such enumeration). Then it's quite clear that all
all relations referring in any way to the kind of real life object the
e-relation represents, should be limited to utilizing the
corresponding domain/type determined by it as the type of the
attribute. (In SQL databases that means declaring foreign keys
referring to primary keys. In my ideal database it would amount to
declaring domains, and in addition to those also full inclusion
dependencies which could span any combination of attributes on both
sides, perhaps even using views in the middle to cater for the more
complex cases; I take constraints rather seriously.)

At the logical level all that was done was to declare some
dependencies, syntactic constraints, and telling a little bit about
how to read/update the data to a live person, using a controlled
vocabulary to do so (e.g. "social security number is the primary key
and is supposed to uniquely identify persons; thus if you have a
duplicate number, it's semantically fucked, leaving you only the
option of gathering even more data, until you can again make the
minimal model of the database correspond to reality").

But of course the all-round inclusion dependency also closely mimics
earlier referential semantics in its effects, and the usual way we
tend to use data typically reinforces the notion. You rarely see the
symmetry ("there is no link, there are just equal values"). Still, I
can tell you that when you do ad hoc OLAP queries long enough, you'll
eventually find yourself starting the query "against the tide". That
is then when you finally get it -- you simply couldn't have done it in
the presence of a directional link without "going through the whole
fucking thing using a for-loop" -- and off you are, into the
relational brotherhood. :)

> It seems arbitrary and unnecessarily doctrinaire to
> restrict references to only primary keys and unique indexes because I
> think if one wants a particular constraint, one should not have specify
> two constraints, eg., a 'reference' as well as a key or index. This
> doesn't prevent a language from offering shortcuts that are understood
> to involve both.)

That is arbitrary indeed, and one of the few things where genuine
progress has been made in the theory of dependency and normalization
on the issues Codd originated. I also think that it is one of the very
few points where Codd as well fell victim to the earlier record
centric thought -- "referential integrity" often implies the mentality
that we're "linking" from some dependent data to stuff that "is
defined and primarily lives at another place". That is precisely the
record and index centric way of doing things, whereas the central
relational innovations were to a) place such references squarely at
the logical level, which makes them fully symmetric (i.e. in the
current case we might as well say that rvOrders refers to
rvRedemptions as we could say the opposite; it's not about directional
linking, but about referring to the same logical entity in two
different places at once, and just defining the propositional
semantics correctly), b) focus attention on the formal properties,
modelling and constraints (e.g. inclusion dependencies) within the
data instead of playing by real-world intuition alone, and c)
abolishing even the idea that base entities and relationships should
somehow be separated (i.e. what matters is only the formally
verifiable, syntactic dependencies that will necessarily result from
faithfully modeling the data, which then reveals that relationships
and base entities really behave much the same way).
--
Sampo

paul c

unread,

Oct 21, 2009, 8:47:28 PM10/21/09

to

Sampo Syreeni wrote:
...

> I think the later talk about domains and e-relations in RM/T (and to a
> lesser extent in RM/V2) makes that a bit easier to understand. As I
> interpret it, Codd started with a very clean and purely syntactic
> model in his seminal article, which will definitely lead to pure
> symmetry in access. But from the very start he also worried deeply
> about the semantics and making them explicit as metadata, which leads
> to the second influential discovery of his: the integral, reflective
> data dictionary. Later on his development of the relational model not
> only added surrounding functionality (which weighs in so heavily in RM/
> V2, what with all of'em T-joins and the lot) but semantics as well.

> ...

Regarding semantics, far be it from me to try to describe Codd's
motives. While I think it's certain he later on veered from his
original principles, I think no one will ever know if there were
personal or more general reasons that drove him to the so-called
"semantic model", I suspect it was because of various forces that we are
all subject to from time to time. For all I know he might have been
bored with the lack of uptake of his ideas and thought a little
speculation might stimulate progress. Unfortunately most people, who
pretended to read his stuff, including apparently the original system R
implementors, chose to fasten on one sentence or paragraph, sometimes
taking that as the pre-eminent gospel and ignoring the rest. What he
wrote is either full of caveats between the lines or extremely
case-based from the get-go. Pretty much every quote of Codd's I see is
out-of-context.

Mr. Scott

unread,

Oct 21, 2009, 9:59:54 PM10/21/09

to

"Keith H Duggar" <dug...@alum.mit.edu> wrote in message
news:76f4fd43-fa76-4c6f...@r36g2000vbn.googlegroups.com...

A foreign key constraint is an inclusion dependency and an inclusion
dependency is a referential constraint, but there are referential
constraints that are not inclusion dependencies and there are inclusion
dependencies that are not foreign key constraints. For example, a
constraint that states that if there is a row in one table there cannot be a
corresponding row in a different table is neither an inclusion dependency
nor a foreign key constraint but is still a referential constraint
nonetheless. Also, a constraint that states that whenever there is a row in
one table there must be at least one corresponding row in another is not a
foreign key constraint, but it is still an inclusion dependency.

I think the first problem with your example above is that there are three
different kinds of facts about three different kinds of objects that need to
be represented. There are orders, there are coupons, and there are
redeptions. A coupon is associated with the order that it was shipped with,
but there may be orders that aren't shipped with coupons. A redemption
associates a coupon with an order, presumably other than the one that the
coupon was shipped with. So basically, the three kinds of facts are those
that assert that there is an order, O, those that assert that a coupon was
shipped with a particular order, C, and those that assert that a coupon was
redeemed on a particular order, R. C implies O because coupons are shipped
with orders, R implies C because a there has to be a coupon before it can be
redeemed, and also R implies O because there has to be an order on which to
redeem the coupon.

Since there are three kinds of facts, there should be three tables:

one for orders, O{O#}

one for coupons, C{C#, O#}
with a foreign key from C[O#] to O[O#],

and one for redemptions, R{C#, CO#, O#}
with foreign keys from R[O#] to O[O#]
and from R[C#, CO#] to C[C#, O#],
along with the constraint that CO# != O#

To be able to enforce the constraint CO# != O# requires a foreign key
constraint that is not just key based but superkey based.

Hope this helps.

Keith H Duggar

unread,

Oct 21, 2009, 10:54:30 PM10/21/09

to

I'm guessing this is just a limitation of some particular products?
Because if I'm understanding "The Principle of Interchangeability"
that for example Date's discusses in "Databases In Depth" then the
RM has nothing to say about the arbitrary distinction between base
versus virtual relvars (views). So in principle one should be able
to define constraints on any relvars base or otherwise at least in
so far as the RM is concerned. Is this correct?

KHD

paul c

unread,

Oct 21, 2009, 10:57:18 PM10/21/09

to

Date's POI has this caveat about "no unnecessary" which always troubles
me. I would like to know if Codd ever said his foreign key couldn't be
defined against his views.

Keith H Duggar

unread,

Oct 21, 2009, 11:22:06 PM10/21/09

to

Thanks to everyone. I think I understand more clearly now the
inherent asymmetry of foreign key constraints. But now then my
question is what is the common name for the semantic constraint
that "attribute RV1:X and RV2:Y are sematincally joinable"?

Because simply having the same type is not enough for example
both OrderID and CouponID might both have type GenericID and
yet we may want to state that it doesn't make sense to join
them. Likewise attribute names for joinable attributes may be
different in different relations (ie needing rename).

In other words suppose we go with Mr. Scott's reformulation
but with some additional type information (name : type) and
attribute name changes in Redemptions for argument sake

Orders {
OrderID : GenericID }

Coupons {
OrderID : GenericID ,
CouponID : GenericID }

Redemptions {
CID : GenericID ,
OID : GenericID }

where CID is the redeemed coupon's ID and OID is the ID of the
order on which the coupon was redeemed (not the order the coupon
was shipped with).

So as above, we cannot tell it is ok to join CID with CouponID
from the attribute name, since they are different, nor can we
assume they can be joined simply because they are both type of
GenericID because for example so is OID which is not appropriate
to join. So if we wanted to express the additional constraint
that is semantically appropriate to join CouponID with CID
what kind of constraint would this be? Likewise if we wanted
to express it is semantically inappropriate to join OrderID
and CID?

By the way, Mr. Scott, I removed O# from your R because it
seemed redundant to me since we can recover the OrderID of the
order the coupon was shipped with from the Orders relation (if
we assume that CouponID is a candidate key of Coupons). Is
that not correct?

Thanks again for the help!

KHD

Sampo Syreeni

unread,

Oct 22, 2009, 7:34:22 AM10/22/09

to

> Thanks to everyone. I think I understand more clearly now the
> inherent asymmetry of foreign key constraints. But now then my
> question is what is the common name for the semantic constraint
> that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because
that's what makes it possible to enforce them automatically, simply by
manipulating the symbols in the database. So what you actually mean
is, what syntactic construct is appropriate for enforcing the
semantics you describe. I'd argue that that's what domains were
invented for.

> Because simply having the same type is not enough for example
> both OrderID and CouponID might both have type GenericID and
> yet we may want to state that it doesn't make sense to join
> them.

I see at least two ways to view this. The first would be that you just
made a modelling error -- you assigned the same syntactic type to two
things that are actually quite different. What you should have done is
to assign them two different types/domains, which would then stop them
from being joined.

The second way would be to say that you've now implicitly created a
union type (Order|Coupon), and its key, GenericID, has to be such that
OrderID's and CouponID's never overlap. That way you can join willy
nilly, but the results will be identical to the case where you've
isolated the ID's into two separate types. Carried out in full this
approach eventually leads to a generalization hierarchy converging
towards a fully generic Object, and thus database wide object/unique
identifiers.

It is well known that this is rather a contentious and dangerous
design choice. In fact we can already see the first signs of trouble
in your minimal example: what on earth do Orders and Coupons have to
do with each other? What precisely are you trying to express by
creating the union type? In order for that generalization to make
sense, at the very minimum you should be able to point to some
hypothetical field that could be shared between the two types, i.e.
data belonging uniquely to the union type. What would that be,
precisely?

If you have the means of keeping such ID's private -- this would
necessitate making the ID's fully opaque from an outside user's point
of view, which cannot be done in any DBMS I know of -- you might just
be able to justify their existence as surrogates. But this is still
stuff that requires tremendous discipline and insight into the entity
integrity issues that need to be addressed in addition to the
referential integrity between the surrogates. I wouldn't recommend
going down this slippery slope unless you're *absolutely* sure you
know what you're doing -- I haven't seen a single wider scale
deployment which got this right and didn't suffer integrity issues in
the long run.

> Likewise attribute names for joinable attributes may be
> different in different relations (ie needing rename).

Yes. Ideally the attributes wouldn't carry just a name and a physical
level type (e.g. string, integer) but also a semantic type/domain.
That way the name would indicate the semantics, or role, of the
attribute in this given relation, the domain would indicate what it
can be joined with and where to look for the set of permissible
values, and the syntacting type associated with the domain would tell
how to actually store the data.
--
Sampo

Nilone

unread,

Oct 22, 2009, 8:40:52 AM10/22/09

to

On Oct 22, 5:22 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
> Thanks to everyone. I think I understand more clearly now the
> inherent asymmetry of foreign key constraints. But now then my
> question is what is the common name for the semantic constraint
> that "attribute RV1:X and RV2:Y are sematincally joinable"?
>
> Because simply having the same type is not enough for example
> both OrderID and CouponID might both have type GenericID and
> yet we may want to state that it doesn't make sense to join
> them. Likewise attribute names for joinable attributes may be
> different in different relations (ie needing rename).

If two attributes have the same domain and are semantically
comparable, I would call that a type. So I would call OrderID and
CouponID types, and GenericID would be their supertype. It just so
happens that the attributes of these types in Orders and Coupons have
the same name as the name of their type.

Orders { OrderID : OrderID }
Coupons { OrderID : OrderID, CouponID : CouponID }
Redemptions { CID : CouponID, OID : OrderID }
OrderID <: GenericID
CouponID <: GenericID

Mr. Scott

unread,

Oct 22, 2009, 9:40:08 AM10/22/09

to

"Keith H Duggar" <dug...@alum.mit.edu> wrote in message
news:ee296634-d56d-4b09...@l33g2000vbi.googlegroups.com...

If you use generic ids then I think you would have to explicitly specify the
ISA relationships between the objects represented in the database, but if
you use a separate domain or type for each kind of object, then the
relationships become implicit. I would like to explore this further, but I
don't have the time right now.

>
> By the way, Mr. Scott, I removed O# from your R because it
> seemed redundant to me since we can recover the OrderID of the
> order the coupon was shipped with from the Orders relation (if
> we assume that CouponID is a candidate key of Coupons). Is
> that not correct?

You can indeed recover the OrderID from the coupon in a join, but how would
you declare the constraint that coupons can't be applied to the order with
which they were shipped?

paul c

unread,

Oct 22, 2009, 12:45:33 PM10/22/09

to

Mr. Scott wrote:
...

> A foreign key constraint is an inclusion dependency and an inclusion
> dependency is a referential constraint, but there are referential
> constraints that are not inclusion dependencies and there are inclusion
> dependencies that are not foreign key constraints. For example, a
> constraint that states that if there is a row in one table there cannot be a
> corresponding row in a different table is neither an inclusion dependency
> nor a foreign key constraint but is still a referential constraint

> nonetheless. ...

Tempts me to use a new term (at least I'm guessing it's new) - exclusion
dependency, even if there is nothing new about what it connotes. I
think it is hard to separate the influence of the typical dbms product
that doesn't allow one to express all the possibilities that an algebra
allows (maybe easier for me being quite ignorant of most products). This
makes talk of theory harder so most people end up habitually confusing
various product documentation with theory. Personally, given that any
dbms is likely to have a number of practical limitations, I don't see
why a dbms couldn't allow restricted use of negation so that a foreign
'reference'/exclusion dependency might be read as "A{attr} = A{attr} AND
(NOT B{attr})", which is basically of the same form as any inclusion
dependency or what I call a reference. (Any dbms that has a 'delete'
operator already supports similar negation implictly.) When people talk
of referential integrity I suspect that they are usually talking about
such an equation. Also suspect that we have a number of qualified names
for that basic form (of which 'primary key' was apparently the first)
simply because using the terms in a dbms' language makes it easy for
implementers to physically optimize. IMHO the implementation artifacts
don't really contribute to, nor involve, any essential theory except for
the usually ignored theory of optimization.

(I once read an interview of Codd where he more or less acknowledged
that in his first two papers he was struggling to find the best terms to
use. Eg., he said that the American 'normalization' of relations with
China inspired him to use same term. I've read where CJ Date hadn't
heard of this, but I recall the interview, probably from a magazine
called 'DBMS' around 1994.)

paul c

unread,

Oct 22, 2009, 1:22:01 PM10/22/09

to

Sampo Syreeni wrote:
> ... What we'd nowadays

> call "complex", "semistructured", "unstructured", "unconventional" or
> even "nasty" data was explicitly excluded by Codd from the very
> beginning.

> ...

He was dealing with structured data all along. What he excluded was two
things i) unnecessary structure, ii) hidden structure.

Sampo Syreeni

unread,

Oct 22, 2009, 3:16:53 PM10/22/09

to

> Tempts me to use a new term (at least I'm guessing it's new) - exclusion
> dependency, even if there is nothing new about what it connotes.

I've seen that term being used, although far less commonly than is the
case for inclusion. There is not a whole lot of theory on that sort of
thing. (Which is why I forgot referential integrity can take that form
as well; Thanks, Mr. Scott!)

I've never seen a real life schema where this sort of constraint was
being enforced. Quite possibly because people don't come to think of
the logical presence, and thus the necessity of the constraint. The
main context in which I have in fact seen these used is a theoretical,
data modelling one. There the commonest case is to enforce the
distinction between plain and disjoint union types.

In practical, relational use that however mostly translates into a
data model which places the separate pieces of a disjoint union into
separate tables, and so makes the enforcement implicit. (Cf.
dependency preservation and all that.) The other kind of union then
tends to yield tables with lots of correlatedly null columns, or a
constellation of tables with a shared key (often a surrogate/OID/
whatever in case the participants in the union are different enough;
unsurprisingly the pattern often also leads to redundant columns as
well).

> Personally, given that any
> dbms is likely to have a number of practical limitations, I don't see
> why a dbms couldn't allow restricted use of negation so that a foreign
> 'reference'/exclusion dependency might be read as "A{attr} = A{attr} AND
> (NOT B{attr})", which is basically of the same form as any inclusion
> dependency or what I call a reference.

Well, not quite the same form because it's more general. But no, there
is no reason why that sort of functionality couldn't be available. And
in fact, even SQL supports it nowadays, via assertions, eventhough
just about nobody implements that part of the standard.

That's really quite a shame. Every single form of dependency I can
think of could be asserted that way, and the wider use of assertions
instead of more specialised forms of integrity constraint would
basically for RDBMS vendors to treat constraints in a fully
declarative, generalized, high level fashion. And if you then think
about it, solving that problem would also pretty much solve the
problem of incremental maintenance of materialized views, and vice
versa.

> Also suspect that we have a number of qualified names
> for that basic form (of which 'primary key' was apparently the first)
> simply because using the terms in a dbms' language makes it easy for
> implementers to physically optimize.

Quite so. More specifically, if you special case for just a few forms
of constraints (or other functionality at that), you can then special
case your underlying implementation to deal particularly well with
those special cases. That can considerably simplify implementation and
lead to substantially higher performance in those cases which fit into
your defined interface. Just like in our original example here:
foreign keys as a concept have been heavily optimized by always
demanding that the target is a candidate key and unconditionally
creating a a unique index for it.

That optimization comes at the price of generality. So when we think
about the relational model in all, such solutions don't really fit in
too well. I mean, from the very start relational data management has
been about making the model and its support machinery completely
general and declarative. No difference between fields at the logical
level. No difference between entities and relationships. No difference
in the level at which data reside, unlike in hierarchical databases.
No difference between pointers and the data pointed to, because
pointers are explicitly forbidden. No difference between table and
view. And so on.

Thus what the relational model would like is purely declarative,
algebraically complete assertions as the means of guaranteeing
integrity. Not specialized mechanisms like foreign keys -- those
belong in the same era as pointers to data do.

> IMHO the implementation artifacts
> don't really contribute to, nor involve, any essential theory except for
> the usually ignored theory of optimization.

In this regard, there is a slight complication, though. If you think
about foreign keys, they have one extra aspect in addition to the
inclusion dependency they declare has to hold. That is the procedural
rule used to deal with impending violation.

Roughly there are two general approaches to this, and SQL allows a
limited form of both wrt foreign keys: 1) deny and roll back the
transaction if a violation is about to happen, or the hazier, far more
complicated option of 2) making it so that the update takes effect,
and then using whatever means to automatically propagate the logical
consequences of the update to once again make the integrity to fully
and globally check out.

As I said, even SQL has the latter in the limited form of "on delete
cascade". But in the fully general form, those operational semantics
would be exceedingly complex, and evenmore complex to specify exactly
if full generality was to be expected. So in that sense, we do have
some very serious theory going on in here, and the kind of theory that
hasn't been fully fleshed out yet even in the best of DB literature.
--
Sampo

Keith H Duggar

unread,

Oct 24, 2009, 1:53:07 PM10/24/09

to

On Oct 22, 7:34 am, Sampo Syreeni <de...@iki.fi> wrote:
> > Thanks to everyone. I think I understand more clearly now the
> > inherent asymmetry of foreign key constraints. But now then my
> > question is what is the common name for the semantic constraint
> > that "attribute RV1:X and RV2:Y are sematincally joinable"?
>
> Hmm. Usually we talk about constraints when they're syntactic, because
> that's what makes it possible to enforce them automatically, simply by
> manipulating the symbols in the database. So what you actually mean
> is, what syntactic construct is appropriate for enforcing the
> semantics you describe. I'd argue that that's what domains were
> invented for.

I agree that it is possible to enforce such join semantics with
domains but I'm not sure it is entirely appropriate. I say that
because to the greatest extent the RM is orthogonal to domains.
To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals
only with rather simple types: numbers, strings, perhaps dates
and times, and not much else. In this chapter, I've tried to show
that this is indeed a misconception. Rather, relations can have
attributes of /any type whatsoever/ -- the relational model
nowhere predicates what those types must be, and in fact they
can be as complex as we like ... In other words, the question as
to what types are supported is orthogonal to the question of
support for the relational model itself. Or (less precisely but
more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and
effective for a universe with a single domain as it is for a
universe with a rich domain set (such as you propose to handle
my GenericID join constraint example). And placing constraints
on /relational expressions/ seems as fundamental to the RM as
placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are
admitting that the RM itself has no support for constraining
relational expressions and I'm not prepared to admit that
limitation. Especially since it does provide support for
constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms
to operate the other way around ie to allow designers to force
cross-domain operations. Codd with "Domain Check Overrides" and
date with the THE_ operators that provide type coercion.

> > Because simply having the same type is not enough for example
> > both OrderID and CouponID might both have type GenericID and
> > yet we may want to state that it doesn't make sense to join
> > them.
>
> I see at least two ways to view this. The first would be that you just
> made a modelling error -- you assigned the same syntactic type to two
> things that are actually quite different. What you should have done is
> to assign them two different types/domains, which would then stop them
> from being joined.
>
> The second way would be to say that you've now implicitly created a
> union type (Order|Coupon), and its key, GenericID, has to be such that
> OrderID's and CouponID's never overlap. That way you can join willy
> nilly, but the results will be identical to the case where you've
> isolated the ID's into two separate types. Carried out in full this
> approach eventually leads to a generalization hierarchy converging
> towards a fully generic Object, and thus database wide object/unique
> identifiers.

I think there are other views, see above. And I don't think either
of the views is accurate in this case and certainly may not apply
to every case one can imagine. In short, we should not let our lack
of imagination guide our design principles.

> It is well known that this is rather a contentious and dangerous
> design choice. In fact we can already see the first signs of trouble
> in your minimal example: what on earth do Orders and Coupons have to
> do with each other? What precisely are you trying to express by
> creating the union type? In order for that generalization to make
> sense, at the very minimum you should be able to point to some
> hypothetical field that could be shared between the two types, i.e.
> data belonging uniquely to the union type. What would that be,
> precisely?

It's not for us to wonder such whys because our imagination is
usually quite limited. For example, GenericID seems an entirely
reasonable thing to me if I labelled every item I sent out (even
the coupons) with a barcode and have other relations expressing
facts about those barcodes apart from what they label. But
arguing about such specific design semantics I think is irrelevant
to basic questions of what constraint capabilities the relational
model supports.

Anyhow, the question here is not one of our imagination but rather
simply this: if it makes sense for the RM to support constraints
on relational /values/ (taken on by variables) why does it not
make sense to support constraints on relational /expressions/?
That is a question of general principle not specific design.

> If you have the means of keeping such ID's private -- this would
> necessitate making the ID's fully opaque from an outside user's point
> of view, which cannot be done in any DBMS I know of -- you might just
> be able to justify their existence as surrogates. But this is still
> stuff that requires tremendous discipline and insight into the entity
> integrity issues that need to be addressed in addition to the
> referential integrity between the surrogates. I wouldn't recommend
> going down this slippery slope unless you're *absolutely* sure you
> know what you're doing -- I haven't seen a single wider scale
> deployment which got this right and didn't suffer integrity issues in
> the long run.

Frankly I don't understand the point of the above paragraph at
all. Also I don't see how entity integrity enters into this at
all; I'm assuming we follow the sage advice of having no NULLs
to begin with; so entity integrity is non-issue.

> > Likewise attribute names for joinable attributes may be
> > different in different relations (ie needing rename).
>
> Yes. Ideally the attributes wouldn't carry just a name and a physical
> level type (e.g. string, integer) but also a semantic type/domain.
> That way the name would indicate the semantics, or role, of the
> attribute in this given relation, the domain would indicate what it
> can be joined with and where to look for the set of permissible
> values, and the syntacting type associated with the domain would tell
> how to actually store the data.

I don't think I'm talking about "physical level" types (posreps)
at all. And again, I'm not convinced of your conclusion that the
responsibility of constraining /relational/ expressions should be
shoved off to domains.

Consider Date's point that RM is orthogonal to type support.
Then what argument from principle do you have for prohibiting
the RM from defining constraints on relational expressions by
pushing that responsibility off to domain support? How would
we express the constraints in a universe with one domain?

Thanks!

KHD

Bob Badour

unread,

Oct 24, 2009, 3:13:32 PM10/24/09

to

Keith H Duggar wrote:

Domains, however, define what is or is not joinable because natural join
applies the equality comparison to attributes with like names, and
domains define equality comparisons.

Keith H Duggar

unread,

Oct 24, 2009, 5:03:23 PM10/24/09

to

Domains also define a set of permissible values for attributes.
However, that doesn't stop RM from adding additional constraints
to limit those values further. Why should relational operations
be any different in this respect from relational variables? Ie
why shouldn't the RM allow us to further constrain what domains
permit for both values AND operations?

KHD

Bob Badour

unread,

Oct 24, 2009, 7:07:09 PM10/24/09

to

Keith H Duggar wrote:

You asked about semantic joinability. Semantically, the join works if
like-named attributes have an equality comparison defined and not if
they don't.

> Why should relational operations
> be any different in this respect from relational variables? Ie
> why shouldn't the RM allow us to further constrain what domains
> permit for both values AND operations?

Because the RM doesn't have anything much to say about domains. Domains
are a separate--albeit vitally important--concern.

paul c

unread,

Oct 24, 2009, 11:47:17 PM10/24/09

to

Bob Badour wrote:
...

> You asked about semantic joinability. Semantically, the join works if
> like-named attributes have an equality comparison defined and not if
> they don't.

> ...

In RT, how could they not define equality?

Bob Badour

unread,

Oct 24, 2009, 11:56:24 PM10/24/09

to

paul c wrote:

What reason would one have to define an equality comparison to compare
employee ids with department ids or with names? Or charge with current
or potential? Or age with price?

paul c

unread,

Oct 24, 2009, 11:56:09 PM10/24/09

to

None in theory, I thought you were talking about attributes in the same
domain.

Cimode

unread,

Oct 25, 2009, 3:24:01 AM10/25/09

to

Hi paul,

A valid set of questions (pun intended). My conclusion were that Codd
was focused onto defining a side of the relational model that would be
part of information system as opposed as being a part of mathematics
since relational domain analysis is more loosely coupled with set
theory than relational general model. While relational domain analysis
is not mandatory in the context of the general relational model, it is
mandatory in the context of implementation of RM. It even makes
header based definitions somehow moot...IMHO...

Regards...

com...@hotmail.com

unread,

Oct 25, 2009, 4:08:59 AM10/25/09

to

On Oct 24, 8:56 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> What reason would one have to define an equality comparison to compare
> employee ids with department ids or with names? Or charge with current
> or potential? Or age with price?

shoot your age and you are worth your weight in gold.

Bob Badour

unread,

Oct 25, 2009, 10:40:45 AM10/25/09

to

paul c wrote:

Nope. Natural join doesn't care about attributes in the same domain per
se. It compares like named attributes regardless of domain.

TTM explains when and how the most specific types can differ and still
have an equality comparison return true: they must share some common
supertype other than the universal supertype. In this case, the MST of
the join will be the most specific common supertype. If the MSTs of the
original relations are mutually exclusive, the body will be empty. If
not, the join body will consist of values drawn from subtypes of both
MSTs and that appear in both relations.

Otherwise, the body of the join will be empty and the type of the like
named attribute will be the universal supertype. As a type with a set of
values and a set of operations, the universal supertype's set of values
is the union of all values, and its set of operations is empty.

I expect most implementations would treat an attribute with the
universal supertype as an error, or at least complain loudly with a warning.

Bob Badour

unread,

Oct 25, 2009, 10:44:56 AM10/25/09

to

com...@hotmail.com wrote:

My age in what measure? Years? Integral or rational? If integral,
cardinal or ordinal? According to what calendar?

paul c

unread,

Oct 25, 2009, 3:55:34 PM10/25/09

to

I can't make much comment on type theory, which I think is optional, not
essential to RT. Appendix A, the relational underpinning of TTM,
doesn't depend on MST's or inheritance, just set membership except that
set equality crops up for unions of headings, as far as I can tell. It
does say, eg., in the <AND> definition: "It is required that if <A,T1>
'is member of' Hr1 and <A,T2> 'is member of' Hr2, then T1 = T2" (edited
quote). I presume those qualified attributes are the "like named" ones
you mean. While I wouldn't argue when you say "it compares like named
attributes regardless of domain", TTM does require that the "state is
set" beforehand, Appendix A uses the qualification to ensure that the
type for both attributes is the same.

Bob Badour

unread,

Oct 25, 2009, 4:07:25 PM10/25/09

to

paul c wrote:

Which still leaves Keith with the same answer: data types or domains
determine the semantics of joinability. In a system that supports
specialization and generalization, the attributes have to share a common
supertype. In a system that does not support specialization and
generalization, the types must be the same.

Marshall

unread,

Oct 25, 2009, 6:05:31 PM10/25/09

to

On Oct 24, 10:53 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
>
> Anyhow, the question here is not one of our imagination but rather
> simply this: if it makes sense for the RM to support constraints
> on relational /values/ (taken on by variables) why does it not
> make sense to support constraints on relational /expressions/?
> That is a question of general principle not specific design.

This question, it seems to me, is clear and to the point.
And I would answer it by saying that we shouldn't really
even make the distinction! (At least not formally.)

There are quite a lot of constraint taxonomies out there,
and I haven't ever really been able to derive any particular
value of any of them. Most generally, ever constraint
is a constraint on the entire database; if the constraint
mentions only a single variable, then we might say
it is a "table constraint" or "relvar constraint" or some
such, but that's only a degenerate case of a database
constraint. So let us say there is only one type of
constraint, the database constraint, and be done with it.

Now that I've said that, let me take it back a bit.
Transition constraints don't seem to be quite like
anything else. Transition constraints are therefore
annoying. :-)

It's Sunday and I'm hot, so I'm posting with
reckless abandon. Read my sentences at your
own risk.

Marshall

unread,

Oct 25, 2009, 6:09:53 PM10/25/09

to

On Oct 24, 12:13 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
> Domains, however, define what is or is not joinable because natural join
> applies the equality comparison to attributes with like names, and
> domains define equality comparisons.

That may be the most practical perspective, but from a theoretical
perspective, I find myself lately thinking that every value of every
domain is theoretically able to be compared for equality with
every value of every domain. If we are considering comparing
an integer and a string, we can say with confidence ahead of
time that the comparison will return that they are unequal.

I also am suspicious of the idea of allowing a domain designer
to define equality. From a theoretical standpoint, the system
ought to know enough to figure it out, and from a practical
standpoint, I've seen enough class designers screw it up
with subtle and annoying consequences. Damn them!

Marshall

(Still hot; still grumpy.)

Sampo Syreeni

unread,

Oct 25, 2009, 6:47:08 PM10/25/09

to

Sampo Syreeni

unread,

Oct 25, 2009, 6:54:16 PM10/25/09

to

On Oct 24, 7:53 pm, Keith H Duggar <dug...@alum.mit.edu> wrote:

(Sorry for the resend, Google Groups seems to be taunting me...)

> I agree that it is possible to enforce such join semantics with
> domains but I'm not sure it is entirely appropriate.

Uhm, being the practical kind of guy, I wonder where you're going with
this. I mean, you too agree that domains/types do get the job done.
Was there something else you wanted in addition to that?

> To put this another way, the RM should be just as complete and
> effective for a universe with a single domain as it is for a
> universe with a rich domain set (such as you propose to handle
> my GenericID join constraint example).

And it is. There is no reason why the relational model wouldn't work
just fine with only a single type, such as an integer. (Encoding then,
say, textual data, would become somewhat convoluted, but it could be
done.) So the type system is indeed completely orthogonal to the
model.

However, that only means that when we model data, the relational model
by itself isn't sufficient to fully handle all of our conceptual
modelling needs. It's a very general theory, and as such it can
benefit from the addition of more specific constraints and semantics,
for example in the form of a rich type system.

From this point of view I would then say that your original question
about joinability is something that sits squarely within the type
system, and thus is itself fully orthogonal to the relational model.
Trying to somehow make the skeletal relational model responsible for
enforcing such constraints seems much akin to asking why the hammer
ain't working too well with the cream, right after casting aside the
whisk.

> By pushing such expression constraints off to domains, we are
> admitting that the RM itself has no support for constraining
> relational expressions and I'm not prepared to admit that
> limitation. Especially since it does provide support for
> constraining the values of relational variables.

To reiterate, it seems to me that you're now trying to have your cake
and eat it too. First you say that domains, a particular
implementation of a type system, are outside of the relational model.
Then you go on to claim that the relational model has a problem
because in this amputated form it cannot handle a problem that is
caused by the lack of a type system.

The way I see it, both Date and Codd advocate a view of the relational
model which already includes a rich type system, and actively utilizes
it to express e.g. join semantics. Domains, they're Codd's particular
idea of how to handle typing in a form that actually reflects much of
the relevant metadata about the types back into a shared encoding that
is accessible under relational semantics. Granted, there will be other
ways of handling types, but Codd's idea doesn't seem all that bad to
me, and is also amenable to straightforward extension should we find
it somehow lacking.

> I will also note that both Codd and Date proposed RM mechanisms
> to operate the other way around ie to allow designers to force
> cross-domain operations. Codd with "Domain Check Overrides" and
> date with the THE_ operators that provide type coercion.

True, but I would be inclined to believe that in practice those
mechanisms would only be used to circumvent modelling errors, before
the conceptual model was changed to explicitly allow the join. Most
commonly that would probably include the addition of a common
supertype, which Bob Badour already mentioned as a necessary
prerequisite for joinability. Under the domain model, this fact would
also be explicitly recorded in the data dictionary.

> I think there are other views, see above.

As I said, I could think of "at least two" ways to view it. However, I
don't think your quotation from Date falls outside of either of them.

> In short, we should not let our lack of imagination guide our design principles.

There we agree fully. My point is just that a) yes, joinability is
something that probably should be explicitly constrained in the
database, b) domains at least get the job done, c) unlike abstract
typing, domains also reflect type metadata back into a formal, shared
and accessible form within the database which is a clear plus, so that
d) in the absence of a viable alternative, they or something closely
related to them is what we should use until we think of something even
better.

> It's not for us to wonder such whys because our imagination is
> usually quite limited. For example, GenericID seems an entirely
> reasonable thing to me if I labelled every item I sent out (even
> the coupons) with a barcode and have other relations expressing
> facts about those barcodes apart from what they label.

In that case, it's the systematic barcoding of certain real world
items that makes the union type sensible. The union type could then be
given the name of BarcodedItem, and could perhaps have its own
dependent data, like type of barcode, printing method, and so on.

My point was simply that you have to have a valid, real life/business
reason, or semantic justification for the union type/generalization.
Since you just gave one, I'm happy with that. At the same time you
should then note that under these semantics, you cannot add to the
union type willy nilly. For example including people in it is out of
the picture, unless you actually insist on branding your customers
with a barcode. ;)

> Anyhow, the question here is not one of our imagination but rather
> simply this: if it makes sense for the RM to support constraints
> on relational /values/ (taken on by variables) why does it not
> make sense to support constraints on relational /expressions/?

I don't really see what the connection with this and the above is, but
again we're in full agreement: indeed support for constraints on
relational expressions should be available. And it in fact would be if
RDBMS vendors just implemented generic support of SQL assertions.

> > If you have the means of keeping such ID's private -- this would
> > necessitate making the ID's fully opaque from an outside user's point
> > of view, which cannot be done in any DBMS I know of -- you might just
> > be able to justify their existence as surrogates. But this is still
> > stuff that requires tremendous discipline and insight into the entity
> > integrity issues that need to be addressed in addition to the
> > referential integrity between the surrogates. I wouldn't recommend
> > going down this slippery slope unless you're *absolutely* sure you
> > know what you're doing -- I haven't seen a single wider scale
> > deployment which got this right and didn't suffer integrity issues in
> > the long run.
>
> Frankly I don't understand the point of the above paragraph at
> all.

The point is that when you start creating union types, and so the keys
used to them willy nilly, you tend to rapidly get to the point where
the resulting space of identifiers has no semantic relevance. What you
then have is a space of invented, autogenerated, and so on,
identifiers. Those cannot be correlated against the real world
entities they represent in any way except by reference to the data
dependent on them. What you then have is surrogate keys, and if you
take a look at RM/T and RM/V2, Codd makes an eloquent case against
revealing them to the users. Because making them truly opaque is then
not supported by current RDBMSes, I would hesitate to use them at all.

> Also I don't see how entity integrity enters into this at
> all; I'm assuming we follow the sage advice of having no NULLs
> to begin with; so entity integrity is non-issue.

Heh, I did say going that way requires tremendous discipline and
insight...

At its most general, entity integrity has nothing to do with nulls.
Rather it has to do with whether you can reliably correlate the data
in your database with real life facts. It is essentially a semantic
construct which tries to ensure that it is possible to unambiguously
update the contents of your database against the changing reality
we're trying to capture/encode.

This is not a process that can be done strictly at the logical,
relational level. All we can do there is to to put up syntactic
constraints to help in the process along. The main, commonest,
simplest, and especially easiest to implement example is the key. It
tries to capture the semantics of real life things/entities a) always
having a certain set of characteristics we can use to correlate them
with the data we have on them (i.e. the not null syntactic
constraint), and b) being separable from each other once we're privy
to that set of characteristics (i.e. the uniqueness constraint).

That means that entity integrity is only certain to be maintained when
a) the set of facts we actually have on real life objects/concepts is
sufficient to uniquely identify their representation in the database,
b) in full generality, that does *not* mean that all of the data on
the primary key is available, only that all of the data required to
identify any candidate key is present, and that in case c) does not
mean that the identification has to take a form that only involves one
table; the data might be distributed all over your database, and in
general will be if you're using something other than full, natural
keys to link your data together.

Maybe a recycled example could make this clearer... Suppose you have
persons and the cars that they own. You're keying them both using
surrogates, which obviously have no real world significance because
you cannot take a look at a person or a car and immediately read off
the corresponding surrogate. (Tattooing the number on each person or
carving it into the manufacturer's stamp in a car would make the key
natural, and not a surrogate.) You can clearly have a not null and
unique primary key for each of the entities, but that no longer
guarantees entity integrity: there might be more than one person
called John Smith, and more than two cars with the make of Audi. In
order to maintain entity integrity, the syntactic constraint that the
surrogates be not null and unique is not sufficient; when you try to
update your database to reflect reality, you don't necessarily know
which John Smith is which, and certainly you can't tell one Audi apart
from the other. In particular, you don't have any reliable means of
stopping somebody from adding yet another John Smith or Audi that are
already described in your database; suddenly duplicates and the
resulting update anomalies will abound.

So the first lesson is that surrogates don't really work towards
entity integrity. Only keys referring to real life, measurable/
observable things, that is natural keys, do that.

Then, secondly, there is the more complicated case of distributed
information over multiple tables. Suppose that mess of surrogates you
have tells you that there are two John Smith's, which cannot be easily
told apart from each other. But then you notice that doing a join
against the table of vehicles owned tells you that the two Johns own
mutually disjoint sets of vehicles, which you *can* tell apart from
each other by their licence plate numbers (i.e. a natural key).
Suddenly you can tell those people apart at some level of certainty --
and counterwise if the sets overlapped, you could be perfectly certain
that entity integrity has been violated, the two Johns are actually
the same person, their records need to be unified/deduplicated, and
the update has to be cascaded into the rest of the database to reflect
the new knowledge.

Here the lesson is that keys and the related single table reasoning
aren't all there is to entity integrity, and that surrogates usually
make the situation so complicated that establishing entity integrity
even at the syntactic level can easily force you to scan through your
entire database. Which of course nobody ever does when using
surrogates, which then almost surely leads to integrity issues in the
long run.

Formally speaking, then, entity integrity is an inseparable part of
integrity as a whole, and it involves not just syntax but semantics as
well; especially model theory. Since a database essentially contains a
logical theory, the first part is about internal consistency; the
theory has to be consistent in order to have a real life model in the
first place. The second, more difficult part is that the theory also
has to be refined enough to exclude real life models which are
inappropriate, such as the one where we have one John Smith in the
database and a million of them in the wild, with no means of telling
which one is which at the logical level. And finally, third, the
hardest part is to strike the proper engineering balance between
maintainable detail, availability of real life identifying data and
the processes of efficiently correlating reality with the symbolic
universe represented by the database.

More often than not, then, surrogates and the improper generalizations
they imply hinder this process. Most often by creating a false sense
of complacency which obscures the real, underlying, logical issues
which need to be addressed. Then the result is cumulative damage to
overall integrity via simple laziness and neglect.

> I don't think I'm talking about "physical level" types (posreps)
> at all.

You seem to be, because your conflating the two types of identifiers,
simply because they're, well, "identifiers". That don't mean that they
semantically are of the same type, eventhough they might look like
that when they carry the same physical type (posrep).

> Consider Date's point that RM is orthogonal to type support.
> Then what argument from principle do you have for prohibiting
> the RM from defining constraints on relational expressions by
> pushing that responsibility off to domain support?

None. Thus I think we're actually in what they call "vigorous
agreement".

> How would we express the constraints in a universe with one domain?

In a highly convoluted fashion. Such a convoluted fashion that I'm not
inclined to give an example. Nor would I probably be inclined to go
through such an exercise when designing a database.

That, to me, suggests that types and domains simply make life easier,
and should be actively utilized when modelling data.
--
Sampo

Bob Badour

unread,

Oct 25, 2009, 7:15:22 PM10/25/09

to

Marshall wrote:

> On Oct 24, 12:13 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
>>Domains, however, define what is or is not joinable because natural join
>>applies the equality comparison to attributes with like names, and
>>domains define equality comparisons.
>
> That may be the most practical perspective, but from a theoretical
> perspective, I find myself lately thinking that every value of every
> domain is theoretically able to be compared for equality with
> every value of every domain. If we are considering comparing
> an integer and a string, we can say with confidence ahead of
> time that the comparison will return that they are unequal.

For the comparison, that's fine. But what is the type of the resulting
attribute? We are discussing natural join.

Suppose we have 2 relations:

state: {
(id in char(2),name in string) |
('AK','Alaska'), ('NY', 'New York) ...
}

inventory_bin: {
(id in integer, contents in string) |
( 1, 'Soup' ), ( 2, 'Nuts' ) ...
}

When we evaluate state join inventory_bin, we know the body is empty,
but what is the type of the id attribute?

state join inventory_bin: {
(id in ??,name in string,contents in string) |
}

> I also am suspicious of the idea of allowing a domain designer
> to define equality. From a theoretical standpoint, the system
> ought to know enough to figure it out, and from a practical
> standpoint, I've seen enough class designers screw it up
> with subtle and annoying consequences. Damn them!

I suppose in TTM, equality is implicit based on type and any arbitrary
possrep.

> Marshall
>
> (Still hot; still grumpy.)

Yeah, it was hot here today to. It got all the way up into the 50's F.
But to be fair, there was a nice gusty wind to take the edge off the
heat. ;)

paul c

unread,

Oct 25, 2009, 8:05:47 PM10/25/09

to

> ...

Not that I'm advocating it but I imagine there would be nothing
illogical if the model (just another word for 'interpretation') let the
heading be (id in char(2), id in integer, name ... etc.) even if that's
contrary to TTM, not to mention sql convention. Surely the problem has
less to do the technical heading than with preferring a language to
reference only by name?

Bob Badour

unread,

Oct 25, 2009, 9:22:14 PM10/25/09

to

paul c wrote:

It would be very illogical. Names are very important things. What does
id mean if you pretend it means two mutually exclusive things?

paul c

unread,

Oct 25, 2009, 9:43:15 PM10/25/09

to

Bob Badour wrote:
> paul c wrote:

...
>> Not that I'm advocating it but I imagine there would be nothing
>> illogical if the model (just another word for 'interpretation') let
>> the heading be (id in char(2), id in integer, name ... etc.) even if
>> that's contrary to TTM, not to mention sql convention.
>
> It would be very illogical. Names are very important things. What does
> id mean if you pretend it means two mutually exclusive things?

Obviously you wouldn't, you would have to name the type as well as the
attribute. Personally, another alternative wouldn't bother me, eg.
excluding both both attributes from the heading union.

Bob Badour

unread,

Oct 25, 2009, 10:07:29 PM10/25/09

to

paul c wrote:

A sometimes disappearing attribute that is the very basis for
comparison? That makes no sense to me whatsoever.

Marshall

unread,

Oct 26, 2009, 1:17:02 AM10/26/09

to

Well. If the system is untyped, the question doesn't even
come up. (Not that I would ever use such a system, but
I mention it for completeness.)

In a typed system, the type is whatever the intersection
of int and string is. Since we know, as mentioned, that
there are no values that are both ints and strings, we
certainly know that whatever the type is, it will be
uninhabited. (That is, it has no values.) So either "the
intersection of int and string" or else simply the empty
type, aka "bottom" or "void" or what have you.

Now that I've said all that, I agree that there are
good arguments for saying (how did you put it?)
you get an error or a warning or something like that
instead.

> > I also am suspicious of the idea of allowing a domain designer
> > to define equality. From a theoretical standpoint, the system
> > ought to know enough to figure it out, and from a practical
> > standpoint, I've seen enough class designers screw it up
> > with subtle and annoying consequences. Damn them!
>
> I suppose in TTM, equality is implicit based on type and any arbitrary
> possrep.

It makes sense to me.

> > Marshall
>
> > (Still hot; still grumpy.)
>
> Yeah, it was hot here today to. It got all the way up into the 50's F.
> But to be fair, there was a nice gusty wind to take the edge off the
> heat. ;)

On the weekends I often hang out with my oldest daughter at
the local Starbucks; I sip coffee and she people-watches. I
sit and she prefers to stand. If we go in the afternoon that
means I'm in the shade and she's not. Today we went at
twelve thirty, and she got sunburned.

Marshall

Bob Badour

unread,

Oct 26, 2009, 1:52:25 AM10/26/09

to

Marshall wrote:

Union not intersection. It has to be a type compatible with both integer
and string.

> Since we know, as mentioned, that
> there are no values that are both ints and strings, we
> certainly know that whatever the type is, it will be
> uninhabited.

It's not the type that is uninhabited only the relation body. The
relation body is empty not because the type is empty but because no
integer values in the type happen to equal any string values in the type.

> (That is, it has no values.) So either "the
> intersection of int and string" or else simply the empty
> type, aka "bottom" or "void" or what have you.

It has to be "top" not "bottom".

> Now that I've said all that, I agree that there are
> good arguments for saying (how did you put it?)
> you get an error or a warning or something like that
> instead.
>
>
>>>I also am suspicious of the idea of allowing a domain designer
>>>to define equality. From a theoretical standpoint, the system
>>>ought to know enough to figure it out, and from a practical
>>>standpoint, I've seen enough class designers screw it up
>>>with subtle and annoying consequences. Damn them!
>>
>>I suppose in TTM, equality is implicit based on type and any arbitrary
>>possrep.
>
> It makes sense to me.
>
>
>>>Marshall
>>
>>>(Still hot; still grumpy.)
>>
>>Yeah, it was hot here today to. It got all the way up into the 50's F.
>>But to be fair, there was a nice gusty wind to take the edge off the
>>heat. ;)
>
> On the weekends I often hang out with my oldest daughter at
> the local Starbucks; I sip coffee and she people-watches. I
> sit and she prefers to stand. If we go in the afternoon that
> means I'm in the shade and she's not. Today we went at
> twelve thirty, and she got sunburned.

Ouch!

com...@hotmail.com

unread,

Oct 26, 2009, 2:41:42 AM10/26/09

to

I was just (humorously seriously) giving examples of such peculiar
comparisons that I'd say have reasons. Otherwise I agree with you.
Certainly the corresponding expressions depend on
the variable and function predicates.

philip

Mr. Scott

unread,

Oct 26, 2009, 3:06:29 AM10/26/09

to

"Marshall" <marshal...@gmail.com> wrote in message
news:386975f9-472c-4184...@r24g2000prf.googlegroups.com...

> On Oct 24, 10:53 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
>>
>> Anyhow, the question here is not one of our imagination but rather
>> simply this: if it makes sense for the RM to support constraints
>> on relational /values/ (taken on by variables) why does it not
>> make sense to support constraints on relational /expressions/?
>> That is a question of general principle not specific design.
>
> This question, it seems to me, is clear and to the point.
> And I would answer it by saying that we shouldn't really
> even make the distinction! (At least not formally.)

I think we should make the distinction, and formally.

(p /\ q) -> r is not the same as (p -> r) /\ (q -> r)
but (p \/ q) -> r is the same as (p -> r) \/ (q -> r)

A view consisting of a natural join, for example, represents a set of
conjunctions. Each row of the join represents a conjunction of
propositions, one for each operand. A constraint defined on a join would be
of the form (p /\ q) -> r. That is definitely not the same as constraints
defined on one or more tables, which would take the form (p \/ q) -> r.

paul c

unread,

Oct 26, 2009, 4:43:55 AM10/26/09

to

Bob Badour wrote:
...

> A sometimes disappearing attribute that is the very basis for
> comparison? That makes no sense to me whatsoever.

Loopholes usually annoy the systematic mind. Only mystics can make the
incomparable such as these attributes disappear but they can be ignored
with enough willpower. From the day it was built, my father's Ford
FANCON (no, I didn't make the spelling error) had many loopholes but
during the times it was operational he preferred to drive rather than
walk.

Some systems will cop out with an error but without being able to inform
exactly what the error really is. That's because loopholes create their
own logic.

paul c

unread,

Oct 26, 2009, 5:00:14 AM10/26/09

to

Mr. Scott wrote:
> "Marshall" <marshal...@gmail.com> wrote in message
> news:386975f9-472c-4184...@r24g2000prf.googlegroups.com...
>> On Oct 24, 10:53 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
>>> Anyhow, the question here is not one of our imagination but rather
>>> simply this: if it makes sense for the RM to support constraints
>>> on relational /values/ (taken on by variables) why does it not
>>> make sense to support constraints on relational /expressions/?
>>> That is a question of general principle not specific design.
>> This question, it seems to me, is clear and to the point.
>> And I would answer it by saying that we shouldn't really
>> even make the distinction! (At least not formally.)
>
> I think we should make the distinction, and formally.
>
> (p /\ q) -> r is not the same as (p -> r) /\ (q -> r)
> but (p \/ q) -> r is the same as (p -> r) \/ (q -> r)
>
> A view consisting of a natural join, for example, represents a set of
> conjunctions. Each row of the join represents a conjunction of
> propositions, one for each operand. A constraint defined on a join would be
> of the form (p /\ q) -> r. That is definitely not the same as constraints
> defined on one or more tables, which would take the form (p \/ q) -> r.
>

...

I guess the attitude, interpretation if you like, that relational ops
implement logic leads to that but another attitude is that they merely
apply logic to obtain relations that consist of simple propositions. I
believe most people happily accept the latter interpretation when
looking at a relation value that has been obtained by a language devices
such as insert or assignment where the definition is based on union.
The 'OR' disappears. I think there is a big difference between the
implementation and the application of logic. Another question is what
happens to the join's conjunction when we project, does it survive or
not depending on which attributes we choose?

paul c

unread,

Oct 26, 2009, 5:25:10 AM10/26/09

to

Cimode, I tend to look at it the same way, at least the part about the
information system motive, not sure about headers. If I recall it
wasn't until his second paper that he introduced the pair of attribute
and domain name, he was adjusting his theory to meet practice.

com...@hotmail.com

unread,

Oct 26, 2009, 5:51:26 AM10/26/09

to

Keith H Duggar <dug...@alum.mit.edu> wrote:

> what is the common name for the semantic constraint
>that "attribute RV1:X and RV2:Y are sematincally

Concerns about "semantic joinability" are confused.

Relation operators, attributes and expressions
correspond to logic operators, (bindable) variables
and expressions. So predicates of relation variables
plus predicates corresponding to functions determine
the semantics of relation expressions. (Note that you
need an EQUALS function with two appropriately
typed parameters whenever a relation operator
evaluation uses an attribute's values to remove
duplicate tuples or to compare tuples from multiple
arguments.)

The predicates entirely determine the possible
queries and their meanings. If you don't make
mistakes in understanding predicates then you will
never want to write a query that can't be expressed
in terms of the predicates and answered in terms of
relations.

The predicates entirely determine valid database
states. Attribute types and constraint expressions are
logically redundant. What they do is constrain
updates to valid database states. If you don't make
mistakes in observing the modeled world and
evaluating the predicates in it then you will never
want to make an update that violates a constraint.

Your problem is you don't know that you should be
given the predicates. Your difficulties arise from this.

> why shouldn't the RM allow us to further constrain what domains

> permit for both values AND [relational] operations?

I hope it is now clear that "semantic" limiting of
relation expressions is confused.

There is actually a limit on logic expressions, namely
those that correspond to relation expressions. You can
only write those that involve AND, AND NOT, OR of
expressions with the same free variables, EXISTS,
RENAME and functions (which must include implicitly
needed EQUALSs). Otherwise evaluation cost is a
function of type cardinalities instead of just the
number of tuples in the database.

philip

vldm10

unread,

Oct 26, 2009, 6:20:52 AM10/26/09

to

On Oct 21, 11:36 pm, Sampo Syreeni <de...@iki.fi> wrote:
> > (Since I first saw Codd's 1970 paper I've never been able to understand
> > why on the one hand he talked about "symmetrical" access but on the
> > other he used the very restrictive term "foreign key".) For me, a
> > "reference" is a reference even if most dbms' have similar restrictions
> > to Codd's term.
>
> I think the later talk about domains and e-relations in RM/T (and to a
> lesser extent in RM/V2) makes that a bit easier to understand. As I
> interpret it, Codd started with a very clean and purely syntactic
> model in his seminal article, which will definitely lead to pure
> symmetry in access. But from the very start he also worried deeply
> about the semantics and making them explicit as metadata, which leads
> to the second influential discovery of his: the integral, reflective
> data dictionary. Later on his development of the relational model not
> only added surrounding functionality (which weighs in so heavily in RM/
> V2, what with all of'em T-joins and the lot) but semantics as well.
>
> I think what was driving Codd was the need to make his model fully
> self-contained. If you really think about it, that tends to exclude
> any and all free-form input by the user. I mean, let's say you let an
> end user input a free form name. It's pretty much guaranteed that lots
> of nonsense is going to be input then. After that the name field
> doesn't really constitute a well-defined domain at all; to a
> mathematician like Codd its proper type should probably be more or
> less 'general-ape-shit;no-real-type;use-at-own-risk'. I.e. the field
> could be straightforwardly likened to a blob of text as we call the
> more sustained kind of drivel today; certainly not something one would
> make into a domain, and not something which would fit Codd's neat idea
> of formal, repetitive business data.
>
> So, what he did was to go the way we'd now call Master Data
> Management. He went on to separate all of the data we'd intuitively
> call entities, and more formally would define as a) relatively stable
> data b) centrally and well managed as a fully controlled vocabulary,
> so that c) applying the closed world assumption to it is actually well
> founded because there is a formal process to keep the real world and
> the minimal model of the database trying to approximate it in close
> correspondence, d) attaching to things that have real, societally
> established permanence, uniqueness and omnipresence which can be
> likened to "a candidate key on an established class of real world
> entities which is at most slowly varying", and most of all e)
> discrete, enumerated data that is mostly used to classify/qualify
> other data, the latter usually being composed of abstract measurements
> on a continuous scale (originally account balances, given that we're
> after all talking about developments which happened within the
> International *Business* Machines Corporation). What we'd nowadays
> call "complex", "semistructured", "unstructured", "unconventional" or
> even "nasty" data was explicitly excluded by Codd from the very
> beginning.
>
> That then lead to a sort of revival of the earlier intuitive idea of
> entities as opposed to relationships. Only this time there was a clear
> distinction in the criteria: relations were always relations, and they
> represented a theory of the outlying reality that was perfectly
> symmetric. Nevertheless, the connection to the real world was also
> taken into account in that some relations had additional semantics
> bolted on in addition to just the logical constraints. They referred
> to real world entities with the kinds of properties that would make it
> especially easy to keep the correspondence between the minimal model
> of the DB and the real world the DB was trying to represent in synch.
>
> The difference to today's MDM and OLAP-dimension frenzy was then that
> the logical level design criteria were fully unaffected by such
> semantic, real world concerns. They remained completely syntactic, as
> they should have. The only thing added was the idea that some
> relations which happened to have a certain commonly reoccurring
> structure, and in addition were meant to semantically correspond to
> real life in what was also a commonplace correspondence, could have
> those facts formally notarized in the data dictionary, using a well-
> known syntax.
>
> That, and only that, was what made a relation an e-relation. You
> neither could nor had to reference the e-relation, but still its
> semantics necessarily made it so that a) it made no sense to refer to
> any real life modelable thing that wasn't already modeled by the e-
> relation, b) thus semantically speaking we had an inclusion dependency
> towards the e-relation already, c) of course all such semantic
> constraints should be enforced if possible, d) a foreign key
> constraint is pretty much The Mechanism to do that in the databases of
> the past couple of decades, and finally e) foreign keys seem an awful
> lot like unidirectional links, eventhough they could and should be a
> lot more besides.
>
> So, in addition to the semantics, we also have the direct implication
> that an e-relation fully defines a domain (though not all domains need
> to be defined via such enumeration). Then it's quite clear that all
> all relations referring in any way to the kind of real life object the
> e-relation represents, should be limited to utilizing the
> corresponding domain/type determined by it as the type of the
> attribute. (In SQL databases that means declaring foreign keys
> referring to primary keys. In my ideal database it would amount to
> declaring domains, and in addition to those also full inclusion
> dependencies which could span any combination of attributes on both
> sides, perhaps even using views in the middle to cater for the more
> complex cases; I take constraints rather seriously.)
>
> At the logical level all that was done was to declare some
> dependencies, syntactic constraints, and telling a little bit about
> how to read/update the data to a live person, using a controlled
> vocabulary to do so (e.g. "social security number is the primary key
> and is supposed to uniquely identify persons; thus if you have a
> duplicate number, it's semantically fucked, leaving you only the
> option of gathering even more data, until you can again make the
> minimal model of the database correspond to reality").
>
> But of course the all-round inclusion dependency also closely mimics
> earlier referential semantics in its effects, and the usual way we
> tend to use data typically reinforces the notion. You rarely see the
> symmetry ("there is no link, there are just equal values"). Still, I
> can tell you that when you do ad hoc OLAP queries long enough, you'll
> eventually find yourself starting the query "against the tide". That
> is then when you finally get it -- you simply couldn't have done it in
> the presence of a directional link without "going through the whole
> fucking thing using a for-loop" -- and off you are, into the
> relational brotherhood. :)
>
> > It seems arbitrary and unnecessarily doctrinaire to
> > restrict references to only primary keys and unique indexes because I
> > think if one wants a particular constraint, one should not have specify
> > two constraints, eg., a 'reference' as well as a key or index. This
> > doesn't prevent a language from offering shortcuts that are understood
> > to involve both.)
>
> That is arbitrary indeed, and one of the few things where genuine
> progress has been made in the theory of dependency and normalization
> on the issues Codd originated. I also think that it is one of the very
> few points where Codd as well fell victim to the earlier record
> centric thought -- "referential integrity" often implies the mentality
> that we're "linking" from some dependent data to stuff that "is
> defined and primarily lives at another place". That is precisely the
> record and index centric way of doing things, whereas the central
> relational innovations were to a) place such references squarely at
> the logical level, which makes them fully symmetric (i.e. in the
> current case we might as well say that rvOrders refers to
> rvRedemptions as we could say the opposite; it's not about directional
> linking, but about referring to the same logical entity in two
> different places at once, and just defining the propositional
> semantics correctly), b) focus attention on the formal properties,
> modelling and constraints (e.g. inclusion dependencies) within the
> data instead of playing by real-world intuition alone, and c)
> abolishing even the idea that base entities and relationships should
> somehow be separated (i.e. what matters is only the formally
> verifiable, syntactic dependencies that will necessarily result from
> faithfully modeling the data, which then reveals that relationships
> and base entities really behave much the same way).
> --
> Sampo

E-relations in RM/T have been introduced in order for the properties
of an entity to be represented as binary relations. As Codd wrote,
“split into as many binary relations as there are properties to be
recorded.” In these binary relations, surrogates have a key role. E.
Codd writes in RM/T, “Each surrogate appearing in this e-attribute
uniquely identifies the entity being described.”
E. Codd, however, does not show or solve the only important thing in
this paper : how to decompose any relation into binary relations. In
fact, in the RM/T he only expressed the desire for any relation to be
represented through binary relations.

E. Codd is undoubtedly among the most significant people in the
history of computer science and his work is of great and lasting
value. However, the e-relations and surrogates introduced in his paper
RM/T don’t have theoretical significance.

Vladimir Odrljin

Mr. Scott

unread,

Oct 26, 2009, 9:15:43 AM10/26/09

to

"paul c" <toledob...@oohay.ac> wrote in message
news:yydFm.50294$PH1.7159@edtnps82...

I'm not really sure what you mean by "The 'OR' disappears." An insert into
a union view is inherently ambiguous.

The propositions represented in the rows of a projection imply the
propositions represented in the operand of the projection. That's why when
you insert through a projection, the columns on the operand that are not
represented must either allow nulls or have a default constraint defined.

Bob Badour

unread,

Oct 26, 2009, 9:26:42 AM10/26/09

to

com...@hotmail.com wrote:

> On Oct 25, 7:44 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
>>com...@hotmail.com wrote:
>>
>>>On Oct 24, 8:56 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>>
>>>>What reason would one have to define an equality comparison to compare
>>>>employee ids with department ids or with names? Or charge with current
>>>>or potential? Or age with price?
>>
>>>shoot your age and you are worth your weight in gold.
>>
>>My age in what measure? Years? Integral or rational? If integral,
>>cardinal or ordinal? According to what calendar?
>
> I was just (humorously seriously) giving examples of such peculiar
> comparisons that I'd say have reasons.

But surely you understand that's what type casts are for. Right?

paul c

unread,

Oct 26, 2009, 11:16:38 AM10/26/09

to

Mr. Scott wrote:
...

> I'm not really sure what you mean by "The 'OR' disappears." An insert into
> a union view is inherently ambiguous.

> ...

I wasn't thinking of views. The person who wishes to assert all the
propositions in a base relation AND some other proposition uses some
language verb that is defined on UNION. Eg., if A is base, INSERT I
INTO A 'loses' the 'OR'. He has used a disjunction to form a conjunction.

(By the way, I thought the regulars here knew better than to get me
going on view updates! But since the door has opened if briefly, I'll
comment that I've never understood why insert to base is treated as an
instruction that produces a conjunction but the definition of a union
view is not treated as such an instruction. I gather this is generally
considered either heretical or cranky, possibly because it might play
hob with de Morgan etc.)

Tegiri Nenashi

unread,

Oct 26, 2009, 12:27:58 PM10/26/09

to

On Oct 25, 11:06 pm, "Mr. Scott" <do_not_re...@noone.com> wrote:
> I think we should make the distinction, and formally.
>
> (p /\ q) -> r is not the same as (p -> r) /\ (q -> r)
> but (p \/ q) -> r is the same as (p -> r) \/ (q -> r)

I don't follow. If these are BA expressions with the "->" as material
implication, then

(p v q) -> r = ~(p v q) v r = (~p ^ ~q) v r
(p -> r) v (q -> r) = (~p v r) v (~q v r) = (~p v ~q) v r

If the "->" is interpreted as deduction symbol (that is partial
boolean lattice order), then

(p v q) < r is not the same as (p < r) or (q < r)

Mr. Scott

unread,

Oct 26, 2009, 3:01:43 PM10/26/09

to

"Tegiri Nenashi" <tegiri...@gmail.com> wrote in message
news:84d21c7e-c6df-48a3...@12g2000pri.googlegroups.com...

You're right.

(p /\ q) -> r is not the same as (p -> r) /\ (q -> r)

but (p \/ q) -> r is.

Tegiri Nenashi

unread,

Oct 26, 2009, 3:51:46 PM10/26/09

to

On Oct 26, 11:01 am, "Mr. Scott" <do_not_re...@noone.com> wrote:
> "Tegiri Nenashi" <tegirinena...@gmail.com> wrote in message

(p ^ q) -> r = (p -> r) v (q -> r)

and, dually

(p v q) -> r = (p -> r) ^ (q -> r)

(Everything is perfectly symmetric in boolean algebra).

Now, before getting into constraint classification, what language do
you suggest to express them in? Is it RC, RA, or something else?

Message has been deleted

com...@hotmail.com

unread,

Oct 27, 2009, 12:33:31 AM10/27/09

to

On Oct 26, 6:26 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> But surely you understand that's what type casts are for. Right?

(boolean)1!

philip

Marshall

unread,

Oct 27, 2009, 12:52:22 AM10/27/09

to

I don't see how that would work. An integer isn't type compatible
with a string. Is it?

The other points you made seem to be variations on the
same question.

> > On the weekends I often hang out with my oldest daughter at
> > the local Starbucks; I sip coffee and she people-watches. I
> > sit and she prefers to stand. If we go in the afternoon that
> > means I'm in the shade and she's not. Today we went at
> > twelve thirty, and she got sunburned.
>
> Ouch!

Fortunately, I'm not hot any more. Now I'm thinking that
I haven't had any alcohol in a week or more. Maybe
a drink is in order? Plus, I think there is a cupcake in
a cupboard somewhere downstairs. I must away!

Marshall

Bob Badour

unread,

Oct 27, 2009, 1:29:35 AM10/27/09

to

Marshall wrote:

It doesn't have to be compatible with a string. It only has to be
compatible with TOP or the universal supertype. String doesn't have to
be compatible with integer either. It only has to be compatible with TOP
or the universal supertype.

> The other points you made seem to be variations on the
> same question.

Indeed. I think it is an important point to understand because it is so
easy to get wrong. The type of the joined attribute has to be TOP which
has the union of all values and an empty set of operations (or actually
a single "equals" comparison operation) not an empty set of values and
the union of all operations.

>>>On the weekends I often hang out with my oldest daughter at
>>>the local Starbucks; I sip coffee and she people-watches. I
>>>sit and she prefers to stand. If we go in the afternoon that
>>>means I'm in the shade and she's not. Today we went at
>>>twelve thirty, and she got sunburned.
>>
>>Ouch!
>
> Fortunately, I'm not hot any more. Now I'm thinking that
> I haven't had any alcohol in a week or more. Maybe
> a drink is in order? Plus, I think there is a cupcake in
> a cupboard somewhere downstairs. I must away!
>
> Marshall

Having bathed each of my 3 dogs several times, I now have to wash the
skunk smell out of the vestibule before it pollutes the house any more
than it already has, when I would much rather just go to bed.

Cimode

unread,

Oct 27, 2009, 8:34:04 AM10/27/09

to

A reasonnable assumption. Since I went back to domain analysis for
designing a storage physical representation for a relational
implementation, that conclusion quickly imposed itself on me. Domain
analysis is for instance more relevant than relational general theory
for physically representing set disjointedness ...

Marshall

unread,

Oct 27, 2009, 10:22:44 AM10/27/09

to

On Oct 26, 10:29 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> Marshall wrote:
>
> >>>In a typed system, the type is whatever the intersection
> >>>of int and string is.
>
> >>Union not intersection. It has to be a type compatible with both integer
> >>and string.
>
> > I don't see how that would work. An integer isn't type compatible
> > with a string. Is it?
>
> It doesn't have to be compatible with a string. It only has to be
> compatible with TOP or the universal supertype. String doesn't have to
> be compatible with integer either. It only has to be compatible with TOP
> or the universal supertype.

OK, I didn't get much sleep last night so I'm not very sharp, but
I'll just regurgitate the argument I thought of before I went to
bed:

Consider relations A and B each with a single, common attribute.
Natural join and inner union will behave much like intersection
and union in this case. If the result type of the join isn't an
intersection type, then we lose the property:

A = A join (A union B)

because the type of the attribute of the expression is different
than the type of the attribute of A.

More generally, the values in the result of a join are the
intersection
of the values in the operands; why wouldn't the result type be the
intersection type?

Also, what about the analogous situation with inner union? Would
the type of the common attributes be a union type or an intersection
type? The values are the union of the values of the operands,
so I would propose that it has to be a union type.

> > Fortunately, I'm not hot any more. Now I'm thinking that
> > I haven't had any alcohol in a week or more. Maybe
> > a drink is in order? Plus, I think there is a cupcake in
> > a cupboard somewhere downstairs. I must away!
>

> Having bathed each of my 3 dogs several times, I now have to wash the
> skunk smell out of the vestibule before it pollutes the house any more
> than it already has, when I would much rather just go to bed.

Ouch! That sucks about the skunk.

I skipped the drink but I found the cupcake. It was getting a
bit on the dry side but still quite tasty! Now it's time for me
to make my son some pancakes and send him off to school.

Marshall

Sampo Syreeni

unread,

Oct 27, 2009, 12:27:01 PM10/27/09

to

> More generally, the values in the result of a join are the
> intersection
> of the values in the operands; why wouldn't the result type be the
> intersection type?

That's because we're mixing two different levels of typing. The first,
weaker kind only considers set operations and typing in general. In
this frame of mind, and as Bob already pointed out, you will have to
find the common supertype of string and integer before you even have
the comparison operator available. The lowest common supertype in this
case is TOP, which as it happens *only* has a single equality
comparison operator. The type of the join column is then the supertype
where we finally find the means of comparison, i.e. TOP.

But you can also go deeper and consider the semantics of the equality
operator. Then you actually know that the result set is always the
intersection of the incoming ones, that because of that we can unify
the two sides of the resulting comparison to get "the join column" of
one homogeneous type, and furthermore that, in the case of types whose
least common supertype is TOP, the intersection type will also be
uninhabited. What happens is that now we're actually propagating types
through different kinds of operators, and the equality comparison
operator is just one of them, with its particular properties. A
minimum nontrivial counterexample in this vein would probably be a
join using a less-than comparison between integers and floats.

> Consider relations A and B each with a single, common attribute.
> Natural join and inner union will behave much like intersection
> and union in this case. If the result type of the join isn't an
> intersection type, then we lose the property:
>
> A = A join (A union B)
>
> because the type of the attribute of the expression is different
> than the type of the attribute of A.

That equality never held for types. Only for the values, or in other
words for the values considered as inhabitants of TOP. That's
essentially because type logics tend to be intuitionistic, or even
modal, not classical.
--
Sampo

paul c

unread,

Oct 27, 2009, 12:33:26 PM10/27/09

to

Marshall wrote:
...

> Consider relations A and B each with a single, common attribute.
> Natural join and inner union will behave much like intersection
> and union in this case. If the result type of the join isn't an
> intersection type, then we lose the property:
>
> A = A join (A union B)
>
> because the type of the attribute of the expression is different
> than the type of the attribute of A.

> ...

Marshall, I probably am diverging from your purpose but let me ask if
that property is important because without it you don't have a
relational lattice or is it important because without it some practical
use is lost?

(I may be far out in left field as far as mathematicians and type
theorists are concerned, but my simple interpretation of the Information
Principle is that the type of A{c} JOIN B{c} is {c}, whether it is a
union or intersection doesn't need to be asked.)

Bob Badour

unread,

Oct 27, 2009, 12:41:48 PM10/27/09

to

Marshall wrote:

> On Oct 26, 10:29 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
>>Marshall wrote:
>>
>>>>>In a typed system, the type is whatever the intersection
>>>>>of int and string is.
>>
>>>>Union not intersection. It has to be a type compatible with both integer
>>>>and string.
>>
>>>I don't see how that would work. An integer isn't type compatible
>>>with a string. Is it?
>>
>>It doesn't have to be compatible with a string. It only has to be
>>compatible with TOP or the universal supertype. String doesn't have to
>>be compatible with integer either. It only has to be compatible with TOP
>>or the universal supertype.
>
> OK, I didn't get much sleep last night so I'm not very sharp, but
> I'll just regurgitate the argument I thought of before I went to
> bed:
>
> Consider relations A and B each with a single, common attribute.
> Natural join and inner union will behave much like intersection
> and union in this case. If the result type of the join isn't an
> intersection type, then we lose the property:
>
> A = A join (A union B)
>
> because the type of the attribute of the expression is different
> than the type of the attribute of A.

I don't see that you do. The type will be the MST of the resulting join.
Because you will join A with a supertype of A, the MST will be the same
as the type of A. This is different from the case joining an integer
with a string because the MST is the universal supertype.

> More generally, the values in the result of a join are the
> intersection
> of the values in the operands; why wouldn't the result type be the
> intersection type?

I am not sure what you mean by the intersection type. If you join 2
types that are both supertypes of some set of subtypes, the resulting
type will be the union type of those subtypes or the most specific
supertype of those subtypes. In the case where there are no common
subtypes, the resulting type will be the union type of the types
themselves or the most specific supertype of them, and the body will be
empty.

> Also, what about the analogous situation with inner union? Would
> the type of the common attributes be a union type or an intersection
> type? The values are the union of the values of the operands,
> so I would propose that it has to be a union type.

If you mean regular old union, the resulting type will be the most
specific supertype, which might be a union type. If you union circles
with hyperbola, the type will be conic sections. If you union circles
with ellipses, the type will be ellipses.

>>>Fortunately, I'm not hot any more. Now I'm thinking that
>>>I haven't had any alcohol in a week or more. Maybe
>>>a drink is in order? Plus, I think there is a cupcake in
>>>a cupboard somewhere downstairs. I must away!
>>
>>Having bathed each of my 3 dogs several times, I now have to wash the
>>skunk smell out of the vestibule before it pollutes the house any more
>>than it already has, when I would much rather just go to bed.
>
> Ouch! That sucks about the skunk.
>
> I skipped the drink but I found the cupcake. It was getting a
> bit on the dry side but still quite tasty! Now it's time for me
> to make my son some pancakes and send him off to school.

I didn't skip anything, but I missed Ollie's radio collar, which I took
off him before bathing him. I have to mix up another batch of peroxide
and baking soda just for it. I should have just left it on him.

TroyK

unread,

Oct 27, 2009, 3:41:33 PM10/27/09

to

On Oct 27, 11:41 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:

> I didn't skip anything, but I missed Ollie's radio collar, which I took
> off him before bathing him. I have to mix up another batch of peroxide

> and baking soda just for it. I should have just left it on him.- Hide quoted text -
>
> - Show quoted text -

I mostly lurk, but on the subject of skunks and dogs, I can offer some
meaningful advice (since, like mine, your dogs will undoubtedly play
this game again)

Next time, try Dawn dishwashing liquid. In the past, I've tried a
specialty shampoo, and the baking soda/peroxide recipe, but Dawn alone
seems to do the trick the best. I'm fortunate in that I have 3 Germain
Shorthaired Pointers, so their fur tends not to hold on to the scent
as much as a longer-haired dog. Plus, they have improved on the art of
the quick kill, which greatly reduces the emissions from the
unfortunate animal.

Now to get back to thinking about types, supertypes, domains and
such :)

TroyK

David BL

unread,

Oct 28, 2009, 1:01:56 AM10/28/09

to

On Oct 28, 1:41 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> Marshall wrote:

> > Consider relations A and B each with a single, common attribute.
> > Natural join and inner union will behave much like intersection
> > and union in this case. If the result type of the join isn't an
> > intersection type, then we lose the property:
>
> > A = A join (A union B)
>
> > because the type of the attribute of the expression is different
> > than the type of the attribute of A.
>
> I don't see that you do. The type will be the MST of the resulting join.
> Because you will join A with a supertype of A, the MST will be the same
> as the type of A. This is different from the case joining an integer
> with a string because the MST is the universal supertype.

I agree with Bob, although use of the definite article in "the type"
is potentially confusing because a value may have many types (only one
of which is the MST).

Properties like

A = A join (A union B)

are true because it concerns equality of values on the two sides of
the equation, and has nothing to do with static type analysis. It is
similar to how D&D talk about an ellipse variable that inadvertently
holds a circle value when the width and height happen to be equal.
Obviously for every A and B, the MST of

A join (A union B)

will match the MST of A.

> > More generally, the values in the result of a join are the
> > intersection
> > of the values in the operands; why wouldn't the result type be the
> > intersection type?
>
> I am not sure what you mean by the intersection type.

I cannot see any reason not to allow the type system to form a
lattice, i.e. for any two given types there is a well defined "union
type" and "intersection type".

Static type analysis of an expression containing a natural join could
assume that its static type is either the intersection type or the
union type of the static types of the operands. I would suggest the
former is more useful than the latter because statically typed
languages make upcasts implicit and downcasts explicit - so a more
specialised static type is favourable.

Of course at "run time" a particular join result (i.e. value) may have
a dynamic type (i.e. its MST) that is more specific than its static
type.

Marshall

unread,

Oct 28, 2009, 9:26:27 AM10/28/09

to

On Oct 27, 9:33 am, paul c <toledobythe...@oohay.ac> wrote:
> Marshall wrote:
>
> ...

>
> > Consider relations A and B each with a single, common attribute.
> > Natural join and inner union will behave much like intersection
> > and union in this case. If the result type of the join isn't an
> > intersection type, then we lose the property:
>
> > A = A join (A union B)
>
> > because the type of the attribute of the expression is different
> > than the type of the attribute of A.

> > ...
>
> Marshall, I probably am diverging from your purpose but let me ask if
> that property is important because without it you don't have a
> relational lattice or is it important because without it some practical
> use is lost?

Heh. I think that's a fair question.

I think absorption, (the above property) is in fact something
with practical use, to the optimizer anyway. But I guess
my point is more general, which is to say that the behavior
of the type system ought to match the behavior of the
runtime system. If join puts attribute values together with
AND, then the type system should put type values together
with AND.

Marshall

unread,

Oct 28, 2009, 9:31:37 AM10/28/09

to

On Oct 27, 12:41 pm, TroyK <cs_tr...@juno.com> wrote:
> On Oct 27, 11:41 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
>
> > I didn't skip anything, but I missed Ollie's radio collar, which I took
> > off him before bathing him. I have to mix up another batch of peroxide
> > and baking soda just for it. I should have just left it on him.- Hide quoted text -
>
> > - Show quoted text -
>
> I mostly lurk, but on the subject of skunks and dogs, I can offer some
> meaningful advice (since, like mine, your dogs will undoubtedly play
> this game again)
>
> Next time, try Dawn dishwashing liquid.

I don't think he's going to like that advice; Bob has gone on record
repeatedly saying mean things about Dawn.

Marshall

unread,

Oct 28, 2009, 9:35:20 AM10/28/09

to

On Oct 27, 9:41 am, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> I didn't skip anything, but I missed Ollie's radio collar, which I took
> off him before bathing him. I have to mix up another batch of peroxide
> and baking soda just for it. I should have just left it on him.

Ack! Out of time for news this morning. Hope to reply later.

Marshall

Bob Badour

unread,

Oct 28, 2009, 9:42:53 AM10/28/09

to

Marshall wrote:

I am actually allergic to Dawn. I use Sunlight. I started with Sunlight
to wash out as much of the oil as possible before I used the peroxide
and baking soda.

paul c

unread,

Oct 28, 2009, 2:41:48 PM10/28/09

to

Mr. Scott wrote:
> "Marshall" <marshal...@gmail.com> wrote in message
> news:386975f9-472c-4184...@r24g2000prf.googlegroups.com...
>> On Oct 24, 10:53 am, Keith H Duggar <dug...@alum.mit.edu> wrote:
>>> Anyhow, the question here is not one of our imagination but rather
>>> simply this: if it makes sense for the RM to support constraints
>>> on relational /values/ (taken on by variables) why does it not
>>> make sense to support constraints on relational /expressions/?
>>> That is a question of general principle not specific design.
>> This question, it seems to me, is clear and to the point.
>> And I would answer it by saying that we shouldn't really
>> even make the distinction! (At least not formally.)
>

> I think we should make the distinction, and formally.
>
> (p /\ q) -> r is not the same as (p -> r) /\ (q -> r)
> but (p \/ q) -> r is the same as (p -> r) \/ (q -> r)
>

> A view consisting of a natural join, for example, represents a set of
> conjunctions. Each row of the join represents a conjunction of
> propositions, one for each operand. A constraint defined on a join would be
> of the form (p /\ q) -> r. That is definitely not the same as constraints
> defined on one or more tables, which would take the form (p \/ q) -> r.
> ...

Forgot to mention that I don't see that a "a constraint defined on a
join" would necessarily be "of the form (p /\ q) -> r". I had thought
that many people think it could be any truth-valued expression such as
"(p /\ q) = r".

This leads me to think that most, if not all, view definitions can be
interpreted as constraints. It is interesting to me to then ask what
makes a view different from a base. Is it enough to say that a view
always has one constraint (of possibly several) that is an equality and
a view may be 'updated' without reference to the view?

A more opaque way but perhaps less useful way of saying this is that a
relation's definition in the first place amounts to nothing more than a
constraint.

Cimode

unread,

Oct 28, 2009, 4:17:19 PM10/28/09

to

Snipped

> If I recall it
> wasn't until his second paper that he introduced the pair of attribute
> and domain name, he was adjusting his theory to meet practice.

Precisely.

As for the headers being less relevant in the context of domain
analysis, I meant that domain analysis clearly focuses on the body of
each un-ary relation being primarily a domain subset (*before*
relation level constraints are applied; I use the term with great
caution !) to be considered an attribute of a specific relation. It
is only once domain requirements are met that the assignment of a
specific header defines the attribute (then the header set defining
the relation).

My observation is that a lot of work has been done into defining
relations in order to algebrically operate them as relations but very
little has been done to use domains to clarify relational algebra.
`
My conclusion is that the relational consensus that constraint
specialization through domain analysis could be in any way orthogonal
to relation structural definitions restricts the possibility of
clarifying further relational model operations and implementations.

But that is only my opinion.

Regards...

Tegiri Nenashi

unread,

Oct 28, 2009, 4:29:32 PM10/28/09

to

On Oct 28, 10:41 am, paul c <toledobythe...@oohay.ac> wrote:
> Mr. Scott wrote:
> > "Marshall" <marshall.spi...@gmail.com> wrote in message

Is view definition a constraint? IMO it's purely terminological
matter. Consider relations x and y defined by some algebraic
identities. Is adding new view z (as a function of x and y) adding a
constraint to the system?

Let's analyze a simpler example. Consider two real values constrained
by the equality:

x + y = 5

Is introducing a new variable z, say

z = x - 2y

a new constraint imposed onto the system? Not really, because,
variable z is redundant and can be eliminated, and it doesn't affect
the formal property of the system of being under constrained.

paul c

unread,

Oct 28, 2009, 4:46:31 PM10/28/09

to

Tegiri Nenashi wrote:
...

> Is view definition a constraint? IMO it's purely terminological
> matter. Consider relations x and y defined by some algebraic
> identities. Is adding new view z (as a function of x and y) adding a
> constraint to the system?
>
> Let's analyze a simpler example. Consider two real values constrained
> by the equality:
>
> x + y = 5
>
> Is introducing a new variable z, say
>
> z = x - 2y
>
> a new constraint imposed onto the system? Not really, because,
> variable z is redundant and can be eliminated, and it doesn't affect
> the formal property of the system of being under constrained.

That is a form of argument that I've seen quite often regarding various
RM questions, not just this one. I'd have no problem with it were it
not called an "example". Since it is about arithmetic, it's at best a
mere analogy to relations and we need to decide whether the analogy
should apply. To try to answer that I would ask when do we ever record
"extensions" of arithmetic equations? In other words, just because we
have abstract operations for both numbers and relations doesn't mean one
should mimic the other. If that's so, maybe somebody else can put it
better.

paul c

unread,

Oct 28, 2009, 4:54:42 PM10/28/09

to

Tegiri Nenashi wrote:
> ... Is adding new view z (as a function of x and y) adding a
> constraint to the system?
> ...

I would say not exactly. More exactly it is adding another relation and
constraining the resulting system (aka the resulting database). That is
different from merely constraining an existing system/database.

Bob Badour

unread,

Oct 28, 2009, 6:41:28 PM10/28/09

to

paul c wrote:

> Tegiri Nenashi wrote:
> ...
>
>> Is view definition a constraint? IMO it's purely terminological
>> matter. Consider relations x and y defined by some algebraic
>> identities. Is adding new view z (as a function of x and y) adding a
>> constraint to the system?
>>
>> Let's analyze a simpler example. Consider two real values constrained
>> by the equality:
>>
>> x + y = 5
>>
>> Is introducing a new variable z, say
>>
>> z = x - 2y
>>
>> a new constraint imposed onto the system? Not really, because,
>> variable z is redundant and can be eliminated, and it doesn't affect
>> the formal property of the system of being under constrained.
>
> That is a form of argument that I've seen quite often regarding various
> RM questions, not just this one. I'd have no problem with it were it
> not called an "example". Since it is about arithmetic, it's at best a
> mere analogy to relations and we need to decide whether the analogy
> should apply.

Ahem.

x + y = 5 is a relation. z = x - 2y is a relation. They are linear
polynomial functions, and all functions are relations.

x*x + y*y + z*z - r*r = 0 is also a relation. It is a relation
describing a sphere of radius r centered at the origin. It is also a
polynomial. While it is not a function, it is a relation.

> To try to answer that I would ask when do we ever record
> "extensions" of arithmetic equations?

Whenever anyone writes the word "let":

Let u = x-3, v=y+2, w=z-1...

> In other words, just because we
> have abstract operations for both numbers and relations doesn't mean one
> should mimic the other. If that's so, maybe somebody else can put it
> better.

Whether involving numbers or no numbers, a relation is a relation. What
we can do with relations doesn't change because some of them involve
numbers and some of them do not.

paul c

unread,

Oct 28, 2009, 6:57:15 PM10/28/09

to

That's very good, accurate up to a point and no argument except for a
couple of things i) when he mentioned "variable" Tegiri didn't make it
clear whether he was talking about one of Codd's non-binary relations, I
presume traditional math philosophy would have to have some recasting
for that (don't ask me how!) ii) even if I'm wrong about i), Codd's
relations are slightly different in usage both because they use
different operators than the arithmetic ones and because we can 'bend'
the relational ops to produce a certainty from an uncertainty, notably
when we use union to 'insert' to a relation - this seems quite different
to me from what arithmetic allows.

Gene Wirchenko

unread,

Oct 28, 2009, 6:59:29 PM10/28/09

to

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Beautiful line.

>to wash out as much of the oil as possible before I used the peroxide

^
... and hierarchical stains ...

>and baking soda.

Sincerely,

Gene Wirchenko

Mr. Scott

unread,

Oct 28, 2009, 7:19:59 PM10/28/09

to

"paul c" <toledob...@oohay.ac> wrote in message
news:Mf0Gm.50609$PH1.4563@edtnps82...

The form (p /\ q) -> r applies because whenever r does not hold, neither can
(p /\ q). r is in fact your "any truth valued expression." In the case of
a join view, the antecedent is a conjunction, not a disjunction.

>
> This leads me to think that most, if not all, view definitions can be
> interpreted as constraints. It is interesting to me to then ask what
> makes a view different from a base. Is it enough to say that a view
> always has one constraint (of possibly several) that is an equality and a
> view may be 'updated' without reference to the view?
>

Tables house data; views just present it. That is in a nutshell the
difference between tables and views. What is presented by a view implies
what is in the operands of the view's definition. As a consequence, in
order to be fully updatable and therefore interchangable, each and every set
of inserts, updates and deletes applied to a view must map one-to-one to a
set of inserts, updates and deletes applied to those operands. Views that
are joins or unions or restrictions or projections in general aren't fully
updatable. There are exceptions, of course. A view defined on a pair of
tables that participate in mutual foreign keys is fully updatable because
each and every set of inserts, updates and deletes applied to the view maps
one-to-one to a set of inserts, updates and deletes applied to the tables.

paul c

unread,

Oct 28, 2009, 7:51:57 PM10/28/09

to

Mr. Scott wrote:
> ...

> Tables house data; views just present it. That is in a nutshell the
> difference between tables and views. What is presented by a view implies
> what is in the operands of the view's definition. As a consequence, in
> order to be fully updatable and therefore interchangable, each and every set
> of inserts, updates and deletes applied to a view must map one-to-one to a
> set of inserts, updates and deletes applied to those operands. Views that
> are joins or unions or restrictions or projections in general aren't fully
> updatable. There are exceptions, of course. A view defined on a pair of
> tables that participate in mutual foreign keys is fully updatable because
> each and every set of inserts, updates and deletes applied to the view maps
> one-to-one to a set of inserts, updates and deletes applied to the tables.

> ...

Doesn't this amount to saying that tables are stored and views are not?
(whereas I don't see why a view couldn't be stored because of some
practical reason or other.)

paul c

unread,

Oct 28, 2009, 7:53:47 PM10/28/09

to

(Not to discount Bob B's reply - I may be stepping over some unmarked
line in the above, which would make it a provocation but I often find
those useful for seeing things more clearly.)

Tegiri Nenashi

unread,

Oct 28, 2009, 8:10:57 PM10/28/09

to

Ok, equality relation in u = x-3 certainly contributed to the
confusion. I meant to bring in algebraic analogy without any reference
to relations. It was just an algebra of real numbers (aka real number
field). The idea was that the topic of [linear] constraints over real
number field is well understood (so it can be viewed as a role model
for development in database field).

Certainly, in database field we have different algebraic axioms (they
are somewhat similar to boolean algebra!), but the other concepts stay
the same (variables, constants, operations, and equations). The
objects of the algebra are relations and not numbers, of course -- yet
another source of confusion because relations structured into tables
might have numbers in them!

Bob Badour

unread,

Oct 28, 2009, 8:41:41 PM10/28/09

to

paul c wrote:

Ignore Mr. Scott. He doesn't know what he is talking about. Base
relations and views equally represent data and neither houses anything,
because housing implies something physical. A view can be stored or not
stored. Regardless, a view is derived from base relations. Base
relations, themselves, are derived from physical storage structures and
might not be stored anywhere as is either.

It's easy enough to construct various schema with identical predicates
where what is base in each is derived in all the others. The database
designer generally chooses the base relations as a matter of his own
convenience.

Mr. Scott

unread,

Oct 28, 2009, 9:40:20 PM10/28/09

to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:4ae8e4c5$0$26502$9a56...@news.aliant.net...

Ignore Mr. Badour. He doesn't know what he is talking about. Table
definitions are constraints. View definitions are queries. The difference
is clear to anyone who has a clue. (Obviously, Mr. Badour doesn't.)
Constraints specify what can be true, not what is supposed to be true.
Queries manipulate what is supposed to be true. Base relations (what are
housed in tables) are not derived from physical storage structures: they are
instead a logical expression of what is supposed to be true. How they are
physically represented is irrelevant. What is presented by a view is the
result of querying that logical expression of what is supposed to be true.

paul c

unread,

Oct 28, 2009, 10:58:04 PM10/28/09

to

Mr. Scott wrote:

> ... Table
> definitions are constraints. View definitions are queries. ...

I detect some mysticism here. There has to be a reason to distinguish b
between 'table' and 'view' this way, otherwise we don't need both terms.
What is the reason (or reasons)?

paul c

unread,

Oct 28, 2009, 11:04:11 PM10/28/09

to

Mr. Scott wrote:
...

> Constraints specify what can be true, not what is supposed to be true.

> ...

I thought constraints constrain, ie., limit. (I've often thought that
isn't enough in practice, eg., I've never seen a default defined
algebraically and beyond that I wouldn't mind a variation on constraints
that lets me force an assertion, eg., some tuple that is always present,
whether the user has thought to include it or not, probably CJ Date
would disagree with that.)

Bob Badour

unread,

Oct 28, 2009, 11:09:30 PM10/28/09

to

paul c wrote:

I cannot make sense of what you wrote. I suspect you have omitted much
context, internal dialogue and assumptions.

paul c

unread,

Oct 28, 2009, 11:14:54 PM10/28/09

to

I usually try to omit at least two out of three of those, otherwise even
I can't guess what I'm talking about! Being of a minimalist persuasion,
not wanting more concepts than I can handle, I think I'd rather have
constraints, unlike CJ Date's, that are applied against values without
requiring them to be 'truth-valued' and 'and-ed' if you will. I don't
have a good name for this.

Mr. Scott

unread,

Oct 29, 2009, 12:18:59 AM10/29/09

to

"paul c" <toledob...@oohay.ac> wrote in message

news:LC7Gm.49742$Db2.7559@edtnps83...

A proposition either can be true or can't be true. Specifying what can be
true also specifies what can't; likewise, specifying what can't be true also
specifies what can.

Mr. Scott

unread,

Oct 29, 2009, 12:32:16 AM10/29/09

to

"paul c" <toledob...@oohay.ac> wrote in message

news:0x7Gm.49741$Db2.23278@edtnps83...

There is nothing mystical here. Constraints specify what can and cannot be
assigned a positive truth value, they do not specify what has been assigned
a positive truth value. Queries manipulate what has been assigned a
positive truth value or what can be derived from what has been assigned a
positive truth value.

Bob Badour

unread,

Oct 29, 2009, 10:30:50 AM10/29/09

to

paul c wrote:

If not truth-valued, what would they be? Either something passes the
constraint or it doesn't.

paul c

unread,

Oct 29, 2009, 7:58:38 PM10/29/09

to

Bob Badour wrote:
> paul c wrote:

...

>> I usually try to omit at least two out of three of those, otherwise
>> even I can't guess what I'm talking about! Being of a minimalist
>> persuasion, not wanting more concepts than I can handle, I think I'd
>> rather have constraints, unlike CJ Date's, that are applied against
>> values without requiring them to be 'truth-valued' and 'and-ed' if you
>> will. I don't have a good name for this.
>
> If not truth-valued, what would they be? Either something passes the
> constraint or it doesn't.

Sorry, I've forgotten what I was driving at, I was remembering some
notes I made months ago but can't find them right now. I'm sure the
notions will come back to me in a day or two. (Natch', I'm also sure you
aren't holding your breath!)

com...@hotmail.com

unread,

Oct 30, 2009, 9:11:09 PM10/30/09

to

On Oct 28, 8:04 pm, paul c <toledobythe...@oohay.ac> wrote:

> Mr. Scott wrote:
> > Constraints specify what can be true, not what is supposed to be true.

Relational dbms constraints restrict updates to valid database states.
They are logically redundant. Variable predicates and (the history of)
the
world fully determine valid database states.

> I thought constraints constrain, ie., limit. (I've often thought that
> isn't enough in practice, eg., I've never seen a default defined
> algebraically and beyond that I wouldn't mind a variation on constraints
> that lets me force an assertion, eg., some tuple that is always present,
> whether the user has thought to include it or not, probably CJ Date
> would disagree with that.)

1.Constraint (v union relation{t}) = v // variable v must always
contain tuple t
2.You have to start the dbms with its database in a valid state.
So before the user may query, the designer and/or user have to
assign initial values for variables so that all constraints are met.
So any values given by the designer (which might be all of them) would
be
your defaults.

In oo, establishing such a valid initial state is called construction
or initialization. All module-based programming languages (have to)
have such a phase in a module's life.

An application is a finite state machine, ie a variable with permitted
update
operators with permitted sequences of calls. You have to understand
this in order to actually describe or understand how any application
behaves.
More simple fundamental computing theory orthogonal to the relational
model.

philip

Reinier Post

unread,

Oct 31, 2009, 7:02:39 PM10/31/09

to

Mr. Scott wrote:

>[...] Constraints specify what can and cannot be

>assigned a positive truth value, they do not specify what has been assigned
>a positive truth value. Queries manipulate what has been assigned a
>positive truth value or what can be derived from what has been assigned a
>positive truth value.

Formally: a database instance is a set of named relations,
i.e. a mapping from names to relations.
A database is a set of possible database instances, such that
each instance maps the same names to relations with, per name, the same signature.
It is defined by a database schema (which maps each relation name to its signature)
and database constraints (which limit the possible combinations of relations
to which the names are mapped). (I can go on and define 'signature', etc.)
A database constraint is a predicate on database instances.

A 'table' (in this context) is a named relation in a database instance,
or perhaps a database restricted to that name (i.e. a relation name mapped to
the set of all possible values for that relation).

A 'view' (in this context) is a named mapping from database instances to relations.

--
Reinier

Mr. Scott

unread,

Oct 31, 2009, 9:39:31 PM10/31/09

to

"Reinier Post" <r...@raampje.lan> wrote in message
news:4aecc20f$0$14682$703f...@news.kpn.nl...

I really don't get what you're driving at. I also don't fully agree with
your characterization. What's the point you're trying to make?

Reinier Post

unread,

Nov 1, 2009, 5:41:24 PM11/1/09

to

Mr. Scott wrote:

>I really don't get what you're driving at. I also don't fully agree with
>your characterization. What's the point you're trying to make?

The point is that your characterization can be formalized.

--
Reinier