For instance, consider the following states for a relation describing
people's marital status, and a transition constraint that says: Single
people can't become Divorced:
Current Proposed
Jane Jones Married Jane Jones Married
Jane Smith Single Jane Smith Divorced
Should the proposed state be rejected? Or did Jane Jones get Divorced
becoming Jane Smith at the same time that Jane Smith married Bob Jones? It
is impossible to tell: not enough information has been provided.
Since a transition that violates a transition constraint can result in the
same /possible state/ as a transition that doesn't, the notions of
relational assignment and multiple assignment are broken: an update must be
submitted as a transition or something equivalent, not just as sets of
relation values, otherwise transition constraints cannot always be enforced.
So, should Relation Variables be discarded in favor of Relation Schemata?
I think a transition constraint must be defined as a relational
expression in the general form of some join between t and t'. So in all
cases I'd say that the precise nature and syntax of the update
operation is irrelevant. The proposed transition has to be permitted or
rejected based purely on the relation values of t and t'. Assignment
won't break; Jane Smith cannot get divorced.
--
David Portas
<argggghhhh/>.Sorry brian, but this still isn't right. It is illogical
to talk about the transition of a tuple from one value to another, as
though they were entities from the real world themselves. Look, say
mathematically you are talking about a relation composed of three
tuples:
R := {x, y, z}
x, y and z are not variables! They are aliases for values. I can't
compare x, y an z with their future selves - they only have one value,
today, tomorrow, for evermore.
Maybe the next state of the relvar will be R := {x, z, j}. Sure I can
compare j and y but saying they somehow correspond to each other in
some other way is nonsensensical logically. And they certainly aren't
'transitioning'.
> t would be empty for an inserted tuple, t' would be
> empty for a deleted tuple, and neither would be empty for tuples
> corresponding in an update. From that set of transitions, only those that
> satisfy all transition constraints are /possible transitions/. The problem
> is that more than one transition can result in the same /possible state/ but
> that not all of those are /possible transitions/.
>
> For instance, consider the following states for a relation describing
> people's marital status, and a transition constraint that says: Single
> people can't become Divorced:
I'm glad that you've seen the light with the whole not-hiding
attributes thing (it was a revelation for me when the information
principle finally clicked), but the 'corresponding tuple' conjecture
still needs breaking!
Brian Selzer wrote:
> For instance, consider the following states for a relation describing
> people's marital status, and a transition constraint that says: Single
> people can't become Divorced:
Single people can get married and then divorced in between updates.
Anyhows...
>
> Current Proposed
> Jane Jones Married Jane Jones Married
> Jane Smith Single Jane Smith Divorced
>
> Should the proposed state be rejected? Or did Jane Jones get Divorced
> becoming Jane Smith at the same time that Jane Smith married Bob Jones?
Brian, this example makes no sense whatsoever. If anything it
highlights what we've all been saying. In fact you explain why
yourself:
> is impossible to tell: not enough information has been provided.
Exactly. There is insufficient information to identify the people
concerned. The design is fubar. Nothing else to it.
The whole point is that here someone's name is insufficient to identify
them. In fact, forget the database, this would be a problem just in the
company when one of these women tried to explain to a secretary who she
was.
Now if you want to talk about constraints then fair enough, but lets
get the other stuff clarified first. All best, Jim.
In fact, given any modestly complex schema and two database states S1
and S2, it is possible to generate a superginormous number of different
sets
of transitions that would effect the transition from S1 to S2. In fact
the
only reason the number is even finite is because we use finite
computers
with necessarily finite domains; remove that constraint and I can
construct
an infinite number of valid transitions between S1 and S2 for a schema
containing only a single relvar of only a single attribute.
> For instance, consider the following states for a relation describing
> people's marital status, and a transition constraint that says: Single
> people can't become Divorced:
>
> Current Proposed
> Jane Jones Married Jane Jones Married
> Jane Smith Single Jane Smith Divorced
>
> Should the proposed state be rejected? Or did Jane Jones get Divorced
> becoming Jane Smith at the same time that Jane Smith married Bob Jones? It
> is impossible to tell: not enough information has been provided.
So apparently you have this idea that we want to model tuple identity
100% of the time with no exceptions, and you have this example
which shows that it is possible to model a situation where we
don't have tuple identity, and you expect us to see that as some
kind of flaw. The problem is that we *don't* want tuple identity
100% of the time, so the above doesn't show any flaw. All it
shows is you intentionally doing the wrong thing for the effect
you're trying to achieve. That has all it has ever shown; that is
all it will ever show, no matter how many times you keep posting
the same example and pointing to it and saying "See the flaw?!"
The flaw is only in your idea of what the RM should and should
not be able to do. The ambiguity in your above example? It is
*supposed* to be possible to do that. If you took away the
possibility of doing that, it would make the model less expressive.
The idea of transition constraints is inherrently non-set-theoretic,
because it assumes tuple-identity. Since it is possible to
model non-tuple-identity scenarios in the RM, there will be
some scenarios in which transition constraints do not make
sense. This is not a flaw in RM; rather it is an indication that
transition constraints are a lower-level albeit useful hack
on top of some subset of RM schemas, that are applicable
only some of the time.
> Since a transition that violates a transition constraint can result in the
> same /possible state/ as a transition that doesn't, the notions of
> relational assignment and multiple assignment are broken: an update must be
> submitted as a transition or something equivalent, not just as sets of
> relation values, otherwise transition constraints cannot always be enforced.
No, that's not what it shows. (And why have you suddenly shifted from
talking
about insert/update/delete operations to relational assignment and
multiple
assignment? They have different properties, especially with regards to
tuple identity.) What it shows is that transition constraints are not
universally applicable, because sometimes we're not modelling
tuple identity.
> So, should Relation Variables be discarded in favor of Relation Schemata?
This statement doesn't even make any sense. Nor does it appear to have
anything to do with what came before it.
Marshall
More precisely, the idea of *tuple-level* transition constraints is
inherently non set-theoretic. Fortunately, logical operations of any
kind at tuple-level are not possible or permitted in RM. The only
applicable way to define a transition constraint is in set-based
fashion at the logical level. Truly relational transition constraints
would therefore suffer none of the problems mentioned.
--
David Portas
Please express this constraint using relational calculus.
I'm particularly interested to see how "people" and "become"
are expressed.
Thank you.
-- Keith -- Fraud 6
Yes, exactly; thank you for pointing this out. I was not sufficiently
explicit.
I haven't seen much written about transition constraints, and
would be happy to read more if anyone has any good pointers.
I went a-Googling a week or two ago for writings on the topic
and the only interesting thing I came across was a forgotten
post by me asking people about transition constaints! Ha ha!
Marshall
Is the cross product of a set of attributes any less superginormous? Yet,
that is how relations are defined. How about the set of consistent database
states? I find this argument less then compelling.
>> For instance, consider the following states for a relation describing
>> people's marital status, and a transition constraint that says: Single
>> people can't become Divorced:
>>
>> Current Proposed
>> Jane Jones Married Jane Jones Married
>> Jane Smith Single Jane Smith Divorced
>>
>> Should the proposed state be rejected? Or did Jane Jones get Divorced
>> becoming Jane Smith at the same time that Jane Smith married Bob Jones?
>> It
>> is impossible to tell: not enough information has been provided.
>
> So apparently you have this idea that we want to model tuple identity
> 100% of the time with no exceptions, and you have this example
> which shows that it is possible to model a situation where we
> don't have tuple identity, and you expect us to see that as some
> kind of flaw. The problem is that we *don't* want tuple identity
> 100% of the time, so the above doesn't show any flaw. All it
> shows is you intentionally doing the wrong thing for the effect
> you're trying to achieve. That has all it has ever shown; that is
> all it will ever show, no matter how many times you keep posting
> the same example and pointing to it and saying "See the flaw?!"
> The flaw is only in your idea of what the RM should and should
> not be able to do. The ambiguity in your above example? It is
> *supposed* to be possible to do that. If you took away the
> possibility of doing that, it would make the model less expressive.
>
Identity beyond that provided by a candidate key (that is, a single database
state) is in the eye of the beholder: whether a value means or represents
the same thing in successive database states depends on the perception of
the user. In addition, you don't lose any expressiveness at all by updating
via transitions instead of updating via relational or multiple assignment.
In fact, the effect of relational or multiple assignment can be thought of
as degenerate transition forms: the transition would simply be a set of
triples in which only t or t' were not empty, but not both. The point is,
if the user *knows* that tuples should correspond, there's no mechanism for
conveying that information using only relational or multiple assignment, and
there's also no mechanism for defining transition constraints (unless
perhaps, aggregate functions are involved).
> The idea of transition constraints is inherrently non-set-theoretic,
> because it assumes tuple-identity. Since it is possible to
> model non-tuple-identity scenarios in the RM, there will be
> some scenarios in which transition constraints do not make
> sense. This is not a flaw in RM; rather it is an indication that
> transition constraints are a lower-level albeit useful hack
> on top of some subset of RM schemas, that are applicable
> only some of the time.
>
I disagree. If tuple-identity were non-set-theoretic, then the Principle of
Full Normalization is just a bunch of hogwash.
I prefer to use the term limitation instead of flaw. Because one cannot
define enforcible transition constraints, the model is incomplete. That's
not necessarily a flaw, but it certainly limits what can be done with the
model. Codd defined a data model in this way:
<<<<
It is a combination of three components:
1) a collection of data structure types (the building blocks of any database
that conforms to the model);
2) a collection of operators or inferencing rules, which can be applied to
any valid instances of the data types listed in (1), to retrieve or derive
data from any parts of those structures in any combinations desired;
3) a collection of general integrity rules, which implicitly or explicitly
define the set of consistent database states or changes of state or both --
these rules may sometimes be expressed as insert-update-delete rules.*
*E. F. Codd, "Data Models in Database Management," Proceedings of the 1980
Workshop on Data Abstraction, Databases and Conceptual Modeling, Pingree
Park, Colorado
>>>>
It is clear from this definition that transition constraints should be
definable and enforcible, and are not just a useful hack. He also said in
the same document that a model without operators or integrity rules should
be regarded as partial or incomplete.
>
>> Since a transition that violates a transition constraint can result in
>> the
>> same /possible state/ as a transition that doesn't, the notions of
>> relational assignment and multiple assignment are broken: an update must
>> be
>> submitted as a transition or something equivalent, not just as sets of
>> relation values, otherwise transition constraints cannot always be
>> enforced.
>
> No, that's not what it shows. (And why have you suddenly shifted from
> talking
> about insert/update/delete operations to relational assignment and
> multiple
> assignment? They have different properties, especially with regards to
> tuple identity.) What it shows is that transition constraints are not
> universally applicable, because sometimes we're not modelling
> tuple identity.
>
I wasn't talking about insert/update/delete operations. I was talking about
a transition, which is a set which specifies what is different between the
current database state and the proposed database state and how. For each
element of the set, if t is not empty, then a proposition that was true no
longer is; if t' is not empty, then a new proposition is now true. Updates,
therefore, are still set-based, the only difference is that now there is
enough information to enforce transition constraints if they are defined.
Tuples have identity, whether you like it or not: a candidate key value
*identifies* a tuple. Using transitions instead of relational or multiple
assignment simply allows a user to convey the information that tuples in
successive database instances correspond in some way so that transition
constraints can be enforced.
>
>> So, should Relation Variables be discarded in favor of Relation Schemata?
>
> This statement doesn't even make any sense. Nor does it appear to have
> anything to do with what came before it.
>
If relational assignment is broken, then there's no need to think in terms
of relation variables, because if you can't assign a value to a variable,
then they are of no use; instead, you can think in terms of relation
schemata.
>
> Marshall
>
But that assumes that the key value from the current state identifies a
tuple in the proposed state. That is outside of the scope of a key as
defined by the model; moreover, it injects meaning into the key that may or
may not match what the user perceives that meaning to be. Thus, the user
must specify during the update whether or not tuples in successive database
states correspond so that transition constraints can be enforced.
> --
> David Portas
>
Argument?! I was agreeing with you. You said "more than one"
and I pointed out that the number was actually always very large.
The two positions are consistent. What were you imagining this
was an argument of? Never mind; I don't actually want to know.
It is pretty clear to me that you fundamentally misunderstand
some of the basics of the RM. It is also pretty clear that you
are convinced that you do understand them. Have you noticed
yet that *no one* buys your argument about people changing
their names? Nobody. You bring us the same example again
and again, you defend it with detailed explanations and what
you probably think of as well-reasoned logic, and yet we're
all unmoved. So which is more likely: you understand and
everyone else everywhere gets it wrong, or that you have
a basic misunderstanding?
Marshall
I think you're confusing attributes with tuples. Even if you're not, I
agree: tuples are values. If they *can* correspond, it is in the mind of
the designer of the database who defined the transition constraint, and thus
the fact that they *do* correspond must be conveyed by the user during the
update.
Not without violating the constraint. The constraint ensures that the
"marriage" event precedes the "divorce" event.
> Anyhows...
>
>>
>> Current Proposed
>> Jane Jones Married Jane Jones Married
>> Jane Smith Single Jane Smith Divorced
>>
>> Should the proposed state be rejected? Or did Jane Jones get Divorced
>> becoming Jane Smith at the same time that Jane Smith married Bob Jones?
>
> Brian, this example makes no sense whatsoever. If anything it
> highlights what we've all been saying. In fact you explain why
> yourself:
>
>> is impossible to tell: not enough information has been provided.
>
> Exactly. There is insufficient information to identify the people
> concerned. The design is fubar. Nothing else to it.
>
The design is a simple example of what can happen if a key can change along
with another attribute that participates in a transition constraint. I've
stated before that key stability is irrelevant, if a key can change, it
will, and if it can change, then the attribute participating in a transition
constraint can also change at the same time.
> The whole point is that here someone's name is insufficient to identify
> them. In fact, forget the database, this would be a problem just in the
> company when one of these women tried to explain to a secretary who she
> was.
>
But identity beyond a single database state is in the mind of the database
designer, or the user, or both. The fact that that identity exists must
therefore be conveyed by the user during updates.
Sorry about that. I misinterpreted your statement.
> It is pretty clear to me that you fundamentally misunderstand
> some of the basics of the RM. It is also pretty clear that you
> are convinced that you do understand them. Have you noticed
> yet that *no one* buys your argument about people changing
> their names? Nobody. You bring us the same example again
> and again, you defend it with detailed explanations and what
> you probably think of as well-reasoned logic, and yet we're
> all unmoved. So which is more likely: you understand and
> everyone else everywhere gets it wrong, or that you have
> a basic misunderstanding?
>
I was just trying to be concise. If you can't see that it can happen from
this simple example, then how would you possibly be able to see that it can
happen in much more complicated and subtle situations. It is extremely
difficult not only to articulate the circumstances surrounding the numerous
instances where I've seen similar situations, but also to defend the bozos
that designed the systems in the first place along with the people who
wanted to change the keys. Most of the work I do is cleaning up the messes
that others have made. One situation involved a poorly-written trigger
that, in part, prevented an inventory row from being deleted unless the
quantity on hand was zero. Since the trigger was defined as FOR UPDATE,
DELETE, it rejected key updates if any affected row had quantity because it
could not correlate the rows in the deleted pseudotable with those in the
inserted pseudotable, and thus it appeared that those rows were being
deleted. The symptoms in this case involved an update being interpreted as
a delete, but the underlying problem is the same: the inability to correlate
tuples during an update in the presence of a transition constraint where the
key can change.
So, yes, I do understand. I wouldn't say that everyone else everywhere gets
it wrong, but instead that because of the work that I do, my experience with
unusual situations is broader than most.
>
> Marshall
>
"Suppliers with status 20 cannot be deleted."
I think this is indeed an example of a transition constraint. Can this
be expressed using just JOIN ?
It is the designer's responsibility to choose the right key and to
implement the correct transition constraint. That is all.
> Thus, the user
> must specify during the update whether or not tuples in successive database
> states correspond so that transition constraints can be enforced.
>
Exactly. As far as the DBMS is concerned, no such correspondence exists
at the logical level. There is only the relation value before the
update and the one after it. A constraint based on a notion of
tuple-level transitions (identifying tuples by something other than
explicit values in columns) would clearly be a dire violation of the
Information Principle. It doesn't take much analysis to demonstrate
that the consequences would be logical nonsense.
--
David Portas
In SQL (assume SupplierId is the key):
NOT EXISTS
(SELECT SupplierId
FROM Suppliers /* before */
WHERE Status = 20
EXCEPT
SELECT SupplierId
FROM Suppliers' /* after */) ;
--
David Portas
I am not confusing anything. If you agree tuples are not variables,
then you agree that tuples cannot 'change'. And by that one is saying
that they /cannot/ have a transition. That's the logic, and its
unavoidable - how can you argue against it?
This is a non-sequitur. Nevertheless I repeat there is insufficient
information to provide continual identification in these propositions.
It is always possible for all keys to 'change' and you are then
logically looking at a different 'thing' altogether. If this is not
what is desired, then there is a gaping design flaw for the task at
hand. The point is the holes that have been picked in that example can
be applied to /any/ example that is trying to illustrate what you say.
> I was just trying to be concise.
You ended up being terse. The same thing has happened to me, more times
than I care to recall. Just chuckle and try again.
> If you can't see that it can happen from
> this simple example, then how would you possibly be able to see that it
can
> happen in much more complicated and subtle situations.
The solution is not to assume that we are all people of very limited vision.
The solution is to present a situation that has more of the earmarks of a
real situation that you are dealing with.
> It is extremely
> difficult not only to articulate the circumstances surrounding the
numerous
> instances where I've seen similar situations, but also to defend the bozos
> that designed the systems in the first place along with the people who
> wanted to change the keys. Most of the work I do is cleaning up the
messes
> that others have made.
Back in the day, that was most of the work I did. BTDT.
I've heard everthing from "what's data normalization?" to "we don't have
time for all that theoretical crap?"
The problem in c.d.t. is not whether the people who designed, built, and
used the systems you repair were or were not bozos. The problem is whether
you are or are not writing for bozos in c.d.t.
There is a brief mention and example in TTM (3rd ed.). Transition
constraints don't appear to be well understood or supported. For
example Oracle exposes the tuple-level update flaw in its FOR EACH ROW
triggers.
--
David Portas
Identity beyond that provided by the identifying keys is a nonsense. By
not encoding it you have told the system it does not exist. If this is
not true who is at fault?
How come ? One can easily think about a transition constraint in set
terms by extending the original universe with a new attribute value.
Then declarative TC specification would be no different from any other
integrity constraint specification. That's, by the way, is what
Dataphor does with its 'old' and 'new' values.
>Since it is possible to
> model non-tuple-identity scenarios in the RM, there will be
> some scenarios in which transition constraints do not make
> sense.
What exactly do you have in mind when you talk about "some scenarios in
which transition constraints do not make sense" ?
What specific flaw that Oracle exposes do you have in mind ?
>
> --
> David Portas
Is this an example of, to quote, "a relational expression in the
general form of some join between t and t'." ????????????
It looks like it to me. In D&D terms, adding Dave P's apostophe, the
SQL might become something like:
(Suppliers <AND> {Status 20} <AND> Suppliers') {} = TABLE_DUM;
I like the apostophe because it emphasizes that we are really comparing
relations, not rows or what have you. Then from the dbms's point of
view, we are not implementing time, as far as it is concerned, 'before'
means 'this one' and 'after' means 'that one'.
p
How do you tell the system that it does exist? How can you know when you
design a system whether a key that appears to be very stable will remain
that way? I was called in to fix a problem at a company where management
decided to add a prefix to each part number so that they could tell at a
glance which plant produced a part. So an identifier that was very stable
suddenly wasn't for several days while admin staff were busily updating
every part number in the system, and duplicating parts that were made at
more than one plant. In the end many reports and the application had to be
modified in order to summarize the information for parts produced at
multiple plants and to work around the redundancy introduced as a result.
During that period the database became corrupt because it allowed changes to
occur that shouldn't have been allowed. In this case, preventing those
changes was left up to the application, which was not designed to deal with
multiple entries for the same part and which remained broken for several
weeks--making the problem even worse. It can be surprising how a change
that doesn't seem that significant can spiral into a total disaster. The
point is, keys that appear to be very stable can change for the stupidest
reasons, and constraints should be definable and enforcible in the
database--including transition constraints. The designers of a system are
making a gross assumption about the stability of those keys if they think
that what is stable today will remain stable tomorrow.
Closed-world assumption.
p
http://www.dbazine.com/ofinterest/oi-articles/celko35/view?searchterm=Celko%20transition
I examined triggers, CHECK() and DRI actions. The last one seems to be
the best.
CREATE TABLE StateChanges
(previous_state VARCHAR(15) NOT NULL,
current_state VARCHAR(15) NOT NULL,
PRIMARY KEY (previous_state, current_state));
INSERT INTO StateChanges VALUES ('Grape Juice', 'Grape Juice');
INSERT INTO StateChanges VALUES ('Grape Juice', 'Wine');
INSERT INTO StateChanges VALUES ('Wine', 'Vinegar');
INSERT INTO StateChanges VALUES ('Wine', 'Spoilage');
INSERT INTO StateChanges VALUES ('Vinegar', 'Spoilage');
INSERT INTO StateChanges VALUES ('Spoilage', 'Spoilage');
-- 'Grape Juice' is the initial state and a DEFAULT. 'Spoilage' is a
terminal state.
CREATE TABLE WineCellarChanges
( ..
previous_state VARCHAR(15) DEFAULT 'Grape Juice' NOT NULL,
current_state VARCHAR(15) DEFAULT 'Grape Juice' NOT NULL,
FOREIGN KEY (previous_state, current_state)
REFERENCES StateChanges (previous_state, current_state)
ON UPDATE CASCADE,
transition_time TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL,
..);
>> For instance, consider the following states for a relation describing people's marital status, and a transition constraint that says: Single people can't become Divorced: <<
You are wording it in the negative. I would 'single' the initial state
and 'dead' the (very) terminal state for the transitions. Basically
the StateChanges table is a directed graph written as an adjacency list
model.
>> Since a transition that violates a transition constraint can result in the same /possible state/ as a transition that doesn't, the notions of relational assignment and multiple assignment are broken: <<
Using DRI, I think we avoid an illegal transition. I can also use the
transition_time to keep a history and do a "rollback" if needed.
Notice that state changes can be temporal -- graph juice becomes wine
or vinegar in (n) days, so the time stamp can be important.
--CELKO--
BTDT. This is a classic case, one that has happened enough times so that
it's well documented in the literature.
The consequences should have been eminently predictable, to a seasoned DBA
or database designer. That person should have pushed back to management,
alerting them to the probable disaster awaiting if they changed the keys for
part number.
The specific change wanted by management was, itself, a bad one. It adds
"intelligence" to a key, with the attendant disadvantages.
If a DBA did push back, and management went ahead anyway, then management
screwed up, big time. Unfortunately, managment screwups are often, in
today's world, blamed on somebody else, and managment continues to make the
same mistake.
Because facts are about things, and things /do/ change. If you want to
limit the set of possible database states to include only facts that
reflect alterations that /can/ occur, then you need to know what each
thing looked like at the end of the last change and what it looks like
in each consistent database state.
I disagree. The value of a key does not necessarily mean the same
thing in successive database states, unless you impose an immutability
constraint on it. Therefore, any conclusion drawn from correlating
facts from successive states by using a mutable key is based on a false
premise. Consequently, transition constraints that involve joining
successive relation values on a mutable key are incorrect, and will
allow transitions that should not occur and will prevent transitions
that should. You need to tell the database that a key's values mean
the same thing in successive database states. You do that by imposing
an immutability constraint.
You have still not explained how you intend to go against this logic.
That is honestly completely irrelevant Brian. We are /only/ concerned
with statements. You are confusing levels, and this seems to be the key
point.
Then you are trying to rewrite the definition of identity that
underlies all logic. This is not a sensible thing to do in light of the
thousands of years of thought it took to reach it.
I can only think the reason you cannot see this is because you have
vested so much time and become too entrenched . I hope you can give
this a rethink given the lack of success you have had convincing this
audience. Jim.
I agree with David.
However this is not an issue with the logical model. It is not like an
identifying 'hair_colour' attribute changing from brown to red. Rather
it is just a /translation/ of one label to another, even though it
represents /exactly/ the same value it did before. (no different from
translating a database into french say). Its a subtle difference but
the key is still completely stable, and imo should have been altered in
old relation values too so as to maintain consistency given the
managements foolhardy renaming of the world.
Selzer has the ball. He moves up court. He stops, he
looks... Selzer shoots! He misses! Wide by a mile!
What kind of drugs is he *using?*"
[snip]
Forgive this bump, but I'm a little curious. Was my above
question irrelevant or tangent or ignorant in some way? Are
such constraints usually specified in relational calculus?
Thank you in advance to anyone offering guidance.
-- Keith -- Fraud 6
Let's see.... Aspirin...Tylenol.... I think I need some now. My ISP's
news server is on the fritz, so part of the following may appear to be
a repost....
Can a candidate key value identify something beyond the scope of a
single database state? It appears to be the general concensus that it
can: choose a stable key...surrogates are sometimes necessary.... The
fact is, key stability is a workaround because a candidate key value
CANNOT identify something beyond the scope of a single database state.
The stability of a key is irrelevant, because within the scope that its
values can be used, it can identify a fact. Moreover, what it
identifies depends on the context of a query or update. Consider a
line of people waiting to be served at the bank, (Bob, Brian, Lynn,
Susan, Mark, Lisa). Many different queries can be answered, for
example: "Who's first in line?" "How many people are in front of
Susan?" "Who's in front of Lisa?" "Is Joe in line?" In a relational
database, you can model a line as a relation schema, LINE {PERSON,
POSITION} where both PERSON and POSITION are candidate keys. Since
PERSON is a candidate key, the answer to "Is Joe in line?" can be
answered. Since POSITION is a candidate key, the answer to "Who's
first in line?" can be answered. Once Bob has been served, the line
changes, (Brian, Lynn, Susan, Mark, Lisa). The queries above can still
be answered, but in this state the facts are all different. In the
original state, Bob is first in line; in this state Brian is first in
line. The value for POSITION in one state does not identify the same
thing in the other. For example, the value "1" for POSITION in the
first state identifies the tuple (Bob, 1); in the other it identifies
the tuple (Brian, 1). It is clear that (Bob, 1) and (Brian, 1) are not
identical. Similarly, the value "Brian" for PERSON in the first state
identifies the tuple (Brian, 2), whereas in the other it identifies
(Brian, 1). It is clear that (Brian, 1) and (Brian, 2) are not
identical. Therefore, a candidate key value can only logically
identify something in single database state. Furthermore, any given
tuple can represent different things depending on the perspective of
the observer. (Brian, 2) could represent Brian's position in the line
in one context, or the person who's second in line in another. So, the
meaning of the answer to a query depends on the key used to identify a
fact. Similarly, the meaning of an update depends also on the key used
to identify a fact. For example, if I issue the update,
UPDATE LINE SET PERSON = 'Joe' WHERE PERSON = 'Mark'
What I'm saying is, "Hey, wait a minute! That's not Mark, it's Joe."
On the other hand, if I issue the update,
UPDATE LINE SET PERSON = 'Joe' WHERE POSITION = 5
What I'm saying is, "Joe is fifth in line, not whoever was there
before."
The point is, by definition, a candidate key value can only be used to
identify a tuple in a single database state. Using one beyond that
scope implies that it has already been assigned meaning--that it
identifies more than just a tuple. Meaning is something that should be
left up to the designer of a database and it's users, not the data
model. In order to separate meaning from the model, erroneous
assumptions, such as, "A candidate key value must identify the same
thing throughout an update." must be discarded. It *can* identify the
same thing, if the designer so chooses, but that doesn't mean that it
always does.
Because a state-transition involves more than one database state, a
transition constraint is *always* concerned with what values mean, not
just how relation values can change from one database state to another.
That's the rub: how do you divorce meaning from the model and still
enforce transition constraints? At the same time, the model must
provide a mechanism for defining transition constraints that is also
not based on any erroneous assumptions, such as, "if a key value
changes, then you're talking about different things." At a minimum,
there must be a way to correlate the facts in one database state with
those in the next. Facts are values and cannot change, but what
they're about can, so in order to define a transition constraint, you
must be sure that you're looking at facts from both states that are
about the same thing. There are two ways to accomplish this.
A change marks a temporal boundary between two successive situations in
one universe. Determining if a thing in the preceding situation is the
same thing in the succeeding situation requires either knowledge of how
a change can affect something or the ability to observe something as it
changes. You must either know that the change will not affect at least
one set of identifying properties of something, take note of them in
the preceding situation and then compare them to the same set of
identifying properties of something in the succeeding situation, or you
must identify the thing in the preceding situation and then watch it as
it passes through the temporal boundary into the succeeding situation.
In other words, if you know that a set of identifying properties will
not change, then given that set of identifying properties, you can
identify the thing in both situations; or, if you can observe something
as it changes, then once you've identified it, you know that it is the
same thing in both situations. In the context of a database, a set of
identifying properties that is not affected by the change may not be
relevant to the discussion. If that is the case, then in order to
enforce a transition constraint, you must either inject irrelevant
information into the discussion, or you must be able to observe things
throughout a transition, and then convey how that transition affected
each thing individually.
As has been made clear in earlier posts, keys can change, so unless you
introduce object identifiers or immutable surrogates, you cannot
*always* enforce transition constraints based only on key values. (I
percieve other problems with relations that have more than one
candidate key, but I haven't drawn any conclusions.) My main objection
is the injection of irrelevant information when it's not necessary. If
a change can be observed, then it doesn't matter if keys change, you're
still looking at the same thing. That's why I think that the update
semantics of the model should be augmented to allow the user to specify
how facts correlate if needed. This does not break the model, it just
makes it possible to enforce transition constraints when keys can
change.
No, you're question isn't irrelevant, nor tangent, nor ignorant. The
problem is that I don't think that it *can* be expressed in relational
calculus without making gross assumptions.
If you make the gross assumption that key values in the old state
identify tuples in the new state, then
NOT (EXISTS p IN new EXISTS q IN old (q.LAST = p.LAST
AND q.FIRST
= p.FIRST
AND q.STATUS
= 'Single'
AND p.STATUS
= 'Divorced'))
The problem is that transition constraints describe the changes that
can occur to things in the universe and that can be recognized in the
differences between tuples in the current database state and tuples in
a proposed database state. Just because a candidate key value can
identify a fact in the current database state doesn't always mean that
it can identify one in the proposed database state, nor does it mean
that the one identified is about the same thing. The definition of a
candidate key limits its scope to a single dabase state. Keys can
change, but that doesn't mean that the facts they determine in their
respective database states don't refer to the same thing at different
times.
I posted a detailed explanation to J M Davitt, but here's the short
version.
A candidate key value can only identify a tuple within a single
database state. Using that value to identify a tuple in a different
database state implies that the value identifes more than just a tuple
(or fact). So if a tuple from the current database state represents a
fact about something, and a tuple with the same key value in a proposed
database state represents a fact about something, then the gross
assumption is that both facts are about the same thing. So, you're
right, tuples are values and cannot change, but using a candidate key
value beyond the scope of its definition injects meaning into that
value so that it doesn't just identify a tuple, but also something in
the universe. Therefore, if it's possible for a key value to identify
something in the universe, then it's also possible for that key to
change and still identify the same thing. It's also possible for
things to change so that the key value identifies something else.
Hence, a transition constraint prevents database states that are
illogical not because values are different, but because the new values
represent facts about things that cannot be true given the current
state of things reflected in the old values.
Bottom line: if a transition constraint can be defined, then tuples
represent facts about things that can change.
This is not an explanation of how you are going against the logic. It's
fluff. If tuples cannot change there can be no transition.
>
> A candidate key value can only identify a tuple within a single
> database state.
No. That is only a /necessary/ condition of a candidate key. There are
further characteristics that seperate choosing a good key from a bad
one.
> Using that value to identify a tuple in a different
> database state implies that the value identifes more than just a tuple
> (or fact).
It gives us a clue that the key for a tuple /may/ also be a good
identifying attribute for an item in the real world. This may not be
true.
> So if a tuple from the current database state represents a
> fact about something, and a tuple with the same key value in a proposed
> database state represents a fact about something, then the gross
> assumption is that both facts are about the same thing.
More than Gross, its an atrocious assumption. Facts are not about
things, they /concern/ things. In fact the above assumption shows a
misunderstanding about the nature of the statements we are recording.
This is the root of your mistake imho.
> So, you're
> right, tuples are values and cannot change, but using a candidate key
> value beyond the scope of its definition injects meaning into that
> value so that it doesn't just identify a tuple, but also something in
> the universe.
An identifying value can be used as a key in a statement, but we cannot
assume the reverse. It is illogical. I think you are confusing the role
of identifying an item and identifying a statement of fact.
> Therefore, if it's possible for a key value to identify
> something in the universe, then it's also possible for that key to
> change and still identify the same thing.
It can be renamed and happily identify the same 'thing' (i.e. a domain
renaming). However this would require a renaming in old relation values
too to maintain consistency.
It can also change values (within a domain) and identify a snapshot of
a 'thing'.
However it no longer identifies a things identity over several
snapshots of it. If that is what matters it was an error using that
identifier in the initial propositions. You should have used an
identifier that represents that 'thing' over that time.
> It's also possible for
> things to change so that the key value identifies something else.
This sentence makes no sense. Things don't have key values, only
tuples.
> Hence, a transition constraint prevents database states that are
> illogical not because values are different, but because the new values
> represent facts about things that cannot be true given the current
> state of things reflected in the old values.
>
> Bottom line: if a transition constraint can be defined, then tuples
> represent facts about things that can change.
No. Things can change, however to identify something over time in the
real world we need /some identifying attribute/. How on earth would we
ever know it was the same thing otherwise! When you meet someone you
have not known since they were a child, and they have changed their
name through marriage, you cannot identify them. They have to ascertain
a consistent identifier for you to recognise them.
All you have done is highlight bad key choices. This is a valid
problem, but external to the RM.
oids are clearly no use as they are not observable identifiers.
However, you are correct - surrogates /can/ be used. The problem is
your use of the term 'immutable' - it is the wrong perspective,
starting from a db schema viewpoint, as opposed to correctly starting
from the necesseties of real world identification. Its consistency of
identification that matters there rather than some notion of
immutability.
> (I percieve other problems with relations that have more than one
> candidate key, but I haven't drawn any conclusions.) My main objection
> is the injection of irrelevant information when it's not necessary.
A surrogate is not irrelevant. It may be vital information.
You may indeed be right. It's difficult to enforce an immutability
constraint because it's also a transition constraint, and if you take
your eyes off the ball, someone may have switched it with another while
you weren't looking. So a surrogate will work, provided the meaning
injected into the surrogate is independent of time--that is, the values
have never been used before will never change and will never be used
again.
I tend to look at things from more of a practical standpoint than a
theoretical one, so when I think about constraints, I think more in
terms of how they can be enforced rather than what they do, probably
because over the years I've had to work around so many implementation
limitations. My perspective is therefore more pessimistic than
optimistic. I tend to focus on whether or not a constraint can be
circumvented rather than which is the best way to declare one.
> > (I percieve other problems with relations that have more than one
> > candidate key, but I haven't drawn any conclusions.) My main objection
> > is the injection of irrelevant information when it's not necessary.
>
> A surrogate is not irrelevant. It may be vital information.
>
If it's not required to get the job done, then it's not relevant and
thus shouldn't be defined in the schema; if it's defined in the schema,
then it's relevant. For that matter, if a proposition about something
appears in the database, then the proposition is true and the thing
both exists and is relevant. If a proposition about something doesn't
appear in the database, then either the proposition is false or the
thing is either not relevant or does not exist.
In what way does whether a fact is about something or a fact concerns
something make any difference in this context? If this is the root of
my mistake, then could you please explain? I guess if a fact describes
a class of things (in the mathematical sense, things with a common
property), then it could /concern/ something indirectly. Is that the
distinction you're making? Regardless, if a fact /concern's/
something, then there's still something out there in the universe
that's associated with the fact.
>
> For instance, consider the following states for a relation describing
> people's marital status, and a transition constraint that says: Single
> people can't become Divorced:
>
> Current Proposed
> Jane Jones Married Jane Jones Married
> Jane Smith Single Jane Smith Divorced
The schema does not contain enough information to validate marital status.
For instance, if a person switches from single to divorced, do you reject it
and have them call you an idiot because they are trying to correct a
mistake?
Or, do you put in some absurd requirement that the person change to married,
then to divorced?
Why not just record what you know? A table of claimed marital statuses with
effective dates would detail the history of the claimed states, recording
what you've been told and offering future investigation.
P.S. the Catholic Church solves this problem the old fashioned way, with a
single piece of paper per individual. If two people want to get married in
parish A, the church secretary at Parish A sends the vital stats off to
Church B, the groom's baptism church, and Church C, the bride's baptism
church. After both have confirmed that neither are committing bigamy, they
give the go-ahead.
Sometimes I think it helps to remember that we are really just in the
record-keeping business.
--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
Because you can't define an enforcible transition constraint in terms
of a single database state. Adding an additional relation to record
claimed states just shifts the transition constraint from the one
relation to the other, and nothing prevents a user from changing what
has already been recorded. Let me try to be more clear. It is
possible to issue an update that shouldn't be allowed. In your
suggested solution, the user could simply replace the entry that
indicated the last marital status, instead of adding an additional
entry. This allows the user to circumvent the constraint. While in
the above example you may want to allow that in order to correct a
mistake, in an accounting environment, changes must often be tracible.
So as an answer to your question above, you could modify the constraint
above to allow an illegal change only if an explanation for the
correction is recorded along with it, perhaps in another relation.
I think I was very clear: values do not change, things do. Codd
mentioned "key updates" in the context of checking for consistency in
the paper that defined the Relational Model. Of course, he was then
concerned only with checking for the consistency of individual
instantaneous states, but when he later defined what a data model is,
he included "changes of state" as a type of integrity rule that can be
defined. So it's clear that Codd knew that keys could change when he
defined the Relational Model, and it is clear from his definition of a
data model that he expected a data model to provide a mechanism for
determining allowable state changes. Therefore, If tuples are values,
and values cannot change, then either state-change integrity rules are
nonsense, or tuples must represent facts about things that can change,
and a system conforming to the model must be able to detect those
changes and react accordingly--even if a key changes.
I agree that it's just a translation, or perhaps a transliteration.
But there are some practical conseuquences to changing a key, even in a
systematic way. In practice, a great deal of data is often under the
management of the DBA while its in the database, but passes beyond the DBA's
span of control, when it is extracted and either fed to some other body of
data or printed in a report.
In particular, some of the old foreign keys will be beyond the control of
any DBA to translate into the new key value. Thus, when people go through
the archived data, or perhaps the warehoused data, the issue will arise of
whether a given foreign key should be referred back to the "old reference
table" or the "new reference table". This issue won't arise in the case
under discussion, because you can tell "at a glance" (shudder!) whether a
key is old style or new style. But there are key changes where it's not
obvious.
As an example, In October, 1752, the American colonies adopted the "new
style calendar" (Gregorian). For several years thereafter, dates had to be
qualified as O.S. (Old Style) or N.S. (New Style). An unqualified date was
subject to misinterpretation, and attachment to the wrong day. This had
practical consequences in, for example, the calculation of mortgage
amortizations. Yeah, yeah, this has nothing to do with computerized
databases, but it illustrates a problem with data management nicely.
And, in the long run there isn't much difference between mismanaging data
with quills and parchment or with computers and long term storage. The
biggest difference is how much data you can screw up in an hour.
> A candidate key value can only identify a tuple within a single
> database state.
Here's where you're going wrong, Brian. A candidate key (from the logical
model) doesn't identify
a tuple. Not even within a single database state.
One might say that it specifies a tuple. That isn't the same thing as
identifying the tuple.
Then why did Codd use exactly that term when he defined the Relational
Model? I quote,
>>>>
Normally, one domain (or combination of domains) of a given relation
has values which uniquely identify each element (n-tuple) of that
relation. Such a domain (or combination) is called a primary key.*
*E. F. Codd, "A Relational Model of Data for Large Shared Data Banks,"
Communications of the ACM (June 1970)
PMFJI, I would say that the VALUE of a candidate key identifies one and
only one tuple FOREVER! Nothing to do with `states`. The value can`t
ever change, rather a different value would identify a different tuple.
Surely since we can distinguish two tuples based only on their values,
then we can identify them.
p
Stupid me, I have to take part of that back - the value of a candidate
key obviously could identify several tuples but I still think that would
hold forever. Might have been better to say the value of a candidate
key identifies a tuple regardless of time.
p
A candidate key does not identify a tuple. A candidate key is a
constraint on a relvar and not on a tuple.
No argument about a candidate key being a constraint. I`m talking about
the value of a candidate key. If you can infer the values of the other
attributes from that value, I`d say you have achieved identification.
p
And one cannot infer anything from a subset of the attributes when one
is talking about a tuple. The only thing that identifies a tuple is the
tuple's value. Just as the only thing that identifies the number 5 is
the number 5.
There must be a subtlety here that eludes me. If a candidate key of a
tuple has a value of 1 and a tuple in a relation that has that candidate
key has a value of (1,2) then I would say that the value 1 certainly
identifies that tuple.
p
There must be a subtlety here that eludes me. If a candidate key, k, of
a tuple has a value of 1 and a tuple in a relation that has that
candidate key has a value of (1,2) then I would say that the value k=1
There must be a subtlety here that eludes me. If a candidate key of a
tuple has a value of 1 and a tuple in a relation that has that candidate
key has a value of {k 1, x 2} then I would say that the value 1
There must be a subtlety here that eludes me. If a candidate key, k,
of a relation has a value of 1 in some tuple and a tuple in a relation
There must be a subtlety here that eludes me. If a candidate key, k, of
a relation has a value of 1 in some tuple and a tuple in that relation
has a value of {k 1, x 2} then I would say that the value k = 1
Okay, maybe now I'm seeing the subtlety, if you are talking about the
tuple after it's been identified, ie., a tuple in the context of a
relation. If I've got that right, I could have been more clear that I
was talking about identifying a tuple within a relation. Your emphasis
on language precision, which some people might call pedantry, hurts my
head, but I suppose it's necessary and I really shouldn't complain if
the theorists are to find any common ground with the hackers (or if the
hackers are to get their act together).
p
k=1, x=2 identifies that tuple
k=1, x=3 identifies a different tuple
A candidate key specifies an irreducible subset of attributes which must
be unique within a relvar. Both tuples {k 1, x 2} and {k 1, x 3} are
equally valid within the relvar. Just not at the same time.
> Okay, maybe now I'm seeing the subtlety, if you are talking about the
> tuple after it's been identified, ie., a tuple in the context of a
> relation. If I've got that right, I could have been more clear that I
> was talking about identifying a tuple within a relation.
A relation is a value, which means it is identified by that value. The
concept of a candidate key is pretty much meaningless with respect to
relations. What is irreducible for a relvar may very well be reducible
for the relvar's current value.
Your emphasis
> on language precision, which some people might call pedantry, hurts my
> head, but I suppose it's necessary and I really shouldn't complain if
> the theorists are to find any common ground with the hackers (or if the
> hackers are to get their act together).
I am not even sure what you are saying above. This is a precise field.
If you do not understand what a tuple is, you will not understand much
of anything in this field.
"precise field"? that's a laugh, there are plenty who avoid precision
and claim the same thing. okay, just in case there's some meat in this
thread, which i'm starting to doubt, stupid me, i'll take back the
pedantry comment, for now, and substitute mysticism for it. surely the
notion of tuple is a mere construct to enable talk about relations.
sorry about the commas, at least i avoided latin.
p
Mmm, I'm going to have to disagree with that one. It's meaningless
as a prescriptive constraint, yes. But it still has descriptive value.
> What is irreducible for a relvar may very well be reducible
> for the relvar's current value.
This one I agree with.
Marshall
I think there is a subtlety here that eludes you.
> If a candidate key, k, of
> a relation has a value of 1 in some tuple and a tuple in that relation
> has a value of {k 1, x 2} then I would say that the value k = 1
> certainly identifies that tuple.
Let's look at it a different way. Let's specify the candidate key
as a functional dependency constraint.
Given a relation R with two attributes, k and v, where k -> v.
(Aka, {k} is a candidate key.)
Expressing this as a constraint:
forall k, v, k', v' in R: k = k' => v = v'
"For all pairs of tuples (k, v) and (k', v') in R, if k equals k'
then v = v'."
So if we are restricting ourselves to only looking at one tuple (k, v),
what does that mean? Well, it doesn't mean anything, because
what it's saying depends on there being two tuples to compare.
If we're talking about a relation with only one tuple, then the
pair (k,v) and the pair (k', v') are the same pair, so again it
doesn't say anything.
Marshall
> Bob Badour wrote:
>
>>The concept of a candidate key is pretty much meaningless
>>with respect to relations.
>
> Mmm, I'm going to have to disagree with that one. It's meaningless
> as a prescriptive constraint, yes. But it still has descriptive value.
The most specific type is interesting. I guess the concept of a
candidate key for the MST has some meaning.
>>What is irreducible for a relvar may very well be reducible
>>for the relvar's current value.
>
> This one I agree with.
I prefer to think in terms of types. The MST of the relation must be a
subtype of the declared type of the relvar. A candidate key is part of
the type declaration for a relvar.
All words are constructs that enable us to talk. This is a precise
field; hence, we use precise terms like tuple.
> All words are constructs that enable us to talk. This is a precise
> field; hence, we use precise terms like tuple.
in the morning, not feeling the plonk so much, i think i'll think that
is just as circular as it seems now.
p
In the interest of precision, Bob, A candidate key specifies an
irreducible subset of attributes which must be unique within /any
relation/ for a given relvar. There's a huge difference between a
relation and a relvar.
Sorry for the delay in posting, my ISP's news server hiccupped, so
several posts I sent before didn't make it here.
I looked at your article. I found several issues with it.
The trigger is broken because "FROM 01, N1" is a
cross join and the trigger is not a row trigger. If more than one row
is
affected, it may reject a transition that should be allowed.
CHECK constraints require that the user be cooperative, for example:
If a row for WineCellarChanges contains (...,'Vinegar', 'Spoilage')
and someone issues an update: (...,'Grape Juice', 'Wine')
Then obviously, the transition shouldn't be allowed, but the check
constraint would pass it.
The same may not be true for DRI constraints in SQL (I haven't really
looked hard at this, because the rest of the article is either broken
code,
or leaves gaping holes that do not prevent updates that should be
prevented.), but this does not apply to the Relational Model, because
the
entire set of relations are replaced, so it's possible to issue a
relational
or multiple assignment that would circumvent the transition constraint.
If you wouldn't mind a suggestion, perhaps you should make sure that
your
code actually does what it says it does before you publish it with your
name
on it. Considering the lambasting I've received on this forum just for
using the wrong terminology, I'm really surprised that you included
that link at
all.
I disagree about the emphasis on precision being pedantry. It's pedantry
only when the distinction makes no difference.
In that context, I have to back track a little. A few responses back, I
said that a candidate key doesn't identify a tuple.
Perhaps I should have said "relvar that contains a tuple". In the context
of an update, it's easy to confuse the two.
Let's go back to the start of this thread:
> A transition can be defined as a set of triples (r, t, t') where r is the
name
> of a relation, t is a tuple from the current state, and t' is a tuple from
> the proposed state.
There's a problem here: what ties t and t' together? Do they share a
common identity? Well, no they don't, because they aren't the same tuple.
However what I think Brian implies is that there is a relvar that has the
state t before the transition and the state t' after the transition. Notice
that it's the relvar whose identity endures the transition, not the
contents.
A small digression: all this reminds me of the (in)famous SQL construct:
UPDATE .... WHERE CURRENT of <cursor>
End digression.
Anyway, how does
(r, t, t')
differ from the pair of transitions?
(r, t,empty)
(r,empty,t')
Or how does
(r, t0, tn)
differ from the sequence of transitions?
(r0, t0, t1',r1)
(r1, t1, t2',r2)
...
(rn-1, tn-1, tn',rn)
Where rk are the states of some relvar r caused by a sequence of n
assignments for k in [0,n], ti is the set of tuples in ri not appearing
in ri+i for i in [0,n-1], and tj' is the set of tuples in rj not
appearing in rj-i for j in [1,n].
Your comment regarding 'where current of' is apropos. If a dbms
implements 'on update cascade' semantics, the dbms has full use of its
internal representation for implementing those semantics. There is
nothing wrong with the dbms using physical pointers for keeping track of
what goes where.
I suggest 'on update cascade' semantics can only apply for something
like the SQL UPDATE statement and not for generalized relational
assignment. Obviously, for INSERT, the dbms cannot cascade inserts out
of thin air. If a dbms implements 'on delete cascade' and 'on update
cascade' semantics for a set of relations, then (r, t, empty) followed
by (r, empty, t) will have the effect of deleting all references to t
from all referencing relations.
Because the Tutorial D comma operator combines multiple queries into a
single atomic statement, I expect that Tutorial D will have problems
implementing 'on update cascade' semantics.
I question the need for 'on update cascade', but I note that some views
would exhibit behavior similar to 'on update cascade'. This suggests
those views pose problems for generalized assignment too.
For example, if we have two base relations ree and rer (for referree and
referrer) were ree has two candidate keys k0 and k1 and where rer has a
foreign key reference k0 to ree, one could create views
vee as ree{all but k0}
and
ver as (ree join rer){all but k0}.
It would appear that updates to vee will cascade to ver, but what
happens in the case of an arbitrary assignment?
Suppose somebody made a copy of vee yesterday in relation rhist. What
would happen to ver if someone assigned the value of rhist to vee?
That's probably because it hasn't clicked yet that a transition is a *set*
of triples, where each element represents only a distinct component of the
overall difference between the current database state and a /possible
state/. The above construct involves something that passes over a result
set in a particular sequence. That's a totally different thing altogether.
My thinking is that a user must assert which combination of component
differences applies to a particular change, since there can be more than one
combination for a /possible state/ and not all of those combinations may be
allowed.
>
> End digression.
>
> Anyway, how does
>
> (r, t, t')
>
> differ from the pair of transitions?
>
> (r, t,empty)
> (r,empty,t')
>
In the transition {(r, t, t')}, t and t' both /concern/ the same thing;
however, in the transition {(r, t, empty), (r, empty, t')}, t and t'
/concern/ different things.
A thing can come into existence, can change in appearance, and can become
nonexistent. The distinction between the two transitions makes it possible
for the database designer to constrain information about each type of event.
For example, the U.S. government requires additional documentation for cash
transactions that exceed $5,000. The above distinction makes it possible to
reject such a cash transaction without the requisite documentation. Since a
transition is a set of component differences, one component could represent
the documentation while another could represent the cash transaction, and a
transition constraint could then eliminate from consideration all
transitions that don't include both the documentation and the cash
transaction whenever the amount exceeds $5,000.
>
>
>
>
>
>
>
>
>
Please permit me to withdraw my above-quoted statement;
upon further reflection, (derived from vc's comment) it does
not hold up.
Marshall
I'm well aware that you are talking about a set of transitions. Your
professional history of dealing with bozos has conditioned you into a habit
of condescension that is out of place in this newsgroup.
What may not have clicked with you is that the "set of transitions from one
state to another" describes, nearly exactly, the progress of a processor
executing a computer program. Abstracting out all of the intermediate
states, where some, but not all of the transitions have been carried out,
is the essence of atomicity of transactions.
> My thinking is that a user must assert which combination of component
> differences applies to a particular change, since there can be more than
one
> combination for a /possible state/ and not all of those combinations may
be
> allowed.
>
> >
> > End digression.
> >
> > Anyway, how does
> >
> > (r, t, t')
> >
> > differ from the pair of transitions?
> >
> > (r, t,empty)
> > (r,empty,t')
> >
>
> In the transition {(r, t, t')}, t and t' both /concern/ the same thing;
> however, in the transition {(r, t, empty), (r, empty, t')}, t and t'
> /concern/ different things.
>
How do you know?
PS: in the above, I intended to say "the pair of transitions", rather
than "the transition composed of the pair of triples".
If you treat
(r, t, empty)
(r, empty, t')
as a single transition composed of two triples, you're going to be in
deeper water than you already are.
I ASS-U-MEd that you were confusing triples with transitions because of the
digression below. I was wrong, and I apologize. I did not intend to be
condescending.
> What may not have clicked with you is that the "set of transitions from
> one
> state to another" describes, nearly exactly, the progress of a processor
> executing a computer program. Abstracting out all of the intermediate
> states, where some, but not all of the transitions have been carried out,
> is the essence of atomicity of transactions.
>
I'm not sure that I understand what you're trying to say. Yes, there may be
intermediate states within a transaction, and from a logical standpoint,
it's not necessary that we know what those are.... Did I somehow imply by
saying "from one state to another" (I didn't think I said that...maybe it
was in another thread) that there would be a series of successive
transitions between the current state and each possible state? That was
definitely not my intended meaning. What I did intend was that the user
specifies one and only one transition, which when applied to the current
database state yields one and only one possible database state. Just as the
set of all state constraints defines a set of consistent database states,
each of which may become current; the set of all transition constraints
defines a set of possible transitions, each of which by itself describes the
difference between the current database state and a possible database state.
>
>> My thinking is that a user must assert which combination of component
>> differences applies to a particular change, since there can be more than
> one
>> combination for a /possible state/ and not all of those combinations may
> be
>> allowed.
>>
>> >
>> > End digression.
>> >
>> > Anyway, how does
>> >
>> > (r, t, t')
>> >
>> > differ from the pair of transitions?
>> >
>> > (r, t,empty)
>> > (r,empty,t')
>> >
>>
>> In the transition {(r, t, t')}, t and t' both /concern/ the same thing;
>> however, in the transition {(r, t, empty), (r, empty, t')}, t and t'
>> /concern/ different things.
>>
>
> How do you know?
>
In the one transition both tuples appear in the same triple, whereas in the
other, they appear in different triples. So if they appear in the same
triple, then they concern the same thing. If they don't appear in the same
triple, then either they don't concern the same thing, or it wasn't
important that they do. In a closed world, if you're not told that facts
concern the same thing, then you cannot prove that they do, so they don't.
> PS: in the above, I intended to say "the pair of transitions", rather
> than "the transition composed of the pair of triples".
> If you treat
>
> (r, t, empty)
> (r, empty, t')
>
> as a single transition composed of two triples, you're going to be in
> deeper water than you already are.
>
Is this not what happens with relational assignment? The old values are
deleted and the new values are inserted.
>
>
>
>
OK. No harm done. The subject of UPDATE ... WHERE CURRENT is worth a side
discussion here, because
it blurs the distinction between content based addressing and location based
addressing. The CURRENT row of a result table is like "the can of
Campbell's chicken noodle soup that I have in my left hand", although at a
different level of abstraction.
I should mention in passing that I am NOT one of those purists in this
forum who consider SQL so fallen as to be not worth the effort. On a
practical level I'll settle for SQL until I learn something better. And I'm
not in a hurry to learn something better. Later in this post, I'm going to
discuss triggers that are fired on update, in order to get a more concrete
handle on the abstraction that you are describing as a "transition".
And, sometimes, expressing things in a
The above is also worth a discussion of its own. Essentially the relational
model (or at least the part of the model I'm familiar with) models the state
of the database between transactions, not the intermediate states of the
database during a transaction.
During a transaction there is code operating on the client side and also
code that's operating on the server side. Describing database based
constraints that operate in that environment can get awfully complex.
In that context I'm going to refer you to the discussions about tutorial D
that surface in this forum. I've never fooled around with D, so I lack the
specific concrete knowledge to comment on those discussions. But here's
what I've gleaned from following the discussions.
In a language like SQL, A transaction is a series of actions, starting
with an action that requires a transaction, and ending with a COMMIT or a
ROLLBACK. (Actually, DEC Rdb/VMS had a specific "start transaction"
action, but I don't think that's standard SQL.) Control is passed back and
forth between the client and the server in the middle of all of this.
In D, if I understand correctly from what I've read in this forum, A
transaction can (and should?) be expressed as a single message from the
client to the server. Thus the entire transaction is atomic, not only with
regard to the effect on the database, but also with regard to the client. I
can't be sure of this, but I expect that this would simplify transition
analysis a great deal.
A model such as your "set of triples" could be subjected to a rigorous
analysis in D, in a way that baffles me with regard to SQL. Or so I
imagine. I await eagerly the input of those who know both SQL and D.
It depends on what you mean by the above. I only know the internals of one
product (DEC Rdb/VMS) well enough to comment on the above. With regard to
Oracle RDBMS, all I can do is trust the documentation and the engineers and
hope for the best. I forget wat Oracle documentation says about this.
Let me say a word about triggers, here. In SQL, a trigger can be fired up
by an update. In fact, a trigger can even be specified as being fired up
once for the entire update, or once for each row that is about to be
updated. In the latter case,
the prodecure invoked by the trigger has available two values within the
context of a single row, namely the old value and the new value.
In this context, your concept that the triple (r, t, t') refer to the same
"thing" has a concrete meaning that I can wrap my mind around: the same
"thing" refers to the same table row. However, I have the uneasy
feeling that by "the same thing" you mean "the same thing" in the world of
the subject matter. (Oops, meybe I ASS-U-ME too much). I await your
shedding some light on this before proceeding.
anyway, back to SQL triggers. Note that at the end of the update, but
before the commit, the client is going to receive control. The trigger
alert the client by generating an error. So this kind of trigger operates
not merely in the context of a single transaction, but in the context of a
single action within that transaction. As such, it provides a poor
concrete simile for the kind of analysis you are attempting with the sets of
triples.
By an amazing coincidence, the "old" and "new" values in the context of a
trigger during an update suffers from the exact same mystery as the UPDATE
.... WHERE CURRENT suffers from.
I see your point. On the other hand, since the set of triples in a
transition describes the entire difference between two successive database
states, it is not exactly the same thing. The idea of location based
addressing doesn't have any place in the conceptual or logical models. What
I'm trying to point out is that how tuples correlate during an update is
something that either the database designer must specify, or that the user
must specify, or both. The database designer specifies it with a
system-generated surrogate. The user specifies it in the WHERE clause. I
believe that if the model provided the capability for the user to specify
which tuples will be affected, that is, how a specific set of tuples were
selected by the user for updating, then a variety of transition constraints
could be defined that cannot when only using a surrogate. For example,
assume that you were asked to develop a set of job descriptions for every
person on the payroll, so you create the following table with two candidate
keys:
ASSIGNED_DUTIES {EMPLOYEE#, DUTY, POSITION#}
The candidate keys are {EMPLOYEE#, DUTY} and {POSITION#, DUTY}
After conducting interviews of all of the employees and their supervisors,
you end up with a complete set of duties and a list of positions from which
you can create job descriptions. Over time, the duties of an employee may
change, which has the effect of altering the duties for the position that
the employee is filling. In this way, if someone quits, you know exactly
what the company needs to fill the position. Now, consider the following
updates:
(1) UPDATE ASSIGNED_DUTIES
SET DUTY = 'Record labor using barcode terminal'
WHERE EMPLOYEE# = 45
DUTY = 'Fill out labor ticket'
(2) UPDATE ASSIGNED_DUTIES
SET DUTY = 'Record labor using barcode terminal'
WHERE POSITION# = 22
DUTY = 'Fill out labor ticket'
(3) UPDATE ASSIGNED_DUTIES
SET DUTY = 'Record labor using barcode terminal'
WHERE DUTY = 'Fill out labor ticket'
The semantics of these updates are different, even though they may affect
the same row. In (1) you're changing one of the duties of an employee; in
(2) you're changing one of the duties for a position, and in (3) you're
making a blanket change. Similarly, consider the following updates:
(4) UPDATE ASSIGNED_DUTIES
SET EMPLOYEE# = 99
WHERE EMPLOYEE# = 45
(5) UPDATE ASSIGNED_DUTIES
SET EMPLOYEE# = 99
WHERE POSITION# = 22
From the perspective of (4), you're assigning the duties that employee #45
had to employee #99, whereas in (5) you're assigning the duties for position
#22 to employee #99. It's a subtle difference, I know, but that difference
can easily be captured if you supply the attributes used to select each
tuple in a transition. So instead of a set of triples, you could have a set
of quadruples,
(r, s, t, t')
where r is a relation name, s is a set of attribute names that were used to
select a tuple, t is the tuple selected from the current database state, and
t' is the corresponding tuple from the proposed state.
If this information that is already supplied by the user could be made
available, you could define transition constraints based not only on what is
different on a tuple-by-tuple basis, but also on the perspective of the user
when they issued the update. In the above example, for (1) you could
require documentation from the employee's supervisor; whereas for (2) you
could require documentation from the HR manager instead. I realize that
this is pushing the limit a bit, and I really haven't given it a whole lot
of thought, but I think it could be useful. This information is provided
when a user issues an update, so why not make it possible to take advantage
of it?
I absolutely agree. I personally think that modifications should be cached
until the end of a transaction and then submitted all at once in a single
operation (a transition) wherein all constraints could be checked and
enforced en mass. A transaction is supposed to be a single unit of work, so
to me it doesn't make any sense to do otherwise.
I wasn't referring to any specific implementation. Conceptually, relational
assignment replaces the current relation value with a new relation value.
Without immutable surrogates and a means of preventing key updates, one
cannot assume that tuples in the new value correspond to tuples in the old
one, so at the conceptual level, every tuple in the new relation value
represents a different fact--even if all of its attribute values are
identical to one that had already existed. The reason for this is that the
facts represented in the current relation value concern things that existed
at the instant that that value became current, and what is reflected in
those facts depends on the situation that existed at that same point in
time. The facts represented in the proposed relation value, on the other
hand, originate from a later situation. It's as if there's an imaginary
timestamp or transaction sequence number attached to each tuple in the
current state, and another in the proposed state, so "=" (logical identity)
will always return false. Therefore, it's necessary for the database
designer or the user to tell the system how tuples correlate.
>
> Let me say a word about triggers, here. In SQL, a trigger can be fired
> up
> by an update. In fact, a trigger can even be specified as being fired
> up
> once for the entire update, or once for each row that is about to be
> updated. In the latter case,
> the prodecure invoked by the trigger has available two values within the
> context of a single row, namely the old value and the new value.
>
> In this context, your concept that the triple (r, t, t') refer to the
> same
> "thing" has a concrete meaning that I can wrap my mind around: the same
> "thing" refers to the same table row. However, I have the uneasy
> feeling that by "the same thing" you mean "the same thing" in the world
> of
> the subject matter. (Oops, meybe I ASS-U-ME too much). I await your
> shedding some light on this before proceeding.
>
I probably just answered this question in the above paragraph, but I'll put
it another way. Because database states are instantaneous snapshots of the
universe, transitions *always* concern changes that occur in the universe,
not just differences between relation values or tuples.
> anyway, back to SQL triggers. Note that at the end of the update, but
> before the commit, the client is going to receive control. The trigger
> alert the client by generating an error. So this kind of trigger operates
> not merely in the context of a single transaction, but in the context of
> a
> single action within that transaction. As such, it provides a poor
> concrete simile for the kind of analysis you are attempting with the sets
> of
> triples.
>
Indeed. A row trigger allows changes to each row to be checked, but not in
context with the entire set of differences. A statement trigger exhibits
the same problems as relational assignment: you can't correlate the rows if
keys can change. A transition on the other hand, presents all of the
information at once. It's a simple thing to transform a transition into
three sets of relations, one for new information, one for changing
information, and one for information that is becoming history. These can
then be compared with the current state and possibly the proposed state by
using normal relational operators in order to determine whether or not the
entire transition should be allowed. In the context of a trigger, you could
have a set of deleted values, a set of changed values and a set of inserted
values. (Of course, such a trigger would be limited in scope to just a
single table.)
Does the set of triples describe the entire difference or does it prescribe
the entire difference?
> The idea of location based
> addressing doesn't have any place in the conceptual or logical models.
Agreed.
> What
> I'm trying to point out is that how tuples correlate during an update is
> something that either the database designer must specify, or that the user
> must specify, or both.
Here's where you're losing me. What does this "correlation" signify? Does
it mean that the old tuple and the new tuple
both refer to the same item in the universe of discourse (subject matter)?
Does it mean that the old tuple and the new tuple are both stored in the
same row of the same table (implementation)? Or does it means that the new
one "replaces" the old one in the sense of overwriting the old one in some
(part of) a variable? Or something else?
> The database designer specifies it with a
> system-generated surrogate.
would you explain this a little more clearly?
> The user specifies it in the WHERE clause. I
> believe that if the model provided the capability for the user to specify
> which tuples will be affected, that is, how a specific set of tuples were
> selected by the user for updating, then a variety of transition
constraints
> could be defined that cannot when only using a surrogate.
If a tuple is a value, then you can't update it. If it's a variable, then
you can't identify it by its contents.
Which is it?
In the five updates above, you are updating one or both of the candidate
keys. Earlier, you said, IIRC, that keys should be immutable. Why isn't
this a contradiction?
The set of triples does that in the hypothetical implementation of 'on
update cascade' that Selzer made up.
> Does the set of triples describe the entire difference or does it prescribe
> the entire difference?
The question to ask oneself is: "What relevance do these implementation
details have to any theory?"
>>The idea of location based
>>addressing doesn't have any place in the conceptual or logical models.
>
> Agreed.
True, which is why another implementation might use physical location
internally and represent it differently to the user. For example,
consider an implementation that represents the update to the triggered
procedure as a single relvar with two attributes 'new' and 'old' where
each attribute is a relation valued attribute with a type that is a
sub-type of R's constrained to at most one tuple. Such an implementation
can associate individual tuples in the initial state with individual
tuples in the final state using physical location but the user always
interacts with relvars and relations.
>>What
>>I'm trying to point out is that how tuples correlate during an update is
>>something that either the database designer must specify, or that the user
>>must specify, or both.
>
> Here's where you're losing me. What does this "correlation" signify?
He isn't losing me. He is just flat-out plain old wrong. The dbms can
correlate tuples during an update by physical location. After all, the
dbms has access to and manages the internal physical representation of
the data.
Does
> it mean that the old tuple and the new tuple
> both refer to the same item in the universe of discourse (subject matter)?
> Does it mean that the old tuple and the new tuple are both stored in the
> same row of the same table (implementation)? Or does it means that the new
> one "replaces" the old one in the sense of overwriting the old one in some
> (part of) a variable? Or something else?
Who cares? He is a self-aggrandizing ignorant who demonstrably talks
gibberish. Even if he accidently manages to reply with something that
sounds cogent, how will you know whether he understands what he says the
same way a normal educated person would understand it?
>>The database designer specifies it with a
>>system-generated surrogate.
>
> would you explain this a little more clearly?
Why do you keep inviting him to waste more of everyone's time?
[yet another idiotic straw man snipped]
> In the five updates above, you are updating one or both of the candidate
> keys. Earlier, you said, IIRC, that keys should be immutable. Why isn't
> this a contradiction?
If you are going to pretend to engage the self-aggrandizing ignorants,
please do a better job of calling them on their bullshit. What is the
point of asking him about the details of a straw man? It's a straw man.
'Nuff said.
What implementation details? This is not about implementation. It is about
the limitations of relational and multiple assignment. This is about how
having variables as part of the definition of a datbase is illogical. When
you're talking about changes of state, you must be talking about database
values, not just relation values, because a proposed state for one relation
may depend on the current state of another. For that matter, the entire
concept of multiple assignment is just a kludge to work around problems that
were introduced when D&D decided to wrap variables around relations instead
of databases. Yes, multiple assignment solves some symptoms of the problem,
but it doesn't address the root cause, nor does it allow for the definition
of enforcible transition constraints. There are other problems with using
variables to define databases. Date's translation of Heath's theorem in
terms of relvars instead of relations is a huge leap. The statement that a
relvar is equal to the join of its projections is patently false: because a
relvar is a variable, there is an infinite number of values for each
projection (also variables) that when joined are equal to any one value for
the original relvar. Hence, a relvar is equal to the join of its
projections only in the presence of a circular inclusion dependency or
mutual foreign keys. Using variables in the definition of a database causes
a huge amount of confusion and limits the expressiveness of the model
because assignment involves only one kind of transition: replacement. One
set of tuples is discarded in favor of another set. There is no correlation
between the tuples in each set. Tuples may appear identical, that is, their
component attribute values may be identical, but because assignment replaces
the current relation value or values and because no information is provided
about how tuples correlate, all of the new tuples are logically distinct
because they belong to a different database state. In a closed world, if
you can't prove that they mean the same thing, then they don't.
I was trying to stay away from dbms internals, as much as I could, because
Brian keeps saying that his idea is not at that level.
As I've said fairly often, DEC Rdb/VMS is the only implementation I'm
familiar with at the internals level.
Rdb stores table rows in a data structure that, internally, is called a
"record". (I can hear Joe Celko warming up his blowtorch now.) Updating a
table row would involve locating a record that contains the "old tuple",
and replacing the record contents with the "new tuple". Aside from a little
space management, index updating, and before-image maintenance (for the
sake of snapshot transactions), it's no big deal.
I believe what I've just written is nothing more than a specific
implementation of what you have described above.
The thing is that this is all done out of sight of the database-client, and
whether this is an update of type 1 through 5 (using Brian's prior example)
is of no consequence to this layer of the Rdb software. Hell, it could even
be an error correction or a roll-forward after a database restore.
I get the impression that Brian is trying to construct a new flavor of
client-server interface, one that would permit his definition of transition
constraints to be easily implemented. That's the part I'm not following.
I'm still not convinced that it is a straw man. I'm by no means convinced
that it isn't a straw man, either. That's why I'm probing.
No, he's a self-aggrandizing ignorant constructing a straw man to make
himself look far more important than he really is.
It's a straw man. He's saying: "If the dbms tells a triggered procedure
that some set of tuples was replaced with some other set of tuples, I
cannot tell if there is a correlation between them." But there is no
reason why the dbms would necessarily do that. It could just as easily
tell the triggered procedure that a set of replacements happened.
If the dbms tells the triggered procedure that a set of replacements
happened, it hands the correlations over on a silver platter...
I'm not sure what you're driving at with this question. If you know the one
state, and you know what's about to change, then you can determine the next
state. From one point of view the set of triples is one description of the
differences between one state and the next. From the other it's what the
user prescribes by choosing one of those descriptions.
>
>> The idea of location based
>> addressing doesn't have any place in the conceptual or logical models.
>
> Agreed.
>
>> What
>> I'm trying to point out is that how tuples correlate during an update is
>> something that either the database designer must specify, or that the
>> user
>> must specify, or both.
>
> Here's where you're losing me. What does this "correlation" signify?
> Does
> it mean that the old tuple and the new tuple
> both refer to the same item in the universe of discourse (subject
> matter)?
Yes. And because "correlation" is subjective, it should be specified by the
database designer or the user.
> Does it mean that the old tuple and the new tuple are both stored in the
> same row of the same table (implementation)?
No. The implementation is irrelevant.
>... Or does it means that the new
> one "replaces" the old one in the sense of overwriting the old one in some
> (part of) a variable? Or something else?
>
I wasn't thinking in terms of variables. A database state is a set of named
sets of sets of named values. In other words: it's a value. Given a
database state and a description of how the individual components of that
state will be different in a proposed database state (a transition), you can
derive that proposed database state. Similarly, given a database state and
a proposed database state, you can derive a set (possibly infinite) of
transitions.
>
>
>> The database designer specifies it with a
>> system-generated surrogate.
>
> would you explain this a little more clearly?
>
Assuming that you have a system-generated surrogate, and assuming that the
system can prevent updates to it, then you can use the system-generated
value to correlate tuples in one database state with those in another during
an update.
>
>> The user specifies it in the WHERE clause. I
>> believe that if the model provided the capability for the user to specify
>> which tuples will be affected, that is, how a specific set of tuples were
>> selected by the user for updating, then a variety of transition
> constraints
>> could be defined that cannot when only using a surrogate.
>
> If a tuple is a value, then you can't update it. If it's a variable, then
> you can't identify it by its contents.
> Which is it?
>
A tuple is a value. So is a relation. So is a database state. Yet Codd
included "changes of state" in the types of integrity rules that a data
model should include. (D&D also strongly support the idea of transition
constraints in TTM, Chapter 4--at least the one that's on the TTM web
site--I don't have a copy of the acutal book). Since values can't be
updated, either "transition constraint" is meaningless, or a transition must
involve something other than just values. Since tuples represent facts that
concern things in the universe of discourse, it seems plausible that those
things are what Codd, Date and Darwin had in mind when they wrote about
allowable changes of state. A transition constraint involves more than one
database state, so in order to define one, there must be a way to specify
that tuples in one state correspond to tuples in another, while at the same
time keeping in mind that a candidate key value within a tuple in one
database state that represents a fact concerning something may be different
in a tuple in the next database state that represents a fact concerning the
same thing.
What I've been trying to say is that in order to define an enforcible
transition constraint, there must be a way to correlate the tuples in the
current database state with those in a proposed database state. This means
that either the database designer must specify that at least one candidate
key will never change by defining an immutable system-generated surrogate
that can then be used to correlate tuples or the user must be able to
specify how the tuples in one database state correlate to the tuples in the
next during an update.
>
>
No, it's not what I'm saying at all. I used the example of a triggered
procedure as an illustration of the problem, since the underlying
theoretical problem mirrors it. What I'm saying is that transition
constraints can be defined declaratively, without using triggered procedures
at all, but not only by comparing the current set of tuples with the
proposed set of tuples.
> If the dbms tells the triggered procedure that a set of replacements
> happened, it hands the correlations over on a silver platter...
It hands over the correlations one at a time, and if there are
intrarelational dependencies, you're screwed. Furthermore, a dbms that
hands over correlations has provided an implementation-specific extension to
the model as it is currently defined by D&D. That's why many applications
that were written for DB2 don't run on Sql Server, and why many applications
that run on Sql Server don't run on Oracle. So, depending on an
implementation-specific extension is contrary to the concept of data
independence.
What I'm trying to do is point out that relational assignment is not
sufficient to provide support for transition constraints, that INSERT,
UPDATE, and DELETE are not just shorthands for relational assignment, that
there is a fundamental difference between them, that the root problem lies
in defining a database as a container for relation variables where
modifications are defined in terms of relational assignment, and that there
is a more fundamental mechanism for modifying a database from which INSERT,
UPDATE, DELETE, relational assignment and multiple assignment can all be
derived. Furthermore, this more fundamental mechanism provides a means to
define enforcible transition constraints declaratively, spanning multiple
relation schemata, using standard relational operators from the algebra or
the calculus, in the presence of volatile keys, without using surrogates.
I don't understand your reference to Dataphor here. Dataphor's
transition constraints are tuple-based, and aren't enforced usefully
when using relvar assignment (as opposed to insert/update/delete).
--
Jon
So what ? The Dataphor row-level transition constraint is a special
case of the more general relvar constraint. Are you criticizing
Dataphor for failure to implement a more general relvar constraint
mechanism ? I am sure they had a lot of reasons for not doing so.
> and aren't enforced usefully
> when using relvar assignment (as opposed to insert/update/delete).
I am not intimately familiar with Dataphor, but my impression was that
they did not implement relational assignment at all, just the
'shorthand' delete/upadate/insert operators. Is that so ? In any
case, they did what they could by implementing row-level declarative
transition constraints 'set-theoretically', something the OP was
wondering about.
> --
> Jon
I can see only one: Performance.
>> and aren't enforced usefully
>> when using relvar assignment (as opposed to insert/update/delete).
>
> I am not intimately familiar with Dataphor, but my impression was that
> they did not implement relational assignment at all, just the
> 'shorthand' delete/upadate/insert operators. Is that so ?
No, they have relational assignment; implemented as delete-insert, as
far as I can tell. Which means that their transition constraints can be
circumvented, if they're not carefully written.
> In any
> case, they did what they could by implementing row-level declarative
> transition constraints 'set-theoretically', something the OP was
> wondering about.
Then I probably misunderstand what you and Marshall mean by
'set-theoretically'. My point was just that Dataphor's TCs are (for
update) based on comparing pairs of tuples, and the pairing is based on
the formulation of the 'update' statement. When using relational
assignment (including translating an update shorthand to an assignment)
this pairing is lost; thus, that kind of TC doesn't work. (It also has
other problems.) TTM-style transition constraints, based on comparing
(pairs of) relations, not tuples, avoid those problems. I thought this
was what you meant by the term 'set-theoretic': set-at-a-time vs.
tuple-at-a-time.
--
Jon
I did not see any examples of Dataphor relational assignment in the
online manuals they have on their website. Could you provide a
reference ? Also, if the r.a is indeed implemented as a
delete/insert/update and you specify some constraint for a d/i/u, it's
unclear why such constraint should be circumvented.
>Which means that their transition constraints can be
> circumvented, if they're not carefully written.
>
> > In any
> > case, they did what they could by implementing row-level declarative
> > transition constraints 'set-theoretically', something the OP was
> > wondering about.
>
> Then I probably misunderstand what you and Marshall mean by
> 'set-theoretically'. My point was just that Dataphor's TCs are (for
> update) based on comparing pairs of tuples, and the pairing is based on
> the formulation of the 'update' statement. When using relational
> assignment (including translating an update shorthand to an assignment)
> this pairing is lost; thus, that kind of TC doesn't work. (It also has
> other problems.)
Could you be more specific about other problems ?
> TTM-style transition constraints, based on comparing
> (pairs of) relations, not tuples, avoid those problems.
For example (Dataphor vs. TTM) ?
> I thought this
> was what you meant by the term 'set-theoretic': set-at-a-time vs.
> tuple-at-a-time.
Are you saying that a predicate "AgePrev - AgeCurrent <= 0" is somehow
less 'set-theoretical' than "sum(account) = 1000" ? At the logical
level, there is no 'at a time' in either case, at the implementation
level there may be (in both cases), but not otherwise.
> --
> Jon
I don't know if they mention it specifically in the documentation
anywhere; maybe not, since it is a somewhat obvious consequence of
relvars being, well, variables. I know they support it because I use it.
> Also, if the r.a is indeed implemented as a
> delete/insert/update and you specify some constraint for a d/i/u, it's
> unclear why such constraint should be circumvented.
Consider the TC that employee salaries may not decrease:
transition constraint TC1 on update new.Salary >= old.Salary
Now consider the relvar R = table { row { 1 ID, 70000 Salary } }.
"update R set { Salary := 60000 }" will fail because of TC1, but the
corresponding relational assignment, which can be simplified to "R :=
table { row { 1 ID, 60000 Salary } }" will succeed. I cannot see how
tuple-base TCs (or referential actions, for that matter) can work if one
takes update-as-relational-assignment completely seriously.
Including "on delete false" in TC1 might on first glance seem to resolve
the problem, since then the assignment will also fail. But *any*
relational assignment (in Dataphor) will fail in that case (except if
the relvar is empty to begin with), so that's not very useful.
I also note that the delete constraint might be needed even if we avoid
relational assignment, because else we could circumvent the TC by first
deleting the tuple, and then inserting a new corresponding (i.e. same
key) tuple with lower salary. But the point is: Tuple-based TCs and
relational assignment don't work well together.
>> My point was just that Dataphor's TCs are (for
>> update) based on comparing pairs of tuples, and the pairing is based on
>> the formulation of the 'update' statement. When using relational
>> assignment (including translating an update shorthand to an assignment)
>> this pairing is lost; thus, that kind of TC doesn't work. (It also has
>> other problems.)
>
> Could you be more specific about other problems ?
Relvar R with the TC1 update constraint:
ID Salary
==+------
0 60000
1 70000
update R set { ID := (ID + 1) mod 2 };
TC1 isn't violated, yet employee 1's salary has decreased. This may be
why some hold that keys must be constant / tuples must have identity
(speaking loosely).
>> TTM-style transition constraints, based on comparing
>> (pairs of) relations, not tuples, avoid those problems.
>
> For example (Dataphor vs. TTM) ?
A corresponding TTM TC would be something like this, I think:
IS_EMPTY(R RENAME (Salary AS NewSalary) JOIN R' RENAME (Salary AS
OldSalary) WHERE NewSalary < OldSalary)
It will prevent the updates above, including the relational assignment.
It will not preventing deleting an employee and reinserting it with a
lower salary, though. Another constraint is needed for that, but it will
work with both assignment and insert/update/delete.
>> I thought this
>> was what you meant by the term 'set-theoretic': set-at-a-time vs.
>> tuple-at-a-time.
>
> Are you saying that a predicate "AgePrev - AgeCurrent <= 0" is somehow
> less 'set-theoretical' than "sum(account) = 1000" ?
You are leaving out the crucial context of these predicates: The point
is how to determine which AgePrev/AgeCurrent pairs to compare. Hopefully
the examples above make it clear what I mean.
--
Jon
I'm not sure that the comment about tuples matters, but anyway, I think
those are excellent examples and points you make.
Regarding the salary increment and going back to algebra, say D&D
algebra since I don't really know much of any others, I wonder whether
delete and insert expressed in terms of assignments aren't in fact
capable of handling the constraint of not lowering anybody's salary.
Not to say that end users must use an algebra, but an algebra is what an
engine is likely to use underneath whatever syntax the users have. So I
assume an engine that was instructed to obey the TC would invoke
something like (assuming the user langugage allowed transition
constraints):
R' = R <AND> <NOT> {ID 1} <AND> <NOT> {Salary LE 60000} as part of a
'delete' "step". Depending on the implementation, this might raise an
exception or not, just as a subsequent assignment such as
R'' = R' <OR> {ID 1, Salary 60000, ...) might or might not raise an
exception. I know most people would prefer an exception, I'm just
pointing out that the most elemental algebras don't embody exceptions.
Surely a verb like 'update' is just a shortcut if an implementation uses
an algebra such as D&D's which apart from no exceptions, has no operator
for update?
Just my two cents, we can only accomplish what the 'bare metal' allows
(these days D&D's Algebra is as close as I get to bare metal, which I
find a little ironic, since it is not supposed to be about bare metal at
all!)
p
Technically, the Dataphor TC is not a tuple constraint because it
references two tuples, or rather two sets of tuples, not just a
single tuple. You can think in terms of ordered pairs too, but it's a
matter of taste.
My point is that if 'update' is shorthand for the respective relational
assignment, then by the same token the update constraint must be
shorthand for the respective relvar constraint: IS_EMPTY(R RENAME
(Salary AS NewSalary) JOIN R' RENAME (Salary AS
> OldSalary) WHERE NewSalary < OldSalary).
If Dataphor silently ignores shorthand constraints with relational
assignment, then the implementation is deficient. Alternatively, one
can treat update constraints as matching old/new rows by some hidden
row_id rather than by the primary key, but such treatment is also no
good since hidden row_id's is not part of the relational model.
Is it the case that the Dataphor relational assignment does not honor
any constraints ? If so, how can one seriously consider using the
assignment?
[...]
> >> My point was just that Dataphor's TCs are (for
> >> update) based on comparing pairs of tuples, and the pairing is based on
> >> the formulation of the 'update' statement. When using relational
> >> assignment (including translating an update shorthand to an assignment)
> >> this pairing is lost; thus, that kind of TC doesn't work. (It also has
> >> other problems.)
If the Dataphor relational assignment does not honor constraints, then
the implementation is seriously broken.
> >
> > Could you be more specific about other problems ?
>
> Relvar R with the TC1 update constraint:
>
> ID Salary
> ==+------
> 0 60000
> 1 70000
>
> update R set { ID := (ID + 1) mod 2 };
>
> TC1 isn't violated, yet employee 1's salary has decreased. This may be
> why some hold that keys must be constant / tuples must have identity
> (speaking loosely).
Right, if the Dataphor TC matches, conceptually, old/new tuples by a
hidden id as it appears to do, then there is no way to enforce TC1
with the above update (except by disallowing the key update).
>
> >> TTM-style transition constraints, based on comparing
> >> (pairs of) relations, not tuples, avoid those problems.
> >
> > For example (Dataphor vs. TTM) ?
>
> A corresponding TTM TC would be something like this, I think:
>
> IS_EMPTY(R RENAME (Salary AS NewSalary) JOIN R' RENAME (Salary AS
> OldSalary) WHERE NewSalary < OldSalary)
>
> It will prevent the updates above, including the relational assignment.
> It will not preventing deleting an employee and reinserting it with a
> lower salary, though. Another constraint is needed for that, but it will
> work with both assignment and insert/update/delete.
>
> >> I thought this
> >> was what you meant by the term 'set-theoretic': set-at-a-time vs.
> >> tuple-at-a-time.
> >
> > Are you saying that a predicate "AgePrev - AgeCurrent <= 0" is somehow
> > less 'set-theoretical' than "sum(account) = 1000" ?
>
> You are leaving out the crucial context of these predicates: The point
> is how to determine which AgePrev/AgeCurrent pairs to compare. Hopefully
> the examples above make it clear what I mean.
Even with the apparently broken Dataphor constraint implementation,
one can still regard its TCs as dealing with sets obtained as a result
of matching old/new tuples by the row id rather than some row-by-row
sequential process (which it may very well be under covers).
> --
> Jon
I can't recall I have called it a "tuple constraint". I have called it a
"tuple-based transition constraint". I know it is not a tuple constraint.
> My point is that if 'update' is shorthand for the respective relational
> assignment, then by the same token the update constraint must be
> shorthand for the respective relvar constraint: IS_EMPTY(R RENAME
> (Salary AS NewSalary) JOIN R' RENAME (Salary AS
>> OldSalary) WHERE NewSalary < OldSalary).
In that case, the shorthand is insufficient, because it doesn't specify
what you should join on. You could say it should join on the key by
default, but what if there's more than one key? What if you want keys to
be able to change?
> If Dataphor silently ignores shorthand constraints with relational
> assignment, then the implementation is deficient.
Transition constraints aren't shorthand in Dataphor; they don't have
relvar-level TCs. Whether or not the implementation is deficient is a
matter of viewpoint; if they define relational assignment as delete
followed by insert, they honour the TCs just fine. Just not as the user
might expect... so I am inclined to say that using relational assignment
in Dataphor is not very practical, and their TCs a bit of a hack.
They're still useful, though.
> Alternatively, one
> can treat update constraints as matching old/new rows by some hidden
> row_id rather than by the primary key,
There are no primary keys in Dataphor.
> but such treatment is also no
> good since hidden row_id's is not part of the relational model.
Agreed.
> Is it the case that the Dataphor relational assignment does not honor
> any constraints ?
Of course not. It's just TCs that are an issue, and the way they do it
does make sense.
> If so, how can one seriously consider using the
> assignment?
I just use it to fill empty tables in a somewhat novel way. It isn't
really needed for anything.
--
Jon
> vc wrote:
>
>>Jon Heggland wrote:
>>
[snip]
>>My point is that if 'update' is shorthand for the respective relational
>>assignment, then by the same token the update constraint must be
>>shorthand for the respective relvar constraint: IS_EMPTY(R RENAME
>>(Salary AS NewSalary) JOIN R' RENAME (Salary AS
>>
>>>OldSalary) WHERE NewSalary < OldSalary).
>
> In that case, the shorthand is insufficient, because it doesn't specify
> what you should join on. You could say it should join on the key by
> default, but what if there's more than one key? What if you want keys to
> be able to change?
Natural join joins on all attributes with the same name. Given the
rename operations specified, that would be every attribute except
Salary. If one wanted to limit the join, one could rename additional
attributes or project the relations onto the attributes of interest.
Right. I forgot ;)
>
> > but such treatment is also no
> > good since hidden row_id's is not part of the relational model.
>
> Agreed.
>
> > Is it the case that the Dataphor relational assignment does not honor
> > any constraints ?
>
> Of course not. It's just TCs that are an issue, and the way they do it
> does make sense.
[...]
I understand that in Dataphor it's impossible, syntactically, to refer
to the old relvar value for the purpose of defining a 'real' TC
constraint, right ? Interestingly, it would have been trivial to
implement such feature efficiently in a multiversioning database but
apparently they did not want to be implementation dependent although
they could have used less efficient ways to do so in other databases..
> Jon
Not a bad idea, but you can't really apply it to Dataphor's TCs without
changing the syntax quite a bit by involving explicit renames and/or
projections. The resulting shorthand may not be all that short.
--
Jon
I think that's my whole point. UPDATE cannot be a shorthand for relational
assignment unless keys cannot change. If keys can change, then tuples from
successive database states cannot be paired for comparison by joining R to
R'. So we're stuck with either implementation-specific row-level transition
constraints such as a FOR EACH ROW triggers or Dataphor's implementation, or
introducing tuple identifiers or sets of object identifiers to guarantee
that at least one key won't change. A better solution is needed. Instead
of exposing only the proposed state, that is, R', the transition itself
could also be exposed as a set of relations, Rd, Ru, Ri, representing
respectively the tuples that will be removed, the tuples that will be
different and how, and the tuples that are being added. For example, given
the relation schema R {A, B, C}, a constraint could be defined declaratively
not in terms of just R and R', but in terms of R and relations conforming to
the following schemata:
Rd: {A, B, C}
Ru: {A, B, C, A', B', C'}
Ri: {A', B', C'}
For that matter, given an additional relation schema, S {A, D}, a constraint
could be defined in terms of R, S, the above three relation schemata, and
the following:
Sd: {A, D}
Su: {A, D, A', D'}
Si: {A', D'}
With this scheme, enforcible set-based transition constraints can be defined
declaratively, involving multiple relation schemata, using standard
operators from the algebra or the calculus, in the presence of volatile
keys, without using object identifiers.