1. the fact that values for an attribute either were or could have been
derived?
2. how values for an attribute were derived?
3. how values for an attribute could have been derived?
For example, if a system is to be written that accepts US zip codes and
populates city and state based on the zip, storing all three values,
must the code for the derivation and the fact that this is derived data
be known only through the code?
Similarly, if data could have been derived, but was not, is there any
way to specify that? For example, if the zip+4 information could be
derived from the rest of the address, but we don't want to require that
it be derived to the DBMS -- apps could collect it directly from a user
if that meets the requirements for that app -- could we identify that
the zip+4 can be derived using this or that service or this or that
code?
If there are materialized attributes, such as a student GPA, where the
data should never be collected by any application and should only be
derived, is there a way to specify or even ensure that it is derived
(then materialized) data? Is there any notation that works with
derived, but stored, data any differently than any other attributes?
I also have not seen anything in conceptual modeling techniques,
including ORM (I'm not an expert on that), to collect information about
what is or can be derived from what in the conceptual model. Does
anyone have suggestions in that area? Thanks. --dawn
This is slightly off topic, but city and state are not always determined by
knowing zip code. I live in a zip code that spans two towns. On the way
through to Maine, going north from Errol, there is a zip code that spans
two states.
> "dawn" <dawnwo...@gmail.com> wrote in message
> news:1145622076.9...@t31g2000cwb.googlegroups.com...
>
>>Is there database theory that includes identification of
>>
>>1. the fact that values for an attribute either were or could have been
>>derived?
>>2. how values for an attribute were derived?
>>3. how values for an attribute could have been derived?
What absurd questions!
The set of every attribute that could have been derived is the set of
every attribute. For every inverse function fprime(x), x could have been
derived by f(fprime(x)) from fprime(x). Duh!
The whole point of logical independence, which is itself a very
important principle of sound data management, is to stop forcing users
to ask stupid questions like #1 and #2 above.
Do I remember correctly that this idiot claims to have some background
in mathematics? Unbelievable!
Yes, I tried to be careful with the wording to indicate that the
organization determined that the software would work this way. In
reality, it is more likely that entering a zip code would bring up a
best guess for the user to override. I guessed that it was less likely
that anyone had used anything at a database level (rather than app
level) to indicate that something was "mostly-derivable." So, I
simplified the example. You could put names of a, b, c... for the ones
I used to make it a more abstract question.
I'm looking at that fuzzy line where code and data meet. Given that I
have seen (or at least noticed) more derived and derivable data of
late, I thought perhaps there were ways to at least do the conceptual
modeling, but perhaps even the implementation of such attributes
differently so as to reflect this information about the attributes,
perhaps supplying appropriate derivation code.
Thanks. --dawn
Stalkers and harassers often do know everything about their prey. I do
hope you get another hobby soon.
Ah, you know Bob has a hobby. Therefore, ...
No, I can not bear to write such illogic. Dawn is not so
limited.
Sincerely,
Gene Wirchenko
Yeah, I stalk her by adding her to the twit filter and encouraging
everyone else to follow suit. Somehow, that doesn't seem to follow the
usual definition.
Quote "I stalk her". Propaganda?
Sarcasm.
you talk about Dawn more than you talk about database theory.
Marshall
Please explain.
I only reply to what people are discussing. If they are engaging
self-aggrandizing ignorants and ignoring glaring obvious idiocy to reply
to things of secondary importance, I reply to that. If they are
discussing database theory, I reply to that.
Is it my fault you allowed the newsgroup to be overrun by
self-aggrandizing ignorants who fill it to overflowing with utter
nonsense? I think not.
That, for what is is, is not /your/ explanation in this context.
You utterly fail at explaining.
Back to the question and helping you by focussing:
Are you stalking her or not?
No, I am not.
Ok.
I'll take that as a promise.
Bob is the guy who behaves right. All we should protest against Dawn's
unacceptable behavior.
Regards
Alfredo
Could you please provide more content? These are questions I have. I
cannot see what is stupid about them, so I would appreciate some
clearer instruction. Did I phrase them in a way where they do not make
sense to you? Am I using vocabularly that is not aligned with yours?
My questions might not be relational theory questions, but they are
related to database theory. If you guys who persist in beating me up
would please provide some meat in your responses, rather than just the
"you and your postings are stupid" responses, perhaps I could learn
something from you and your responses would be more enlightening for
others as well. It is not a productive dialog for one person to ask a
question or make a statement and another to say "stupid" repeatedly.
You can ignore my postings if they do not meet your standards,
otherwise please provide constructive criticisms.
Thanks in advance for clarifying. --dawn
By this I mean derived data, vritual fields, computed columns...
derived from other stored data, but then stored (materialized). Is
there any identification (flagging) of such columns in any theory, any
conceptual models, any logical models?
> 2. how values for an attribute were derived?
If this is a UDF in SQL Server (which I have not used, so I'm
guessing), then we have the code for the derived data, but if we
materialize the values then how is the derivation function associated
with this new column? I'm not so concerned with how that is done in
practice right now, but how that is reflected in any modeling.
> 3. how values for an attribute could have been derived?
>
> For example, if a system is to be written that accepts US zip codes and
> populates city and state based on the zip, storing all three values,
> must the code for the derivation and the fact that this is derived data
> be known only through the code?
I see I missed the word "application" before the last "code."
> Similarly, if data could have been derived, but was not, is there any
> way to specify that? For example, if the zip+4 information could be
> derived from the rest of the address, but we don't want to require that
> it be derived to the DBMS -- apps could collect it directly from a user
> if that meets the requirements for that app -- could we identify that
> the zip+4 can be derived using this or that service or this or that
> code?
>
> If there are materialized attributes, such as a student GPA, where the
> data should never be collected by any application and should only be
> derived, is there a way to specify or even ensure that it is derived
> (then materialized) data? Is there any notation that works with
> derived, but stored, data any differently than any other attributes?
I am referring to any notation anywhere, perhaps in a conceptual or
logical data model, for example. While there are terms to distinguish
between a base table and a materialized view, for example, I have not
found the right terms to distingish a derived materialized attribute
from on that is not derived from values in the database. I would like
to write about such attributes and am not finding terminology or
notation to do so.
> I also have not seen anything in conceptual modeling techniques,
> including ORM (I'm not an expert on that), to collect information about
> what is or can be derived from what in the conceptual model. Does
> anyone have suggestions in that area? Thanks. --dawn
I hope this helps to clarify the question. Thanks. --dawn
> I am referring to any notation anywhere, perhaps in a conceptual or
> logical data model, for example. While there are terms to distinguish
> between a base table and a materialized view, for example, I have not
> found the right terms to distingish a derived materialized attribute
> from on that is not derived from values in the database. I would like
> to write about such attributes and am not finding terminology or
> notation to do so.
I think you talk about using expressions like values.
A cross of functional calculus and predicate calculus or predicate calculus
with equality.
Actually, the word the ignorant is looking for is "snapshot". A snapshot
is a stored, derived relation. If the fucking kook ever opened a book,
she would know that.
I'm talking about taking a single derived value or an entire column of
derived values and adding those values into a stored column. I am not
talking about taking a snapshot of an entire table or view. For
example, if we have a base relation
Students {Name, Major}
and we have a derived attribute (stored procedures, udf...) for the
GPA, then we add a column to our base relation to get
Student {Name, Major, GPA}
We then either need the DBMS (as in the case of Sybase, it appears), a
trigger, or some other process to keep the GPA in synch with the stored
data from which it is derived. The purpose of this would be
performance in read-only situations.
Did that clarify? Thanks. --dawn
That is usuallly taken care of by specifying the database constraints
that define the relationships between the derived column and the other
data, and then marking it as either derived or not.
This is rather trivial. So I'm wondering if I understood your question
correctly.
-- Jan Hidders
It is very possible that I'm missing something basic. I know how to
tie derived data specifications to the stored data from which they are
derived. Now let's use that derived data to determine a value and
store it back in an existing base table. Is there a way for the DBMS
to retain the information about the source of this materialized
attribute?
Or said differently, is there a way, other than app or trigger code, to
tie the materialized version of an attribute to the derived attribute
from which the value was taken?
David pointed me to Sybase as a DBMS that seems to have the feature of
storing materialized attributes, otherwise I'm figuring with most
DBMS's it is an app or trigger where the code resides to store the
data, but I might be completely missing something as my direct
experience with sql constraint specification is limited. Given that
the RM has as a goal not to have duplicate data and this is clearly a
duplication of data, I was thinking this might be one place where the
application would have the knowledge, rather than the DBMS.
Additionally, perhaps you can specify a constraint that says that a
particular attribute can only be maintained by a specified process, but
my googling came up short. I likely do not know the correct
terminology.
Thanks. --dawn
What do you mean by 'the source' and what information do you want to
keep about it? Apparently not the definition of the derived column. Are
you talking about data provenance here?
> Or said differently, is there a way, other than app or trigger code, to
> tie the materialized version of an attribute to the derived attribute
> from which the value was taken?
?? The derived attribute *is* the materialized version.
> David pointed me to Sybase as a DBMS that seems to have the feature of
> storing materialized attributes, otherwise I'm figuring with most
> DBMS's it is an app or trigger where the code resides to store the
> data, but I might be completely missing something as my direct
> experience with sql constraint specification is limited.
One simply specifies the query that derives the value of the column,
or in the case of a materlized view the whole table. That is the
definition of the derived column. What more do you want?
> Given that
> the RM has as a goal not to have duplicate data and this is clearly a
> duplication of data, I was thinking this might be one place where the
> application would have the knowledge, rather than the DBMS.
Another goal of the RM is to shield the data from sloppy applications,
so if duplication cannot be avoided then at least the DBMS should be in
charge of it and not the applications.
-- Jan Hidders
I'm going to skip attempting to translate most of the terminology I
typically use and give it another go.
Example: I have a file STUDENT_COURSES that includes grades and a field
named COURSE_GRADE. In a STUDENT file, I have a virtual field named
CURRENT_GPA and backing it is code (stored procedure-like) that blows
through all courses for a student and computes the GPA. So, this
CURRENT_GPA field is a virtual field, derived data, computed column,
user-defined function or whatever you want to call it.
Additionally, I have decided to materialize this value by firing off a
process whenever any of the underlying stored data are changed so that
the value of CURRENT_GPA at that point in time is stored in the STUDENT
file as something like M_GPA. There could then be a slight lag where
CURRENT_GPA and M_GPA are not in synch. Users will now want to query
the M_GPA data because it is faster.
> > Or said differently, is there a way, other than app or trigger code, to
> > tie the materialized version of an attribute to the derived attribute
> > from which the value was taken?
>
> ?? The derived attribute *is* the materialized version.
>
> > David pointed me to Sybase as a DBMS that seems to have the feature of
> > storing materialized attributes, otherwise I'm figuring with most
> > DBMS's it is an app or trigger where the code resides to store the
> > data, but I might be completely missing something as my direct
> > experience with sql constraint specification is limited.
>
> One simply specifies the query that derives the value of the column,
> or in the case of a materlized view the whole table. That is the
> definition of the derived column. What more do you want?
In my neck of the woods, that specification would be on the derived
field (CURRENT_GPA), but there would be no information on the
materialized version (M_GPA) that specifies that it is a snapshot of
CURRENT_GPA. So, there is nothing in the dbms that tells us things
like the fact that no apps should update M_GPA because there is a
process that derives and stores it and that is the only process that
should do that. Although it makes sense in my implementation that
there would be no constraint in the dbms (since there are few
constraints specified to the dbms), it sure would be nice to have
information that ties the materialized attribute to the derived
attribute.
> > Given that
> > the RM has as a goal not to have duplicate data and this is clearly a
> > duplication of data, I was thinking this might be one place where the
> > application would have the knowledge, rather than the DBMS.
>
> Another goal of the RM is to shield the data from sloppy applications,
> so if duplication cannot be avoided then at least the DBMS should be in
> charge of it and not the applications.
Yes, that is my thinking. Any materialized attributes should be able
to be specified as such. I might take a look at Sybase, but was
looking for something less implementation-specific on how materialized
attributes should be specified, identified in notation, etc. within a
base table. Are there constraints that can be specified to indicate
that a particular attribute "comes from" a derived attribute (or stored
procedure)?
Sorry for my awkward terminology. Thanks. --dawn
Jan, how could you let such astonishing ignorance stand unchallenged?
The RM says nothing about the physical duplication of data and actively
encourages multiple logical views on the same data.
> Another goal of the RM is to shield the data from sloppy applications,
> so if duplication cannot be avoided then at least the DBMS should be in
> charge of it and not the applications.
Absolutely, I agree. Similarly for pointers. If one must have pointers
to achieve adequate physical performance, the dbms should manage them to
ensure correctness and to ease the burden on others.
Dawn, hello! Long time no see.
I saw your post and decided it was time to end my long exile, you have
asked the questions that are at the heart of my company and project.
In order to even begin to deal with this question, you have to deal with
the 800lb gorilla called "Normalization", which forbids derived data. I
wrote an essay on that some months ago, which is here:
http://docs.secdat.com/index.php?gppn=Normalization+and+Automation
The states basically that derived data is part of life, and since
normalization can't deal with it, normalization is not enough to guide
serious application development. But you also can't throw it away,
because it gives good things.
Yikes! Another self-aggrandizing ignorant! Plonk.
As for the theory, I have never seen one, so I made up my own. It is
described here:
http://docs.secdat.com/index.php?gppn=Review+of+Automations
There are no proofs, just desriptions. This means it is not a theory
but only the suggestion of a theory. But the code is downloadable and
it runs :)
>
> For example, if a system is to be written that accepts US zip codes and
> populates city and state based on the zip, storing all three values,
> must the code for the derivation and the fact that this is derived data
> be known only through the code?
My system would specify that this way:
table zips {
column zip9 { primary_key: Y; }
column { city; }
foreign_key { states; }
}
table anytable {
foreign_key { zips; }
column city {
automation_id: FETCH;
auto_formula: zips.city;
}
column state {
automation_id: FETCH;
auto_formula: zips.state;
}
}
In the Andromeda implementation any attempt to write to anytable.city or
anytable.state would give the error "Direct writes not allowed to
derived column -city-" (or state as the case may be).
If you wanted your program to know which columns were safe to write to,
you look in the data dictionary. The Andromeda UI uses this to make
columns read-only so things are clear to the user.
>
> If there are materialized attributes, such as a student GPA, where the
> data should never be collected by any application and should only be
> derived, is there a way to specify or even ensure that it is derived
> (then materialized) data? Is there any notation that works with
> derived, but stored, data any differently than any other attributes?
In Andromeda:
table students {
column student_id { primary_key: Y; }
column credits_taken {
automation_id: SUM;
auto_formula: schedule.credits_taken;
}
column credits_earned {
automation_id: SUM;
auto_formula: schedule.credits_earned;
}
column cum_gpa {
chain calc {
test { compare: @credits_earned = 0; return: 0; }
test { return: @credits_earned / @credits_taken; }
}
}
}
table schedule {
foreign_key students { primary_key: Y; }
foreign_key courses { primary_key: Y; }
column credits_taken {
automation_id: FETCH;
auto_formula: courses.credits;
}
column credits_earned { description: Numerical Grade; }
}
>
> I also have not seen anything in conceptual modeling techniques,
> including ORM (I'm not an expert on that), to collect information about
> what is or can be derived from what in the conceptual model. Does
> anyone have suggestions in that area? Thanks. --dawn
>
My own experience is that it is best to leave the design of the tables,
including the derivations, to human beings, since the operations
requires judgement. Once you've got the designs, feed them to the
builder to build/upgrade your database. Concentrate on automating the
deterministic stuff, but don't bother trying to automate the stuff that
requires judgement.
As for ORM, data and code are fundamentally different, trying to match
your data tier to code tier by smashing one into the shape of the other
smells bad on general principles. I have no reason to pursue it, I've
found dictionary-based library techniques produce far more code in less
time that accomplishes far more.
Ken,
I read that essay, then I read the other one:
http://docs.secdat.com/index.php?gppn=Data+is+More+Valuable+Than+Code
It looks to me as if the first sentence of paragraph two has a dramatic
typo. (If by some chance it's not a typo, it needs a much better
explanation :-)
Marshall
!!!
Thanks Marshall. The explanation has something to do with an 18 hour
day and too much coffee.
>
>
> Marshall
>
I would be careful to avoid the idea that the procedure is processing
through the table row-by-row. The calculated value of the CURRENT_GPA is
intrinsic and should always be correct. Whenever any row in any table that
"feeds" this value changes, the changes must propagate all the way through
within that transaction, and must be wholly visible everywhere when the
transaction commits.
>
> In my neck of the woods, that specification would be on the derived
> field (CURRENT_GPA), but there would be no information on the
> materialized version (M_GPA) that specifies that it is a snapshot of
> CURRENT_GPA. So, there is nothing in the dbms that tells us things
> like the fact that no apps should update M_GPA because there is a
> process that derives and stores it and that is the only process that
> should do that. Although it makes sense in my implementation that
> there would be no constraint in the dbms (since there are few
> constraints specified to the dbms), it sure would be nice to have
> information that ties the materialized attribute to the derived
> attribute.
You want to avoid a duality between two values, one materialized, one not.
Materialize it and be done with it, otherwise you will be fielding calls
from people getting incorrect results.
--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)
Welcome back. Seems like you were out for the academic calendar.
> I saw your post and decided it was time to end my long exile, you have
> asked the questions that are at the heart of my company and project.
>
> In order to even begin to deal with this question, you have to deal with
> the 800lb gorilla called "Normalization", which forbids derived data.
Interesting perspective. I think of SQL Views as permitting derived
data, including a means to include an attribute from a function that
accesses a stored procedure, for example.
> I
> wrote an essay on that some months ago, which is here:
>
> http://docs.secdat.com/index.php?gppn=Normalization+and+Automation
>
> The states basically that derived data is part of life, and since
> normalization can't deal with it, normalization is not enough to guide
> serious application development. But you also can't throw it away,
> because it gives good things.
Great stuff! I like the name you (or others?) gave this concept.
There is a distinction between derived columns/attributes and what you
called automated columns. Very good. That is what I'm asking about --
stored derived data, which I will now call "automated" (even if I call
them attributes instead of columns) instead of "stored derived" based
on
http://docs.secdat.com/index.php?gppn=Review+of+Automations
I like most of what you wrote in your normalization + automation
column, with only minor quibbles about the normalization issue. I,
too, believe that we should treat our metadata, including business
rules, as data. Those writing database-independent software and those
writing IDEs typically do this, each reinventing the wheel, of course.
Obviously our theory needs to include derived data, but my read on the
RM is that it does, just not "in base tables." So, it separates
"non-derived" and derived along set lines, rather than along attribute
lines. That is a mistake, in my opinion.
Anecdote: In an environment where derived data can be added to a
relation, a SQL-trained data modeler wanted to normalize the derived
data into the mix, at which point I had a little go-round on that with
the upshot that now the derived data are not related to the key, the
whole key, and nothing but the key. There is no need to normalize
derived data. I think that might be one of your points here too.
I do know that asking some relational modeling folks how to handle
atomated columns (yup, that term works great!) puts them way outside
their comfort zone where there should never be any redundancy in data.
Of course it is that kind of thinking that prompted people to take the
whole bloomin' database and port it to another one, reshaping it,
deriving stuff, etc, for reporting purposes. Talk about redundancy!
Cheers! --dawn
Jan, do you see the sort of shit that results when you encourage the
willful ignorants?
What's worse this latest ignorant uses multiple email addresses, which
just makes more work for filtering the idiot.
> Jan, do you see the sort of shit that results when you encourage the
> willful ignorants?
>
> What's worse this latest ignorant uses multiple email addresses, which
> just makes more work for filtering the idiot.
The repeated, purposeful use of logical fallacies in your postings
(e.g. see http://en.wikipedia.org/wiki/Logical_fallacy ) is
troublesome, coming from someone I am guessing is capable of a logical
argument. Perhaps someday you will take the time to lay one out. Is
there something logically wrong about derived data or automated data or
my questions about them?
I sometimes miss in being entirely logical, as most do, but I attempt
to correct my errors once I understand them. I'm guessing I'm not the
only one who would appreciate that approach from you as well.
Of course, you will not see this unless someone quotes it since you
filter me out, thankfully.
--dawn
I didn't catch the typo, but the whole first paragraph seems like
wishful thinking. Lots of people know they have crappy data which was
derived by crappy code.
p
> I didn't catch the typo, but the whole first paragraph seems like
> wishful thinking. Lots of people know they have crappy data which was
> derived by crappy code.
>
Yes, but in a lot of cases crappy code can be pulled out and replaced,
while crappy data is a permanent disaster.
I didn't catch the typo either. Could it have been fixed by now?
>
> Interesting perspective. I think of SQL Views as permitting derived
> data, including a means to include an attribute from a function that
> accesses a stored procedure, for example.
If there is anyplace where we could nail down the precise formulation of how
views support derived data, then c.d.t. is the place.
In my book, you begin with the ability to specify derived data, that is the
"theory" part of it. Views can be used to implement some derived data, but
not all. They have two drawbacks in practice. First, they become unwieldy
as they get complex. Second, they cannot create new rows.
Another drawback is more of a balance question. Materializing derivations
within the transaction lengthens the transaction, takes more space, and
increases in some cases the chances of contention. But then the reads are
fast. With a view, the transactions are kept shorter, at the price of an
ever-increasing read cost as the complexity increases.
Finally, SQL itself is a piss-poor language in which to store the
authoritative fundamental definition of derived data, so the view statement
itself can't be your method of definition. You need a better language in
which to define the variations, and then you decide whether to use a view
as an implementation method.
>
> I like most of what you wrote in your normalization + automation
> column, with only minor quibbles about the normalization issue. I,
> too, believe that we should treat our metadata, including business
> rules, as data.
I wish I were more persuasive on this point. It seems so self-evident to me
that I often find myself at a loss to explain it.
> Those writing database-independent software and those
> writing IDEs typically do this, each reinventing the wheel, of course.
> Obviously our theory needs to include derived data, but my read on the
> RM is that it does, just not "in base tables." So, it separates
> "non-derived" and derived along set lines, rather than along attribute
> lines. That is a mistake, in my opinion.
If I understand what you are saying, you want to see the derived data become
attributes of the key, or, columns. Me too. In super-plain English, a
derived value is just another column in the table.
>
> Anecdote: In an environment where derived data can be added to a
> relation, a SQL-trained data modeler wanted to normalize the derived
> data into the mix, at which point I had a little go-round on that with
> the upshot that now the derived data are not related to the key, the
> whole key, and nothing but the key. There is no need to normalize
> derived data. I think that might be one of your points here too.
I don't understand what each of you were proposing. How did he want to
normalize the derived data?
>
> I do know that asking some relational modeling folks how to handle
> atomated columns (yup, that term works great!) puts them way outside
> their comfort zone where there should never be any redundancy in data.
yeah, they have forgotten what normalization is for.
> Of course it is that kind of thinking that prompted people to take the
> whole bloomin' database and port it to another one, reshaping it,
> deriving stuff, etc, for reporting purposes. Talk about redundancy!
Ha! LOL!
>>
>>
>> Marshall
>>
>
> I didn't catch the typo, but the whole first paragraph seems like
> wishful thinking. Lots of people know they have crappy data which was
> derived by crappy code.
I fixed the typo early this morning after seeing Marshall's post.
But those poor folks with the crappy data will pay you plenty to clean it
up, there are whole industries that do this. The code will sooner or later
be thrown away.
>
> p
yeah, i had a feeling that the crappy idea of cleaning crappy data would
crop up. not many in IT have the courage or scruples to tell the poor
folks to just throw it all out.
p
Not many have the skills to clean it up either.
Nice chattin' with ya
I doubt that. There is no shortage of snake oil salesmen.
p
love the optimism :-)
>
> In my book, you begin with the ability to specify derived data, that is the
> "theory" part of it. Views can be used to implement some derived data, but
> not all. They have two drawbacks in practice. First, they become unwieldy
> as they get complex.
agreed
> Second, they cannot create new rows.
Hmm. I guess I do think about derived data as columns rather than
rows. Other than aggregate rows, I'm having trouble coming up with
derived rows -- can you give an example?
> Another drawback is more of a balance question. Materializing derivations
> within the transaction lengthens the transaction,
Yes.
> takes more space,
a whole lot less than if you port a lot of your data elsewhere for
reporting
> and
> increases in some cases the chances of contention. But then the reads are
> fast. With a view, the transactions are kept shorter, at the price of an
> ever-increasing read cost as the complexity increases.
>
> Finally, SQL itself is a piss-poor language in which to store the
> authoritative fundamental definition of derived data,
I'm sure that's true. I did work on a project to define the language
used to specify derived data before, but only at the very start where
we could argue the approaches. Too much fun.
> so the view statement
> itself can't be your method of definition. You need a better language in
> which to define the variations, and then you decide whether to use a view
> as an implementation method.
Of course I don't use SQL views to implement anythiing (when not using
a SQL DBMS)
> >
> > I like most of what you wrote in your normalization + automation
> > column, with only minor quibbles about the normalization issue. I,
> > too, believe that we should treat our metadata, including business
> > rules, as data.
>
> I wish I were more persuasive on this point. It seems so self-evident to me
> that I often find myself at a loss to explain it.
>
> > Those writing database-independent software and those
> > writing IDEs typically do this, each reinventing the wheel, of course.
> > Obviously our theory needs to include derived data, but my read on the
> > RM is that it does, just not "in base tables." So, it separates
> > "non-derived" and derived along set lines, rather than along attribute
> > lines. That is a mistake, in my opinion.
>
> If I understand what you are saying, you want to see the derived data become
> attributes of the key, or, columns. Me too. In super-plain English, a
> derived value is just another column in the table.
Yes. That is precisely how they are implemented in a database
environment I'm trying not to mention.
> > Anecdote: In an environment where derived data can be added to a
> > relation, a SQL-trained data modeler wanted to normalize the derived
> > data into the mix, at which point I had a little go-round on that with
> > the upshot that now the derived data are not related to the key, the
> > whole key, and nothing but the key. There is no need to normalize
> > derived data. I think that might be one of your points here too.
>
> I don't understand what each of you were proposing. How did he want to
> normalize the derived data?
The specific case I recall is a "file" with a composite key, each of
which being a dimension that one might use for slicing and dicing. The
file didn't have much base data in this file that would be a
relationship file in an ERD among the three parts of the key. I wanted
to put derived data related to just one part of the key (each part) in
the file. This turned the file into a virtual cube. The DBA said it
was not normalized with virtual fields that were not related to the
whole key. (He was a she, by the way)
> > I do know that asking some relational modeling folks how to handle
> > atomated columns (yup, that term works great!) puts them way outside
> > their comfort zone where there should never be any redundancy in data.
>
> yeah, they have forgotten what normalization is for.
>
> > Of course it is that kind of thinking that prompted people to take the
> > whole bloomin' database and port it to another one, reshaping it,
> > deriving stuff, etc, for reporting purposes. Talk about redundancy!
>
> Ha! LOL!
Good to have you back. Cheers! --dawn
> Another drawback is more of a balance question. Materializing derivations
> within the transaction lengthens the transaction, takes more space, and
> increases in some cases the chances of contention. But then the reads are
> fast. With a view, the transactions are kept shorter, at the price of an
> ever-increasing read cost as the complexity increases.
>
Hi Ken. Long Time no See. Welcome back.
It's going to take me a long time to go through the material you've pointed
us to.
In the meantime, I'd like to understand your main point a little better.
Oracle has a tool called a "Snapshot"
that was new to me on one of my projects. When the DBA described snapshots,
my reaction was that it sounded like a materialized view. Her response was
that that is exactly what it is.
In retrospect, I think my response was an oversimplification. Some
transforms can be carried out in a snapshot that can't, to my knowledge, be
implemented in a view. I've got two questions for you. First, is what I've
just said true or not? Second, if true, is it relevant to the point you
made in your earlier post?
> Finally, SQL itself is a piss-poor language in which to store the
> authoritative fundamental definition of derived data, so the view
statement
> itself can't be your method of definition. You need a better language in
> which to define the variations, and then you decide whether to use a view
> as an implementation method.
>
Well, I'm an eighty-twenty kind of guy myself. My attitude toward good
tools with some defects is anathema in c.d.t. I've done some good things
with SQL, and I'm not about to apologize to anyone in this group for doing
that.
At the same time, it's worthwhile knowing what the features of a better
language than SQL would be.
So, Ken, what makes SQL a piss-poor language?
> >
> > I like most of what you wrote in your normalization + automation
> > column, with only minor quibbles about the normalization issue. I,
> > too, believe that we should treat our metadata, including business
> > rules, as data.
>
> I wish I were more persuasive on this point. It seems so self-evident to
me
> that I often find myself at a loss to explain it.
>
I understand your frustration. You could shorten your argument down to the
level of a slogan, like "theory IS practical".
It might come out somthing like this: "software IS data, but it's just not
expressed in the most usable form". Trouble is, to people who just want to
run it, and not manage it, software IS expressed in the most usable form.
I think that's where the disconnect is between you and those you wish you
could persuade.
>
> agreed
>
>> Second, they cannot create new rows.
>
> Hmm. I guess I do think about derived data as columns rather than
> rows. Other than aggregate rows, I'm having trouble coming up with
> derived rows -- can you give an example?
You know its funny you say this. To make a long story short, the only
automated creation of rows I have ever used is the materialization of
aggregates.
This is done in Andromeda by what I call the auto-insert feature of a
foreign key. When a row is inserted into a child table that would fail RI
validation to parent table X, it actually inserts a row into table X to
satisfy the RI.
I say this is funny because this totally surprised me. I had assumed for no
good reason that I would need to arbitrarily insert rows into any Table X
based on any event in Table Y. But as I was working through real-world
cases I discovered that if there was no key between X and Y, it was
fiendishly difficult to verify the data was cascading correctly. So I
began in practice to cascade only between related tables. Of course I
realized eventually that the auto-insert foreign key could be used in all
cases, and then it dawned on me that this was no coincidence. A definition
of an automated write from one table to another could not make sense unless
the data could be demonstrated correct after the fact, and the way we
relate rows between tables is called a foreign key.
>
>> Another drawback is more of a balance question. Materializing
>> derivations within the transaction lengthens the transaction,
>
> Yes.
>
>> takes more space,
>
> a whole lot less than if you port a lot of your data elsewhere for
> reporting
>
Ha! this is the second time you've mentioned this, the point is well taken.
>> and
>> increases in some cases the chances of contention. But then the reads
>> are
>> fast. With a view, the transactions are kept shorter, at the price of an
>> ever-increasing read cost as the complexity increases.
>>
>> Finally, SQL itself is a piss-poor language in which to store the
>> authoritative fundamental definition of derived data,
>
> I'm sure that's true. I did work on a project to define the language
> used to specify derived data before, but only at the very start where
> we could argue the approaches. Too much fun.
I don't know if you saw my second post to your original question, but I have
a language defined and it works well. My largest project is about 250
heavily automated tables described in a data file about 9000 lines long.
Success was elusive until I followed this track:
1) describe informally what we need to know about tables, including
automation. I need to know types, keys, formulas....
2) design the tables to record what we need to know about tables, IOW,
define the data dictionary
3) come up with a file format that can be easily typed and read, and can
also be easily parsed and loaded to tables for processing.
From a theoretical point it is important to stress that statements in a data
dictionary language are propositions, which, by Codd, makes them data. The
path to getting a working system is to concentrate first on what data you
are trying to record, then coming up with a syntax for it.
>
>> so the view statement
>> itself can't be your method of definition. You need a better language in
>> which to define the variations, and then you decide whether to use a view
>> as an implementation method.
>
> Of course I don't use SQL views to implement anythiing (when not using
> a SQL DBMS)
LOL
>
>> > Anecdote: In an environment where derived data can be added to a
>> > relation, a SQL-trained data modeler wanted to normalize the derived
>> > data into the mix, at which point I had a little go-round on that with
>> > the upshot that now the derived data are not related to the key, the
>> > whole key, and nothing but the key. There is no need to normalize
>> > derived data. I think that might be one of your points here too.
>>
>> I don't understand what each of you were proposing. How did he want to
>> normalize the derived data?
>
> The specific case I recall is a "file" with a composite key, each of
> which being a dimension that one might use for slicing and dicing. The
> file didn't have much base data in this file that would be a
> relationship file in an ERD among the three parts of the key. I wanted
> to put derived data related to just one part of the key (each part) in
> the file. This turned the file into a virtual cube. The DBA said it
> was not normalized with virtual fields that were not related to the
> whole key. (He was a she, by the way)
Yeah, a shame. She was correct, but wrong.
>
> "Kenneth Downs" <knode.wa...@see.sigblock> wrote in message
> news:ue2ti3-...@pluto.downsfam.net...
>
>> Another drawback is more of a balance question. Materializing
>> derivations within the transaction lengthens the transaction, takes more
>> space, and
>> increases in some cases the chances of contention. But then the reads
>> are
>> fast. With a view, the transactions are kept shorter, at the price of an
>> ever-increasing read cost as the complexity increases.
>>
>
> Hi Ken. Long Time no See. Welcome back.
Thanks.
>
> It's going to take me a long time to go through the material you've
> pointed us to.
>
> In the meantime, I'd like to understand your main point a little better.
> Oracle has a tool called a "Snapshot"
> that was new to me on one of my projects. When the DBA described
> snapshots,
> my reaction was that it sounded like a materialized view. Her response
> was that that is exactly what it is.
I googled variations of "oracle snapshot" and seem to be getting features
that look like backups, which i don't think is what you mean. Can you tell
me more?
>
> In retrospect, I think my response was an oversimplification. Some
> transforms can be carried out in a snapshot that can't, to my knowledge,
> be
> implemented in a view. I've got two questions for you. First, is what
> I've
> just said true or not? Second, if true, is it relevant to the point you
> made in your earlier post?
If you can flesh out more about snapshots I'll try to answer.
>
>
>
>> Finally, SQL itself is a piss-poor language in which to store the
>> authoritative fundamental definition of derived data, so the view
> statement
>> itself can't be your method of definition. You need a better language in
>> which to define the variations, and then you decide whether to use a view
>> as an implementation method.
>>
>
> Well, I'm an eighty-twenty kind of guy myself. My attitude toward good
> tools with some defects is anathema in c.d.t. I've done some good things
> with SQL, and I'm not about to apologize to anyone in this group for
> doing that.
>
> At the same time, it's worthwhile knowing what the features of a better
> language than SQL would be.
>
> So, Ken, what makes SQL a piss-poor language?
Let me clarify.
SQL is great as an interface to relational a DMBS. It has its detractors,
but I'm not one of them. It is childishly simple to generate SQL (much
easier than generating HTML or XML for instance) and so in practical terms
I love it. It is Good Enough(tm).
My complaint is that a view definition, written in SQL, is a terrible format
in which to store meta-data. Technically the view definition tells you
everything you need to know about where the derived columns came from, but
if your entire system (the other tiers besides the server) are based on the
use of a data dictionary, you need that dictionary in very simple clean
tables for easy querying. SQL DDL is a dead letter here, you need
something else.
Therefore you want some superior syntax for defining the derived columns. In
my case, that syntax allows human beings to type files that get converted
into data and loaded to data dictionary tables. Then you generate the view
definition, though in my case I generate trigger code instead.
>
>
>> >
>> > I like most of what you wrote in your normalization + automation
>> > column, with only minor quibbles about the normalization issue. I,
>> > too, believe that we should treat our metadata, including business
>> > rules, as data.
>>
>> I wish I were more persuasive on this point. It seems so self-evident to
> me
>> that I often find myself at a loss to explain it.
>>
>
> I understand your frustration. You could shorten your argument down to
> the level of a slogan, like "theory IS practical".
point well taken
>
> Hmm. I guess I do think about derived data as columns rather than
> rows. Other than aggregate rows, I'm having trouble coming up with
> derived rows -- can you give an example?
>
well I said only aggregate in an earlier post, but I had forgotten about
another case :(
Consider a generic list of things that must be done to complete a job, such
as documents required to move an animal from country A to country B. The
documents required vary by country, so you have a list of which documents
are required by country.
When a job is created, you want to COPY all of the rows for Country B into
the JOBS_DOCS table. The user can then override these defaults by deleting
or disabling some rows and adding new ones, but the original copies came
from the master list.
How do you describe this copy command? Right now I do this in the client
code (gasp!) It's on my list of things to think about someday.
So, maybe an all-determinant
required { country, document }
would allow you to know what country requires which documents
and what document is required by which countries. I guess a
jobs { job }
is needed. In no particular language, some operator('job','country')
should do
{ jobs | 'job' = job } , { required | 'country' = country }
which is supposed to yield a set of { job, country, document }
Notation? '{ }' demark sets; ',' is product; '|' is where; '=' equals,
''' wraps literals, and '( )' enclose parameters and arguments.
"But, Shirley, was that really your question?"
> When a job is created, you want to COPY all of the rows for Country B into
> the JOBS_DOCS table. The user can then override these defaults by deleting
> or disabling some rows and adding new ones, but the original copies came
> from the master list.
>
> How do you describe this copy command? Right now I do this in the client
> code (gasp!) It's on my list of things to think about someday.
Copy command? No doubt, you want to do something with the response -
throw away the country and put the rest it in the job-docs table;
seems fine by me - but I'm not sure it's a copy.
Created? You *are* thinking client code, aren't you? Nothing wrong
with that, but a different frame of mind will serve you better when
there are data to be wrangled.
> Kenneth Downs wrote:
>> dawn wrote:
>>
>>
>>>Hmm. I guess I do think about derived data as columns rather than
>>>rows. Other than aggregate rows, I'm having trouble coming up with
>>>derived rows -- can you give an example?
>>>
>>
>>
>> well I said only aggregate in an earlier post, but I had forgotten about
>> another case :(
>>
>> Consider a generic list of things that must be done to complete a job,
>> such
>> as documents required to move an animal from country A to country B. The
>> documents required vary by country, so you have a list of which documents
>> are required by country.
>
> So, maybe an all-determinant
>
> required { country, document }
I think you know what I'm saying, but we are coming at it from very
different angles.
Andromeda specifies everything as properties of tables, columns, or keys,
because it is specifying a database. Since writing the original post I
have asked myself how to specify it as a property of a foreign key.
Lets simplify considerably and say that every job has a checklist of items
that must be performed. The checklist for each job is taken from a master
list at the time the job is created. Whatever rows are in the master list
are copied to the job.
Translating that into tables, we have a table MASTERLIST, a table JOBS, and
a table JOBLIST. What is important is the table JOBLIST:
table JOBLIST {
foreign_key JOBS { primary_key: Y; }
foreign_key MASTERLIST { primary_key: Y; }
}
which states that the table JOBLIST is composed of columns that match the
primary key of JOBS, and the primary key of MASTERLIST, and that all of
these columns together form the primary key of this table.
Now, how would you specify something here so that an INSERT into JOBS copies
all rows from MASTERLIST into JOBLIST?
I have experimented with flags on the foreign_key definitions, but never
found anything that was totally satisfactory. At one time I experimented
with a flag on a foreign key that basically said, "When an insert to the
parent table of this key occurs, populate the child table with the
cross-join of all other foreign keys". This works but I don't like it
much. However, it is the most likely path that I will take.
>
>
> Notation? '{ }' demark sets; ',' is product; '|' is where; '=' equals,
> ''' wraps literals, and '( )' enclose parameters and arguments.
>
The notation for specifying databases in Andromeda is here:
http://docs.secdat.com/index.php?gppn=ADD+Syntax
It is not related to the notation you are using.
The rest of your post seems directed by assumptions that do not hold for the
question at hand.
Ah, Andromeda. I see.
> Lets simplify considerably and say that every job has a checklist of items
> that must be performed. The checklist for each job is taken from a master
> list at the time the job is created. Whatever rows are in the master list
> are copied to the job.
>
> Translating that into tables, we have a table MASTERLIST, a table JOBS, and
> a table JOBLIST. What is important is the table JOBLIST:
>
> table JOBLIST {
> foreign_key JOBS { primary_key: Y; }
> foreign_key MASTERLIST { primary_key: Y; }
> }
>
> which states that the table JOBLIST is composed of columns that match the
> primary key of JOBS, and the primary key of MASTERLIST, and that all of
> these columns together form the primary key of this table.
>
> Now, how would you specify something here so that an INSERT into JOBS copies
> all rows from MASTERLIST into JOBLIST?
I wouldn't. The point, perhaps not sufficiently explicit, is that
some operation is needed -- which should be performed "at the time
the job is created."
> I have experimented with flags on the foreign_key definitions, but never
> found anything that was totally satisfactory. At one time I experimented
> with a flag on a foreign key that basically said, "When an insert to the
> parent table of this key occurs, populate the child table with the
> cross-join of all other foreign keys". This works but I don't like it
> much. However, it is the most likely path that I will take.
And, from what I gather, Andromeda provides some sort of framework
for performing designer-specified operations when some 'data
manipulation events' occur, is that right?
I can see where this might be very useful, but I suspect that
usefulness has bounds. And, in that regard, Andromeda is very much
like many other 'interface builder' or 'application builder' or
'report builder' or even 'system builder' tools out there. I've
used more than a couple, and it seems that everything works as
long as you don't need anything the tool designers didn't provide
features for. At that point, one has had to abandon the tool and
start coding "by hand." Whether - or when - one can use the tool
again has always created a point of no return decision in my
development projects.
>>Notation? '{ }' demark sets; ',' is product; '|' is where; '=' equals,
>>''' wraps literals, and '( )' enclose parameters and arguments.
>>
>
>
> The notation for specifying databases in Andromeda is here:
>
> http://docs.secdat.com/index.php?gppn=ADD+Syntax
I took a look; I didn't see anything among the automations,
chains, or cascading actions that looked like it would be
a good fit. The description of automations, which seem to
be column and foreign-key (a new feature) oriented, didn't
seem like good fits. The phrase
[i]t is easy enough to provide any number of ways to
add rows to TABLE X on the event of changes to TABLE Y
seemed a bit of a teaser. Perhaps what you're looking for
might be, in Andromeda terms, a "row automation?"
Have you asked the Andromeda authors?
[..]
> And, from what I gather, Andromeda provides some sort of framework
> for performing designer-specified operations when some 'data
> manipulation events' occur, is that right?
>
> I can see where this might be very useful, but I suspect that
> usefulness has bounds. And, in that regard, Andromeda is very much
> like many other 'interface builder' or 'application builder' or
> 'report builder' or even 'system builder' tools out there. I've
> used more than a couple, and it seems that everything works as
> long as you don't need anything the tool designers didn't provide
> features for. At that point, one has had to abandon the tool and
> start coding "by hand." Whether - or when - one can use the tool
> again has always created a point of no return decision in my
> development projects.
This is true - in my book it is all or nothing. Using a tool that you
know you will have to jettison before the project is completed is no fun
at all.
[..]
> Have you asked the Andromeda authors?
Ken is the Andromeda author!
[..]
Cheers, Frank.
>
> I took a look; I didn't see anything among the automations,
> chains, or cascading actions that looked like it would be
> a good fit. The description of automations, which seem to
> be column and foreign-key (a new feature) oriented, didn't
> seem like good fits. The phrase
> [i]t is easy enough to provide any number of ways to
> add rows to TABLE X on the event of changes to TABLE Y
> seemed a bit of a teaser. Perhaps what you're looking for
> might be, in Andromeda terms, a "row automation?"
>
You are right about that sentence, it is too philosophical in a document
that should be a review and light reference. I took it out. I'm afraid
the current sentence is still to philosophical, but perhaps a few more
iterations will do it.
In any event, yes, the idea is to define a new method of row automation not
currently provided.
The concept in my previous post, of populating a child table automatically
on INSERT to parent, has appeal to me because it rounds out the operations
Andromeda provides around foreign keys. If this operation were put in, you
would be able to create parents, create children, copy down and aggregate
up through a foreign key. It becomes the 4-lane highway of automation.
The day's off to a good start -- I've already learned something.
> [..]
>
> Cheers, Frank.
Jay,
You would benefit by paying more attention when I point out the
snake-oil salesmen and other self-aggrandizing ignorants.