I did not obfuscate in replying to her thread, but was able to avoid
having to know what she meant by that term.
Can someone define "flatten database" for me, in 100 words or less (OK
500 words is fine).
Fred Z.
"Flatten database" means removing all the curvatures. Start applying slicing
and dicing intil the database get above the 1st normal form. If there is no
schema to begin with, then use stylesheet transformations generously. I
could go on, but it would exhaust your limit of 500 words.
>Can someone define "flatten database" for me, in 100 words or less (OK
>500 words is fine).
Hi Fred,
The only situations where I have heard this term is when (part of) the
contents of a database are copied to one or more sequential files. The
reason to do this is usually to speed up a batch process (using balanced
line processing to process many thousands of transactions against the
flattened contents of the database, then rebuilding the database).
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
Denormalize.
--
Daniel A. Morgan
University of Washington
damo...@x.washington.edu
(replace 'x' with 'u' to respond)
unless it's put on paper, i think Mikito is right (i don't know how to
re-combine posts, pardon me) when he says ""Flatten database" means
removing all the curvatures. Start applying slicing and dicing intil the
database get above the 1st normal form. If there is no schema to begin
with, then use stylesheet transformations generously. I could go on, but
it would exhaust your limit of 500 words."
i hope i would've asked interviwer "define flat?" and if they dodged,
"alright, how flat?", but i avoid wine before interviews. maybe i should
apply that to newsgroups too!
p
all
I would guess "normalize" is more likely. If the source dbms permits
non-1NF structures, then the term "flatten" is sometimes used to mean that
the data are put in1NF. Terms I might use as synonyms to "flatten" are
"unnest", "explode", "normalize". It isn't quite the same as normalizing,
however, because the embedded lists are not placed in a separate relation,
but rather the "scalar" data are repeated in each row, while the list has
one value per row. Reasons to flatten the structure would include a need to
use a SQL-92-based tool or pour the data into a single Excel worksheet.
If the source dbms is a SQL-based product, then the user might still be
dumping tag-delimited lists in as attribute values (we all know there are
people who do that, right?) or even using some non-1NF capabilities of the
particular RDBMS.
Example:
Start with:
(12345,DeSmith,John,{jsm...@aol.com,John...@yahoo.com})
(23127,Meador,Beth,{beth_...@msn.com,bme...@hotmail.com,b...@ibm.org})
Then flatten it to get:
(12345,DeSmith,John,jsm...@aol.com)
(12345,DeSmith,John,John...@yahoo.com)
(23127,Meador,Beth,beth_...@msn.com)
(23127,Meador,Beth,bme...@hotmail.com)
(23127,Meador,Beth,b...@ibm.org)
That's my best guess. --dawn
Dawn,
Flattening is DEnormalizing. Period. You have it backwards and twisted. It
could be called Neo-normalizing.
OK, I'd vote for both terms being incorrect. It is not denormalizing
because it is starting with non-1NF data -- agreed? And it is not
normalizing because, well, it clearly isn't (the key, the whole keyu and
nothing but the key ...). However, the reason that "we" sometimes call it
normalizing is that it takes a non-SQL92-compatible structure and turns it
into something that can be used with ODBC, for example. By formal
definitions (until perhaps recently) the source data are not in first normal
form. So, informally, to get ODBC to work (for example) someone might
suggest they are normalizing it. I'm one of the people who says "flatten"
instead because normalizng is clearly wrong. However, it is still an
informal way people might say it as in "we need to nornalize the data in
order to pull it into Excel". (Wrong, but you can see how such language
evolved).
> You have it backwards and twisted.
I think I had the concept correct and the "language that gets used" correct,
even if such language is inaccurate. Agreed?
> It
> could be called Neo-normalizing.
That's going a bit too far now, don't you think?! smiles. --dawn
I'll just add one other point that the reason that "flattening" from a
non-1NF is sometimes referred to as normalizing is that there is a sense
that the nested structures are pulled into separate "rows" thereby
normalizing, and then a view created that ties them to their original table,
but now as if both relations (the parent and the child) were normalized and
then joined. Did that help clarify the langauge?
> You have it backwards and twisted.
This might still be true. --dawn
No. Putting anything into rows (seperate or otherwise) is not normalizing
Normalizing is simply a process of removing redundancy from data and
reducing the possibility of insert, update and delete anomalies. It is not
done at the row level, it is done at the individual data element level.
(Remember functional dependencies? Did you ever read the Elmasri/Navathe
book you bought?) The structure of a row is defined +as a result+ of the
process, it is not the process itself. Flattening is the process of doing
the opposite (DEnormalizing)- taking normalized data from many normalized
tables (think of it as a kind of hierarchy (not that it is), or parent-child
relationships (Joe Celko, do not yell at me- I am just trying to explain
this in simple terms)) and putting it into fewer tables (usually to one
table) (IOW, taking away the hierarchy "pyramid", therfore causing it to
appear "flat" or "flattened", hence the term "flattening"). This could
introduce redundancy, and therefore cause the above mentioned anomalies.
I _still_ don't understand why you insist on making this complex. It isn't -
it is soooo basic - but I give up.
> > You have it backwards and twisted.
>
> This might still be true. --dawn
>
>
Well, yes...
> I _still_ don't understand why you insist on making this complex. It
isn't -
> it is soooo basic - but I give up.
If you can bear to hear the truth you've spoken,
Twisted by knaves to make a trap for fools,
Or see the works you gave your life to, broken,
And stoop and mend 'em up with worn out tools...
--Kipling--
I'm not sure you understood me. I really do know how to normalize data
according to the usual practices -- I'm trying to explain a term, which is
the topic of this thread. I also explained WHY when working with embedded
lists, putting the data into a form that is usable as a SQL VIEW is
sometimes referred to this way. This is an incorrect use of the term
"normalize" but if you listen a bit, you might be able to hear how such
language evolved:
George: Can you pull that into Excel using ODBC?
Harry: Do you want to have a person's list of e-mail addresses all in one
cell or in separate cells?
G: Separate ones
H: The e-mail addresses are in a list, so I'll have to normalize the data
first
G: OK, let me know when you've flattened the data into Excel, OK?
> Normalizing is simply a process of removing redundancy from data
That is true for 2NF and beyond, but that is where I get on my soapbox about
1NF. 1NF is not about reducing redundancy, is it?
> and
> reducing the possibility of insert, update and delete anomalies. It is not
> done at the row level, it is done at the individual data element level.
Of course, but one can have a row with embedded lists, thereby indicating
that the data are not in 1NF, right?
> (Remember functional dependencies? Did you ever read the Elmasri/Navathe
> book you bought?)
About half of it and almost all of Date's 8th edition. I'll sitll admit to
ignorance, but perhaps not as much as you might think, so you might have to
decide whether or not I'm dumb instead ;-)
> The structure of a row is defined +as a result+ of the
> process, it is not the process itself. Flattening is the process of doing
> the opposite (DEnormalizing)- taking normalized data from many normalized
> tables (think of it as a kind of hierarchy (not that it is), or
> parent-child
> relationships (Joe Celko, do not yell at me- I am just trying to explain
> this in simple terms)) and putting it into fewer tables (usually to one
> table) (IOW, taking away the hierarchy "pyramid", therfore causing it to
> appear "flat" or "flattened", hence the term "flattening"). This could
> introduce redundancy, and therefore cause the above mentioned anomalies.
>
> I _still_ don't understand why you insist on making this complex. It
> isn't -
> it is soooo basic - but I give up.
I was not trying to make it complex but to answer the original question
since I have often used the non-RDBMS phrase "flatten the data" and,
therefore, thought I might be able to shed some light on at least what I
meant when I said it.
>
>> > You have it backwards and twisted.
>
>
>>
>> This might still be true. --dawn
>
> Well, yes...
I'm not sure you are reading me right, but I'm sure you will feel free to
continue to hold that opinion. Ah well ...
--dawn
Which part of reading comprehension did you miss? ;-)
My one word response was "Denormalize." You will find up up just
under the signature "Fred Z."
So I'm not sure what it is you are responding to. And given that I
teach the subject here at the university I think I have at least
a basic knowledge thereof.
I was quite excited and thought I'd check out what classes you were
teaching (note: heavy dose of facetiousness here), especially as they
applied to database theory and computer science.
I looked for you here: http://www.cs.washington.edu/people/faculty/
Nope.
So maybe you are an adjunct? I looked for you here:
http://www.cs.washington.edu/people/faculty/adjuncts.shtml
Nope.
An affiliate perhaps? Nope...
http://www.cs.washington.edu/people/faculty/adjuncts.shtml#affiliate
Now. I happen to know that you were *once* an Oracle certification
instructor for a technical extension program that was sponsored by the
University of Washington. But those types of instructors would never
insinuate that they teach theory or computer science at the
undergraduate or granduate level, would they? Geez, that would be
bordering on unethical.
Perhaps they haven't updated the links yet.
Regards,
Dan
> Now. I happen to know that you were *once* an Oracle certification
> instructor
I have never taught Oracle certification in my life. So who is posturing
now mon ami?
My academic affiliations over the years include 7 colleges and
universities in three different states.
--
Ok. Erratum: Change the wording "certification" to "extension program
certificate".
http://www.extension.washington.edu/ext/certificates/oad/oad_gen.asp
Thanks,
- Dan
Geez, Dan, I was on your side. I was responding to Dawn's insitence that
flattening was something other than denormalizing..
You are not explaining- you are twisting a simple explanation so it fits
your view of the world. Flattening is denormalizing. It is not normalizing.
I also explained WHY when working with embedded
> lists, putting the data into a form that is usable as a SQL VIEW is
> sometimes referred to this way. This is an incorrect use of the term
> "normalize" but if you listen a bit, you might be able to hear how such
> language evolved:
>
> George: Can you pull that into Excel using ODBC?
> Harry: Do you want to have a person's list of e-mail addresses all in one
> cell or in separate cells?
> G: Separate ones
> H: The e-mail addresses are in a list, so I'll have to normalize the data
> first
> G: OK, let me know when you've flattened the data into Excel, OK?
This example makes no sense. I've NEVER heard ANYONE refer to the things
you've mentioned in this way, or anything close to it, and I've heard some
pretty dumb conversations in meetings.
>
> > Normalizing is simply a process of removing redundancy from data
>
> That is true for 2NF and beyond, but that is where I get on my soapbox
about
> 1NF. 1NF is not about reducing redundancy, is it?
In a sense, it is. 1NF is a description of a step in the process of
acheiving a minimum of 3NF. You are under the misunderstanding that is a
(first?) type of finsihed normalization. Technically speaking, I suppose it
is, but no one I know of would stop at 1NF. It is just the first gas station
between Chaostown and Organizedville. Fill'er up and keep driving. Next stop
2NF. No services available- just a porta-potty. Hardly even worth a visit.
As Tony said to me, not so long ago, maybe you are flogging a dead horse.
DA Morgan is telling us that he's "the professor". If you go back a few
months, you'll see Dawn announcing the same thing, although in different
phrasing. So I think it's wisest to let them play out their little game of
"my credentials are better than yours", and wait until they have something
substantive to respond to.
But I didn't insist on that, Alan -- I tried to explain that the term
"flattening" was sometimes used interchangably with the term "normalize"
(even if incorrectly). In that case it means that data that were in a
structure not usable with ODBC (to put it in concrete terms) were virtually
put into 1NF (one aspect of being normalized) and then virtually joined into
a VIEW of the data that SQL92 understands (that's where it is then
denormalized).
And because I think you might have what it takes to understand this point,
I'm going to give it one more shot and I'm really, really hoping that you
come back and say "OK, I do understand your point now".
Take two people, A & B, who work with non-1NF data. If you have no other
reference for data stored in a non-1NF structure, think of data in XML
documents (others might prefer to think of a VSAM or indexed-sequential file
defined with a COBOL OCCURS clause, even if that shows your age). Use the
same example I used earlier with (PersonID, LastName, FirstName, Email*)
tuples that include a list of Email addresses.
A: Can you put that data in Excel?
B. Sure. The data are not normalized -- do you want all of the e-mail
addresses for a person in a single cell?
A. No -- why don't you normalize it for me.
B. Will do. I'll flatten it and send you the spreadsheet.
A. Thanks.
This conversation is one that could definitely happen in the world of
non-RDBMS products. Cam you see why that is? Have I had any success
explaining why a person using the term "flatten" might be thinking in terms
of normalizing, even if the 1NF step is only the first of a process like
this in our example:
1. put the list of Email addresses into its own (virtual) table, with a
candidate key of the PersonID plus the e-mail address -- that is, put into
1NF, the first step in "normalizing" (at least in the "old" relational
version of normalizing).
2. create a view of the data spanning both the original table sans Email and
the new (virtual) table of e-mail addresses
3. use the view of the now "normalized" (from the first step) data,
recognizing that the step of making a view was a denormalizing step (but not
the same one that would put us back to the original data -- it is still a
step that keeps us in the world of SQL92).
That was my last shot at it, so if you still think I'm either trying to skew
my information or am making no sense, well, I tried my best.
Cheers! --dawn
Sorry ... I've lost track of who said what when so please accept my
sincere apology.
I happen to know that I never taught Oracle Certification. What I
taught was a class that led to a University of Washington certificate
that had nothing to do with Oracle and its certs. And the difference
is far from superficial. One is a canned program consisting of
PowerPoint slides manufactured by Oracle Corp. and taught at every
community college and training company around. The other a curriculum
that I custom developed for the university approved by the Computer
Science faculty and only taught to grad students ... to qualify for
my class requires a BA or BS.
If you have a point ... make it. Right now all I see is someone
trying to score points in a vacuum. The fact is I have 36 years in
this industry and know what denormalization is. And further the
mistake was yours in not properly reading the thread and realizing
that I had posted the correct answer. So get off your treadmill and
just acknowledge that you made a very small mistake. End of thread
as I'll not respond to this nonsense again.
Give me a break. I work in the business world and am taking a single year
"out" to teach with my lowly masters degree (and not even one in Computer
Science!) at the college where my husband teaches because they asked if I
could help them out.
I figured this would help me brush up my own knowledge in the areas I am
teaching since you can really learn a topic when you teach it! By the end
of this year I will have taught more CS courses than I have taken. I am
able to do this in a way that I think is competent because I have a quarter
of a century of experience and have been an avid reader and experimenter,
keeping quite current over the years.
So, while I have no problem mentioning facts, both relevant and not (I'm a
bit chatty), I would hope that you won't find me playing any one-upmanship
on just about anything except that my husband and I raised two wonderfull,
intelligent, beautiful, and kind daughters. Now THAT I'm very proud of
(enough to end my sentence with a preposition).
Cheers! --dawn
Nothing to do with Oracle? I understand the consultant mentality to sell
oneself, but quit while you are ahead.
I think that anyone who accesses the link above can see how Oracle-centric
the program is. A curriculum is published.
[snip]
The other a curriculum
> that I custom developed for the university approved by the Computer
> Science faculty and only taught to grad students ... to qualify for
> my class requires a BA or BS.
Well, two things: (1) I certainly wouldn't forget attribution to Jack Cline
for program development, and (2) it is not only taught to grad students;
professionals with no academic aspirations can apply and register as well.
Moreover, degree is not necessarily a criterion for admission. Admission
requirements state the following:
"Participants should have at least four years of undergraduate education,
technical training, or equivalent experience; familiarity with a programming
language; and a basic understanding of database management concepts (How to
register.
http://www.extension.washington.edu/ext/certificates/oad/oad_hta.asp)."
>
> If you have a point ... make it.
Ok. My point is this: You are NOT a professor at a university and you do
not teach a formal undergraduate or graduate curriculum, nor are you
necessarily qualified to do so. You teach in a professionalization program
(approved by the univerisity as an extension certificate program) focusing
on specific and single vendor set of technologies.
Passively letting people make the assumption you are a university
instructor, using wording that insinuates that fact, or perpetuating the
myth with misleading wording is as bad as actual active misrepresentation or
lying.
[snip]
> End of thread
> as I'll not respond to this nonsense again.
Well Daniel, that would be a change. There are very few instances (over the
course of many years) where I can recall you resisting the impulse or
temptation of getting the last word in.
Regards,
- Dan
In general, flattening is de-normalizing as a few have already stated.
The opposite, but not twisted, is Neo-normalizing where each thing
(even each symbol) is represent just once in a db.
An example that Neo-normalizes the symbol '1' where box has thickness
of 1 cm, weight of 1 kg and cost of $1 is shown at
www.xdb2.com/example/ex033.asp
An example that Neo-normalizes the string 'brown' where person named
brown lives on brown street, drives a brown car, has brown hair, wears
a brown suit, and has one brown eye is shown at
www.xdb2.com/example/ex030.asp
Certainly.
Can I try to bring some light here?
Some database texts actually do treat the step from NFNF (or UNF or
non-1NF or whatever you want to call it) to 1NF as the first step of the
normalization process which makes it valid to call this step
"normalization". Note that normalization means in general that you are
transforming something such that it afterwards adheres to a certain
normal form, i.e., a form in which it satisfies certain criteria, and
that is of course exactly what is happening in this step. So the term
"normalization" is certainly justified and correct.
However, it is also defendable to classify it as denormalization since
one of the goals of normalization was to remove redundancy, but
flattening the database may actually introduce redundancy, which sets it
somewhat apart from the other steps. Since in industry the term
normalization is often loosely used synonymously with any form of
restructuring of the schema that removes redundancy, it then makes sense
to see it as denormalization. But, from an academic point of view that
is not strictly correct. Obviously, *defining* flattening as
denormalization is clearly wrong, even under the latter interpretation,
since not all denormalization is flattening.
To close off a small test question for those who really understand the
subject: Introductory database texts sometimes differ in how to get to
1NF: in some cases they tell you to immedeately split off the nested
columns into separate relations (thus avoiding redundancy) and in others
the relation is merely flattened and the splitting off is postponed to
the later normalization steps. So the question is this. Can we always
use both approaches and if we can does it matter for the end result
which one we take?
-- Jan Hidders
> Nothing to do with Oracle?
The company ... not the product. Comprehension includes context.
> I think that anyone who accesses the link above can see how Oracle-centric
> the program is. A curriculum is published.
Of course it is but that is irrelevant. What is relevant is that
Oracle Certification is a program of the Oracle Corporation and has
nothing, zero, nada, nyet, zilch to do with the program at the
University of Washington.
I hadn't expected providing clarification to be a problem. I'll try
for fuzziness in the future.
> Dawn,
>
> Flattening is DEnormalizing. Period. You have it backwards and twisted. It
> could be called Neo-normalizing.
Neo is a bot, right?
> Give me a break.
Sure. I'll just let you and Dan Morgan sort the whole thing out.
What is the "it" you are referring to? I can't tell.
> as denormalization since
> one of the goals of normalization was to remove redundancy, but
> flattening the database may actually introduce redundancy, which sets it
> somewhat apart from the other steps. Since in industry the term
> normalization is often loosely used synonymously with any form of
> restructuring of the schema that removes redundancy, it then makes sense
> to see it as denormalization. But, from an academic point of view that
> is not strictly correct. Obviously, *defining* flattening as
> denormalization is clearly wrong, even under the latter interpretation,
> since not all denormalization is flattening.
I believe your conclusion to be illogical. It is not as simple as A>B,
therefore B< A. While all denormalization is not flattening, it does NOT
follow that all flattening is not denormalization. All flattening of a
relational database is denormalizing. Show me an example that proves
otherwise, and I'll happily have learned something.
I now understand why you understand it the way you do, but flattening is not
normalizing. Flattening a relational database is denormalizing. Not all
denormalizing is flattening, but all flattening is denormalizing- unless
someone can prove this to be wrong.
> I now understand why you understand it the way you do, but flattening is
not
> normalizing. Flattening a relational database is denormalizing. Not all
> denormalizing is flattening, but all flattening is denormalizing- unless
> someone can prove this to be wrong.
I have not wanted to offer a definition of "flattening" because I've never
seen a formal definition of "flat data". I've seen database data contrasted
with "flat files" enough times to have an inkling of what the writers
intended by the term "flat files". And not all writers mean exactly the
same thing. Let me come back to this.
First, a problem with this discussion is that "normalized", "denormalized",
and "flattened" are all being discussed as though they were points (or
directions) along a scale with only one dimension. I don't think so. There
is plenty of data that is neither normalized nor flat, nor anywhere in
between.
When you are faced with complex data rendered in a normalized form, and you
are asked how to flatten it, you might say "denormalize". When you are
asked how to get from New York to San Francisco, you might say "cross the
Hudson". Both are true, but don't really deal with the subject.
Now back to what I think the people who have written about data in "flat
files" meant. Most of the time, the writer was referring to a stream of
records, where all the records were of the same type. Some writers wanted
the records themselves to be "flat records", and some writers meant
different things by that.
By "flat records" some writers meant that no field of a flat record could be
a record. This turns out not to be a very useful distinction.
Other writers meant that a "flat record" could not contain an array of data,
otherwise known as a repeating group. Hence, Dawn's comments.
If there is an actual definition of "flattened", it would be nice to add it
to the glossary. If not, we should just accept that IT argot, like that
of other crafts, has a lot of terminolongy that just came into being without
formal analysis.
A very reasonable discussion/analysis of the points. Thank you.
Yes, you are always welcome!
> Some database texts actually do treat the step from NFNF (or UNF or
> non-1NF or whatever you want to call it) to 1NF as the first step of the
> normalization process which makes it valid to call this step
> "normalization". Note that normalization means in general that you are
> transforming something such that it afterwards adheres to a certain normal
> form, i.e., a form in which it satisfies certain criteria, and that is of
> course exactly what is happening in this step. So the term "normalization"
> is certainly justified and correct.
>
> However, it is also defendable to classify it as denormalization since one
> of the goals of normalization was to remove redundancy, but flattening the
> database may actually introduce redundancy, which sets it somewhat apart
> from the other steps. Since in industry the term normalization is often
> loosely used synonymously with any form of restructuring of the schema
> that removes redundancy, it then makes sense to see it as denormalization.
> But, from an academic point of view that is not strictly correct.
> Obviously, *defining* flattening as denormalization is clearly wrong, even
> under the latter interpretation, since not all denormalization is
> flattening.
Yes -- thanks, you clearly were tracking with me and it looks like Alan now
understands this point too. He wants to say that when you flatten you
ALWAYS are denormalizing, which seems to me to have a semantic problem, but
otherwise I understand that point. If you start with data that are not
normalized, do we say we are denormalizing if we introduce more redundancy?
Or should we reserve the term "denormalize" for times when we start with
data that are at least in 1NF and head to a structure that is not? I'm OK
either way, but it makes a difference in whether I can give a counter
example to Alan's claim or simply agree with it.
> To close off a small test question for those who really understand the
> subject: Introductory database texts sometimes differ in how to get to
> 1NF: in some cases they tell you to immedeately split off the nested
> columns into separate relations (thus avoiding redundancy) and in others
> the relation is merely flattened and the splitting off is postponed to the
> later normalization steps. So the question is this. Can we always use both
> approaches
Let's pretend I meet your criteria for answering this question ;-) then I
would say that yes, either approach may be used and, if done properly, will
yield the desired result.
> and if we can does it matter for the end result which one we take?
It is conceivable to me that if the list in question is implicitly ordered,
then one of these approaches might be more likely to encourage the data
modeler to take care to add in an explicit ordering attribute. My guess
would be that starting by splitting off the nested columns up front might be
more likely to retain ordering information, so I would favor that approach
if you really are trying to end up with data in 1NF.
I would opt for a third approach, however, and drag those lists along for
the normalization ride, treating the list as an entity that, as a whole,
needs to depend on the key, the whole key, and nothing but the key. In
other words, skip the 1NF step for lists that seem like good candidates for
child/nested/embedded tables -- those with strong life-cycle ties to the
parent, for example.
smiles. --dawn
> -- Jan Hidders
Let's reserve "denormalizing" for occasions when we start with data that is
at least 1NF. Otherwise, we are "reorganizing". Actually, what say we not
use "flatten" at all? We are either normalizing (always moving up the
normalization steps), denormalizing (always moving down the steps), or
renormalizing (either a sideways move in normalization or correcting an
error that does not make substantial changes), or reorganizing (working with
NNF data, but not normalizing it).
I thought you might prefer that, so we then can come up with the
counter-example you requested showing that flattening data is not always
"denormalizing", right? We can be flattening the data by moving from one
unnormalized structure to another.
> Otherwise, we are "reorganizing". Actually, what say we not
> use "flatten" at all?
You need not use it, but I will continue to for this simple reason -- it is
a visual term and even if the theory isn't clear to everyone immediately
regarding whether data are being normalized, denormalized, both or neither,
when used in context, I suspect most would understand that such "flat" data
would fit cleanly into a single worksheet in Excel.
> We are either normalizing (always moving up the
> normalization steps), denormalizing (always moving down the steps), or
> renormalizing (either a sideways move in normalization or correcting an
> error that does not make substantial changes), or reorganizing (working
> with
> NNF data, but not normalizing it).
Those terms work for me and I doubt I'll ever again attempt to explain any
relationshiop between the words "normalize" and "flatten" -- I've learned my
lesson on that ;-) but thanks for tracking it through until you understood
my comments -- much appreciated.
Cheers! --dawn
No, as seen below...
>
> > Otherwise, we are "reorganizing". Actually, what say we not
> > use "flatten" at all?
>
> You need not use it, but I will continue to for this simple reason -- it
is
> a visual term and even if the theory isn't clear to everyone immediately
> regarding whether data are being normalized, denormalized, both or
neither,
> when used in context, I suspect most would understand that such "flat"
data
> would fit cleanly into a single worksheet in Excel.
As long as flattening is used in context to mean denormalizing, otherwise,
you are reorganizing :) But, anyway, -generally speaking, I agree- I was
really talking about using these terms in this NG for clarity, not in the
real world. In the real world, my motto is "Whatever works." If someone
grasps the concept of what is going to happen if I say "flatten", or,
"dilute it", or "turn it into water ice" that's just peachy keen.
[snip]
>I now understand why you understand it the way you do, but flattening is not
>normalizing. Flattening a relational database is denormalizing. Not all
>denormalizing is flattening, but all flattening is denormalizing- unless
>someone can prove this to be wrong.
http://www.jargon.8hz.com/html/F/flatten.html says:
[common] To remove structural information, esp. to filter
something with an implicit tree structure into a simple sequence of
leaves; also tends to imply mapping to flat-ASCII. “This code flattens
an expression with parentheses into an equivalent canonical form.”
By this definition, it appears that converting XML to
comma-delimited would be flattening. XML does not, to my knowledge,
use the term "normal form", so this flattening would not be
denormalising.
Sincerely,
Gene Wirchenko
A word of advice: Never argue with a fool. People may not notice the
difference.
For months now, this group is filled with uneducated, ego-puffed idiots
like the ignoramus you are responding to.
Search google with "Daniel Morgan"+Cobb. It is hilarious, but pathetic
that someone who doesn't even know the inventor of the most successful
and theory oriented data model "contributes" to database theory. You
should see how he "guides and directs" newbies to learn normalization
in oracle newsgroups though :)
Some other trolls include Dawn, Neo etc who have nothing meaningful in
their lives, with only an axe to grind. Gone are the days when people
with a brain contributed to this group. Paul, Bob, Jan, Vadim, Mikito
etc. are all quiet, not sure why.
Dave
Thanks. I'll keep that in mind.
Read your first paragraph. <EG> That might be it.
Sincerely,
Gene Wirchenko
Thanks for the constructive advice.
- Dan
It seems the answer to this has to take into account several other
factors, such as whether there exists variability in the cardinality
nested attributes in relation to their parent, and on whether we are
examining this problem in the context of pure relations where NULLs are
not addressed, or from the SQL/relational database perspective.
In a nested customer to orders relationship, "flattening" before
splitting off would require some guarantee that the maximum cardinality
could be accomodated in a single row/tuple. Thus, a customer with only
one order associated with it would require the same flattened structure
as one with 1000 orders. This might lead to the disadvantage of a
large amount empty/NULL attributes, leading to some complexity in
decoding or further splitting off the data in a later step (actually,
I'll come back to this later step). Furthermore, who is to say for
sure in many cases what the maximum cardinality can or will be? So
there are some definite possible disadvantages to this approach, at
least in certain cases, variability in cardinality being one.
The second issue that comes to mind is that I can't recall the
normalization formalism (NF1-3 and BCNF) where this flattened approach
is reconciled in the normalization process, except for in the cases
where the nested properties are involved in further functional
dependenies (i.e. orders functionally determine other attrributes).
Aside from that case, once we have a 1NF structure such as (CUSTOMER
ID, ORDER1 ID, ORDER2 ID, ...... , ORDER1000 ID), there is nothing in
the normalization process, nor in the definition of a relation, that
absolutely requires splitting off orders away from customers.
I've seen authors simply say that 1NF also means that flattened
sequences of nested attributes should be avoided, but such statements
are often informally worded and seem more in line with a best practice
type of recommendation.
So, in conclusion, it seems to me that both approaches can be taken
when taking formal principles into account; however, there are cases
where the tuple structure and definition would end up being less
desireable when flattening is done as a separate process from
splitting.
Regards,
Dan
Geez..harsh, but to the point! Bob's advice sometime back about maintaining
a twit filter w/ huge killfile list was helpful.
Btw, how did the seminar go? Ping me offline.
--
Anith
I'm not a troll, I have meaning in my life outside of this group, I have no
axe to grind, and I think I have made some good contributions to cdt over
the past year or so. But perhaps I should take the advice not to argue with
a fool, eh?
Thankfully, those I admire on the list are not the ones making such
statements (are any of you guys listening in willing to correct the above
misconception? If not, I can find a sandbox where I'm more welcome). So,
Anith & Dave, I'm sorry you feel that way and I'll have to search the
archives at some point to see what contributions you have made previously to
cdt to encourage significant discourse here, compared to what you have
contributed to this thread which was, quite frankly, hurtful.
--dawn
>>>> Some other trolls include Dawn ....
[snip]
I'm afraid I don't agree.
>>
[snip]
>
> I'm not a troll, I have meaning in my life outside of this group, I have
> no axe to grind, and I think I have made some good contributions to cdt
> over the past year or so.
[snip]
Yes, you have.
Not that it means anything, but I have come to respect and appreciate your
contributions.
Over time, I haven't always or even partially been in total agreement, but I
have been won over and believe you participate here with the best of
intentions. You are actually much more intellectual than I would have ever
originally thought and you've introduced perspectives that challenge and
make us (me at least) readdress our own (or mine at least) foundations of
belief.
At first I was circumspect in accepting everything at face value. I
apologize for that.
Regards,
- Dan
I don't see you as a troll, even though I'm a definite relational fan. I
think it's good to have somone who can challenge fundamental assumptions
and force us to carefully think about our beliefs.
Ultimately there's no right or wrong way to model business data - I
think that doing it as first-order predicates, relational-style, is the
most elegant and productive way, but others may disagree, and that's
fine. If we all agreed on everything 100% this newsgroup would be pretty
pointless!
As for Neo, maybe he has something there, maybe not, but I think
basically there's a communication barrier where his complicated examples
leave everyone without a clue what he's going on about.
Paul.
> Ultimately there's no right or wrong way to model business data
>
> Paul.
There may well be more than one "right" one but there are certainly
a large number of models that are dead wrong. Anyone that has been
in this business for awhile has seen that a very many times.
Well it depends what you mean by "wrong" I guess :)
I could store my business data and rules written in longhand in a large
notebook in chronological order of when I thought of it.
That's not wrong in the sense that it would give me false information,
it's just not very useful for querying and reporting.
You could have business rules enforced in the client applications rather
than centrally in the DBMS.
That's not wrong in the sense that they won't be enforced, it's just a
logistical nightmare making sure all the client apps are synchronised.
If, however, I stored all my business rules as the number "7", that
would be absolutely wrong, because I can't distinguish between them.
So when people slag off XML or multi-valued databases, they aren't
saying they are Wrong in some absolute sense, just that the practical
ramifications of such a model can be less than desirable in some ways.
Paul.
Dawn, it was a blunt remark & I apologize for assuming a concurrence without
reading most of what you have written nearly over the last year.
--
Anith
Below is a relatively simple example (except for the query at end) that
models the following in a normalized manner: John is a person and
engineer. John is male. Mary is a person, doctor and dentist. Mary is
female. Franny is a performer whose classification and gender are
unknown. Franny likes John. John likes Mary. Note that each thing can
have different number of classifications, attributes and values.
// Create items in directory to classify things.
(CREATE *person.item ~in = dir)
(CREATE *engineer.item ~in = dir)
(CREATE *doctor.item ~in = dir)
(CREATE *dentist.item ~in = dir)
(CREATE *performer.item ~in = dir)
// Create abbreviations
(CREATE *abbr.item ~in = dir)
(CREATE *M.cls = abbr)
(CREATE *F.cls = abbr)
(CREATE *fem.cls = abbr)
// Create male and female genders with abbreviation codes.
// Note that female has two abbreviations: F and Fem.
(CREATE *gender.item ~in = dir)
(CREATE *male.cls = gender & it.abbr = +M)
(CREATE *female.cls = gender & it.abbr = F & it.abbr = fem)
// Create a person named John who is an engineer.
(CREATE *.name = +John
& it.gender = male
& it.cls = person
& it.cls = engineer)
// Create a person named Mary who is a doctor and dentist.
(CREATE *.name = +Mary
& it.gender = female
& it.cls = person
& it.cls = doctor
& it.cls = dentist)
// Create a performer named Franny, gender unknown.
(CREATE *.name = +Franny
& it.cls = performer)
// Create the verb 'like' to relate things.
(CREATE *like.cls = verb)
// Create relationships
(CREATE Franny.like = John)
(CREATE John.like = Mary)
// Find something that likes a engineer
// who likes a dentist
// where the dentist's gender's is abbreviated as fem.
// Finds Franny.
(SELECT *.like=(*.cls=engineer
& *.like=(*.cls=dentist & *.gender=(*.abbr=fem))))
For the sake of having an argument we could continue this ad infinitum.
But in the end the simple fact is that bad modeling increases the cost
of business and can, in some cases, lead to bankruptcy as has been
proven multiple times. You and your longhand may allow you to succeed
in Botswana (no offense intended to those living there) but it will
put you in Chapter 7 in the US in short order. Wrong is wrong. And the
history books are fool of examples (misspelling intentional).
[snip a whole lot of garbage]
So, in conclusion, ..... [snip]
I was wrong, wrong, wrong... :-( Please disregard for those that already
haven't.
- Dan
> And further the mistake was yours in not properly reading the thread
> and realizing that I had posted the correct answer. So get off your
> treadmill and just acknowledge that you made a very small mistake.
Actually, YOU made the "small mistake". Actually, TWO small mistakes.
First, you thought Alan was disagreeing with you when he was actually
supporting your position, and now you seem to think that it was Dan,
not Alan, who wrote what you misunderstood.
Here's the sequence of events, in order:
Fred Z asked:
> Can someone define "flatten database" for me, in 100 words or less
DA Morgan answered:
> Denormalize.
Dawn countered:
> I would guess "normalize" is more likely.
Alan, agreeing with DA Morgan, wrote (to Dawn):
> Flattening is DEnormalizing. Period. You have it backwards and
> twisted.
DA Morgan, misunderstanding, wrote:
> Which part of reading comprehension did you miss? ;-)
>
> My one word response was "Denormalize." You will find up up just
> under the signature "Fred Z."
>
> So I'm not sure what it is you are responding to. And given that I
> teach the subject here at the university I think I have at least
> a basic knowledge thereof.
See? He was responding to Dawn, who in turn was disagreeing with you.
-tih
--
Tom Ivar Helbekkmo, Senior System Administrator, EUnet Norway Hosting
www.eunet.no T +47-22092958 M +47-93013940 F +47-22092901 FWD 484145
> Actually, YOU made the "small mistake".
I see this misunderstanding was cleared up in another thread. Sorry! :-)
Assume a a binary relation with an atomic first column and a nested
second column, and also assume that the first columns is a a candidate
key. In that case unnesting the second column does not introduce redundancy.
-- Jan Hidders
Just for the record, there is actually such a thing as the XML normal
form. The theory on that is by no means complete yet, but developing
quite rapidly. There's the work by Arenas and Libkin, but also by
Millist Vincent.
-- Jan Hidders
"Jan Hidders" <jan.h...@REMOVETHIS.pandora.be> wrote in message
news:apoPd.9882$6Y1.4...@phobos.telenet-ops.be...
>1NF: in some cases they tell you to immedeately split off the nested
>columns into separate relations (thus avoiding redundancy) and in others
>the relation is merely flattened and the splitting off is postponed to
>the later normalization steps. So the question is this. Can we always
>use both approaches and if we can does it matter for the end result
>which one we take?
Which some, here, find quite objectional is that reference to a
table/relation as being in any way, 'flat'.
As for the question, I suspect you had an answer in mind. I don't see
that it would matter as the 'splitting' is to be done on the same
columns on the same table, at whatever point one decides to do so.
Perhaps I might ask a related question. Do you believe that a table
corresponds, but perhaps only weakly, or even inconsistently, to a
"relation" as understood in the set theory of the RM, or do you
believe it is fair to call the database table a "relation", without
much qualification?
I think that's mainly their problem. :-) Most real experts in database
theory are perfectly comfortable with it.
> As for the question, I suspect you had an answer in mind. I don't see
> that it would matter as the 'splitting' is to be done on the same
> columns on the same table, at whatever point one decides to do so.
Really? Note that this is a mathematical theorem about mathematically
defined objects. Can you supply a mathematical proof?
> Perhaps I might ask a related question. Do you believe that a table
> corresponds, but perhaps only weakly, or even inconsistently, to a
> "relation" as understood in the set theory of the RM, or do you
> believe it is fair to call the database table a "relation", without
> much qualification?
The term "table" is vague and can have different meanings that might
depend upon the context in which it is used. Without that one cannot
really answer your question.
-- Jan Hidders
>Mark Johnson wrote:
>> Jan Hidders <jan.h...@REMOVETHIS.pandora.be> wrote:
>> As for the question, I suspect you had an answer in mind. I don't see
>> that it would matter as the 'splitting' is to be done on the same
>> columns on the same table, at whatever point one decides to do so.
>Really? Note that this is a mathematical theorem about mathematically
>defined objects. Can you supply a mathematical proof?
That you had an answer in mind? Yes. I'll try it, and get back to you.
>> Perhaps I might ask a related question. Do you believe that a table
>> corresponds, but perhaps only weakly, or even inconsistently, to a
>> "relation" as understood in the set theory of the RM, or do you
>> believe it is fair to call the database table a "relation", without
>> much qualification?
>The term "table" is vague and can have different meanings that might
>depend upon the context in which it is used. Without that one cannot
>really answer your question.
It could mean a number of things. But table as a grid, as Codd spoke
of, 2-D tables, corresponding to relations. But he also spoke of
relations as, relationships, presumeably because the tuples were no
longer ordered by a set of ordered domains. In short, in other words,
you have the logical presentation, regardless of how it is stored, and
then this theoretical notion either of relations or relationships. And
it confuses me as to how closely these correspond. Is that the RM in
that presentation, or something so different as to be something else?
But I'll give it some more thought.
No, of the mathematical theorem.
-- Jan Hidders
> Perhaps I might ask a related question. Do you believe that a table
> corresponds, but perhaps only weakly, or even inconsistently, to a
> "relation" as understood in the set theory of the RM, or do you
> believe it is fair to call the database table a "relation", without
> much qualification?
It seems to me, after years of doing both, that actually designing,
building and populating databases is considerably simpler than talking about
the same thing in c.d.t.
In practice, I had no problem talking about SQL tables and relations as if
they were essentially the same thing. Please, lurkers, DON'T teach me the
difference between a bag and a set. I already know.
In here, people are quite a bit more picky about terminology, perhaps
appropriately so. Sloppy terminology probably results in sloppy theory.
But here's my current response: define a "relational table" as the
representation of a relation in a relational database.
Describe an "SQL table" as an approximation to a relational table. Move on
from there.
>define a "relational table" as the representation of a relation in a relational database.
>Describe an "SQL table" as an approximation to a relational table. Move on
>from there.
That paper of Codd's that I quoted in a couple of messages showed his
desire to even replace the idea of the table/relation correspondence
with that of a table/relationship. If the tuples were not always in
order, as they are in picking elements from successive sets in the
theory, then he perhaps felt uncomfortable still terming a table based
on that, a relation. Thus, terminology. Once you move from the theory
to the implementation, everything else can change, as well. And as
things are changed on the working side of it, perhaps the connection
with the theory grows more tenuous. If that poses problems, then that
can be stated. If it both solves and causes problems, so too. Of that,
the use of self-referential tables is pretty much the example of the
moment, and such use of Connect By, or similar.
Were you quoting from the 1970 paper? If so, the paper is clear about the
difference between a relation and a relationship. In a relationship, the
attributes are specified by name rather than by order. And Codd gives a
motivation for doing things this way. It's a burden on the user, Codd says,
to make the user remember the order in which a tuple of degree 30 or so
specifies its values. I agree with that.
> If the tuples were not always in
> order, as they are in picking elements from successive sets in the
> theory, then he perhaps felt uncomfortable still terming a table based
> on that, a relation. Thus, terminology. Once you move from the theory
> to the implementation, everything else can change, as well. And as
> things are changed on the working side of it, perhaps the connection
> with the theory grows more tenuous. If that poses problems, then that
> can be stated.
> If it both solves and causes problems, so too. Of that,
> the use of self-referential tables is pretty much the example of the
> moment,
The use of self referential tables is explicitly not ruled out in the 1970
paper.
and such use of Connect By, or similar.
I don't wish to get involved in the religious wars over the use of "Connect
by". I clearly recognize it as an Oracle specific construct. But I don't
hesitate to use it whenever I think it convenient. The nested set model is
also very clever, and very useful. I don't hesitate to use it when I think
it's convenient.
>"Mark Johnson" <1023...@compuserve.com> wrote in message
>news:hjd1021r1i12pggg3...@4ax.com...
>> "David Cressey" <dcre...@verizon.net> wrote:
>> If it both solves and causes problems, so too. Of that,
>> the use of self-referential tables is pretty much the example of the
>> moment,
>The use of self referential tables is explicitly not ruled out in the 1970
>paper. and such use of Connect By, or similar.
I really don't have a problem with these self-referential tables,
either, myself. But they may present problems of their own, even while
solving problems that the database, inspired by the RM perhaps,
otherwise would make it far more difficult to accomplish without them.
As I said, there's a reason why Oracle included a Connect By, and
related. Many people use it. But I also do try to see the other side.
They might suggest that reconstructing such implicit paths in this way
could prove expensive, time-consuming, prohibitive, just as one
complaint. I'm sure you are aware of them.