Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

A Normalization Question

0 views
Skip to first unread message

VHarris001

unread,
Jun 26, 2004, 12:02:09 PM6/26/04
to
Over the years, I've tried several different relational database schemes for
tracking personal and professional information, and in every instance have been
frustrated with the results.

The biggest problem seems to lie in identifying entities, or 'things.' For
instance, when I set up a corporation as a business entity in a table, and use
other tables for employees, vendors, customers, strategic partners, etc., I
find that some employees are also vendors, customers, and partners. Of course,
this means that I must enter the same person (or 'thing') into multiple tables.

At one point, I thought I could work around this difficulty by 'normalizing'
the data -- for instance, by creating just one table for 'things.' But in
trying to do this, I discovered that I still could not 'normalize' the data
sufficient to eliminate duplicate entries in tables -- to which the only
solution seems to be dramatically increasing the number of tables.

The only way to accomplish this that seems to make sense to me is to create one
table that has a unique instance for each entity and another table that relates
entities. (I note that a entity-relationship approach was was being discussed
in another thread here.).

In this conception, the unique corporation would have a unique identifier
(probably a unique number) in the entity table. Same with the person who is an
employee. Also, in the entity table would be the entities, for example, of
"Delaware Corporation." and 'Human Being.'

In the relationship table, the corporation entity would be related to the
"Delaware Corporation' entity, the employee entity would be related to the
'Human Being' entity, and the corporation entity and the employee entity would
be tied together in the relationship table as type 'employer-employee.'

When the employee also purchases from the employer, the relationship table
would be updated to tie the corporation and the human being as type
'vendor-customer,' etc.

In this way, the entity table contains all entities, and the relationship table
contains all the pointers between entites, and describes the relationships
between those entities.

To relate this to the other thread about things and relations, in this
conception, the names Brown and Browne, and the color brown would all three be
entities in the entity table. If human being Brown was incorrectly tied to the
entity 'Browne' in the relationship table, only the relationship table would be
updated to point to the entity name Brown.

In this way, as was commented on in the other thread, even the letter 'b,' if
helpful, could be set up as an entity or a thing.

I guess this would be normalization in the extreme, except that once the data
is normalized to this degree, it no longer makes sense to keep different data
types in different tables, because the number of tables becomes unwieldy.

Is there any database scheme that is set up in this manner, and that could be
used to eliminate the necessity of entering the same 'entity' in multiple
tables?

V Harris

Alan

unread,
Jun 28, 2004, 8:52:40 AM6/28/04
to
"VHarris001" <vharr...@aol.com> wrote in message
news:20040626120209...@mb-m16.aol.com...

Your normalization problem is really a business definitions problem. If you
were to create an ERD first, you would have no problems normalizing. This is
a standard employee-supervisor problem, where an employee has a supervisor
and a supervisor is, of course, an employee. It is a recursive relationship.
You need one table with multiple FKs (one for each relationship) that refer
to itself.

For example, we'll use Tax ID Number (TIN) as the PK. In the U.S., all TINs
are nine digits ssn and company tin and do not overlap. So your table may
look like this:
THINGS
thing_tin (pk),
name,
address,
etc
vendor_tin (fk),
emp_tin (fk),
supplier_tin( fk),
customer_tin (fk),
partner_tin (fk),
etc

You then join the table to itself, using the proper FK. If (when!) you have
multiple addresses (or other attributes), you then split out all of the
non-fk attributes in the THINGS table, leaving behind a many-to-many linking
table:

THINGS
thing_tin (pk),
vendor_tin (fk),
emp_tin (fk),
supplier_tin( fk),
customer_tin (fk),
partner_tin (fk),
etc (fk)


VENDOR_ATTRIBUTES
vendor_tin (pk/fk),
name,
address,
etc

EMPLOYEE ATTRIBUTES
emp_tin (pk/fk),
name,
address,
etc


ETC_ATTRIBUTES
.
.
.

That should do it.


Alan

unread,
Jun 28, 2004, 8:58:50 AM6/28/04
to

Neo

unread,
Jun 28, 2004, 2:05:27 PM6/28/04
to
> At one point, I thought I could work around this difficulty by 'normalizing'
> the data -- for instance, by creating just one table for 'things.' But in
> trying to do this, I discovered that I still could not 'normalize' the data
> sufficient to eliminate duplicate entries in tables -- to which the only
> solution seems to be dramatically increasing the number of tables.

The relational model is fundamentally limited. Its limitations have
pluses and minus. On the plus side, it can represent/manipulate fairly
structured data (ie tables) efficiently. On the minus side, it is
either impractical or impossible to represent/maniplate/normalize some
scopes of data.

> Is there any database scheme that is set up in this manner, and that could be
> used to eliminate the necessity of entering the same 'entity' in multiple
> tables?

XDb's data model (TDM) is more general than RM. On the plus side, it
can normalize any data. On the minus side, it is less efficient under
a limited scope of data (ie tables).

XDb1, an experimental db, is a limited implementation of TDM and can
normalize to a greater extent than RM over certains scopes of data.

XDb2, under development, is a more complete implementation of TDM and
is able to normalize nearly all scopes of data.

Hugo Kornelis

unread,
Jun 28, 2004, 5:57:44 PM6/28/04
to
Hi Neo,

Good to see that you're back. You haven't forgotten that you still owe me
$1.000, have you?

http://tinyurl.com/yvm5g


Best, Hugo
--

(Remove _NO_ and _SPAM_ to get my e-mail address)

mAsterdam

unread,
Jun 28, 2004, 8:10:42 PM6/28/04
to
Hugo Kornelis wrote:

> Good to see that you're back. You haven't forgotten that you still owe me
> $1.000, have you?
>
> http://tinyurl.com/yvm5g

Why, you don't get it, do you?

You should normalize that into
$ isa sign
1 isa sign
. isa sign
0 isa sign
It don't mean a thing and the order doesn't matter.

Gene Wirchenko

unread,
Jun 28, 2004, 11:26:12 PM6/28/04
to
Hugo Kornelis <hugo@pe_NO_rFact.in_SPAM_fo> wrote:

>Hi Neo,
>
>Good to see that you're back. You haven't forgotten that you still owe me
>$1.000, have you?

Well, I have not.

Hey, Neo, when are you going to pay your debt?

> http://tinyurl.com/yvm5g
>
>
>Best, Hugo

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Neo

unread,
Jun 29, 2004, 1:41:09 PM6/29/04
to
> Good to see that you're back. You haven't forgotten that you still owe me
> $1.000, have you?

RM Sol#1 and #2 fail to meet the challenge because, they contain
redundant schema (schema is data) to store the class hierarchy which
cannot be accessed by the provided report procedure, and stores
redundant data (ie "brown"). Deficiencies in the genericness of the
solutions can be shown by the inability to handle things without an
attribute (ie name) and things with multiple attributes each of
different type, etc.

Below are measurements made thus far using RM solutions that aren't as
generic or normalized as XDb1's.

Small Report Generation Summary (provided by Hugo)
---------------------------------------------------------------------
Solution Time(ms) Platform Notes
-------------- -------- ----------------- --------------------------
RM#1 SqlSrvr2K 14.3 1.3 Ghz PC Unnormalized, non-generic
RM#2 SqlSrvr2K 11.0 1.3 Ghz PC Unnormalized, non-generic
XDb1 4.4.7 16 1.3 Ghz PC Debug ver, norm and gener


Small Report Generation Summary (provided by Neo)
---------------------------------------------------------------------
Solution Time(ms) Platform Notes
------------- -------- ----------------- --------------------------
RM#1 SqlSrvr7 65.0 500 Mhz Server Unnormalized, non-generic
RM#2 SqlSrvr7 68.9 500 Mhz Server Unnormalized, non-generic
XDb1 4.5.7 1.632 500 Mhz Server Normalized, generic
XDb1 4.5.9 6.561 233 MHz PocketPC Normalized, generic


Large Report (28,940 rows) Generation Summary (provided by Neo)
200 Goat Hierarchy (5 generations x 40 goats/generation,
each goat having two parents, except 1st gen).
---------------------------------------------------------------------
Solution Time(sec) Platform Notes
------------- -------- ----------------- --------------------------
RM#5 SqlSrvr7 40.5 500 Mhz Server Unnormalized, non-generic
XDb1 4.5.7 2.9 500 Mhz Server Normalized, generic
XDb1 4.5.9 16.971 233 Mhz PocketPC Normalized, generic


Larger Report (276,620 rows) Generation Summary (provided by Neo).
400 Goat Hierarchy (10 gen x 40 goats/gen),
---------------------------------------------------------------------
Solution Time(ms) Platform Notes
------------- -------- ----------------- --------------------------
RM#5 SqlSrvr7 105 min 500 Mhz Srvr, NT Avg of 2 runs, UnNrm,UnGen
XDb1 4.5.10 44 min 500 Mhz Srvr, NT Avg of 2 runs, Norm, gener
XDb1 4.5.10 57 min 450 Mhz PC, 98 1 run, Normalized, generic
XDb1 4.5.10 195 min 233 Mhz PocketPC 1 run, Normalized, generic

Note: SQL Server scripts are available from website.

Neo

unread,
Jun 29, 2004, 1:45:07 PM6/29/04
to
> Hey, Neo, when are you going to pay your debt?

As soon as you are able to generate the report from
normalized/NULL-less data using RM Sol#1 and #2, which fail to meet
the challenge because they contain redundant schema (schema is data)

Tony

unread,
Jun 30, 2004, 5:57:47 AM6/30/04
to
neo5...@hotmail.com (Neo) wrote in message news:<4b45d3ad.04062...@posting.google.com>...

Neo, why do you cut and paste the same answer over and over again? It
gets boring. But it least there can't be many people around here by
now who don't know you are a charlatan who does not honour his
promises, so I guess you are doing us all a service of sorts...

Stefan Oedenkoven

unread,
Jun 30, 2004, 9:15:31 AM6/30/04
to
Hi,
just one example how to normalize your PersonEntitytypes: employees and
customers. You can add easily strategic partners and so on.

CREATE TABLE Person
(PersonID INTEGER PRIMARY KEY
Name VARCHAR(20)
Address VARCHAR(30)
etc.
)

CREATE TABLE EmployeePerson
(fPersonID INTEGER REFERENCES Person ON DELETE CASCADE
fDepartment REFERENCES Department ON DELETE SET NULL
salery INTEGER
employedSince DATE
PRIMARY KEY (fPerson, fDepartment)
etc...
)

CREATE TABLE CustomerPerson
(fPersonID INTEGER REFERENCES Person ON DELETE CASCADE
fCustomer REFERENCES Customers ON DELETE CASCADE
Phone VARCHAR(40)
email VARCHAR(40)
PRIMARY KEY (fPerson, fCustomer)
etc...
)

So you will avoid duplicate entries. In this case a Person can be employee
and Customer, even an employee in different Departments - if needed. To
let a person only be member of one department just delete fDepartment from
the Primary Key.

regards,
Stefan

"VHarris001" <vharr...@aol.com> schrieb im Newsbeitrag
news:20040626120209...@mb-m16.aol.com...

Neo

unread,
Jun 30, 2004, 1:35:29 PM6/30/04
to
> Neo, why do you cut and paste the same answer over and over again?

Because persons make the same type of comments or ask the same
question over and over again and the answer is still the same.

> It gets boring.

If persons are't bored by making the same type of comments or asking
the same question over and over again, then perhaps they aren't bored
by getting the same answer over and over again.

> But at least there can't be many people around here by
> now who don't know you are a charlatan...

Thanks for your comment. Perhaps, but why not convince the last
remaining few people that Neo is a charlatan, by using RM Sol#1 or #2
to create the hierarchal reports for the following without NULLs or
redundancy:

Case1: God is the parent of an unnamed person. God is also the parent
of second person with three names (string 'john', integer 100, decimal
3.14).

Case2: john isa person. john's color is brown. mary isa person. mary's
color is brown. brown is a person.

Please, please post your solution/report in original thread.

Neo

unread,
Jun 30, 2004, 1:59:42 PM6/30/04
to
> THINGS
> thing_tin (pk),
> name,
> vendor_tin (fk),
> etc (fk)
>
> VENDOR_ATTRIBUTES
> vendor_tin (pk/fk), ...
>
> ETC_ATTRIBUTES
> ...

If one wanted to store two persons named brown whose color was also
brown, how could one use the above schema and avoid redundancy?

Alan

unread,
Jun 30, 2004, 2:57:46 PM6/30/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04063...@posting.google.com...

That's like asking how one would store information about airline flights in
the above scehema. You wouldn't. Anyway...

"Brown" is an instance of an attribute. Actually, two different attributes:

Attribute 1 = last_name
Attribute 2 = color of person

You could normalize to a higher NF to prevent storing the value of either
attribute more than once. This is done is real estate databases to save
space:

STREETS
street_id
street_name

HOUSES
house_number
street_id
etc

The idea here is that the street_id will _usually_ take less space on disk
than the street name, so it is not necessary to record the street name for
each house- just the street id.


Laconic2

unread,
Jun 30, 2004, 3:45:10 PM6/30/04
to
Alan,

The real problem here is that Neo has conflated the concepts of duplication
and redundancy.

Alan

unread,
Jun 30, 2004, 9:06:13 PM6/30/04
to
"Laconic2" <laco...@comcast.net> wrote in message
news:x56dnYk38c9...@comcast.com...

I know, but I felt some degree of debunking was needed, even if not
complete.
>


Neo

unread,
Jul 1, 2004, 2:53:04 PM7/1/04
to
> The real problem here is that Neo has conflated the concepts of duplication
> and redundancy.

With respect to dbs, normalization is the process of eliminating or
replacing duplicate things with a reference to the original thing
being represented. Within the context of a db, duplicate references
are not considered redundant because among other reasons:

1) they are unrelated to the thing being represented.
2) they are implementation specific.
3) they typically aren't/shouldn't be exposed to the db-user.
4) they are automatically maintained by the db.

In the most general sense, normalization is the process of removing
redundancy. If you need to hear this from an authority in order to
accept this common sense idea, see C.J. Date's "An Intro to Database
Systems", 6th Ed, Chapter 10, Further Normalization I: 1NF, 2NF, 3NF,
BCNF, pg 280 where he begins with "what is wrong with this design:
redundancy", "redundancy leads to several problems", "so perhaps a
good design principle is 'one fact in one place' (ie. avoid
redundancy). The subject of further normalization is essentially just
a formalization of simple ideas like this". In the summary of the same
chapter, he restates "normalization ... the purpose of such reduction
is to avoid redundancy".

In addition, see pg 291: "By now the reader might well be wondering
whether there is any end to this progression and whether there might
be a 6NF, a 7NF, and so on ad infintum. Although this is a good
question to ask, we are obviously not in a position to give it any
detailed consideration as yet. We content ourselves with the rather
equivocal statement that there are indeed additional normal forms not
shown in Fig 10.2, but that 5NF is actually the 'final' form in a
special (but important) sense."

Because RM is a limited data model and because some forms or
normalization are either impossible or impractical, some RM users have
a limited understanding of duplication, redundancy and normalization
preventing them from recognizing it in a db containing the word
"brown" serveral times in several tables.

Neo

unread,
Jul 1, 2004, 3:17:39 PM7/1/04
to
> I know, but I felt some degree of debunking was needed, even if not
> complete.

How does your updated schema avoid redundancy of the word "brown" if a
person named "brown" whose color is "brown" lives on a street named
"brown". Also if one considered the street named "brown" and the word
"brown" to be things, how would your schema handle this with respect
to an entry in table THINGS?

Alan

unread,
Jul 1, 2004, 3:27:13 PM7/1/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...

> > The real problem here is that Neo has conflated the concepts of
duplication
> > and redundancy.
>
> With respect to dbs, normalization is the process of eliminating or
> replacing duplicate things with a reference to the original thing
> being represented. Within the context of a db, duplicate references
> are not considered redundant because among other reasons:
>
> 1) they are unrelated to the thing being represented.
> 2) they are implementation specific.
> 3) they typically aren't/shouldn't be exposed to the db-user.
> 4) they are automatically maintained by the db.
>
>

Neo,

I get the feeling I'm wasting my time, but... You do not understand
redundancy in respect to relational theory. Redundancy (and
Normalization!!!) in the RD world is based exclusively on Functional
Dependencies. It has nothing at all to do with items 2, 3, or 4 in your
list. An RDBMS is "merely" an implementation mechanism for the finished
relational schema, which is based on normalization, which is based on
functional dependencies. Sorry if I was being redundant. You can report me
to the...

Here is a reference for _you_: "Fundamentals of Database Systems", Elmasri &
Navathe, Third Ed. pages 476-495. They start with functional dependencies
and explain to you how (and when and why) to get all the way to 5NF and all
points in between. In only 20 pages!


Alan

unread,
Jul 1, 2004, 3:37:27 PM7/1/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...

There would never be a relation called "THINGS" containing the attributes in
the way you infer. This is your first and most important error. "THINGS" may
be a superset of attributes. You need to understand functional dependencies.
In normalization, one does not consider the values of the attributes, only
the semantics of the relationships among them. The fact that a name, street,
color, and anything other attribute has the value of "Brown" has absolutely
zero, nada, zip, zilch to do with normalization or redundancy. Nothing at
all. Not even on the radar. All it ever does is lead to the kind of confused
state you are in.

See my response in another thread.


Jan Hidders

unread,
Jul 1, 2004, 4:29:00 PM7/1/04
to
Neo wrote:
>>The real problem here is that Neo has conflated the concepts of duplication
>>and redundancy.
>
>
> With respect to dbs, normalization is the process of eliminating or
> replacing duplicate things with a reference to the original thing
> being represented.

Not things, *facts*! Normalization is about preventing that a certain
fact is being represented in more than one way. That's why having
several copies of the string "Brown" is not redundancy.

-- Jan Hidders

ben brugman

unread,
Jul 2, 2004, 6:50:27 AM7/2/04
to
Just a suggestion.

For people in different roles on could model an entity person,
and have 'sepparate' entities for the subtypes of a person.

If persons and corporations can have the same role, for
example customer, you could create an entity stakeholder.
The stakeholder can then be a person or a business (corperation).

The entity stakeholder has for example the attributes adres and phone.
The entity person does have the attribute date of birth.
The entity business does have again different attributes.

A customer is a stakeholder. (Can be a person or a business).
A employee is a person and the person again a stakeholder.

Offcourse the model has to be usefull for the 'roles' you
recognise and the functions the information system has
to 'offer'. So a lot of the decisions have to be based on that.

Over generalisation and using the same entity (implemented
as a single table) for all entities often does not help the
building of software because of the enormous amount of
possibilities. Limiting the amount of possibilities in the
businessmodel/datamodel/function model often helps to build
the information system.

Good luck,
ben brugman


"VHarris001" <vharr...@aol.com> wrote in message
news:20040626120209...@mb-m16.aol.com...

x

unread,
Jul 2, 2004, 8:33:41 AM7/2/04
to
**** Post for FREE via your newsreader at post.usenet.com ****


"Jan Hidders" <jan.h...@REMOVETHIS.pandora.be> wrote in message
news:gu_Ec.171583$073.8...@phobos.telenet-ops.be...

> > With respect to dbs, normalization is the process of eliminating or
> > replacing duplicate things with a reference to the original thing
> > being represented.

All *things* in a database are references. :-)

> Not things, *facts*! Normalization is about preventing that a certain
> fact is being represented in more than one way. That's why having
> several copies of the string "Brown" is not redundancy.

But *facts* are *things* ! :-)

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
*** Usenet.com - The #1 Usenet Newsgroup Service on The Planet! ***
http://www.usenet.com
Unlimited Download - 19 Seperate Servers - 90,000 groups - Uncensored
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

x

unread,
Jul 2, 2004, 8:57:52 AM7/2/04
to
**** Post for FREE via your newsreader at post.usenet.com ****


"Alan" <al...@erols.com> wrote in message
news:2kj6rkF...@uni-berlin.de...


> I get the feeling I'm wasting my time, but...

What is normal for one, isn't normal for everyone. :-)

> You do not understand
> redundancy in respect to relational theory.

It is hard to understand it.
There are many peoples who don't understand it :-)

> Redundancy (and
> Normalization!!!) in the RD world is based *exclusively* on Functional
> Dependencies.

Really ?

> Sorry if I was being redundant. You can report me to the...

Aren't we all ? :-)

x

unread,
Jul 2, 2004, 9:22:43 AM7/2/04
to
**** Post for FREE via your newsreader at post.usenet.com ****


"Alan" <al...@erols.com> wrote in message

news:2kggoeF...@uni-berlin.de...


> You could normalize to a higher NF to prevent storing the value of either
> attribute more than once. This is done is real estate databases to save
> space:

> STREETS
> street_id
> street_name

> HOUSES
> house_number
> street_id
> etc

> The idea here is that the street_id will _usually_ take less space on disk
> than the street name, so it is not necessary to record the street name for
> each house- just the street id.

No. The idea here is that you may need to know if a given street exists and
what houses are on it.

VHarris001

unread,
Jul 2, 2004, 10:11:31 AM7/2/04
to
>Subject: Re: A Normalization Question
>From: "Alan" not...@uhuh.rcn.com
>Date: 2004-06-28 8:52 AM Eastern Daylight Time
>Message-id: <swUDc.10129$Av3....@nwrdny01.gnilink.net>

>
>"VHarris001" <vharr...@aol.com> wrote in message
>news:20040626120209...@mb-m16.aol.com...
>> Over the years, I've tried several different relational database schemes
>for
>> tracking personal and professional information, and in every instance have
>been
>> frustrated with the results.
>>
>> The biggest problem seems to lie in identifying entities, or 'things.'
>For
>> instance, when I set up a corporation as a business entity in a table, and
>use
>> other tables for employees, vendors, customers, strategic partners, etc.,
>I
>> find that some employees are also vendors, customers, and partners. Of
>course,
>> this means that I must enter the same person (or 'thing') into multiple
>tables.
>>
(snip)

I should have made more clear that I didn't intend for this to be just a
business database, but rather a comprehensive database for both business and
personal use.

So, can the example you cite above accomodate both the business and personal
relationships of, say, a closely held (family) corporation?

That is, it would track regular employees, but could it also track some of the
employees as husband and wife, children, parents, grandparents and other
assorted relatives, who are both employees and non-employees?

For instance, a business database should not include certain personal
information about an employee, while a private database should include personal
information about relatives. If a relative is also an employee, must there be
another table to contain the personal data?

We probably wouldn't want to have in the above table these fields:

THINGS
married_date (fk),
separation_date (fk),
divorce_date (fk),
asset_division (fk),
custody_arrangement (fk),
support_payment (fk)
etc.

These fields are,of course, unnecessary for most THINGS, but are very necessary
for some THINGS.

To normalize the data, must I crreate another table MARITAL HISTORY?

And how about tracking the extended family? Can we show all these
relationships and attributes on the ERD?

V Harris

Jan Hidders

unread,
Jul 2, 2004, 5:08:17 PM7/2/04
to
x wrote:
> "Jan Hidders" <jan.h...@REMOVETHIS.pandora.be> wrote in message
> news:gu_Ec.171583$073.8...@phobos.telenet-ops.be...
>>Not things, *facts*! Normalization is about preventing that a certain
>>fact is being represented in more than one way. That's why having
>>several copies of the string "Brown" is not redundancy.
>
> But *facts* are *things* ! :-)

Yes, but not all things are facts. :-)

-- Jan Hidders

Alan

unread,
Jul 2, 2004, 6:31:31 PM7/2/04
to

"x" <x-f...@yahoo.com> wrote in message news:40e5...@post.usenet.com...

> **** Post for FREE via your newsreader at post.usenet.com ****
>
>
> "Alan" <al...@erols.com> wrote in message
> news:2kggoeF...@uni-berlin.de...
>
>
> > You could normalize to a higher NF to prevent storing the value of
either
> > attribute more than once. This is done is real estate databases to save
> > space:
>
> > STREETS
> > street_id
> > street_name
>
> > HOUSES
> > house_number
> > street_id
> > etc
>
> > The idea here is that the street_id will _usually_ take less space on
disk
> > than the street name, so it is not necessary to record the street name
for
> > each house- just the street id.
>
> No. The idea here is that you may need to know if a given street exists
and
> what houses are on it.

No. Almost every town as an Elm Street, Washington, etc. It is a rare street
that doesn't already exist, along with every possible house number. I worked
at a major Multiple Listing Service. I know why we did this. In this case,
further normalization increased performance instead of decreasing it, and
saved tons of space in the process.

Neo

unread,
Jul 6, 2004, 1:21:00 PM7/6/04
to
> There would never be a relation called "THINGS" containing the attributes in
> the way you infer.

Sure there would because even attributes are things. Everything is a
thing. It is possible for an attribute to have attributes and thus the
attribute with respect to its attributes becomes a thing. For example,
'brown' can have the attribute language whose value(s) could be
english, spanish, etc.

> In normalization, one does not consider the values of the attributes, only
> the semantics of the relationships among them.

If one chooses to ignore some things (ie 'brown', 'brown', 'brown'),
then one has a limited understanding/implementation of normalization.

> The fact that a name, street, color, and anything other attribute has the
> value of "Brown" has absolutely zero, nada, zip, zilch to do with
> normalization or redundancy. Nothing at all.

If the string 'brown' has the attribute language, it would be stored
three times, each having the same value. Changing the third 'brown's
language (ie to western english), would create an update anomaly thus
proving that things weren't normalized.

Neo

unread,
Jul 6, 2004, 1:43:05 PM7/6/04
to
> > With respect to dbs, normalization is the process of eliminating or
> > replacing duplicate things with a reference to the original thing
> > being represented. Within the context of a db, duplicate references
> > are not considered redundant because they are unrelated to the thing
> > being represented.
>
> Redundancy (and Normalization!!!) in the RD world is based exclusively
> on Functional Dependencies.

RM is a limited data model. That RM's concept of
redundancy/normalization is only based on functional dependencies, may
be the reason why RM's fails to see 'brown', 'brown', 'brown' as
redundant.

> Here is a reference for _you_: "Fundamentals of Database Systems", Elmasri &
> Navathe, Third Ed. pages 476-495. They start with functional dependencies
> and explain to you how (and when and why) to get all the way to 5NF and all
> points in between. In only 20 pages!

Unfortunately 5NF does not cover everthing. A better reference is C.J.
Date "An Intro to Database Systems", 6th Ed, Chapter 10, Further
Normalization I: 1NF, 2NF, 3NF, because he states on pg 291 "By now


the reader might well be wondering whether there is any end to this
progression and whether there might be a 6NF, a 7NF, and so on ad

infintum...We content ourselves with the rather equivocal statement
that there are indeed additional normal forms ..." Normalizing
'brown', 'brown', 'brown' is higher than 5NF.

Neo

unread,
Jul 6, 2004, 1:53:25 PM7/6/04
to
> > With respect to dbs, normalization is the process of eliminating or
> > replacing duplicate things with a reference to the original thing
> > being represented.
>
> Not things, *facts*! Normalization is about preventing that a certain
> fact is being represented in more than one way.

Give me an accurate definition of fact and I will show 'brown' can
also be a fact.

> That's why having several copies of the string "Brown" is not redundancy.

"Having several copies" (ie 'brown', 'brown', 'brown') is redundant.
It requires a limited data model to ignore this fact.

Neo

unread,
Jul 6, 2004, 1:58:44 PM7/6/04
to
> > But *facts* are *things* ! :-)
>
> Yes, but not all things are facts. :-)

This is why normalization of facts is a limited form
compared to normalization of things.

Alan

unread,
Jul 6, 2004, 2:02:19 PM7/6/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...
> > There would never be a relation called "THINGS" containing the
attributes in
> > the way you infer.
>
> Sure there would because even attributes are things. Everything is a
> thing. It is possible for an attribute to have attributes and thus the
> attribute with respect to its attributes becomes a thing. For example,
> 'brown' can have the attribute language whose value(s) could be
> english, spanish, etc.

If the subject of the database was languages, yes, but not in the example
originally stated. While in a universal sense, everything is a thing, the
whole purpose of normalization is to determine which things relate to which
other things. You are overgeneralizing to the nth degree.

>
> > In normalization, one does not consider the values of the attributes,
only
> > the semantics of the relationships among them.
>
> If one chooses to ignore some things (ie 'brown', 'brown', 'brown'),
> then one has a limited understanding/implementation of normalization.

Apparently, that is the problem you are having. Please read SOMETHING about
functional dependencies before you post this nonsense again.


>
> > The fact that a name, street, color, and anything other attribute has
the
> > value of "Brown" has absolutely zero, nada, zip, zilch to do with
> > normalization or redundancy. Nothing at all.
>
> If the string 'brown' has the attribute language, it would be stored
> three times, each having the same value. Changing the third 'brown's
> language (ie to western english), would create an update anomaly thus
> proving that things weren't normalized.

We are talking about relational databases. A string is a data type. Strings
don't have attributes, except when considered as part of the data dictionary
(length, etc.), ior maybe in the very specific case where there is a table
whose subject has to do with strings - but not in the normal business world
that the database is representing. _Entities_ have attributes. "Brown" is
not an entity. It isn't even an attribute. It is a value of an attribute.
The attribute may be different in each entity. It may be COLOR in an
AUTOMOBILE entity, it may be LAST_NAME in an EMPLOYEE entity, it may be
STREET_NAME in an ADDRESS entity. DO NOT consider the values of the data
when normalizing. Again, you are confusing yourself by doing this. Your view
of this entire topic is completely twisted.


Alan

unread,
Jul 6, 2004, 2:19:11 PM7/6/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...
> > > With respect to dbs, normalization is the process of eliminating or
> > > replacing duplicate things with a reference to the original thing
> > > being represented. Within the context of a db, duplicate references
> > > are not considered redundant because they are unrelated to the thing
> > > being represented.
> >
> > Redundancy (and Normalization!!!) in the RD world is based exclusively
> > on Functional Dependencies.
>
> RM is a limited data model. That RM's concept of
> redundancy/normalization is only based on functional dependencies, may
> be the reason why RM's fails to see 'brown', 'brown', 'brown' as
> redundant.

Yes, it has limitations, but normalizing data is not one of them. The RM
defines normalization. You just refuse to accept normalization as it is
because you will owe someone $1000.

>
> > Here is a reference for _you_: "Fundamentals of Database Systems",
Elmasri &
> > Navathe, Third Ed. pages 476-495. They start with functional
dependencies
> > and explain to you how (and when and why) to get all the way to 5NF and
all
> > points in between. In only 20 pages!
>
> Unfortunately 5NF does not cover everthing.

I never said it did.

A better reference is C.J.
> Date "An Intro to Database Systems", 6th Ed, Chapter 10, Further
> Normalization I: 1NF, 2NF, 3NF, because he states on pg 291 "By now
> the reader might well be wondering whether there is any end to this
> progression and whether there might be a 6NF, a 7NF, and so on ad
> infintum...We content ourselves with the rather equivocal statement
> that there are indeed additional normal forms ..."

How does that make it better? It's a contention, not a proven fact. Anyway,
how would you know it is better since you never read Navathe?

>Normalizing
> 'brown', 'brown', 'brown' is higher than 5NF.

Even if it is, that doesn't make it better- it just makes it different.


Jan Hidders

unread,
Jul 6, 2004, 3:25:47 PM7/6/04
to
Neo wrote:
>>>With respect to dbs, normalization is the process of eliminating or
>>>replacing duplicate things with a reference to the original thing
>>>being represented.
>>
>>Not things, *facts*! Normalization is about preventing that a certain
>>fact is being represented in more than one way.
>
> Give me an accurate definition of fact and I will show 'brown' can
> also be a fact.

You have to show that in the tuple ("brown", "brown", "brown") the
string "brown" each time represents the same fact.

>>That's why having several copies of the string "Brown" is not redundancy.
>
> "Having several copies" (ie 'brown', 'brown', 'brown') is redundant.

I'm afraid that repeating the claim does not make it more true.

> It requires a limited data model to ignore this fact.

It's true in any data model we know. It's true in the flat relational
model, it's true in the nested relational model, it's true in the format
data model, it's true in object-oriented data models, it's true in
object-relational data models, it's true in ER data models, in
object-role data models, in the functional data model, in the XML data
model, in the semistructured data model, and it's also true in the MV
data model (to the extent that this is actually a well-defined notion).

-- Jan Hidders

Larry Coon

unread,
Jul 6, 2004, 5:39:59 PM7/6/04
to
Neo wrote:

> "Having several copies" (ie 'brown', 'brown', 'brown') is redundant.

Not necessarily:

create table house_color (
customer_id numeric(5) not null,
exterior_color varchar(10) not null,
interior_color varchar(10) not null,
trim_color varchar(10) not null,
primary key (customer_id)
)

Here a tuple that contained ('brown', 'brown', 'brown')
would be representing three independent, distinct facts,
therefore it would not be redundant. Jan was correct
when he said, "not things, facts." There is one -thing-
three times, but that simply doesn't matter.


Larry Coon
University of California

Neo

unread,
Jul 6, 2004, 9:39:00 PM7/6/04
to
> ...the whole purpose of normalization is to determine which things
> relate to which other things.

This is an incorrect understanding of normalization.

> You are overgeneralizing to the nth degree.

No, you/RM are undergeneralizing to the 5th degree.

> In normalization, one does not consider the values of the attributes...

The general form of normalization applies to all data models. Because
RM is a limited data model, it is either impossible or impractical to
normalize some types of data (ie values of attributes) while other
data models (ie TDM) can.

> > If one chooses to ignore some things (ie 'brown', 'brown', 'brown'),
> > then one has a limited understanding/implementation of normalization.
>

> Please read SOMETHING about functional dependencies
> before you post this nonsense again.

RM incorrectly senses that 'brown', 'brown', 'brown' are not redunant,
no matter how many times one rereads it, because it is a limited data
model.


> We are talking about relational databases.

I am talking about representing things in general
and RM is only one limited method of doing so.

> Strings don't have attributes except ...

Sure they can as I have demonstrated earlier. It is not upto a data
model to decide which data is "part of the data dictionary" or "in the
normal business world". If it does, the data model is limited.

> _Entities_ have attributes.
> "Brown" is not an entity.

Because 'brown' can have attributes (ie language, typeset, font, size,
color, blinking, etc) it can by your definition be an entity.

> Your view of this entire topic is completely twisted.

No, your/RM's view of this entire topic is completely limited.

Neo

unread,
Jul 6, 2004, 10:49:11 PM7/6/04
to
> Yes, [RM] has limitations, but normalizing data is not one of them.
> The RM defines normalization.

RM defines a limited form of normalization. The general form of
normalization which is the central theme to all xNFs (where x may be
infinite), allows one to identify 'brown', 'brown', 'brown' as being
redundant which XDb1/TDM normalizes.

> You just refuse to accept normalization ...

I accept the general form of normalization that can be applied to all
data models and allows one to recognize that 'brown', 'brown', 'brown'
is redundant. I refuse to accept RM's limited form of normalization as
the general form of normalization which doesn't allow one to recognize
that 'brown', 'brown', 'brown' is redundant.

> > A better reference is C.J. Date's "An Intro to Database Systems"...


>
> How does that make it better?

> Anyways how would you know it is better since you never read Navathe?

While I have not read all 873 pages of Elmasri/Navathe's "Fund of Db
Sys" 2nd Ed that sits several books under my C.J. Date's "Intro to Db
Sys" 6th Ed, I have read enough of it to know that compared to that of
Date's p288-9, their fundamental explanation of normalization on p407
is limited: "Normalization of data can be looked on as a process
during which unsatisfactory relation shemas are decomposed by breaking
up their attributes into smaller relations schemas that possess
desirable properties". C.J. Date's is better because his fundamental
explanation comes closer to the general form of normalization that can
be applied to data in any model, even those that don't have any
relations.

> > Normalizing 'brown', 'brown', 'brown' is higher than 5NF.
>
> Even if it is, that doesn't make it better- it just makes it different.

It makes RM's concept of normalization, limited.

Neo

unread,
Jul 6, 2004, 11:13:46 PM7/6/04
to
> You have to show that in the tuple ("brown", "brown", "brown") the
> string "brown" each time represents the same fact.

The first, second and third string "brown" represent the same fact.
Each string "brown" is composed of the symbols "b", "r", "o", "w", "n"
in successive order. If you give me an accurate definition of fact and
I will show that the string "brown" is/can be a fact.

> > It requires a limited data model to ignore this fact.
>
> It's true in any data model we know.

May be true in models that "we" knows, but "we" don't know all data
models (or at least that of XDb1/TDM).

Neo

unread,
Jul 6, 2004, 11:32:12 PM7/6/04
to
> > "Having several copies" (ie 'brown', 'brown', 'brown') is redundant.
>
> a tuple that contained ('brown', 'brown', 'brown')
> would be representing three independent, distinct facts,
> therefore it would not be redundant.

Although each cell of the tuple represents a different thing (ie
fact), they are all named by the same thing (ie string 'brown'). The
string 'brown' is redundant. Because RM is a limited data model, it is
hard to see this fact. It is more obvious in XDb1/TDM.

Steve Lancour

unread,
Jul 7, 2004, 1:19:29 AM7/7/04
to
Neo wrote:

Neo,

It appears to me the left vertical leg of the letter "r" in "brown" is
the same as the left vertical leg of the letter "n". The curved part of
the "r" looks like the curved part of the "n". Further, the circular
part of the "b" looks suspiciously similar to the rightmost 135 degrees
of the "o". Cannot, then, this example be further "normalized"?

Steve Lancour

Tony

unread,
Jul 7, 2004, 8:07:39 AM7/7/04
to
neo5...@hotmail.com (Neo) wrote in message news:<4b45d3ad.04070...@posting.google.com>...

You are of course confusing logical and physical issues - and your
issue is absurd in any case. It is not LOGICALLY redundant to record
that "Car X is brown" and "Dog Y is brown", because these are two
different facts. You are presumably saying that it is PHYSICALLY
redundant to store the 5 characters of "brown" twice on the disk, and
so you want to physically store the word "brown" once and then point
to it many times.

The relational model is LOGICAL and is not concerned with the physical
issues. If your argument had any merit at all, an RDBMS
implementation could in fact store every data value only once, and
then physically point to it from all other records. For all you know,
maybe that IS what some RDBMSs do (it isn't, but it would be
impossible to tell using SQL).

Perhaps you might like to build an XDb1/TDM physical implementation of
an RDBMS like that. It would have a big index file full of values
like this:

Index# Value
1 "brown"
2 3.1415926535
3 DATE '2004-07-07'
4 "blue"
5 DATE '2004-07-01'
6 "Mary"
...

When the user inserts a row into a table using SQL like this:

INSERT INTO person (name, eye_color, date_of_birth) VALUES
('Mary','green',DATE '1970-01-01');

... your RDBMS can look up the values 'Mary', 'green' and DATE
'1970-01-01' in your humungous VALUES list and either return the
index# if it exists, or create a new entry and return its index# if
not. Then it can store the record using only your index# pointers
like (6,42,10232).

For a SELECT, a similar process will be invoked.

I suggest you go for it: clearly you think this will be a better
RDBMS, and the SQL interface will make it marketable. Also it will
keep you busy so you don't have so much time to post your ridiculous
ramblings here.

Alan

unread,
Jul 7, 2004, 9:25:52 AM7/7/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...
> > Yes, [RM] has limitations, but normalizing data is not one of them.
> > The RM defines normalization.
>
> RM defines a limited form of normalization. The general form of
> normalization which is the central theme to all xNFs (where x may be
> infinite), allows one to identify 'brown', 'brown', 'brown' as being
> redundant which XDb1/TDM normalizes.
>
> > You just refuse to accept normalization ...
>
> I accept the general form of normalization that can be applied to all
> data models and allows one to recognize that 'brown', 'brown', 'brown'
> is redundant. I refuse to accept RM's limited form of normalization as
> the general form of normalization which doesn't allow one to recognize
> that 'brown', 'brown', 'brown' is redundant.\

It's not, it's not, it's not. You can preach this nonsense till the cows
come home, but it will never be true. Several examples have been provided to
you.

>
> > > A better reference is C.J. Date's "An Intro to Database Systems"...
> >
> > How does that make it better?
> > Anyways how would you know it is better since you never read Navathe?
>
> While I have not read all 873 pages of Elmasri/Navathe's "Fund of Db
> Sys" 2nd Ed that sits several books under my C.J. Date's "Intro to Db
> Sys" 6th Ed, I have read enough of it to know that compared to that of
> Date's p288-9, their fundamental explanation of normalization on p407
> is limited: "Normalization of data can be looked on as a process
> during which unsatisfactory relation shemas are decomposed by breaking
> up their attributes into smaller relations schemas that possess
> desirable properties". C.J. Date's is better because his fundamental
> explanation comes closer to the general form of normalization that can
> be applied to data in any model, even those that don't have any
> relations.

That's on old version, maybe 12 years old. They are up to 4th Ed. now.

Anyway, As I stated elswhere, I give up.


Alan

unread,
Jul 7, 2004, 9:25:56 AM7/7/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...

Well, I give up. Several people here have demonstrated time and again and in
several ways that you are not entirely correct. I will no longer waste my
time.


Alan

unread,
Jul 7, 2004, 9:40:07 AM7/7/04
to

"Larry Coon" <lcno...@assist.org> wrote in message
news:40EB1C...@assist.org...

Thank you! You have provided an expert, lucid, perfect, unarguable example.

Here's an idea. To achieve perfect normalization (zero redundancy), there
should be a database that contains one table for each possible attribute in
the universe- that way each value is stored only once. So, there would be a
table called COLOR with every possible color stored in it. Another table
called NUMBERS with Numbers stored in it (no sense storing a number more
than once), and so on. Then, all you need to do is create tables that run
your business with meaningless id codes in them so you can connect all of
your attributes into rows of business data. But wait, these codes are
numbers. No sense storing those again. Lets create an ID column for the ID
column. No, wait... Lather, rinse, repeat, lather, rinse, repeat, lather,
rinse....


x

unread,
Jul 7, 2004, 11:20:53 AM7/7/04
to
**** Post for FREE via your newsreader at post.usenet.com ****


"Alan" <al...@erols.com> wrote in message

news:2l2btjF...@uni-berlin.de...

> "Neo" <neo5...@hotmail.com> wrote in message
> news:4b45d3ad.04070...@posting.google.com...

> > > > A better reference is C.J. Date's "An Intro to Database Systems"...

> > > How does that make it better?
> > > Anyways how would you know it is better since you never read Navathe?

> > While I have not read all 873 pages of Elmasri/Navathe's "Fund of Db
> > Sys" 2nd Ed that sits several books under my C.J. Date's "Intro to Db
> > Sys" 6th Ed, I have read enough of it to know that compared to that of
> > Date's p288-9, their fundamental explanation of normalization on p407
> > is limited: "Normalization of data can be looked on as a process
> > during which unsatisfactory relation shemas are decomposed by breaking
> > up their attributes into smaller relations schemas that possess
> > desirable properties". C.J. Date's is better because his fundamental
> > explanation comes closer to the general form of normalization that can
> > be applied to data in any model, even those that don't have any
> > relations.

> That's on old version, maybe 12 years old. They are up to 4th Ed. now.

Only 12 years ? This means 1992 - still new compared to the relational
model.

x

unread,
Jul 7, 2004, 11:28:08 AM7/7/04
to
**** Post for FREE via your newsreader at post.usenet.com ****


"Alan" <al...@erols.com> wrote in message

news:2l2cohF...@uni-berlin.de...

> Here's an idea. To achieve perfect normalization (zero redundancy), there
> should be a database that contains one table for each possible attribute
in
> the universe- that way each value is stored only once. So, there would be
a
> table called COLOR with every possible color stored in it. Another table
> called NUMBERS with Numbers stored in it (no sense storing a number more
> than once), and so on. Then, all you need to do is create tables that run
> your business with meaningless id codes in them so you can connect all of
> your attributes into rows of business data. But wait, these codes are
> numbers. No sense storing those again. Lets create an ID column for the ID
> column. No, wait... Lather, rinse, repeat, lather, rinse, repeat, lather,
> rinse....

He might do just that (with an SQL DBMS) for saving disk space. :-)

Neo

unread,
Jul 7, 2004, 1:29:23 PM7/7/04
to
> You have to show that in the tuple ("brown", "brown", "brown") the
> string "brown" each time represents the same fact.

A clearification. While the 1st, 2nd and 3rd "brown" in the above
tuple do represent different things (ie fact), the name of each of
those things is the same (ie string "brown") and it is the name of
those things that can be normalized.

Below is approximately how XDb1 normalizes the above tuple.

Thing Person Color Street
1 ->2 ->3 ->4

Person Name
2 ->5

Color Name
3 ->5

Street Name
4 ->5

Name Sym1 Sym2 Sym3 Sym4 Sym5 ....
5 ->6, ->7, ->8, ->9, ->10

Symbol
6 b
7 r
8 o
9 w
10 n

For more info, see www.xdb1.com/GUI/Labels.asp

Neo

unread,
Jul 7, 2004, 2:08:28 PM7/7/04
to
> It appears to me the left vertical leg of the letter "r" in "brown" is
> the same as the left vertical leg of the letter "n". The curved part of
> the "r" looks like the curved part of the "n". Further, the circular
> part of the "b" looks suspiciously similar to the rightmost 135 degrees
> of the "o". Cannot, then, this example be further "normalized"?

You are blurring two different issues.

In the original example, because the string 'brown' (name of three
things) is stored multiple times in the db, it is a candidate for
normalization.

In the original example, because we did not store the graphical data
to render symbols on the screen, they were not a candidate for
normalization. This was left upto the the hardware/system (ie the
graphics card) which may in fact normalize some graphical parts (ie a
pixel, lines, arcs, circles, squares).

If graphical things were being stored in a db, and the db stored the
same two graphical things twice, the second one would be a candidate
for normalization.

Larry Coon

unread,
Jul 7, 2004, 5:23:55 PM7/7/04
to
Tony wrote:

> You are of course confusing logical and physical issues - and your
> issue is absurd in any case. It is not LOGICALLY redundant to record
> that "Car X is brown" and "Dog Y is brown", because these are two
> different facts. You are presumably saying that it is PHYSICALLY
> redundant to store the 5 characters of "brown" twice on the disk, and
> so you want to physically store the word "brown" once and then point
> to it many times.

You couldn't do that -- if more than one thing POINTS
to "brown," then the pointers are stored redundantly.
Infinite regress rears its ugly head.

So in Neo's world, any attribute (eg: "brown") can only
appear once in the entire database. Got a brown car and
a brown dog? Tough -- Neo's dbms can't handle that.

Larry Coon

unread,
Jul 7, 2004, 5:28:37 PM7/7/04
to
Neo wrote:

> Because RM is a limited data model [. . .]

I think your entire argument boils down to a strawman.
You mischaracterize the application of the relational
model, then use that mischaracterization to assert that
the relational model is limited. Hard to be persuasive
with such a fundamental fallacy in your logic.

Larry Coon

unread,
Jul 7, 2004, 5:46:06 PM7/7/04
to
Neo wrote:

> A clearification. While the 1st, 2nd and 3rd "brown" in the above
> tuple do represent different things (ie fact), the name of each of
> those things is the same (ie string "brown") and it is the name of
> those things that can be normalized.
>
> Below is approximately how XDb1 normalizes the above tuple.
>
> Thing Person Color Street
> 1 ->2 ->3 ->4
>
> Person Name
> 2 ->5
>
> Color Name
> 3 ->5
>
> Street Name
> 4 ->5

You'll have to explain why storing "brown" three
times is redundant, while storing ->5 three times
is okay.

Also, you confuse adding useless layers of indirection
with normalization.

Jan Hidders

unread,
Jul 7, 2004, 6:04:15 PM7/7/04
to
Neo wrote:
>>You have to show that in the tuple ("brown", "brown", "brown") the
>>string "brown" each time represents the same fact.
>
> The first, second and third string "brown" represent the same fact.
> Each string "brown" is composed of the symbols "b", "r", "o", "w", "n"
> in successive order.

No, those are three *different* facts:
1. The first string consists of "b", "r", "o", "w", "n" in that order.
2. The second string consists of "b", "r", "o", "w", "n" in that order.
3. The third string consists of "b", "r", "o", "w", "n" in that order.
You have failed to show that they represent the *same* fact.

Note that they do not represent the fact that the letter 'r' is the
second letter in the string "brown" because if that were the case you
could do an update to them that changes this fact. But since this fact
holds by definition, is is an analytical fact to put it in Kantian
terms, such an update is a logical impossibility.

-- Jan Hidders

Jan Hidders

unread,
Jul 7, 2004, 6:24:20 PM7/7/04
to
Neo wrote:
>>You have to show that in the tuple ("brown", "brown", "brown") the
>>string "brown" each time represents the same fact.
>
> A clearification. While the 1st, 2nd and 3rd "brown" in the above
> tuple do represent different things (ie fact), the name of each of
> those things is the same (ie string "brown") and it is the name of
> those things that can be normalized.

Why would you do that? What update anomaly would that solve? Recall that
this is what normalization is about, and not for example finding a more
compact representation.

-- Jan Hidders

Neo

unread,
Jul 7, 2004, 9:09:39 PM7/7/04
to
> Several people here have demonstrated time and again and in
> several ways that you are not entirely correct.

I am entirely correct. The following tuple has redundant data:

Thing Person Color Street
1 brown brown brown

The above tuple says (among other things) that there is a person named
brown, a color named brown and a street named brown. The person, color
and street in the tuple are all different things but their names are
the same thing, the string 'brown'. The above tuple stores that string
'brown' three times and this is redundant. RM blinds you of this fact


because it is a limited data model.

Below is approximately how XDb1 normalizes the above tuple.

Thing Person Color Street
1 ->2 ->3 ->4

Person Name
2 ->5

Color Name
3 ->5

Street Name
4 ->5

Name Sym1 Sym2 Sym3 Sym4 Sym5 ....


5 ->6, ->7, ->8, ->9, ->10

Symbol
6 b
7 r
8 o
9 w
10 n

Note, there are no tables in XDb1, only things. However certain groups
of things can be viewed a list, tree, table, matrix, etc. For more
info, see www.xdb1.com/GUI/Labels.asp

Neo

unread,
Jul 7, 2004, 9:19:18 PM7/7/04
to
> That's on old version, maybe 12 years old. They are up to 4th Ed. now.

The general form of normalization hasn't changed in the last 12 years
and won't change in the next 12 years. Different data models (ie based
on lists, trees, tables, cubes, etc) will each have some unique xNFs,
but all of them will be a subset of the general form of normalization.

Neo

unread,
Jul 7, 2004, 10:02:57 PM7/7/04
to
> It is not LOGICALLY redundant to record that "Car X is brown"
> and "Dog Y is brown", because these are two different facts.

It is true that "Car X is brown" and "Dog Y is brown" are two separate
and different facts thus they can't be normalized. But these two facts
are built upon other facts (which are stored in the db) one of which
is that both the car and the dog are the same color which has the name
brown. In your example, which is slightly different than the others
examples in this thread, the color named brown is redundant and could
be normalized as follows:

"Car X is brown" and "Dog Y is ->brown"
or
"Car X is ->brown" and "Dog Y is ->brown" and "brown isa color"

> The relational model is LOGICAL and is not concerned with the physical
> issues.

The above is meaningless in demonstrating that the color brown within
the two facts is not redundant.

> If your argument had any merit at all, an RDBMS
> implementation could in fact store every data value only once, and
> then physically point to it from all other records. For all you know,
> maybe that IS what some RDBMSs do (it isn't, but it would be
> impossible to tell using SQL).

The above is meaningless in demonstrating that the color brown within
the two facts is not redundant.

Alan

unread,
Jul 7, 2004, 10:15:48 PM7/7/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...

Yes, but they've re-explained it in a way that even might get through to
you.


Alan

unread,
Jul 7, 2004, 10:19:04 PM7/7/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...


<Sigh> Old proverb from somewhere: "Who is the greater fool- the fool or the
man who argues with the fool?" I am through arguing.


Neo

unread,
Jul 7, 2004, 10:56:29 PM7/7/04
to
> You'll have to explain why storing "brown" three times is redundant,
> while storing ->5 three times is okay.

With respect to dbs, normalization is the process of eliminating or


replacing duplicate things with a reference to the original thing

being represented. Within the context of a db, duplicate references
are not considered redundant because they are unrelated to the thing
being represented.

> Also, you confuse adding useless layers of indirection with normalization.

Normalization requires the equivalent of a references. It is you who
is confused about the following tuple containing redundant data.

Thing Person Color Street
1 brown brown brown

The above tuple says (among other things) that there is a person named
brown, a color named brown and a street named brown. The person, color

and street in the tuple are all different things from each other and
also different from their names which are the same thing, the string


'brown'. The above tuple stores that string 'brown' three times and

this is redundant. Below is approximately how XDb1 normalizes the
above tuple.

Thing Person Color Street
1 ->2 ->3 ->4

Person Name
2 ->5

Color Name
3 ->5

Street Name
4 ->5

Name Sym1 Sym2 Sym3 Sym4 Sym5 ....


5 ->6, ->7, ->8, ->9, ->10

Symbol
6 b
7 r
8 o
9 w
10 n

Note, there are no tables in XDb1, only things. However certain groups

Neo

unread,
Jul 7, 2004, 11:11:40 PM7/7/04
to
> You mischaracterize the application of the relational model, then use that
> mischaracterization to assert that the relational model is limited.

RM's application is to represent data. If RM can represent some scopes
of data without redundancy, it is not me that is mischaracterizing RM,
it is RM that is limited.

Marshall Spight

unread,
Jul 7, 2004, 11:56:45 PM7/7/04
to
"Neo" <neo5...@hotmail.com> wrote in message news:4b45d3ad.04070...@posting.google.com...

Neo,

I am curious: does it bother you at all, the steady stream of
database professionals, college professors, etc. who come
through here and try to explain to you that you are misunderstanding
normalization? Do you ever consider the possibility that,
in fact, they might be right and you might be wrong?

It strikes me that there are two possible ways to interpret the
situation. Either you have broken though the previous
ceiling on human understanding, through genius or deep
insight, or else you simply misunderstand what's going on.

Is it that your opinion of yourself is so high, or that your
opinion of everyone else is so low?

Can I ask how old you are?


Marshall


Larry Coon

unread,
Jul 8, 2004, 12:30:37 AM7/8/04
to
Neo wrote:

> RM's application is to represent data. If RM can represent some scopes
> of data without redundancy, it is not me that is mischaracterizing RM,
> it is RM that is limited.

I assume you meant to write "cannot" rather than "can,"
but that notwithstanding, yes, the relational model CAN
and DOES represent data without redundancy. You simply
continue to misunderstand the concept of redundancy.

Paul

unread,
Jul 8, 2004, 4:48:14 AM7/8/04
to
Neo wrote:
>>You'll have to explain why storing "brown" three times is redundant,
>>while storing ->5 three times is okay.
>
> With respect to dbs, normalization is the process of eliminating or
> replacing duplicate things with a reference to the original thing
> being represented. Within the context of a db, duplicate references
> are not considered redundant because they are unrelated to the thing
> being represented.

You could say that the string "brown" is a reference to the actual
colour brown. Or the string "Brown" is a reference to the surname Brown.

This is where the database's knowledge ends and the human mind begins;
the jump from syntax to semantics.

Redundancy in terms of databases is about removing semantic duplicates,
not about removing syntactic duplicates. i.e. about the logical level,
not the physical level.

I think the type of redundancy you're talking about is more lower-level
like information theory, Huffman coding etc.:
http://en.wikipedia.org/wiki/Information_theory

It sounds like you're trying to roll these two concepts into one when
it's more powerful and useful to have them separate.

Paul.

Tony

unread,
Jul 8, 2004, 5:17:22 AM7/8/04
to
> > The relational model is LOGICAL and is not concerned with the physical
> > issues.
>
> The above is meaningless in demonstrating that the color brown within
> the two facts is not redundant.
>
> > If your argument had any merit at all, an RDBMS
> > implementation could in fact store every data value only once, and
> > then physically point to it from all other records. For all you know,
> > maybe that IS what some RDBMSs do (it isn't, but it would be
> > impossible to tell using SQL).
>
> The above is meaningless in demonstrating that the color brown within
> the two facts is not redundant.

For someone who is obsessed with redundancy, you seem quite happy to
repeat yourself. In fact, everything you write is redundant, because
if no one ever read it they wouldn't be missing anything.

Larry Coon

unread,
Jul 8, 2004, 11:41:53 AM7/8/04
to
Neo wrote:

> With respect to dbs, normalization is the process of eliminating or
> replacing duplicate things with a reference to the original thing
> being represented.

Nope. That's just tokenizing data.

If this is normalization, can you tell me what data
anomalies you avoid by doing this? I know Jan already
asked you this, but apparently I missed your answer.

> Within the context of a db, duplicate references
> are not considered redundant because they are unrelated to the thing
> being represented.

Ah, how convenient for you. You get to cling to definitions
that suit you (like the above) while ignoring ones (like the
definitions for redundancy and normalization) which don't.

How about numbers? Do you tokenize those? Given the
following schema:

create table person_pets (
person_id numeric(5) not null,
number_of_dogs int,
number_of_cats int,
number_of_fish int,
primary key (person_id)
)

Given the tuple (12345, 2, 2, 2) what do you do with
those horrific redundant 2's?

Neo

unread,
Jul 8, 2004, 1:40:26 PM7/8/04
to
> does it bother you at all, the steady stream of database professionals,
> college professors, etc. who come through here and try to explain to you
> that you are misunderstanding normalization?

No it doesn't bother me at all, the steady stream of database
professional and college professors, etc who come through here and try
to explain to me that I am misunderstanding normalization because it
is they who have a limited understanding of normalization which is a


subset of the general form of normalization.

> Do you ever consider the possibility that,


> in fact, they might be right and you might be wrong?

Yes, I haved considered it and will continue to consider the
possibility that I might be wrong. But currently, I am confident that
a db that contains the string 'brown' three times has redundant data.

> It strikes me that there are two possible ways to interpret the
> situation. Either you have broken though the previous
> ceiling on human understanding, through genius or deep
> insight, or else you simply misunderstand what's going on.

Yes, those are two possible ways out of many.

> Is it that your opinion of yourself is so high,
> or that your opinion of everyone else is so low?

Normalization (and its many forms within the context of different data
models) is independent of my opinion of myself of that of everyone
else.

> Can I ask how old you are?

Ultimately, the general form of normalization is independent of my
age, skin color or brain size.

Neo

unread,
Jul 8, 2004, 1:46:46 PM7/8/04
to
> I assume you meant to write "cannot" rather than "can,"
> but that notwithstanding, yes, the relational model CAN
> and DOES represent data without redundancy. You simply
> continue to misunderstand the concept of redundancy.

While my understanding of normalization may not be perfect, it is
closer to the general form compared to those who cannot recognize that
a db containing the string 'brown' three times has redundant data.

Neo

unread,
Jul 8, 2004, 1:53:35 PM7/8/04
to
> Yes, but they've re-explained it in a way that even might get through to
> you.

Their explanation that a db containing the string 'brown' three times
isn't redundant, didn't get through to me. They will have to try again
as to why the same thing (string 'brown') three times in a db is not
redundant.

Jan Hidders

unread,
Jul 8, 2004, 2:15:59 PM7/8/04
to

Because there is no update anomaly.

-- Jan Hidders


Neo

unread,
Jul 8, 2004, 2:47:53 PM7/8/04
to
> You could say that the string "brown" is a reference to the actual
> colour brown. Or the string "Brown" is a reference to the surname Brown.

While the first string 'brown' does act as a unsystematic reference to
a person, the second string 'brown' as a unsystematic reference to a
color, and the third string 'brown' as a unsystematic reference to a
street, the issue isn't about how each string acts but about the
string itself being redundant). The same thing (string 'brown') is in
the db three times. It is redundant.

> Redundancy in terms of databases is about removing semantic duplicates,
> not about removing syntactic duplicates. i.e. about the logical level,
> not the physical level.

The above is an example of a limited form of normalization. The
general form of normalization applies to everthing being represented
(stored, not merely implied) within a db. If sytactic is stored
(meaning it has a location within the db), they are also candidates
for normalization.

The role of each string 'brown' as a unsystematic reference is
probably syntactic. The string 'brown' itself is probably semantic.
Regardless, the same thing is in the db three times, it is redundant.

> I think the type of redundancy you're talking about is more lower-level
> like information theory, Huffman coding etc.:
> http://en.wikipedia.org/wiki/Information_theory

The lowest (most general) level will be applicable to all higher (more
specific) levels.

Neo

unread,
Jul 8, 2004, 3:07:11 PM7/8/04
to
> You have failed to show that they represent the *same* fact.

You have *failed* to define what is a fact?

However, regardless of whether the string 'brown' is categorized as a
fact or not by your definition, they still represent the same thing,
the string 'brown'. Having three things that represent the same thing
in one db is redundant. While the role that each string 'brown' plays
is different (first unsystematically refers a person, second to a
color, third to a street) the three strings themselves represent the
same thing, the string 'brown'. This is redundant. A thing and the
string which names it are two separate things. Below is approximately
how XDb1 normalizes the three strings.

Thing Person Color Street
1 ->2 ->3 ->4

Person Name
2 ->5

Color Name
3 ->5

Street Name
4 ->5

String Sym1 Sym2 Sym3 Sym4 Sym5 ....


5 ->6, ->7, ->8, ->9, ->10

Symbol
6 b
7 r
8 o
9 w
10 n

Note, there are no tables in XDb1, only things. However certain groups

of things can be considered a list, tree, table, matrix, etc.

Paul

unread,
Jul 8, 2004, 3:37:01 PM7/8/04
to
Neo wrote:
> The lowest (most general) level will be applicable to all higher (more
> specific) levels.

OK, I can see how you're thinking.

But as an analogy you wouldn't write all your programs in machine code
would you even though it's possible. You'd use assembler which in turn
might be used by C and in turn that might be used to create some
higher-level language.

You wouldn't build a computer by mining all the silicon etc. yourself,
you'd get the large components ready-made. Even the people who make the
printed circuit board would buy the capacitors etc. ready-made from some
one else (or I assume they would).

It's useful to break down tasks into different levels, where each level
hides its implementation details from the level above. Do you agree?

That's what the relational model does: it separates out the logical and
the physical. So that the data analysts can concentrate totally on the
business model and not be worried about the implementation details. And
the DBMS writers can concentrate on the physical implementation without
worrying about what specific data will go in the database.

As I said before:
It sounds like you're trying to roll two concepts of "redundancy" into

one when it's more powerful and useful to have them separate.

Are you suggesting that it's better to *not* distinguish between the two
forms of redundancy?

Check out this famous painting by Magritte:
http://en.wikipedia.org/wiki/Image:MagrittePipe.jpg
and this discussion of it:
http://en.wikipedia.org/wiki/Ren%E9_Magritte

I think it's kind of relevant to the distinction between the word
"brown" itself and the colour that is referred to as "brown".

Paul.

Alan

unread,
Jul 8, 2004, 3:56:03 PM7/8/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...


You will need an infinite number of monkeys to key in all the possible
"things" you will need to store one time. I can't wait till you need to
write a report using the jibberish you've stored.


Gene Wirchenko

unread,
Jul 8, 2004, 3:59:14 PM7/8/04
to
Larry Coon <lcno...@assist.org> wrote:

[snip]

>Ah, how convenient for you. You get to cling to definitions
>that suit you (like the above) while ignoring ones (like the
>definitions for redundancy and normalization) which don't.
>
>How about numbers? Do you tokenize those? Given the
>following schema:
>
>create table person_pets (
> person_id numeric(5) not null,
> number_of_dogs int,
> number_of_cats int,
> number_of_fish int,
> primary key (person_id)
>)
>
>Given the tuple (12345, 2, 2, 2) what do you do with
>those horrific redundant 2's?

I see where you are headed with that. Someone has already tried
the infinite recursion argument to no apparent avail.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Jan Hidders

unread,
Jul 8, 2004, 4:18:43 PM7/8/04
to
Neo wrote:
>>You have failed to show that they represent the *same* fact.
>
> You have *failed* to define what is a fact?

I did that elsewhere.

> However, regardless of whether the string 'brown' is categorized as a
> fact or not by your definition, they still represent the same thing,
> the string 'brown'. Having three things that represent the same thing
> in one db is redundant.

No, it is not. There is only redundancy if one of them can be ignored
without loss of information. You cannot. So there is no redundancy. It's
as simple as that. This is the standard definition of redundancy in
database theory that applies to all data models. If that is not the type
of "redundancy" you would like to discuss then please use another term
and give an exact definition, so we know what you are talking about.

-- Jan Hidders

Neo

unread,
Jul 8, 2004, 4:27:22 PM7/8/04
to

The central issue is not why we normalize, but whether the string
"brown" (not its role as a unsystematic reference) is redundant. As
you are alluding, one way to know if something is redundant is to
change one of its attributes and check for corruption. This is
difficult to see with strings, because strings typically don't change
and if they change we consider them to be a completely different
string.

I realize the following are extraordinary examples, however a general
data model can't (application above it can) have prejudices as to what
data/changes it will accept. Suppose, the world is taken oven by
islamic terrorist. As part of their spoils, they want every word in
every computer to be spelled backwards, thus 'brown' needs to be
update to 'nworb'. Or suppose, the French, take over and want every
string to end in a silent t. In the above tuple, updating one and not
the others, corrupts the db. Below is approximately how XDb1
normalizes the three strings and updating it from 'brown' to 'nworb'
or appending the symbol t does not corrupt the db.

D Guntermann

unread,
Jul 8, 2004, 4:13:44 PM7/8/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...
[snip]

> > Redundancy in terms of databases is about removing semantic duplicates,
> > not about removing syntactic duplicates. i.e. about the logical level,
> > not the physical level.
>
> The above is an example of a limited form of normalization.
The
> general form of normalization applies to everthing being represented
> (stored, not merely implied) within a db.

Neo,

Since you supplant all instances of a specific encoding with references to a
single character value, wouldn't the references, which are pieces of
information themselves, be redundant?

For example

Value Storage Address (16 bits) Value (8 bits)
1 'B'
2 'R'
3 'O'
4 'W'
5 'N'
6 '1'
7 '2'
8 '3'

Fact in Neo's database


Thing Person Color Street
1 brown brown brown

Each attribute would contain redundancy in terms of the references (by your
logic):

Person(1,2,3,4,5)
Color (1,2,3,4,5) <--- oh no! the most generalized form has been broken!
1,2,3,4,5 are repeated.
Street (1,2,3,4,5) <--- oh no! the most generalized form has been broken!
1,2,3,4,5 are repeated.

Not only do you needlessly denormalize information (shame!) but you don't
meet your own criteria for the most generalized normal form. You have
replicated references in this case at least three times, and imagine the
chaos with non-related data that use the same references.

Moreover, the reference size requirements are 16 bits while the character
value itself is probably only 8 bits. Thus, you needlessly design your
system as inefficiently as possible. If information theory was your guide,
you'd be doing the exact opposite of what one of its objectives is --
conveying value of information in the most efficient form possible.

If sytactic is stored
> (meaning it has a location within the db), they are also candidates
> for normalization.

[snip]


> The lowest (most general) level will be applicable to all higher (more
> specific) levels.

I submit, and I hope you will listen, that you in fact are focused on a
lower level model that is generalized only to the extent that we encode
meaning in languages. If you really want to pursue this form of
generalization, and you are really honest about finding the purest and most
generalized form, then you better start with the bit and make your
references to single bit values from there. In one sense, you are working
towards a true Turing machine, a powerful concept, but not at all practical.

Regards,

Dan


D Guntermann

unread,
Jul 8, 2004, 4:47:03 PM7/8/04
to

"Larry Coon" <lcno...@assist.org> wrote in message
news:40EC69...@assist.org...
> Tony wrote:
>
> > You are of course confusing logical and physical issues - and your
> > issue is absurd in any case. It is not LOGICALLY redundant to record

> > that "Car X is brown" and "Dog Y is brown", because these are two
> > different facts. You are presumably saying that it is PHYSICALLY
> > redundant to store the 5 characters of "brown" twice on the disk, and
> > so you want to physically store the word "brown" once and then point
> > to it many times.
>
> You couldn't do that -- if more than one thing POINTS
> to "brown," then the pointers are stored redundantly.
> Infinite regress rears its ugly head.
>
Yes! Neo read and listen!

The proof of the inconsistency of your logic and its inception as a fallacy
is here, stated succinctly by Mr. Coon.

To eliminate redundancy of words (sequences of characters), you create one
word and subsequently create references or pointers to that that word. But
this is not good enough. These characters within sequences are redundant
themselves, so you create references or pointers to a single symbol.

But then these references or pointer values are redundant, so to ensure no
redundancy in these values, you must create a new set of pointers or
references using a new alphabet. But wait! Even this new set of
pointers/or references have values, both in terms of digits and as a
sequence, that are needlessly replicated, so a new set is needed to point to
the pointers that pointed to the pointers that point to a single value.

I leave it as an exercise to see how far you can go repeating this cycle
before you finally get the "ultimate generalized form of normalization",
where no duplication of a symbolic value exists.

Otherwise you can skip that and attempt to eliminate redundancy at the bit
level. Of course the only two values you could use are 0 and 1 which gives
you two states to create both values and unique non-redundant references or
pointers. Good luck with that one!

Another option would be to create an alphabet that no one uses or is aware
of that consists of enough symbols to ensure that there will never be a need
to replicate the symbol in a database ever again. The tens of thousands of
chinese symbols is but a drop in the bucket in comparison to this endeavor.
When you get this done, you can get rid of the binary, octal, decimal, and
hex number systems and create a 1 digit numeric base system where each
numeric value, integer or real, is represented as a single symbol, don't
forget up to infinity, and infintessimally small!

Good Luck!

> Larry Coon
> University of California

- Dan


Neo

unread,
Jul 8, 2004, 5:19:12 PM7/8/04
to
> So in Neo's world, any attribute (eg: "brown") can only
> appear once in the entire database. Got a brown car and
> a brown dog? Tough -- Neo's dbms can't handle that.

Person Name
1 ->5

Car Color
2 ->4

Dog Color
3 ->4

Color Name

Larry Coon

unread,
Jul 8, 2004, 5:48:49 PM7/8/04
to
Neo wrote:

> I realize the following are extraordinary examples, however a general
> data model can't (application above it can) have prejudices as to what
> data/changes it will accept. Suppose, the world is taken oven by
> islamic terrorist. As part of their spoils, they want every word in
> every computer to be spelled backwards, thus 'brown' needs to be
> update to 'nworb'. Or suppose, the French, take over and want every
> string to end in a silent t. In the above tuple, updating one and not
> the others, corrupts the db. Below is approximately how XDb1
> normalizes the three strings and updating it from 'brown' to 'nworb'
> or appending the symbol t does not corrupt the db.

Oh yes, it does, in exactly the same way the relational
model suffers from this, uh, "problem." Suppose a database
contains two "browns" and one "green." Given your requirement,
It is just as much a "corruption" to change both browns and
miss the green as it is to change one brown but not the other.
Therefore, tokenizing each individual string does nothing to
solve your "problem" here.

Larry Coon

unread,
Jul 8, 2004, 5:51:46 PM7/8/04
to
Neo wrote:

> While my understanding of normalization may not be perfect, it is
> closer to the general form compared to those who cannot recognize that
> a db containing the string 'brown' three times has redundant data.

Because you're still confusing physical and logical
redundancy. And ironically, you're dragging normalization
down to physical storage, yet you're referring to it
as a "higher" form of normalization.

Larry Coon

unread,
Jul 8, 2004, 5:54:21 PM7/8/04
to
Gene Wirchenko wrote:

> I see where you are headed with that. Someone has already tried
> the infinite recursion argument to no apparent avail.

Several people, actually, as I read through this thread.
Me among them. But that wasn't what I was going for here.
I was simply asking whether in Neo's concept of a correctly
normalized database, no number is stored more than once.

Alan

unread,
Jul 8, 2004, 6:16:30 PM7/8/04
to

"D Guntermann" <gunte...@hotmail.com> wrote in message
news:I0JwE...@news.boeing.com...

I tried that argument already. He apparently ignored it. It's also in an old
joke I keep referring to:

"Did you hear about the prgrammer that was found dead in his shower? He was
frozen staring at the instructions on a bottle of shampoo: "Lather, Rinse,
Repeat."


>
> I leave it as an exercise to see how far you can go repeating this cycle
> before you finally get the "ultimate generalized form of normalization",
> where no duplication of a symbolic value exists.
>
> Otherwise you can skip that and attempt to eliminate redundancy at the bit
> level. Of course the only two values you could use are 0 and 1 which
gives
> you two states to create both values and unique non-redundant references
or
> pointers. Good luck with that one!
>
> Another option would be to create an alphabet that no one uses or is aware
> of that consists of enough symbols to ensure that there will never be a
need
> to replicate the symbol in a database ever again. The tens of thousands
of
> chinese symbols is but a drop in the bucket in comparison to this
endeavor.
> When you get this done, you can get rid of the binary, octal, decimal, and
> hex number systems and create a 1 digit numeric base system where each
> numeric value, integer or real, is represented as a single symbol, don't
> forget up to infinity, and infintessimally small!

The data entry would be done by the infinite number of monkeys I referred to
elsewhere.

>
> Good Luck!

He'll need more than that!

Alan

unread,
Jul 8, 2004, 6:18:26 PM7/8/04
to

"Larry Coon" <lcno...@assist.org> wrote in message
news:40EDC1...@assist.org...

Not to mention the multiple instances of "n" and "r".

Alan

unread,
Jul 8, 2004, 6:25:00 PM7/8/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.04070...@posting.google.com...
> > You have failed to show that they represent the *same* fact.
>
> You have *failed* to define what is a fact?
>
> However, regardless of whether the string 'brown' is categorized as a
> fact or not by your definition, they still represent the same thing,
> the string 'brown'. Having three things that represent the same thing
> in one db is redundant.\

It's physically redundant, not logically redundant. It has nothing to do
with normalization of a data model, which is a logical model. There is
nothing wrong with physical redundancy, as it speeds things up tremendously.
It is the old tradeoff of space vs. speed. Space is cheap, time is not.

Jan Hidders

unread,
Jul 8, 2004, 7:29:05 PM7/8/04
to
Neo wrote:
>
> The central issue is not why we normalize, but whether the string
> "brown" (not its role as a unsystematic reference) is redundant.

The two are connected. Once you understand why we normalize you will
understand why the notion of redundancy is defined in database theory as
it is and why multiple occurrences of the same string in a data
structure are not necessarily redundant.

> As you are alluding, one way to know if something is redundant is to
> change one of its attributes and check for corruption. This is
> difficult to see with strings, because strings typically don't change
> and if they change we consider them to be a completely different
> string.

Actually, that is very simple to see with strings. You replace the
occurrence of the string with a variable X. If the value of X can be
derived from the integrity constraints then this occurrence of the
string is redundant. If you cannot then it isn't.

> [...] Suppose, the world is taken oven by islamic terrorist. As part


> of their spoils, they want every word in every computer to be spelled

> backwards, thus 'brown' needs to be update to 'nworb'. [...] In the

> above tuple, updating one and not the others, corrupts the db.

Yes, but only when you start from an instance that is not already an
instance from the new database schema is concerned. To show that either
instances of the new or the old database schema have redundancy you have
to demonstrate that there are update anomalies that stay within that
particular database schema.

-- Jan Hidders

Dan

unread,
Jul 8, 2004, 9:35:52 PM7/8/04
to

"Alan" <al...@erols.com> wrote in message
news:2l2btjF...@uni-berlin.de...

>
> "Neo" <neo5...@hotmail.com> wrote in message
> news:4b45d3ad.04070...@posting.google.com...
> > > Yes, [RM] has limitations, but normalizing data is not one of them.
> > > The RM defines normalization.
> >
> > RM defines a limited form of normalization. The general form of
> > normalization which is the central theme to all xNFs (where x may be
> > infinite), allows one to identify 'brown', 'brown', 'brown' as being
> > redundant which XDb1/TDM normalizes.
> >
> > > You just refuse to accept normalization ...
> >
> > I accept the general form of normalization that can be applied to all
> > data models and allows one to recognize that 'brown', 'brown', 'brown'
> > is redundant. I refuse to accept RM's limited form of normalization as
> > the general form of normalization which doesn't allow one to recognize
> > that 'brown', 'brown', 'brown' is redundant.\
>
> It's not, it's not, it's not. You can preach this nonsense till the cows
> come home, but it will never be true. Several examples have been provided
to
> you.
>
> >
> > > > A better reference is C.J. Date's "An Intro to Database Systems"...
> > >
> > > How does that make it better?
> > > Anyways how would you know it is better since you never read Navathe?
> >
> > While I have not read all 873 pages of Elmasri/Navathe's "Fund of Db
> > Sys" 2nd Ed that sits several books under my C.J. Date's "Intro to Db
> > Sys" 6th Ed, I have read enough of it to know that compared to that of
> > Date's p288-9, their fundamental explanation of normalization on p407
> > is limited: "Normalization of data can be looked on as a process
> > during which unsatisfactory relation shemas are decomposed by breaking
> > up their attributes into smaller relations schemas that possess
> > desirable properties". C.J. Date's is better because his fundamental
> > explanation comes closer to the general form of normalization that can
> > be applied to data in any model, even those that don't have any
> > relations.
>
> That's on old version, maybe 12 years old. They are up to 4th Ed. now.
>
> Anyway, As I stated elswhere, I give up.
>
>
Don't give up!

I see this discussion as very beneficial to the entire group, or at least
for me. It forces everyone to revisit the fundementals and re-examine our
own understanding of them.

Actually, if Neo's ideas were practical at the physical level and he never
introduced his form of "normalization" to the logical user level as a form
of data model, I wouldn't have an issue with his "implementation" at all as
long as it behaved functionally as a user would expect -- operations and
inference rules over semantics units.

- Dan


Marshall Spight

unread,
Jul 8, 2004, 9:48:50 PM7/8/04
to
"Paul" <pa...@test.com> wrote in message news:40eda260$0$554$ed26...@ptn-nntp-reader03.plus.net...

>
> Check out this famous painting by Magritte:
> http://en.wikipedia.org/wiki/Image:MagrittePipe.jpg

I love that pipe! I got to see it when it came to San Francisco
a few years ago, along with a bunch of his others. Magritte rocks!


Marshall


Neo

unread,
Jul 8, 2004, 11:33:48 PM7/8/04
to
> > ... try again as to why the same thing (string 'brown')
> > [stored] three times in a db is not redundant.

>
> Because there is no update anomaly.

Because the three strings each represent the same thing, the string
'brown', it is redundant. To create an update anomaly, we need to
perform an update. This is unusual with strings, because strings


typically don't change and if they change we consider them to be a
completely different string.

I realize the following are unusual examples, however a general data


model can't (application above it can) have prejudices as to what

updates are unusual. Suppose, the world is taken oven by Islam and
they desire every string in a computer to be spelled backwards, thus
'brown' needs to be updated to 'nworb'. Or suppose, the French take
over and want every string to end in a silent 't' (as in buffet). In
the above tuple, updating one and not the others, creates an update
anomaly. Below is approximately how XDb1 normalizes the three strings
and updating it from 'brown' to 'nworb' or 'brownt' does not create an
update anomaly.

Neo

unread,
Jul 8, 2004, 11:47:02 PM7/8/04
to
> Because you're still confusing physical and logical
> redundancy. And ironically, you're dragging normalization
> down to physical storage, yet you're referring to it
> as a "higher" form of normalization.

Having the same person three times in a db is redundant.
Having the same string three times in a db is redundant.
Having the same thing three times in a db is redundant.

One is having difficulty realizing that a person and a string are both
things (ie data, fact, entity, etc).

Below is approximately how XDb1 normalizes the three strings:

Thing Person Color Street
1 ->2 ->3 ->4

Person Name
2 ->5

Color Name
3 ->5

Street Name
4 ->5

String Sym1 Sym2 Sym3 Sym4 Sym5 ....
5 ->6, ->7, ->8, ->9, ->10

Symbol
6 b
7 r
8 o
9 w
10 n

Note, there are no tables in XDb1, only things. However certain groups
of things can be viewed as a list, tree, table, matrix, etc.

Neo

unread,
Jul 9, 2004, 12:10:09 AM7/9/04
to
> >>You have failed to show that they represent the *same* fact.
> >
> > You have *failed* to define what is a fact?
>
> I did that elsewhere.

My apologies. Could you please refer to that post or restate?

Neo

unread,
Jul 9, 2004, 12:20:10 AM7/9/04
to
> There is only redundancy if one of them can be ignored
> without loss of information. You cannot.

Below is approximately how XDb1 ignores the redundant "brown" strings
that were in the place of ->2, ->3 and ->4 without loss of
information. The person, color and street in tuple1 are not redundant,
the string that names them was.

Neo

unread,
Jul 9, 2004, 1:05:47 AM7/9/04
to
> But then these references or pointer values are redundant, so to ensure no
> redundancy in these values, you must create a new set of pointers or
> references using a new alphabet. But wait! Even this new set of
> pointers/or references have values, both in terms of digits and as a
> sequence, that are needlessly replicated, so a new set is needed to point to
> the pointers that pointed to the pointers that point to a single value.

References aren't within the scope of normalizing things in a db
because they are independent of the things being represented. The
actual implementation of references is a hardware issue, not a logical
one. On some hardware (ie PCs), references to the same thing probably
do have the same values. On other hardware (ie brain?), references to
the same thing are likely different paths to the same neuron(s) and
don't have no duplicate "values".

In an ideal db (ie brain?) the user isn't aware of the hardware values
that may or may not be associated with a reference.

Dan

unread,
Jul 9, 2004, 3:19:42 AM7/9/04
to

"Neo" <neo5...@hotmail.com> wrote in message
news:4b45d3ad.0407...@posting.google.com...

> > But then these references or pointer values are redundant, so to ensure
no
> > redundancy in these values, you must create a new set of pointers or
> > references using a new alphabet. But wait! Even this new set of
> > pointers/or references have values, both in terms of digits and as a
> > sequence, that are needlessly replicated, so a new set is needed to
point to
> > the pointers that pointed to the pointers that point to a single value.
>
> References aren't within the scope of normalizing things in a db
> because they are independent of the things being represented.

But you expose them (references/pointers) to us as part of a data model and
use it to support your notion of a "most general form of normalization."
You do it over and over. If they are not in scope, then don't bring them
up. On the other hand, if you want to do the referencing by value and
expose that to us, then that is fine, but then again you will be duplicating
values (or objects in Neo speak), a feature where redundancy is desired in
RM. If you were to do this, then we can merely question the wisdom and the
faculty of logic of the person whose universe of discourse we intend to
formalize within a DBMS based on symbols.

Very few users are going to specify their system or their universe of
discourse in terms of a model where alphabet symbols are one type of node
and sequences of symbols are another type of node with a morass of directed
arcs between these nodes. That provide no value in terms of information or
meaning when considering the units in isolation.

Moreover, when thinking about your "normalized model", I've come to the
conclusion that you actually introduce the potential for unintended update
anomolies precisely because of the fact that normalization is based on
functional dependencies (read semantic rules) while your efforts are
constrained to syntactic dependencies.

If I were to delete the 'r' in 'brown' in your string list, would all
instances of 'brown' be modified (e.g. person name, street name, etc.).? A
properly normalized database would.....but it me either unintended or
incorrect in your form of normalization.

If I were to change the 'br' in 'brown' of the street name to 'cl', would
all other references to 'brown' associated with other "objects" reflect this
update? How would you know one way or the other how the system should
behave without semantic considerations? A user might want all strings of
'brown' changed to 'clown' and in other cases, only the name of the object
he or she is altering.

The
> actual implementation of references is a hardware issue, not a logical
> one.

I agree, but then it shouldn't be exposed as part of your normalization. It
is an implementation issue.

On some hardware (ie PCs), references to the same thing probably
> do have the same values. On other hardware (ie brain?), references to
> the same thing are likely different paths to the same neuron(s) and
> don't have no duplicate "values".
>
> In an ideal db (ie brain?) the user isn't aware of the hardware values
> that may or may not be associated with a reference.

None of us thinks coherently in terms of broken up characters. We think in
terms of words and sentences (predicates). If we have to spell, we might
have several redundant points of reference in our heads from which to draw
from. I would be surprised if only one value of anything existed in our
head.


Marshall Spight

unread,
Jul 9, 2004, 3:52:19 AM7/9/04
to
"Neo" <neo5...@hotmail.com> wrote in message news:4b45d3ad.04070...@posting.google.com...

> > Because you're still confusing physical and logical
> > redundancy. And ironically, you're dragging normalization
> > down to physical storage, yet you're referring to it
> > as a "higher" form of normalization.
>
> Having the same person three times in a db is redundant.
> Having the same string three times in a db is redundant.
> Having the same thing three times in a db is redundant.

You keep treating redundancy and normalization as if
they were the same things; they're not.


Marshall


VHarris001

unread,
Jul 9, 2004, 8:02:46 AM7/9/04
to
>
>How about numbers? Do you tokenize those? Given the
>following schema:
>
>create table person_pets (
> person_id numeric(5) not null,
> number_of_dogs int,
> number_of_cats int,
> number_of_fish int,
> primary key (person_id)
>)
>
>Given the tuple (12345, 2, 2, 2) what do you do with
>those horrific redundant 2's?
>
>
>Larry Coon
>University of California
>

Why isn't each "pet" considered a unique thing rather than a quantity?

V Harris

Alan

unread,
Jul 9, 2004, 8:07:16 AM7/9/04
to

"Dan" <gunter...@verizon.com> wrote in message
news:YDmHc.40521$qw1....@nwrddc01.gnilink.net...

Unfortunately, I don't agree that his implemenation is practical. You wind
up storing nothing but pointers to data. This would be a nightmare when it
comes time to extract the data. Imagine trying to debug a report. Then there
is the question of data entry. How would the system know what data was
already entered? Ex: A user goes to enter the string "Brown" as a car color.
The system would need to check to see if that string was already enetered.
The overhead would be enormous.


VHarris001

unread,
Jul 9, 2004, 8:14:55 AM7/9/04
to
>
>Neo,
>
>Since you supplant all instances of a specific encoding with references to a
>single character value, wouldn't the references, which are pieces of
>information themselves, be redundant?
>

[snip]

>Regards,
>
>Dan
>


Normalizing the data this finely might be necessary in some cases. Wouldn't a
database program that allowed, without requiring, this degree of normalization,
be useful?

It seems that if neo's database REQUIRES normalization to the nth degree, but
if it allows it, it seems that would be a useful feature. No?

V Harris

VHarris001

unread,
Jul 9, 2004, 8:43:50 AM7/9/04
to
Dan wrote:

>
>If I were to change the 'br' in 'brown' of the street name to 'cl', would
>all other references to 'brown' associated with other "objects" reflect this
>update? How would you know one way or the other how the system should
>behave without semantic considerations?

Why would it ever be necessary to change the 'r' in brown? If it was necessary
to have another thing spelled 'clown,' shouldn't the correct approach be to
create the new thing 'clown' in the thing table and point to the new 'thing?'

> A user might want all strings of
>'brown' changed to 'clown' and in other cases, only the name of the object
>he or she is altering.
>

If the change is a correction, just change the pointer to 'clown.' If the
change is an update, one could simply add a new date/timestamp pointer to
'clown' that superceded the prior pointer. This would not only point to the
correct change, but leave a history to the update.


V Harris

It is loading more messages.
0 new messages