Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Proposal: 6NF

31 views
Skip to first unread message

Karen Hill

unread,
Sep 29, 2006, 3:35:07 PM9/29/06
to
6NF would be a database that uses no Nulls.

This way, one could specify that they demand their relations in 6th
Normalized Form, and have a database design that uses no Nulls.

Those that like nulls can have thier 3NF, while the rest of us reach
for higher standers like 6NF.

regards,

Karen

J M Davitt

unread,
Sep 29, 2006, 5:29:54 PM9/29/06
to
Karen Hill wrote:
> 6NF would be a database that uses no Nulls.

Um, "no nulls" is necessary for 1NF. And I
believe someone already has dibs on "6NF."

Bob Badour

unread,
Sep 29, 2006, 5:56:06 PM9/29/06
to
J M Davitt wrote:
> Karen Hill wrote:
>
>> 6NF would be a database that uses no Nulls.
>
> Um, "no nulls" is necessary for 1NF. And I
> believe someone already has dibs on "6NF."

Of course, the original 6NF would obviate NULL in any case.

Karen Hill

unread,
Sep 29, 2006, 6:09:13 PM9/29/06
to

J M Davitt wrote:
> Karen Hill wrote:
> > 6NF would be a database that uses no Nulls.
>
> Um, "no nulls" is necessary for 1NF. And I
> believe someone already has dibs on "6NF."
>

How so? Plenty of people have nulls in 3NF. How is no nulls necessary
for 1NF?

Bob Badour

unread,
Sep 29, 2006, 7:09:25 PM9/29/06
to
Karen Hill wrote:

There are basically two schools of thought on NULL. The SQL school and
the consistent school. The consistent school observes that 1NF requires
logical identity and what is sometimes called the Information Principle.
These both require one represent all information explicitly as values,
and NULL is not a value.

Hence, no table with a nullable column can be in 1NF let alone 3NF. The
SQL school fudges the books ignoring both logical identity and the
information principle.

Codd started down the SQL path with his RM V/2, but that path turned out
to be the primrose path.

paul c

unread,
Sep 29, 2006, 7:27:50 PM9/29/06
to

I think JM is saying nulls aren't possible in 1NF, ergo not either in
3NF. Just because somebody thinks they have 3NF with nulls, doesn't
mean they do, no matter what their SQL product documentation tells them.

p

paul c

unread,
Sep 29, 2006, 7:35:33 PM9/29/06
to

I guess it would have been a little more accurate to say there are some
people who discredit nulls who also aren't sure if 1NF really means
anything at all. I'm sort of in that camp, although I sometimes wonder
if there really is a 1NF, could it be that part of its definition be
that no rva could constitute, by itself, a key. That is part of my
interpretation of the information principle, but I'm sure some other
people would find my interpretation objectionable.

p

Bob Badour

unread,
Sep 29, 2006, 8:13:44 PM9/29/06
to
paul c wrote:

> paul c wrote:
>
>> Karen Hill wrote:
>>
>>> J M Davitt wrote:
>>>
>>>> Karen Hill wrote:
>>>>
>>>>> 6NF would be a database that uses no Nulls.
>>>>
>>>> Um, "no nulls" is necessary for 1NF. And I
>>>> believe someone already has dibs on "6NF."
>>>>
>>>
>>> How so? Plenty of people have nulls in 3NF. How is no nulls necessary
>>> for 1NF?
>>>
>>
>> I think JM is saying nulls aren't possible in 1NF, ergo not either in
>> 3NF. Just because somebody thinks they have 3NF with nulls, doesn't
>> mean they do, no matter what their SQL product documentation tells them.
>

> I guess it would have been a little more accurate to say there are some
> people who discredit nulls who also aren't sure if 1NF really means
> anything at all. I'm sort of in that camp, although I sometimes wonder
> if there really is a 1NF, could it be that part of its definition be
> that no rva could constitute, by itself, a key. That is part of my
> interpretation of the information principle, but I'm sure some other
> people would find my interpretation objectionable.

Why on earth would anyone want to proscribe the use of RVA's in keys? A
value is a value. Period.

paul c

unread,
Sep 29, 2006, 10:26:29 PM9/29/06
to

I'll try to phrase an answer that is as terse as that question.

p

Bob Badour

unread,
Sep 29, 2006, 10:30:22 PM9/29/06
to
paul c wrote:

Let me try: "I dunno. It sounded good when I said it fast." ?

frosty

unread,
Sep 29, 2006, 11:48:45 PM9/29/06
to

11NF!

--
frosty
On my guitar amp, The numbers all go to eleven.
Look... right across the board.


David Cressey

unread,
Sep 30, 2006, 9:14:13 AM9/30/06
to

"Karen Hill" <karen_...@yahoo.com> wrote in message
news:1159558507....@m7g2000cwm.googlegroups.com...

First, let's find out where we all agree, then work our way towards where
there are different viewpoints.

Here's a starter: A schema in DKNF will have no nullable columns.


JOG

unread,
Sep 30, 2006, 10:12:39 AM9/30/06
to

I don't even need 1NF to want to discard nulls. So long as one realises
that the aim is simply record propositions of fact in one consistent
way, it becomes clear nulls are a theoretical nonsense period.

If more people were taught this at the start of their database
education, perhaps we could get away from the 'plugging holes in a
table' syndrome that does so much damage.

Brian Selzer

unread,
Sep 30, 2006, 1:08:06 PM9/30/06
to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1159625559....@i42g2000cwa.googlegroups.com...

> Bob Badour wrote:
>> Karen Hill wrote:
>>
>> > J M Davitt wrote:
>> >
>> >>Karen Hill wrote:
>> >>
>> >>>6NF would be a database that uses no Nulls.
>> >>
>> >>Um, "no nulls" is necessary for 1NF. And I
>> >>believe someone already has dibs on "6NF."
>> >
>> > How so? Plenty of people have nulls in 3NF. How is no nulls necessary
>> > for 1NF?
>>
>> There are basically two schools of thought on NULL. The SQL school and
>> the consistent school. The consistent school observes that 1NF requires
>> logical identity and what is sometimes called the Information Principle.
>> These both require one represent all information explicitly as values,
>> and NULL is not a value.
>>
>> Hence, no table with a nullable column can be in 1NF let alone 3NF. The
>> SQL school fudges the books ignoring both logical identity and the
>> information principle.
>>
>> Codd started down the SQL path with his RM V/2, but that path turned out
>> to be the primrose path.
>
> I don't even need 1NF to want to discard nulls. So long as one realises
> that the aim is simply record propositions of fact in one consistent
> way, it becomes clear nulls are a theoretical nonsense period.
>

There is a problem with your reasoning. The aim isn't simply to record
propositions of fact: it is also to make use of those facts. Recording
propositions is only about 5% of the typical activity for a database. With
today's technology, querying multiple tables is almost always more expensive
than querying one, and a database in 6NF would force most queries to involve
multiple tables. Even if you separated only the NULLable columns into
different tables, the effort required to perform many simple queries can
become impractical. For example, if you have a table with 30 columns where
15 are NULLable, you would need as many as 16 tables to eliminate NULLs. As
a consequence, every query that would have involved a NULLable column would
now require a join.

NULLs are as much a theoretical nonsense as the empty set or the imaginary
component of a complex number.

David Portas

unread,
Sep 30, 2006, 1:45:03 PM9/30/06
to
Brian Selzer wrote:
>
> There is a problem with your reasoning. The aim isn't simply to record
> propositions of fact: it is also to make use of those facts. Recording
> propositions is only about 5% of the typical activity for a database. With
> today's technology, querying multiple tables is almost always more expensive
> than querying one, and a database in 6NF would force most queries to involve
> multiple tables. Even if you separated only the NULLable columns into
> different tables, the effort required to perform many simple queries can
> become impractical. For example, if you have a table with 30 columns where
> 15 are NULLable, you would need as many as 16 tables to eliminate NULLs. As
> a consequence, every query that would have involved a NULLable column would
> now require a join.

Why does the state of "today's technology" indicate a problem with
JOG's reasoning? If Codd had only concerned himself with the technology
of his day then we wouldn't have a relational model at all.

> NULLs are as much a theoretical nonsense as the empty set or the imaginary
> component of a complex number.

Why do you say that? Imaginary numbers and empty sets are real values
with the normal properties that mathematics and logic expects values to
have. Nulls are not. As Date has pointed out, the "Assignment
Principle" is so fundamental that it wasn't even thought worth
mentioning - until the designers of SQL broke it!

--
David Portas

Roy Hann

unread,
Sep 30, 2006, 4:20:05 PM9/30/06
to
"Brian Selzer" <br...@selzer-software.com> wrote in message
news:WvxTg.11808$7I1....@newssvr27.news.prodigy.net...

> Recording propositions is only about 5% of the typical activity for a
> database. With today's technology, querying multiple tables is almost
> always more expensive than querying one,

Simply wrong.

Consider a table SHIPMENTS with, say, a million rows. One nullable
attribute might be CAUSE_OF_LOSS. Since shipments are almost never lost
this attribute is almost invariably null. Supposing that the DBA is even
slightly skilled and chooses to physically key both tables on
SHIPMENT_NUMBER, which query for the value of shipments lost to pirates will
be quicker? The one that scans the million rows? Or the one that scans the
CAUSE_OF_LOSSES table then joins the (possibly empty) restriction with the
SHIPMENTS table on SHIPMENT_NUMBER?

I would argue that the limitations of today's technology are precisely why
this approach is so useful. Not only does it make it easier to reason about
what is going on, but it also helps the real system go like blazes. I
regularly apply such decompositions to solve performance problems in systems
with tables with hundreds of millions of rows.

[snip]


> For example, if you have a table with 30 columns where 15 are NULLable,
> you would need as many as 16 tables to eliminate NULLs.

That is a far-fetched possibility. Maybe you could construct a hypothetical
table with nullable attributes that would have to be decomposed that way,
but in practice it doesn't happen. One rarely needs more than three or four
additional tables, and often less. And as noted above, they often actually
improve performance.

Roy


JOG

unread,
Sep 30, 2006, 4:33:38 PM9/30/06
to

Congratulations on proving my point.

Karen Hill

unread,
Sep 30, 2006, 6:55:02 PM9/30/06
to

Brian Selzer wrote:

> multiple tables. Even if you separated only the NULLable columns into
> different tables, the effort required to perform many simple queries can
> become impractical. For example, if you have a table with 30 columns where
> 15 are NULLable, you would need as many as 16 tables to eliminate NULLs. As
> a consequence, every query that would have involved a NULLable column would
> now require a join.

Wrong wrong wrong.

You don't have to seperate data into 16 tables to eliminate nulls..
Just create a type for each field. For example, let us say "first
name" is not a varchar but a firstname type. A first name has the
states: 1. First name was provided. 2. First Name Not Given. 3.
Does not have a first name. Instead of putting a NULL if someone has
not provided their first name or does not have one, you would instead
put in the type "First Name Not Given". With a type, a bool can be
used to mark whether a the field is in states 1. or (2 , 3).

Example using postgresql:

CREATE TYPE firstname(
nameProvided bool,
name VARCHAR
);

CREATE TABLE foo
(
name firstname NOT NULL;
);

Now with this, if I do know the first name I can do this:
INSERT INTO foo(name) VALUES ( ('t', "jeff"));

Now if the name is not given the INSERT would look like this:
INSERT INTO foo(name) VALUES (('f', "FIRST NAME NOT GIVEN"));

Now instead of having a NULL which tells you NOTHING, you have
something you can actually work with!

For example:

SELECT * FROM foo where name.nameprovided = 'f' AND name.name = 'FIRST
NAME NOT GIVEN';

You can distinguish between somone who does not have a first name with
someone who has chosen not to provide you with their first name.

Could this be called 7NF (since 6 is taken) or in honor of Date, the
Date Normal Form?

regards,
karen

Marshall

unread,
Sep 30, 2006, 8:04:56 PM9/30/06
to
Brian Selzer wrote:
>
> With today's technology, querying multiple tables
> is almost always more expensive than querying one ...

A couple of years ago, working on a database of major
importance to our company, we had reached a situation where,
due to evolving product requirements, we had a case of *two*
tables that had ceased to have any importance. These tables
sat smack in the middle of the very most often used join paths.

Each table below had a foreign key to the table listed
immediately prior:

Customers <- A <- B <- C <- D <- E <- F1, F2, F3 etc.

Tables C and D had evolved such that every B
had exactly one C, and every C had exactly one D.
Because table E was of such importance to the product,
there were a huge number of queries that included
"Customers join A join B join C join D join E" with
various other things. 9 and 10 table joins were not
uncommon.

Once the product changes had settled down and it
became clear that tables C and D were never going
to be relevant again, I embarked on a course of
getting rid of them. Hundreds of queries would need
to be rewritten, but I really thought it would be worth
it to shorten the join path by 2! Performance was
always an issue, and I felt certain that a 7 table join
would perform better than a 9 table join.

However, I have enough experience to know that
one doesn't go on one's feelings about performance
but rather on measurement. So I devised a series
of test queries to measure the benefit of all this
work.

Total measured benefit: bupkiss. I changed my
mind about the refactoring.


Marshall

Brian Selzer

unread,
Sep 30, 2006, 8:12:28 PM9/30/06
to

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:1159638303....@m73g2000cwd.googlegroups.com...

> Brian Selzer wrote:
>>
>> There is a problem with your reasoning. The aim isn't simply to record
>> propositions of fact: it is also to make use of those facts. Recording
>> propositions is only about 5% of the typical activity for a database.
>> With
>> today's technology, querying multiple tables is almost always more
>> expensive
>> than querying one, and a database in 6NF would force most queries to
>> involve
>> multiple tables. Even if you separated only the NULLable columns into
>> different tables, the effort required to perform many simple queries can
>> become impractical. For example, if you have a table with 30 columns
>> where
>> 15 are NULLable, you would need as many as 16 tables to eliminate NULLs.
>> As
>> a consequence, every query that would have involved a NULLable column
>> would
>> now require a join.
>
> Why does the state of "today's technology" indicate a problem with
> JOG's reasoning? If Codd had only concerned himself with the technology
> of his day then we wouldn't have a relational model at all.
>

The argument JOG made focused only on recording information, not retrieving
it. Why would anyone abandon a sound mechanism that can significantly
reduce the computing capacity required to answer a query?

>> NULLs are as much a theoretical nonsense as the empty set or the
>> imaginary
>> component of a complex number.
>
> Why do you say that? Imaginary numbers and empty sets are real values
> with the normal properties that mathematics and logic expects values to
> have. Nulls are not.

Imaginary numbers are real values...that's news to me!

Every value is a set and every set is a value, so what is the empty set?
The absence of a value? Does that mean that the absence of a value is a
value? Isn't that what a NULL represents--the absence of a value?

>... As Date has pointed out, the "Assignment


> Principle" is so fundamental that it wasn't even thought worth
> mentioning - until the designers of SQL broke it!
>

SQL broke a lot of things. Just because one data sublanguage is screwed up
doesn't mean that NULLs aren't useful.

> --
> David Portas
>


Brian Selzer

unread,
Sep 30, 2006, 8:21:35 PM9/30/06
to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:zLmdnQxwyMHjTIPY...@pipex.net...

> "Brian Selzer" <br...@selzer-software.com> wrote in message
> news:WvxTg.11808$7I1....@newssvr27.news.prodigy.net...
>
>> Recording propositions is only about 5% of the typical activity for a
>> database. With today's technology, querying multiple tables is almost
>> always more expensive than querying one,
>
> Simply wrong.
>
> Consider a table SHIPMENTS with, say, a million rows. One nullable
> attribute might be CAUSE_OF_LOSS. Since shipments are almost never lost
> this attribute is almost invariably null. Supposing that the DBA is even
> slightly skilled and chooses to physically key both tables on
> SHIPMENT_NUMBER, which query for the value of shipments lost to pirates
> will be quicker? The one that scans the million rows? Or the one that
> scans the CAUSE_OF_LOSSES table then joins the (possibly empty)
> restriction with the SHIPMENTS table on SHIPMENT_NUMBER?
>

An even slightly skilled DBA would use an index on such a column. For that
matter, if several nullable attributes are often queried together, the DBA
could use a compound index on those columns. Try that with 6NF.

> I would argue that the limitations of today's technology are precisely why
> this approach is so useful. Not only does it make it easier to reason
> about what is going on, but it also helps the real system go like blazes.
> I regularly apply such decompositions to solve performance problems in
> systems with tables with hundreds of millions of rows.
>

Your argument is based on query bias and depends on the physical ordering of
the tables. I'm sure I could come up with some exceptions, too--I believe I
wrote "almost always."

Brian Selzer

unread,
Sep 30, 2006, 8:51:41 PM9/30/06
to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1159648418.4...@i42g2000cwa.googlegroups.com...

In what way did that prove your point?


Marshall

unread,
Oct 1, 2006, 2:48:06 AM10/1/06
to
Brian Selzer wrote:
>
> Every value is a set

No.


> and every set is a value,

Yes.


> so what is the empty set?

A value.


> The absence of a value?

No.


> Does that mean that the absence of a value is a value?

No.


> Isn't that what a NULL represents--the absence of a value?

No.


Marshall

Roy Hann

unread,
Oct 1, 2006, 4:45:15 AM10/1/06
to
"Brian Selzer" <br...@selzer-software.com> wrote in message
news:jSDTg.10095$6S3....@newssvr25.news.prodigy.net...

>> Consider a table SHIPMENTS with, say, a million rows. One nullable
>> attribute might be CAUSE_OF_LOSS. Since shipments are almost never lost
>> this attribute is almost invariably null. Supposing that the DBA is even
>> slightly skilled and chooses to physically key both tables on
>> SHIPMENT_NUMBER, which query for the value of shipments lost to pirates
>> will be quicker? The one that scans the million rows? Or the one that
>> scans the CAUSE_OF_LOSSES table then joins the (possibly empty)
>> restriction with the SHIPMENTS table on SHIPMENT_NUMBER?
>
> An even slightly skilled DBA would use an index on such a column.

If the attribute in question appeared in a WHERE clause sufficiently often
that might be useful. But the DBA can't just index every column hoping it
will be useful one day. Indices are not free. Indices have to be
maintained at run time, they have to be considered by the query optimimizer,
and they take time to back up. Properly designed tables that correctly
model the business are very nearly free, or better.

> Your argument is based on query bias and depends on the physical ordering
> of the tables.

In the first place I wasn't making an argument, I was telling you what I do
in the real world, and it works. You will have a tough time persuading me I
don't see what I do plainly see.

Secondly, there is no such thing as "physical ordering of tables". The
operation is performed in whatever order the DBMS optimizer decides is
optimal. What I suggested is a very reasonable strategy that my particular
optimizer often does choose, unless it chooses a still better one.

Roy


David Portas

unread,
Oct 1, 2006, 4:48:03 AM10/1/06
to
Brian Selzer wrote:
>
> The argument JOG made focused only on recording information, not retrieving
> it. Why would anyone abandon a sound mechanism that can significantly
> reduce the computing capacity required to answer a query?

Because your argument is merely an assumption based on what some
systems of today are capable of.

Consider two possible models:

1. {i,a,b}
2. {i,a}, {i,b}

In each relation the key is {i}. If the values i,a and b are of the
same types in each case then there is no fundamental reason why they
can't have the same physical representation. If the physical model is
the same then there is no reason why one should perform significantly
differently to the other.


>
> Imaginary numbers are real values...that's news to me!
>

Then I would encourage you to read something about mathematics. Complex
numbers are just as real as the reals or the integers. They are all
values.


> what is the empty set?
> The absence of a value?

Absolutely not. The empty set is a value. Again, I encourage you to
read some maths.


> SQL broke a lot of things. Just because one data sublanguage is screwed up
> doesn't mean that NULLs aren't useful.

If you are proposing some version of "null" that does NOT break the
assignment principle then you should state clearly what your proposal
is.

--
David Portas

Roy Hann

unread,
Oct 1, 2006, 5:05:31 AM10/1/06
to
"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:1159692483.4...@c28g2000cwb.googlegroups.com...

> Brian Selzer wrote:
>>
>> The argument JOG made focused only on recording information, not
>> retrieving
>> it. Why would anyone abandon a sound mechanism that can significantly
>> reduce the computing capacity required to answer a query?
>
> Because your argument is merely an assumption based on what some
> systems of today are capable of.

It's worse. His entire position is based on not knowing even what some of
today's products are already capable of. For example, he seems unaware of
the role of the optimizer.

Roy.


Cimode

unread,
Oct 1, 2006, 8:59:12 AM10/1/06
to
Imaginary numbers are not values belonging to domain of reals.
Nevertherless they are values.

NULLS are NOT values.

NULLS is SQL's poor way of handling missing data by making arbirtrary
assumptions about what *should be there*. There is a number of
possible values greater than one for each NULL value stored in a
database and therefore predicate can not be validated (therefore you
have no facts in your database). In a way, one could conceive NULL's
as a occurrence of some random function in a limited set of
possibilities but certainy not as a value. To make your database state
facts about reallity under the form of predicate you use values that
belong to well defined domain of values that are derived from one
another.

NULL is the carpet under which you hide the dirt of missing data
instead of throwing it to the trash. It is practical YES. But it has
the drawback of making your count results false if you, your team and
all people working on the db during the system life cycle, do not put
additional IS NULL/IS NOT NULL conditions in all procedures selecting
values from tables allowing NULLS. Adding such conditions kills
performance at overall server level. So. Result FALSE/ Performance
(Response Time/Concurrency) Down.. Do you still believe that NULLS are
a good idea?

David Cressey

unread,
Oct 1, 2006, 9:47:43 AM10/1/06
to

"Cimode" <cim...@hotmail.com> wrote in message
news:1159707552.2...@k70g2000cwa.googlegroups.com...

What's your proposal for a systematic way of dealing with missing data?


Brian Selzer

unread,
Oct 1, 2006, 10:16:36 AM10/1/06
to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:P-Sdnd58Trp7GYLY...@pipex.net...

I understand fully the role of the optimizer. That's one of my points. If
you arbitrarily split a table with a nullable column, then you're robbing
the optimizer of possible execution plans. It may make sense to split a
table, for example, removing non-key columns that are seldom used in queries
into another table in order to boost the performance of all other queries.
The point I'm trying to make is that the decision should not be arbitrary.

> Roy.
>


dawn

unread,
Oct 1, 2006, 10:27:27 AM10/1/06
to

Unless this is only a discussion about SQL-based DBMS's, the term
"null" might refer to a value (the empty set) if used in conjunction
with two-valued logic (e.g. IBM UniData). Since most computer
languages are two-valued, it might be best if you clarify that these
"bad nulls" are nulls used with a 3VL language.

On a side note, I would suggest that the industry move away from use of
a 3VL with DBMS tools as soon as feasible. I understand that from a
practical perspective, the elimination of nulls is a tactic to help
mitigate the problems with the language.

For future databases and systems, we should be looking at whatever will
take the place of SQL so that we get past the 3VL NULL issue without
bandaids. Even if we eliminate NULLS from base tables, are we going to
eliminate NULLS in views too? Otherwise we are still stuck with 3VL
until we move beyond it. Ultimately we need to address the cause, the
3VL language SQL.

--dawn

Roy Hann

unread,
Oct 1, 2006, 11:01:13 AM10/1/06
to
"Brian Selzer" <br...@selzer-software.com> wrote in message
news:85QTg.6549$TV3....@newssvr21.news.prodigy.com...

>
> "Roy Hann" <spec...@processed.almost.meat> wrote in message
> news:P-Sdnd58Trp7GYLY...@pipex.net...
>> "David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
>> news:1159692483.4...@c28g2000cwb.googlegroups.com...
>>> Brian Selzer wrote:
>>>>
>>>> The argument JOG made focused only on recording information, not
>>>> retrieving
>>>> it. Why would anyone abandon a sound mechanism that can significantly
>>>> reduce the computing capacity required to answer a query?
>>>
>>> Because your argument is merely an assumption based on what some
>>> systems of today are capable of.
>>
>> It's worse. His entire position is based on not knowing even what some
>> of today's products are already capable of. For example, he seems
>> unaware of the role of the optimizer.
>
> I understand fully the role of the optimizer.

So why did you object that I was depending on a "physical order" of tables?

> That's one of my points. If you arbitrarily split a table with a nullable
> column, then you're robbing the optimizer of possible execution plans.

There is no logical way to decide whether to represent information as a row,
an attribute, or a relationship. I can choose anything that is sensible.
Whatever I choose is going to eliminate some possible plans but it is also
going to create others.

And let me repeat: I am not speculating that my approach works. I use it
routinely and it works very well. I therefore don't care what plans the
optimizer never got to consider and nor should anyone else.

> It may make sense to split a table, for example, removing non-key columns
> that are seldom used in queries into another table in order to boost the
> performance of all other queries. The point I'm trying to make is that the
> decision should not be arbitrary.

What is arbitrary about saying that an attribute which doesn't apply to one
type of fact must necessarily belong to another? It is the inescapable
conclusion. To suggest the opposite is not only arbitrary, it is
capricious, counterproductive, and misleading. If it weren't so tiresomely
familiar it's bizarreness would be blindingly obvious.

Roy


J M Davitt

unread,
Oct 1, 2006, 12:33:09 PM10/1/06
to

I can't see that you've proposed anything new. In fact, I think
you've confused a couple things.

Using TYPE firstname, /firstname.name/ holds a first name -- unless
/firstname.nameProvided/ is false, in which case it holds... What?
What is '"FIRST NAME NOT GIVEN"?' A magic value? More magic
than a NULL not-a-value?

I think you are wrestling with the various flavors of NULL, and by
flavors I refer to that many myriad things NULL is supposed to
represent: sometimes "Not Applicable," sometimes "Not Known,"
sometimes "this OUTER JOIN had none." (Or some such.)

You're on the right track when considering TYPEs and the key feature
your TYPE system needs is an equality comparison operator that
can determine that ('t', 'John'), ('f', 'not given), and
('f', 'not applicable') are different values in /firstname/.

But I see nothing that looks like a proposed normal form.

paul c

unread,
Oct 1, 2006, 12:41:40 PM10/1/06
to
David Cressey wrote:
> "Cimode" <cim...@hotmail.com> wrote in message
> news:1159707552.2...@k70g2000cwa.googlegroups.com...
>> ..

>
> What's your proposal for a systematic way of dealing with missing data?
>
>

David, PMFJI, I really do think that an enterprise that is truly
depending on missing data doesn't have a problem that is limited to its
db, as it will soon go out of business. Actually, I don't really see
how one can depend on missing data in general and no systematic solution
is called for. Heh, also think that the IBM approach is best - namely
It's Better Manually!

p

Cimode

unread,
Oct 1, 2006, 12:58:20 PM10/1/06
to
Not mine...Check McGoveran's on thethirdmanifesto.com

Brian Selzer

unread,
Oct 2, 2006, 5:25:03 AM10/2/06
to

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:1159692483.4...@c28g2000cwb.googlegroups.com...

> Brian Selzer wrote:
>>
>> The argument JOG made focused only on recording information, not
>> retrieving
>> it. Why would anyone abandon a sound mechanism that can significantly
>> reduce the computing capacity required to answer a query?
>
> Because your argument is merely an assumption based on what some
> systems of today are capable of.
>
> Consider two possible models:
>
> 1. {i,a,b}
> 2. {i,a}, {i,b}
>
> In each relation the key is {i}. If the values i,a and b are of the
> same types in each case then there is no fundamental reason why they
> can't have the same physical representation. If the physical model is
> the same then there is no reason why one should perform significantly
> differently to the other.
>
>

There is indeed a fundamental reason. The database schemata above are not
equivalent. 2 would also require a single foreign key constraint if only
one of a or b is nullable, or mutual foreign keys if neither are nullable.
If both a and b are nullable, then you would need:

3. {i}, {i, a}, {i, b}

and a pair of foreign key constraints referencing {i}, or if a can be null
only when b is null, then you would need:

4. {i}, {i, a, b}

and a single foreign key constraint referencing {i}.

As far as performance goes, assuming that one of a or b or neither is
nullable, the performance of a join depends on the cardinalities of {i, a}
and {i, b}, the number of rows targeted in each, and the type of join--that
is, whether it's a loop join, a merge join, or a hash join. A loop join
would require a scan or an index seek for each row (presumably in the table
with the smaller cardinality). A merge join would require that {i, a} and
{i, b} be presorted or sorted on the fly--provided, of course, that the
domain for i has a partial order defined on it. A hash join would require
either (1) that a hash index be maintained, or (2) that a hash be computed
on the fly during a scan. None of this extra computation would be required
for {i, a, b}. I would argue then that your statement is only true if {i,
a} and {i, b} are presorted (as is the case when a clustered index is
defined in Sql Server), or if the cardinalities of {i, a} or {i, b} or both
are small.

>>
>> Imaginary numbers are real values...that's news to me!
>>
>
> Then I would encourage you to read something about mathematics. Complex
> numbers are just as real as the reals or the integers. They are all
> values.
>
>
>> what is the empty set?
>> The absence of a value?
>
> Absolutely not. The empty set is a value. Again, I encourage you to
> read some maths.
>

The Axiom of the Empty Set states that there exists a set that does not
contain another set. Since every set is a value, the empty set does not
contain a value.

The empty set does not contain a value; every non-empty set contains at
least one value. Doesn't that imply that the empty set indicates the
absence of a value?

>
>> SQL broke a lot of things. Just because one data sublanguage is screwed
>> up
>> doesn't mean that NULLs aren't useful.
>
> If you are proposing some version of "null" that does NOT break the
> assignment principle then you should state clearly what your proposal
> is.
>

Use something similar to the empty set instead of SQL NULL. It would
completely eliminate 3VL altogether.

Some operators:

Ø = Ø is TRUE
x > Ø is TRUE unless x = Ø or x = +infinity
x < Ø is TRUE unless x = Ø or x = -infinity
x + Ø = x use the empty sum (0)
x * Ø = x use the empty product (1)

Some aggregates:

SUM use the empty sum (0) for each affected Ø (in other words,
ignore Ø)
COUNT use the sum of the cardinality of the value in each affected cell
(again, ignore Ø)
AVG if every affected cell is empty, then 0 else SUM/COUNT as
above.

> --
> David Portas
>


Brian Selzer

unread,
Oct 2, 2006, 5:26:29 AM10/2/06
to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:ybudneB2Qvz...@pipex.net...

If it works for most of the queries being submitted, then go for it! I've
often seen exactly the opposite happen, where splitting a table caused a few
queries to scream while at the same time significantly degrading the
performance of all the rest.

>> It may make sense to split a table, for example, removing non-key columns
>> that are seldom used in queries into another table in order to boost the
>> performance of all other queries. The point I'm trying to make is that
>> the decision should not be arbitrary.
>
> What is arbitrary about saying that an attribute which doesn't apply to
> one type of fact must necessarily belong to another? It is the
> inescapable conclusion. To suggest the opposite is not only arbitrary, it
> is capricious, counterproductive, and misleading. If it weren't so
> tiresomely familiar it's bizarreness would be blindingly obvious.
>

I think you're making the assumption that an attribute doesn't apply if its
value isn't known. That is not always true. It is often the case that not
all of the information about something is received at the same time. The
interval during which the first bit of information is received and the final
bit may be sufficiently long that it makes sense to record parts of that
information at different times rather than wait until all of the relevant
information is available and then recording it en mass.

> Roy
>


Cimode

unread,
Oct 2, 2006, 9:53:59 AM10/2/06
to
True. Utilization of NULLS has become a norm and therefore people
learnt to leave with it. They have learnt to make subjective
adjustments when they can not explain incoherent results (I can count
how many statistician, accounting, and marketing people are called in
datawarehouses to *validate* false counts subjectivity, I have
encoutered). A a consequence, a part of subjectivity has been
introduced back into validating accuracy of results.


> p

Cimode

unread,
Oct 2, 2006, 9:58:54 AM10/2/06
to
Consequences of NULL uses goes far above he simple problem of
execution. It hinders performance at all levels (response time,
concurrency) and introduces a very strong part of subjectivity in
interpretation of data presented to users.

> > Roy.
> >

Brian Selzer

unread,
Oct 2, 2006, 11:26:01 AM10/2/06
to

"Cimode" <cim...@hotmail.com> wrote in message
news:1159797534.2...@e3g2000cwe.googlegroups.com...

I don't agree. Judicious use of NULL can improve performance, response
time, and concurrency. In addition, when used correctly, no subjectivity is
introduced. For every database schema with NULLable attributes, there is
always an equivalent schema without. Any subjectivity would therefore be
independent of the database schema. On the other hand, arbitrary or
incorrect use of NULL can cause all of the problems you describe.

>> > Roy.
>> >
>


Cimode

unread,
Oct 2, 2006, 12:12:10 PM10/2/06
to

> I don't agree. Judicious use of NULL can improve performance, response
> time, and concurrency.

Give me an example of *judicious* use of NULLS.

> In addition, when used correctly, no subjectivity is

*used correctly* ? By whom? Can you seriously guarantee that a
developper will certainly not make a mistake during the entire life
cycle in one of the queries conditions? Do you think that developpers
have in mind all the consequences of not putting all IS NULL/IS NOT
NULL? Besides, at what price is such *correct usage* of NULLS (say how
many extra hours of development). Imagine you have 60 stored
procedures refering to one table Table1 with 2 NULL values. Imagine
that each stored procedure has 2 NULL conditions. That makes 60 * 2
extra conditions and SCANS on stored procedure. I do not see HOW that
can improve response time OR concurrency. But I am positive you will
provide an example.


> introduced. For every database schema with NULLable attributes, there is
> always an equivalent schema without. Any subjectivity would therefore be
> independent of the database schema.

The fact that there is a way to get rid of NULLS does not say anything
about subjectivity or objectivity.

A normalized schema allows systems to make inferences about data
according to a predefined rules. Such rule do not change every time a
user reads false query results. Use of NULLS imposes subjectivity on
every read done on false results.

> On the other hand, arbitrary or
> incorrect use of NULL can cause all of the problems you describe.

Use of NULL is necessary arbiratry no matter what you do. If you find
a NULL value in a field, is there a systematic way to say anything
about on all tables all schemas: NO.

Besides, even if you apply some discipline onto a systematic treatment
of NULLS, how can you guarantee that none of the developer will not
make any mistake now and after you leave the company to get more money?

Don't you realize that you are stating that incorrect use of something
cause numerous problems therefore it is a good idea of using it. Is
that your way of proving that NULL are useful? I say they create
problems, don't use them. Period.

J M Davitt

unread,
Oct 2, 2006, 8:19:14 PM10/2/06
to

A wrong answer isn't always better because it cost less.

In addition, when used correctly, no subjectivity is
> introduced. For every database schema with NULLable attributes, there is
> always an equivalent schema without.

Well, being the god* you seem to be, I won't doubt
that solely because you said so.

* Genius of Design

Brian Selzer

unread,
Oct 3, 2006, 12:08:00 AM10/3/06
to

"J M Davitt" <jda...@aeneas.net> wrote in message
news:60iUg.3585$Cq3...@tornado.ohiordc.rr.com...

But an incomplete answer may be better than nothing. For example, if you
delay arresting a suspected terrorist until you have enough evidence to
convict, you get 9/11; if you pick him up and then develop the evidence
through interrogation, search and seizure, you may not get enough for a
conviction, but you might prevent an attack.

> In addition, when used correctly, no subjectivity is
>> introduced. For every database schema with NULLable attributes, there is
>> always an equivalent schema without.
>
> Well, being the god* you seem to be, I won't doubt
> that solely because you said so.
>
> * Genius of Design
>

I'm not sure I understand your point. Are you asking for proof that a
schema with nullable attributes can always be transformed into one without,
or are you trying to put me back in my place because you don't think I know
what I'm talking about? It's obvious to me that an outer join of a set of
restricted projections of a table with nullable columns is equivalent the
the original table, provided that each projection does not allow nulls, that
each projection includes at least one key and either a subset of the columns
that allow nulls or a subset of the columns that don't, and that each column
is represented in at least one of the projections.

J M Davitt

unread,
Oct 3, 2006, 2:51:26 AM10/3/06
to

So, let me get this straight: we round up all the NULLs,
pack 'em together in Guantanamo Bay, and the world will
be a better place?

>
>> In addition, when used correctly, no subjectivity is
>>
>>>introduced. For every database schema with NULLable attributes, there is
>>>always an equivalent schema without.
>>
>>Well, being the god* you seem to be, I won't doubt
>>that solely because you said so.
>>
>>* Genius of Design
>>
>
>
> I'm not sure I understand your point.

How are two schemas - one "with NULLable attributes" and
one "without" - equivalent?

Are you asking for proof that a
> schema with nullable attributes can always be transformed into one without,
> or are you trying to put me back in my place because you don't think I know
> what I'm talking about? It's obvious to me that an outer join of a set of
> restricted projections of a table with nullable columns is equivalent the
> the original table, provided that each projection does not allow nulls, that
> each projection includes at least one key and either a subset of the columns
> that allow nulls or a subset of the columns that don't, and that each column
> is represented in at least one of the projections.

What does SQL's OUTER JOIN have to do with what you described?

How does a description of COMPOSE make your point?

David Portas

unread,
Oct 3, 2006, 9:32:36 AM10/3/06
to
Brian Selzer wrote:
>
> As far as performance goes, assuming that one of a or b or neither is
> nullable, the performance of a join depends on the cardinalities of {i, a}
> and {i, b}, the number of rows targeted in each, and the type of join--that
> is, whether it's a loop join, a merge join, or a hash join. A loop join
> would require a scan or an index seek for each row (presumably in the table
> with the smaller cardinality). A merge join would require that {i, a} and
> {i, b} be presorted or sorted on the fly--provided, of course, that the
> domain for i has a partial order defined on it. A hash join would require
> either (1) that a hash index be maintained, or (2) that a hash be computed
> on the fly during a scan. None of this extra computation would be required
> for {i, a, b}. I would argue then that your statement is only true if {i,
> a} and {i, b} are presorted (as is the case when a clustered index is
> defined in Sql Server), or if the cardinalities of {i, a} or {i, b} or both
> are small.

You missed my point entirely. My point was that the physical models of
1 and 2 might as well be identical (accepting your comments about
foreign keys, which are a logical issue). If the physical model is
identical then there is no reason to suppose any difference in
performance.


> The empty set does not contain a value; every non-empty set contains at
> least one value. Doesn't that imply that the empty set indicates the
> absence of a value?

The empty set IS a value. Whether the empty set implies something or
not depends on the interpretation the user puts on any proposition
containing the empty set. It is still a value.


> Use something similar to the empty set instead of SQL NULL. It would
> completely eliminate 3VL altogether.
>
> Some operators:
>
> Ø = Ø is TRUE
> x > Ø is TRUE unless x = Ø or x = +infinity
> x < Ø is TRUE unless x = Ø or x = -infinity
> x + Ø = x use the empty sum (0)
> x * Ø = x use the empty product (1)

So Ø looks like some value of the domain in question. If all you are
proposing is better domain support then I agree. Support for
user-defined domains and operators is a far better alternative to
nulls. Unfortunately the task of defining sensible results for such a
value for every operator could be onerous. For that reason I think the
technique of decomposition into separate relations still has its place.

--
David Portas

Chris Smith

unread,
Oct 3, 2006, 11:44:44 AM10/3/06
to
Brian Selzer <br...@selzer-software.com> wrote:
> I think you're making the assumption that an attribute doesn't apply if its
> value isn't known. That is not always true. It is often the case that not
> all of the information about something is received at the same time. The
> interval during which the first bit of information is received and the final
> bit may be sufficiently long that it makes sense to record parts of that
> information at different times rather than wait until all of the relevant
> information is available and then recording it en mass.

I think you're confusing tuples with things. In the relational model,
properly understood, a tuple represents a fact, not a thing. What you
are saying is that sometimes there are two facts about the same subject,
and that you can find out one, and then later perhaps find out another.
If you think of tuples as representing things, then you'd logically
expect that all the simple attributes of that thing would be in one
relation schema; and if some are unknown, you'd look for a way to
represent that. If you understand tuples as facts, though, you are no
longer forced into the presumption that all the basic facts about a
thing need to be expressed in the same relation schema, and that allows
you to consider designs where you can store each fact as you discover
it.

This confusion -- that tuples in relations represent things, not facts
-- is very common. There are, in fact, numerous database design
tutorials, references, etc. that adopt it. I could make the case that,
in fact, SQL is designed for it and that this thing-oriented pseudo-
relational database design is more common in practice than relational
database deasign. It is, however, not relational design.

--
Chris Smith

David Cressey

unread,
Oct 3, 2006, 4:10:12 PM10/3/06
to

"paul c" <toledob...@oohay.ac> wrote in message
news:8dSTg.78337$R63.70243@pd7urf1no...

Paul,

My phrase "a systematic way of dealing with missing data" is from Codd's
twelve rules. I think Codd got this one right. Not because the RDM is any
more dependent on dealing with missing data than any other data model might
be, but because, in the real world, you are going to be faced with the
reality that data that "ought to be there" isn't there.

You either have a systematic way of dealing with missing data, or you deal
with missing data in an unsystematic way, when the eventuality happens.
Which is worse?


I didn't intend the inference that you drew, that a company should design
its data strategy around dependency on missing data. I do think that a
system that a company depends on should not choke up and refuse to work if
some item of expected data is not present.

I also don't believe that "it's better manually". Manual systems also have
ways of dealing with missing data. Sometimes they depend on "common sense"
on the part of the people in the system. Sometimes that works. Sometimes
it doesn't. I think a good, well designed system that incorporates a
systematic way of dealing with missing data can be significantly more robust
that a system that relies on uncommon levels of common sense.

Karen Hill

unread,
Oct 3, 2006, 6:58:05 PM10/3/06
to

David Cressey wrote:

>
> My phrase "a systematic way of dealing with missing data" is from Codd's
> twelve rules. I think Codd got this one right. Not because the RDM is any
> more dependent on dealing with missing data than any other data model might
> be, but because, in the real world, you are going to be faced with the
> reality that data that "ought to be there" isn't there.

Here is an example:

Creating an order in a table for a non-existant customer?

> You either have a systematic way of dealing with missing data, or you deal
> with missing data in an unsystematic way, when the eventuality happens.
> Which is worse?
>

I would rather have a system that would refuse to create an order for a
non existant customer.

Aloha Kakuikanu

unread,
Oct 3, 2006, 7:24:17 PM10/3/06
to

Karen Hill wrote:
> David Cressey wrote:
>
> >
> > My phrase "a systematic way of dealing with missing data" is from Codd's
> > twelve rules. I think Codd got this one right. Not because the RDM is any
> > more dependent on dealing with missing data than any other data model might
> > be, but because, in the real world, you are going to be faced with the
> > reality that data that "ought to be there" isn't there.
>
> Here is an example:
>
> Creating an order in a table for a non-existant customer?

The problem of non-existing customer (or other data for that matter)
has been solved once and forever by
http://www.economist.com/science/PrinterFriendly.cfm?story_id=7854216

Marshall

unread,
Oct 3, 2006, 8:19:01 PM10/3/06
to
Karen Hill wrote:
> David Cressey wrote:
>
>
> Here is an example:
>
> Creating an order in a table for a non-existant customer?
>
> > You either have a systematic way of dealing with missing data, or you deal
> > with missing data in an unsystematic way, when the eventuality happens.
> > Which is worse?
>
> I would rather have a system that would refuse to create an order for a
> non existant customer.

I would expect any systematic approach to enable me to specify
what is and is not allowed to be missing.


Marshall

paul c

unread,
Oct 3, 2006, 9:20:17 PM10/3/06
to

David, regarding the "inference", I wasn't trying to put words in your
mouth, just making what might seem an extreme statement to some in order
to make my point. As for the "it's better manually" comment, I meant it
only as a joke, although I have noticed from time to time that certain
manual systems were far more efficient and more "cost-effective" than
the computerized systems that replaced them!

I didn't mean to distract from the technical topic. I imagine I'm one
of the few people who likes Hugh Darwen's approach, even though I admit
it can seem ponderous (ie., many, many tables) to some programmers.
Then there is Fabian Pascal's approach which is slightly less pragmatic
since it depends on an implementation that doesn't exist as far as I
know, but which seems also a little more ideal to me. My attitude about
the systems I've seen is that they could do with some ruthless expunging
of table columns that were added in haste. Whenever I was able to get
to the REAL user or the big boss, it was surprising how many
requirements cooked up by consultants and ordinary users turned out to
be not essential, just subjective "nice-to-haves".

magoo

paul c

unread,
Oct 3, 2006, 9:25:55 PM10/3/06
to

I second that. I've seen nothing in relational theory that requires
rows or tuples to stand for things (or entities). Saying it does
strikes me as nothing more than close-minded willfullness.

p

Brian Selzer

unread,
Oct 4, 2006, 2:28:59 AM10/4/06
to

"J M Davitt" <jda...@aeneas.net> wrote in message
news:OLnUg.7759$OE1....@tornado.ohiordc.rr.com...

The moon might be better than Camp Delta--or maybe Bikini island. The
problem is that not all NULLs want to kill us and are willing to die in the
process.

>>
>>> In addition, when used correctly, no subjectivity is
>>>
>>>>introduced. For every database schema with NULLable attributes, there
>>>>is always an equivalent schema without.
>>>
>>>Well, being the god* you seem to be, I won't doubt
>>>that solely because you said so.
>>>
>>>* Genius of Design
>>>
>>
>>
>> I'm not sure I understand your point.
>
> How are two schemas - one "with NULLable attributes" and
> one "without" - equivalent?
>

Two database schemata are equivalent if all of the information contained in
an instance of one can be derived from the information contained in an
instance of the other. Database schemata A and B are equivalent if and only
if a pair of inverse mappings exist between A and B: one mapping from A to B
such that for every legal instance of A, there exists one and only one legal
instance of B, and one mapping from B to A such that for every legal
instance of B, there exists one and only one legal instance of A. In other
words, given any instance of A, only one instance of B can be derived using
the mapping from A to B such that when the mapping from B to A is applied to
that derived instance, the result is the original instance for A, and given
any instance of B, only one instance of A can be derived using the mapping
from B to A such that when the mapping from A to B is applied to that
derived instance, the result is the original instance for B. Note that this
applies to database schemata with and without nullable attributes. For
example, a database schema D consisting of a single relation,

R {A, B, C}

that satisfies the functional dependency A --> B is not equivalent to a
database schema D' consisting of two relations,

S {A, B}, T {A, C}

without the circular inclusion dependency S[A] <--> T[A] because for any
given instance of D, there can be a huge number (infinite if any domain for
A, B or C is infinite) of instances of D' where the natural join of the
extensions of S and T is equal to the extension of R. The circular
inclusion dependency ensures that the projections of S JOIN T over {A, B}
and {A, C} are equal to the extensions of S and T respectively.

I hope that the above description and example illustrate what I mean by
equivalent database schemata. For a database schema without nullable
attributes, the mappings involve sets of projections and natural joins. For
a database schema with nullable attributes, the operations are different. A
simple projection would still include nullable attributes. To eliminate
them, the decomposition operation must also restrict the projection to
eliminate any tuple with a null. As a consequence, the inverse operation
must be an outer join which restores the eliminated tuples. In addition,
instead of a circular inclusion dependency, all that is needed is a single
foreign key constraint. There is no need to maintain a 1:1 correspondence
between tuples from each projection because since the original attribute is
nullable, the 1:1 correspondence didn't exist to begin with.

Cimode

unread,
Oct 4, 2006, 4:35:01 AM10/4/06
to

David Cressey wrote:

> Paul,


> You either have a systematic way of dealing with missing data, or you deal
> with missing data in an unsystematic way, when the eventuality happens.
> Which is worse?

There is a systematic way of dealing with problem of missing data
proposed by Darwen (Mad a typo it was not Mc Goveran). It consists of
maintaining a repository that keep track of attempts of inserting
missing data into tables then be able to *bring them back* into the
table where they were supposed to be inserted, in case they get updated
to a known value. I have to admit it creates an extra layer of
administration as you have to maintain the repository, but this is
nothing compared to work that has to be done when letting NULLS in. It
is systematic and most of all insures accurateness. Accepting NULLS
certainly does NOT bring systematic way of dealing with missing data.
Quite frankly, I do not see HOW adding subjectivity into the system by
interpretating results can be in any way *systematic*.

JOG

unread,
Oct 4, 2006, 5:28:11 AM10/4/06
to
All of your points represent a wild goose chase in my eyes Brian. A
proposition with a NULL in it is no proposition at all. From a logical
perspective, case closed. A relation tuple with a NULL in it is no
relation tuple at all. From a mathematical perspective, case closed.
Trying to invoke the 'kludge perspective' is hardly going to convince a
theoretical newsgroup.

Roy Hann

unread,
Oct 4, 2006, 6:47:17 AM10/4/06
to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1159954091.1...@m73g2000cwd.googlegroups.com...

Rest assured he's not convincing this very successful practioner either.

Roy


Cimode

unread,
Oct 4, 2006, 6:56:57 AM10/4/06
to
Could you give me example of performance advantages in using NULLS?
You have stated they were advantages...Could you present them?

Cimode

unread,
Oct 4, 2006, 7:09:09 AM10/4/06
to

Brian Selzer wrote:
By definition, one can not qualify as a *relation*, an N-tuple set that
does NOT state N assertive facts. Accepting 1 NULL values in 1
attribute prevents from stating 1 fact therefore you can state only N-1
facts therefore you can not call it a relation.

Bob Badour

unread,
Oct 4, 2006, 8:50:52 AM10/4/06
to
David Portas wrote:

> Brian Selzer wrote:
>
>>As far as performance goes, assuming that one of a or b or neither is
>>nullable, the performance of a join depends on the cardinalities of {i, a}
>>and {i, b}, the number of rows targeted in each, and the type of join--that
>>is, whether it's a loop join, a merge join, or a hash join.

Bullshit. First and foremost, the cost of the join depends on the
physical storage arrangement. As David already pointed out, one can
cluster {i, a} and {i, b} or partition {i, a, b} so that the physical
arrangement is identical. If one clusters {i, a} and {i, b}, then the
cost of the join is zero. And if one partitions {i, a, b}, then the cost
of no join at all is the same as the cost of joining {i, a} and {i, b}.
Idiot.

[further irrelevancies snipped]


> You missed my point entirely. My point was that the physical models of
> 1 and 2 might as well be identical (accepting your comments about
> foreign keys, which are a logical issue). If the physical model is
> identical then there is no reason to suppose any difference in
> performance.

He's a self-aggrandizing ignorant. They are universally skilled at
ducking points (and clues.)


>>The empty set does not contain a value; every non-empty set contains at
>>least one value. Doesn't that imply that the empty set indicates the
>>absence of a value?
>
> The empty set IS a value. Whether the empty set implies something or
> not depends on the interpretation the user puts on any proposition
> containing the empty set. It is still a value.
>
>
>>Use something similar to the empty set instead of SQL NULL. It would
>>completely eliminate 3VL altogether.
>>
>>Some operators:
>>
>>Ø = Ø is TRUE
>>x > Ø is TRUE unless x = Ø or x = +infinity
>>x < Ø is TRUE unless x = Ø or x = -infinity
>>x + Ø = x use the empty sum (0)
>>x * Ø = x use the empty product (1)
>
> So Ø looks like some value of the domain in question. If all you are
> proposing is better domain support then I agree. Support for
> user-defined domains and operators is a far better alternative to
> nulls. Unfortunately the task of defining sensible results for such a
> value for every operator could be onerous. For that reason I think the
> technique of decomposition into separate relations still has its place.

I would not let the idiot off the hook quite so easily.


>>x > Ø is TRUE unless x = Ø or x = +infinity
>>x < Ø is TRUE unless x = Ø or x = -infinity

So, if x is of a domain that is a total order, the additional 'value'
renders it not an order at all. That's a loss that is not easy to accept.


>>x + Ø = x use the empty sum (0)
>>x * Ø = x use the empty product (1)

Since when has voodoo mathematics ever benefited anyone? Consider:

(x + 1) * (Ø + 1)
= (x + 1) * (1)
= (x + 1)

(x + 1) * (Ø + 1)
= (x + 1) * Ø + (x + 1) * 1
= (x + 1) + (x + 1)
= 2 * (x + 1)

How many C-level executives do you know who will accept a number that
might be a billion dollars or might be two billion? (I mean other than
at Enron.)

Brian Selzer

unread,
Oct 4, 2006, 9:22:03 AM10/4/06
to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1159954091.1...@m73g2000cwd.googlegroups.com...

Is the empty set a value? Yes, it is. So why can't a null be?

By your reasoning, a proposition with an integer in it is no proposition at
all; a proposition with a string in it is no proposition at all; a
proposition with a widgit in it is no proposition at all.

Also, haven't you heard of the existential quantifier? For example, if a
schema {X, Y, Z} has X as a key and Z can be null, then the sentence

EXISTS x (P(x) XOR Q(x))

where P is defined in terms of X and Y and where Q is defined in terms of
X, Y, and Z is a perfectly valid construct in FOPL.


JOG

unread,
Oct 4, 2006, 9:59:46 AM10/4/06
to
Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1159954091.1...@m73g2000cwd.googlegroups.com...
> > All of your points represent a wild goose chase in my eyes Brian. A
> > proposition with a NULL in it is no proposition at all. From a logical
> > perspective, case closed. A relation tuple with a NULL in it is no
> > relation tuple at all. From a mathematical perspective, case closed.
> > Trying to invoke the 'kludge perspective' is hardly going to convince a
> > theoretical newsgroup.
> >
>
> Is the empty set a value? Yes, it is. So why can't a null be?

Because an empty set is a value and a NULL is not.

>
> By your reasoning, a proposition with an integer in it is no proposition at
> all; a proposition with a string in it is no proposition at all; a
> proposition with a widgit in it is no proposition at all.

poppycock. Those are all fine propositions.

>
> Also, haven't you heard of the existential quantifier?

Not in a proposition of fact, absolutely not.

This has been explained many times now. A null _indicates_ that there
is a missing/unknown/inapplicable value, it is not a value itself. A
proposition with a missing/unknown/inapplicable hole is not a
proposition.

J M Davitt

unread,
Oct 4, 2006, 9:16:45 PM10/4/06
to
Brian Selzer wrote:

[big snip]

Schemata. Instances. Information. Pairs of inverse mappings.
Laws. (And lawyers and judges?) Nullable attributes. Natural
joins. Decomposition operations. Restricted projections.
Inverse outer joins. Circular inclusion dependency.

I'm left scratching my head when, as it turns out, the "pairs
of inverse mappings" have something to do with tuples from
projections exchanging some sort of correspondence with each
other. Or not. I think it depends. Right?

I'm sorry I asked. I should have known better...

David Cressey

unread,
Oct 5, 2006, 7:21:29 AM10/5/06
to

"David Cressey" <dcre...@verizon.net> wrote in message
news:F4uTg.998$rH1.633@trndny05...
> First, let's find out where we all agree, then work our way towards where
> there are different viewpoints.
>
> Here's a starter: A schema in DKNF will have no nullable columns.
>
I got no answers to the first point, but I think we all agree.

Here's the next point.

An outer join between two relational tables will, in general, produce a
table with nullable columns.

Note: Brian Selzer has carried this point further in another subthread.

Bob Badour

unread,
Oct 5, 2006, 8:20:18 AM10/5/06
to
David Cressey wrote:

You are begging the question.

Sampo Syreeni

unread,
Oct 5, 2006, 9:17:11 AM10/5/06
to
On 2006-10-05, David Cressey wrote:

> An outer join between two relational tables will, in general, produce
> a table with nullable columns.

I'd add that the same applies if we want the relational model to be
closed under arbitrary unions. To me that seems somehow more elementary,
and it also ties neatly with inheritance modelling: even if you
represent entities of subtypes and supertypes by different relations,
sometimes it's handier to work on their union.
--
Sampo Syreeni, aka decoy - mailto:de...@iki.fi, tel:+358-50-5756111
student/math+cs/helsinki university, http://www.iki.fi/~decoy/front
openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

Bob Badour

unread,
Oct 5, 2006, 9:36:33 AM10/5/06
to
Sampo Syreeni wrote:
> On 2006-10-05, David Cressey wrote:
>
>> An outer join between two relational tables will, in general, produce
>> a table with nullable columns.
>
> I'd add that the same applies if we want the relational model to be
> closed under arbitrary unions. To me that seems somehow more elementary,
> and it also ties neatly with inheritance modelling: even if you
> represent entities of subtypes and supertypes by different relations,
> sometimes it's handier to work on their union.

You are making a great blunder. Null is totally unecessary for handling
specialization and generalization.

Roy Hann

unread,
Oct 5, 2006, 9:55:51 AM10/5/06
to
"Sampo Syreeni" <de...@iki.fi> wrote in message
news:Pine.SOL.4.62.06...@kruuna.helsinki.fi...

> On 2006-10-05, David Cressey wrote:
>
>> An outer join between two relational tables will, in general, produce a
>> table with nullable columns.
>
> I'd add that the same applies if we want the relational model to be closed
> under arbitrary unions.

I'm not sure I do. Persuade me.

> To me that seems somehow more elementary, and it also ties neatly with
> inheritance modelling: even if you represent entities of subtypes and
> supertypes by different relations, sometimes it's handier to work on their
> union.

Really? It might mean less typing on the keyboard, but "handier"? I always
suspect an erroneous intention when I seem to need an outer join. Animal
is a supertype of bird. Why should I expect to be able to ask how many eggs
a camel lays?

Usually this kind of idiocy appears when using crappy tools that simply
assume you're OK about outer joins and their consequent spray of nulls,
instead of solving the problem properly. Report generators are a particular
object of my loathing in this respect.

Roy


Marshall

unread,
Oct 5, 2006, 1:01:36 PM10/5/06
to
On Oct 5, 6:17 am, Sampo Syreeni <d...@iki.fi> wrote:
> On 2006-10-05, David Cressey wrote:
>
> > An outer join between two relational tables will, in general, produce
> > a table with nullable columns.
>
>I'd add that the same applies if we want the relational model to be
> closed under arbitrary unions.

The inner union operator does not require anything like nulls,
obey closure, and furthermore has more desirable algebraic
properties.


Marshall

Sampo Syreeni

unread,
Oct 5, 2006, 1:36:34 PM10/5/06
to
On 2006-10-05, Roy Hann wrote:

> I always suspect an erroneous intention when I seem to need an outer
> join. Animal is a supertype of bird. Why should I expect to be able to
> ask how many eggs a camel lays?

Suppose you want a list of all animals which lay few to no eggs, with no
eggs taken to include the inapplicability of the whole concept of laying
anything. Furthermore, at the time your code is expected to be
finalized, you only know that there will be animals, some of which lay
eggs, but you don't know which kinds of properties all of the animals
eventually described in the database might have. Structural multiple
inheritance in its full generality is a factor, so your DBA insists that
the joint presence and absence of attributes must be efficiently
enforceable. At the current moment it must be that any animal which lays
eggs also has a known wingspan, but unbeknownst to you, two years down
the road a second kind of egg-laying animal will enter the database: the
one with a known extinction date but no wingspan. Eventually your
listing has to include both kinds of animals because both do lay eggs,
but your code can no longer be changed at the time it finally dawns on
the data architect that extinct species have to be included as well.

Evidently you can make all of this work by querying a union view of all
animals, with nulls for the eggs_laid attribute for those species that
don't lay anything, from the start.

You cannot implement the view without the nulls because the laying stuff
is not applicable to all animals, and you want both the animals which
lay few eggs and those to which the concept of laying eggs is
inapplicable. You cannot factor out the number of eggs laid into its own
relation because then it would be difficult to enforce the constraint
that it occurs in conjunction with either wingspan or extinction date,
but not both; each indivisible group of attributes has to live in a
single relation, while partial overlaps between the sets of attributes
possessed by each type can cause each individual attribute to live in
multiple relations at the same time. Hence, at the time you compile your
routine, you cannot know the name of the relation eventually containing
the extinct species, so you cannot work around the problem by taking a
union of projections in your code; in the interests of logical data
independence, the union has to be on the database side. In principle you
could use two views, one summing up all the animals with eggs_laid and
one summing up all the ones without, but this approach seems rather
inelegant to me, if applied more than once. Finally, it just so happens
that your DBMS/DML doesn't support introspection, so that you cannot
find all the relations containing the eggs_laid attribute dynamically,
by catalog lookup.

The example is somewhat contrived, I know, and I'm not completely sure
one couldn't enforce the typing constraints in some less cumbersome
fashion. But still, I find this sort of thing interesting, if not else
then because it gives a nontrivial example of horizontal splitting.

> Usually this kind of idiocy appears when using crappy tools that
> simply assume you're OK about outer joins and their consequent spray
> of nulls, instead of solving the problem properly. Report generators
> are a particular object of my loathing in this respect.

Agreed. Nulls aren't exactly an ease-of-use feature.

Bob Badour

unread,
Oct 5, 2006, 2:00:22 PM10/5/06
to
Sampo Syreeni wrote:

> On 2006-10-05, Roy Hann wrote:
>
>> I always suspect an erroneous intention when I seem to need an outer
>> join. Animal is a supertype of bird. Why should I expect to be able to
>> ask how many eggs a camel lays?
>
>
> Suppose you want a list of all animals which lay few to no eggs, with no
> eggs taken to include the inapplicability of the whole concept of laying
> anything.

So, you are saying that laying zero eggs includes the inapplicability of
the whole concept of laying anything. If zero suffices, why NULL?


Furthermore, at the time your code is expected to be
> finalized, you only know that there will be animals, some of which lay
> eggs, but you don't know which kinds of properties all of the animals
> eventually described in the database might have.

Ah, but one will have to know what those are before one commits the
schema and one will have to know the properties of interest before one
writes a query.

[further long-winded ramblings snipped]

Sampo Syreeni

unread,
Oct 5, 2006, 5:21:17 PM10/5/06
to
On 2006-10-05, Bob Badour wrote:

> So, you are saying that laying zero eggs includes the inapplicability
> of the whole concept of laying anything. If zero suffices, why NULL?

Because a bird that has laid zero eggs might then be observed to lay
more, and we would want to assign a nonzero value to its eggs_laid. With
camels it's different.

>> Furthermore, at the time your code is expected to be finalized, you
>> only know that there will be animals, some of which lay eggs, but you
>> don't know which kinds of properties all of the animals eventually
>> described in the database might have.
>
> Ah, but one will have to know what those are before one commits the schema
> and one will have to know the properties of interest before one writes a
> query.

The whole point of my exercise was that judicious use of nulls allows
you to escape the first half, at least in this one case. In my opinion
such separation of concerns is a worthy design goal, because schemas
tend not to be entirely static and added dependent attributes are
perhaps the most common form of real life schema evolution.

Hugo Kornelis

unread,
Oct 5, 2006, 5:59:03 PM10/5/06
to
On Fri, 6 Oct 2006 00:21:17 +0300, Sampo Syreeni wrote:

>On 2006-10-05, Bob Badour wrote:
>
>> So, you are saying that laying zero eggs includes the inapplicability
>> of the whole concept of laying anything. If zero suffices, why NULL?
>
>Because a bird that has laid zero eggs might then be observed to lay
>more, and we would want to assign a nonzero value to its eggs_laid. With
>camels it's different.

Hi Sampo,

So basically you are saying that you want to store the fact that a bird
is able to lay eggs and a camel is not. A proposition such as
"<Birdtype> is <YesOrNo> able to lay eggs."

In that case, you should model it as such - in a relational table, you
add a column "CanLayEggs" with a domain consisting of the values Yes and
No.

Don't try to make NULL mean "there may not be a value here" - it doesn't
mean that. The *only* meaning NULL has is "value is missing". If you are
interested in the reason why a value is missing, you should model a
proposition to store that reason.

Best, Hugo

J M Davitt

unread,
Oct 5, 2006, 6:16:05 PM10/5/06
to
David Cressey wrote:
> "David Cressey" <dcre...@verizon.net> wrote in message
> news:F4uTg.998$rH1.633@trndny05...
>
>>First, let's find out where we all agree, then work our way towards where
>>there are different viewpoints.
>>
>>Here's a starter: A schema in DKNF will have no nullable columns.
>>
>
> I got no answers to the first point, but I think we all agree.

(I guess my contribution was lost in the ether.)

>
> Here's the next point.
>
> An outer join between two relational tables will, in general, produce a
> table with nullable columns.

Wait a minute: what does OUTER JOIN have to do with
relational theory? Sure, SQL offers an OUTER JOIN,
but...

There are *lots* of things I don't like about "outer
join between two relational tables:"
- "outer" join
What's outer about a join?
- join "between"
"On" or "of," sure; but "between?" Seems strange.
- "relational tables"
Relations or tables: they're different. Pick one.
- "in general"
Very vague.
- "[A] table with nullable columns..."
...is not a relation.

> Note: Brian Selzer has carried this point further in another subthread.

Yes, he has.

JOG

unread,
Oct 5, 2006, 7:59:39 PM10/5/06
to
On Oct 5, 10:21 pm, Sampo Syreeni <d...@iki.fi> wrote:
> On 2006-10-05, Bob Badour wrote:
>
> > So, you are saying that laying zero eggs includes the inapplicability
> > of the whole concept of laying anything. If zero suffices, why NULL?Because a bird that has laid zero eggs might then be observed to lay

> more, and we would want to assign a nonzero value to its eggs_laid. With
> camels it's different.
>
> >> Furthermore, at the time your code is expected to be finalized, you
> >> only know that there will be animals, some of which lay eggs, but you
> >> don't know which kinds of properties all of the animals eventually
> >> described in the database might have.
>
> > Ah, but one will have to know what those are before one commits the schema
> > and one will have to know the properties of interest before one writes a
> > query.The whole point of my exercise was that judicious use of nulls allows

> you to escape the first half, at least in this one case. In my opinion
> such separation of concerns is a worthy design goal, because schemas
> tend not to be entirely static and added dependent attributes are
> perhaps the most common form of real life schema evolution.
> --
> Sampo Syreeni, aka decoy - mailto:d...@iki.fi, tel:+358-50-5756111
> student/math+cs/helsinki university,http://www.iki.fi/~decoy/front

> openpgp: 050985C2/025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2

If you had two relations:

Animals (id [PK], name)
Egg_layers (id [FK], eggs_laid)

and you want a count of animals that lay few eggs (assuming a few eggs
is 5 or less) or aren't egg laying at all, would the following not do
the job without any null kludges?

SELECT COUNT(1)
FROM Animals
ANTIJOIN Layers
ON Animals.id = Layers.id
WHERE Layers.eggs > 5

Perhaps I have misunderstood the issue, given my confusion at this darn
new google groups interface...

Bob Badour

unread,
Oct 5, 2006, 9:17:37 PM10/5/06
to
Sampo Syreeni wrote:

> On 2006-10-05, Bob Badour wrote:
>
>> So, you are saying that laying zero eggs includes the inapplicability
>> of the whole concept of laying anything. If zero suffices, why NULL?
>
> Because a bird that has laid zero eggs might then be observed to lay
> more, and we would want to assign a nonzero value to its eggs_laid. With
> camels it's different.

Are you now suggesting you want to use NULL to indicate that a bird laid
an egg? That seems absurd.


>>> Furthermore, at the time your code is expected to be finalized, you
>>> only know that there will be animals, some of which lay eggs, but you
>>> don't know which kinds of properties all of the animals eventually
>>> described in the database might have.
>>
>>
>> Ah, but one will have to know what those are before one commits the
>> schema and one will have to know the properties of interest before one
>> writes a query.
>
>
> The whole point of my exercise was that judicious use of nulls allows
> you to escape the first half, at least in this one case.

Bullshit.


In my opinion
> such separation of concerns is a worthy design goal,

What concerns? The concerns for laziness and stupidity? Null certainly
has no use either for the concern for correctness or for the concern for
performance.


because schemas
> tend not to be entirely static and added dependent attributes are
> perhaps the most common form of real life schema evolution.

I suggest updatable views handle schema evolution while NULL contributes
nothing useful.

Brian Selzer

unread,
Oct 6, 2006, 4:32:36 AM10/6/06
to

"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1159970386....@i42g2000cwa.googlegroups.com...

> Brian Selzer wrote:
>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>> news:1159954091.1...@m73g2000cwd.googlegroups.com...
>> > All of your points represent a wild goose chase in my eyes Brian. A
>> > proposition with a NULL in it is no proposition at all. From a logical
>> > perspective, case closed. A relation tuple with a NULL in it is no
>> > relation tuple at all. From a mathematical perspective, case closed.
>> > Trying to invoke the 'kludge perspective' is hardly going to convince a
>> > theoretical newsgroup.
>> >
>>
>> Is the empty set a value? Yes, it is. So why can't a null be?
>
> Because an empty set is a value and a NULL is not.
>

Why not?

>>
>> By your reasoning, a proposition with an integer in it is no proposition
>> at
>> all; a proposition with a string in it is no proposition at all; a
>> proposition with a widgit in it is no proposition at all.
>
> poppycock. Those are all fine propositions.
>
>>
>> Also, haven't you heard of the existential quantifier?
>
> Not in a proposition of fact, absolutely not.
>
> This has been explained many times now. A null _indicates_ that there
> is a missing/unknown/inapplicable value, it is not a value itself. A
> proposition with a missing/unknown/inapplicable hole is not a
> proposition.
>

The empty set /indicates/ the absence of a value, yet it /is/ a value; a
null /indicates/ the absence of a value, yet it /isn't/ a value? Why the
double standard?

Brian Selzer

unread,
Oct 6, 2006, 4:32:13 AM10/6/06
to

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:1159882356.4...@i42g2000cwa.googlegroups.com...

>Brian Selzer wrote:
>>
>> As far as performance goes, assuming that one of a or b or neither is
>> nullable, the performance of a join depends on the cardinalities of {i,
>> a}
>> and {i, b}, the number of rows targeted in each, and the type of
>> join--that
>> is, whether it's a loop join, a merge join, or a hash join. A loop join
>> would require a scan or an index seek for each row (presumably in the
>> table
>> with the smaller cardinality). A merge join would require that {i, a}
>> and
>> {i, b} be presorted or sorted on the fly--provided, of course, that the
>> domain for i has a partial order defined on it. A hash join would
>> require
>> either (1) that a hash index be maintained, or (2) that a hash be
>> computed
>> on the fly during a scan. None of this extra computation would be
>> required
>> for {i, a, b}. I would argue then that your statement is only true if
>> {i,
>> a} and {i, b} are presorted (as is the case when a clustered index is
>> defined in Sql Server), or if the cardinalities of {i, a} or {i, b} or
>> both
>> are small.

>
>You missed my point entirely. My point was that the physical models of
>1 and 2 might as well be identical (accepting your comments about
>foreign keys, which are a logical issue). If the physical model is
>identical then there is no reason to suppose any difference in
>performance.
>

I agree. If the physical models are identical, then the performance would
be similar. But common sense argues against the idea that the physical
models should be identical. If {i, a, b} and {i, a}, {i, b} are equivalent,
then since i, a and b are of the same types, it would take at least 1/3
extra storage in separate tables because each i would occur twice.

>
>> The empty set does not contain a value; every non-empty set contains at
>> least one value. Doesn't that imply that the empty set indicates the
>> absence of a value?
>
>The empty set IS a value. Whether the empty set implies something or
>not depends on the interpretation the user puts on any proposition
>containing the empty set. It is still a value.
>
>
>> Use something similar to the empty set instead of SQL NULL. It would
>> completely eliminate 3VL altogether.
>>
>> Some operators:
>>
>> Ø = Ø is TRUE
>> x > Ø is TRUE unless x = Ø or x = +infinity
>> x < Ø is TRUE unless x = Ø or x = -infinity
>> x + Ø = x use the empty sum (0)
>> x * Ø = x use the empty product (1)
>
>So Ø looks like some value of the domain in question. If all you are
>proposing is better domain support then I agree. Support for
>user-defined domains and operators is a far better alternative to
>nulls. Unfortunately the task of defining sensible results for such a
>value for every operator could be onerous. For that reason I think the
>technique of decomposition into separate relations still has its place.
>

I agree that decomposition into separate relations still has its place. I
just don't think that nulls should be dismissed arbitrarily.

After further consideration (prompted by Bob's harangue), I think that the
results of some of the operators enumerated above are not sensible. Ø does
not belong to any numeric domain, and you can't add apples and oranges, but
on the other hand, there is only one empty set, so Ø = Ø should be TRUE.

>--
>David Portas


Brian Selzer

unread,
Oct 6, 2006, 4:33:35 AM10/6/06
to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:M6OUg.80$cz....@ursa-nb00s0.nbnet.nb.ca...

> David Portas wrote:
>
>> Brian Selzer wrote:
>>
>>>As far as performance goes, assuming that one of a or b or neither is
>>>nullable, the performance of a join depends on the cardinalities of {i,
>>>a}
>>>and {i, b}, the number of rows targeted in each, and the type of
>>>join--that
>>>is, whether it's a loop join, a merge join, or a hash join.
>
> Bullshit. First and foremost, the cost of the join depends on the physical
> storage arrangement. As David already pointed out, one can cluster {i, a}
> and {i, b} or partition {i, a, b} so that the physical arrangement is
> identical. If one clusters {i, a} and {i, b}, then the cost of the join is
> zero. And if one partitions {i, a, b}, then the cost of no join at all is
> the same as the cost of joining {i, a} and {i, b}. Idiot.
>
> [further irrelevancies snipped]

Not true. The cost of the join is not zero. There are trade-offs when
choosing a physical design. Since i is duplicated, 33% more storage is
needed for {i, a} and {i, b} than for {i, a, b}. So if you have 10,000,000
rows, and each i takes 4 bytes, then even if they're clustered, the join of
{i, a} and {i, b} would require 160MB to be read instead of 120MB. And that
doesn't even include the additional storage needed for index overhead. So
by choosing to partition {i, a, b} you're stating that you're willing to
accept at least a 33% decrease in performance. That just doesn't make any
sense.

I hadn't considered that. The presence of Ø throws a wrench into the
distributive property. Hmm. I guess substituting a typed value for Ø is
the wrong approach, but I think that x + Ø = Ø is also incorrect. Perhaps
it would be better to just ignore Ø.


Brian Selzer

unread,
Oct 6, 2006, 5:32:08 AM10/6/06
to

"J M Davitt" <jda...@aeneas.net> wrote in message
news:12ZUg.8729$pq4....@tornado.ohiordc.rr.com...

I don't think it depends. In the above example, if S and T do not "exchange
some sort of correspondence with each other," then database schema D
containing only R is broken, meaning that there are some legal combinations
of values for S and T that cannot be represented in a legal value for R
(without using nulls, of course).

Brian Selzer

unread,
Oct 6, 2006, 5:39:23 AM10/6/06
to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:R8iVg.1104$cz.1...@ursa-nb00s0.nbnet.nb.ca...

How? Not every view is updatable. Also, updatable views can violate the
assignment principle.


David Portas

unread,
Oct 6, 2006, 7:16:43 AM10/6/06
to
Brian Selzer wrote:
>
> I agree. If the physical models are identical, then the performance would
> be similar. But common sense argues against the idea that the physical
> models should be identical. If {i, a, b} and {i, a}, {i, b} are equivalent,
> then since i, a and b are of the same types, it would take at least 1/3
> extra storage in separate tables because each i would occur twice.
>

That is pure speculation. When I said identical I meant precisely that.
The only difference is in the metadata. There is no reason at all to
store any values of i twice. Why should there be? If it is the same key
value of the same type then it seems perfectly sensible to store it
only once regardless of how many relations it appears in.

If you want to speculate some more then consider that only two bits of
metadata may need to be added per tuple, which is presumably exactly
the same as would be used to support nullable columns.


> I agree that decomposition into separate relations still has its place. I
> just don't think that nulls should be dismissed arbitrarily.

I have not dismissed nulls arbitrarily. There are good reasons to
reject SQL's definition of a null. I'm still waiting for your
definition. Apparently you don't have one for numerics so dare I ask if
you have one for boolean? Do you have a null for a graphics image type
or a date type or a cartesian co-ordinate? How does your null behave
under the operators for those types? I think you have your work cut out
here...

--
David Portas

David Cressey

unread,
Oct 6, 2006, 7:21:16 AM10/6/06
to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:6M6Vg.656$cz....@ursa-nb00s0.nbnet.nb.ca...

I'm not sure which question I'm begging.

Please note that I did NOT assert that outer join is a relational operation
or that the result of an outer join is a relational table.

David Cressey

unread,
Oct 6, 2006, 7:24:04 AM10/6/06
to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:JbqdnVhWQvB...@pipex.net...

> Usually this kind of idiocy appears when using crappy tools that simply
> assume you're OK about outer joins and their consequent spray of nulls,
> instead of solving the problem properly. Report generators are a
particular
> object of my loathing in this respect.

Either that or people want to read crappy reports. How much or how little
you loathe the way ordinary people approach the data is as much a matter of
culture as it is of logic.


Roy Hann

unread,
Oct 6, 2006, 7:57:42 AM10/6/06
to
"David Cressey" <dcre...@verizon.net> wrote in message
news:o1rVg.851$Ye.584@trndny04...

>
> "Roy Hann" <spec...@processed.almost.meat> wrote in message
> news:JbqdnVhWQvB...@pipex.net...
>
> Either that or people want to read crappy reports. How much or how little
> you loathe the way ordinary people approach the data is as much a matter
> of
> culture as it is of logic.

Actually I have no particular problem with the printed result. People,
including me, are perfectly accustomed to see printed reports with lacunae
and whitespace, and we make of it what we will. However the whitespace is
not null, it is just whitespace, and crucially that is where it ends. The
report is not a table in a database and there is no means nor temptation to
misuse it as the input to further productions within the database.

Roy


JOG

unread,
Oct 6, 2006, 8:07:02 AM10/6/06
to
Brian Selzer wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote in message
> news:1159970386....@i42g2000cwa.googlegroups.com...
> > Brian Selzer wrote:
> >> "JOG" <j...@cs.nott.ac.uk> wrote in message
> >> news:1159954091.1...@m73g2000cwd.googlegroups.com...
> >> > All of your points represent a wild goose chase in my eyes Brian. A
> >> > proposition with a NULL in it is no proposition at all. From a logical
> >> > perspective, case closed. A relation tuple with a NULL in it is no
> >> > relation tuple at all. From a mathematical perspective, case closed.
> >> > Trying to invoke the 'kludge perspective' is hardly going to convince a
> >> > theoretical newsgroup.
> >> >
> >>
> >> Is the empty set a value? Yes, it is. So why can't a null be?
> >
> > Because an empty set is a value and a NULL is not.
> >
>
> Why not?

Because that is how it is defined. This is not an opinion it is a fact.

"The person with id:1 has NAME:Brian and has AGE:NULL"
...is not a proposition. It is a kludge to squash two propositions into
one, representing:

The person with ID:1 has NAME:Brian.
AND
The person with ID:1 has a MISSING/INAPPLICABLE/UNKNOWN: age.

Clearly the value in question is actually "age".

I worry that you do not actually want to get your head round this
admittedly subtle issue, but are rather just being polemic. Hopefully
not, because from a theory point of view this is all there is to it.

>
> >>
> >> By your reasoning, a proposition with an integer in it is no proposition
> >> at
> >> all; a proposition with a string in it is no proposition at all; a
> >> proposition with a widgit in it is no proposition at all.
> >
> > poppycock. Those are all fine propositions.
> >
> >>
> >> Also, haven't you heard of the existential quantifier?
> >
> > Not in a proposition of fact, absolutely not.
> >
> > This has been explained many times now. A null _indicates_ that there
> > is a missing/unknown/inapplicable value, it is not a value itself. A
> > proposition with a missing/unknown/inapplicable hole is not a
> > proposition.
> >
>
> The empty set /indicates/ the absence of a value, yet it /is/ a value; a
> null /indicates/ the absence of a value, yet it /isn't/ a value? Why the
> double standard?

Wrong, wrong, wrong! The empty set is a value by itself, not an
indicator of anything. Any set is a value, regardless of its contents.
/Please/ dive into some mathematical text books if you still find this
genuinely confusing.

Bob Badour

unread,
Oct 6, 2006, 8:44:01 AM10/6/06
to
David Cressey wrote:

You assumed outer join involves NULL to conclude outer join involves NULL.

See: http://en.wikipedia.org/wiki/Begging_the_question

David Cressey

unread,
Oct 6, 2006, 8:47:03 AM10/6/06
to

"J M Davitt" <jda...@aeneas.net> wrote in message
news:FufVg.4131$Cq3....@tornado.ohiordc.rr.com...

> David Cressey wrote:
> > "David Cressey" <dcre...@verizon.net> wrote in message
> > news:F4uTg.998$rH1.633@trndny05...
> >
> >>First, let's find out where we all agree, then work our way towards
where
> >>there are different viewpoints.
> >>
> >>Here's a starter: A schema in DKNF will have no nullable columns.
> >>
> >
> > I got no answers to the first point, but I think we all agree.
>
> (I guess my contribution was lost in the ether.)
>
> >
> > Here's the next point.
> >
> > An outer join between two relational tables will, in general, produce a
> > table with nullable columns.
>
> Wait a minute: what does OUTER JOIN have to do with
> relational theory? Sure, SQL offers an OUTER JOIN,
> but...

Good point. As I pointed out to Bob Badour, I have NOT asserted that outer
join is a relational operator or that the result of an outer join is a
relational table.

I'm just trying to establish the common ground, before we get to areas of
disagreement.


>
> There are *lots* of things I don't like about "outer
> join between two relational tables:"
> - "outer" join
> What's outer about a join?

I'm just using the term I'm given.

> - join "between"
> "On" or "of," sure; but "between?" Seems strange.

I don't get this objection.

> - "relational tables"
> Relations or tables: they're different. Pick one.

I disagree. The term "relational tables" is very widespread in writings on
the relational model. I'm not asserting that a relational table IS a
relation. But I am asserting that a relational table REPRESENTS a relation.
Not all tables are relational.

> - "in general"
> Very vague.

This was just to avoid going down a rat hole. You can come up with an outer
join that doesn't produce any nullable columns, but this is the trivial
case. The general case of outer join will produce nullable columns.


> - "[A] table with nullable columns..."
> ...is not a relation.

I'm coming to that.

David Cressey

unread,
Oct 6, 2006, 8:59:15 AM10/6/06
to

"Brian Selzer" <br...@selzer-software.com> wrote in message
news:hwoVg.7861$TV3....@newssvr21.news.prodigy.com...

> I agree that decomposition into separate relations still has its place.
I
> just don't think that nulls should be dismissed arbitrarily.
>
> After further consideration (prompted by Bob's harangue), I think that the
> results of some of the operators enumerated above are not sensible. Ø does
> not belong to any numeric domain, and you can't add apples and oranges,
but
> on the other hand, there is only one empty set, so Ø = Ø should be TRUE.
>

I've been following your discussion on this subject, and I largely agree
with the major points you've been making. (Unusual, since we've disagreed
in the past).

I also agree with the above, that nulls should not be dismissed arbitrarily.
In another subthread, I'm trying to develop the theme one point at a time.
Where I actaully get to depends on the responses I get.

But here's where I anticipate getting to: as a theoretical tool, nulls are
unnecessary. A system without nulls can be just as expressive as one with
nulls. As a practical matter nulls are just about indispensable. A system
built with no accomodation for nulls just isn't going to work as practically
as one that allows for nulls.

Where I'm very uncertain is whether admitting nulls inescapably leads to
3VL. I think not. But I'm not there yet.

But on nulls and the empty set. The empty set is very clearly a value.
Null is very clearly not a value. Using the empty set in place of a null is
very clearly the road to confusion.

There is one point I'm confused on: what is the domain of the empty set?
does it even have a domain? To me, the empty set of character strings is
not "the same thing" as the empty set of integers. But I may be thinking
like a computer person and not like a mathematician.


Cimode

unread,
Oct 6, 2006, 9:07:17 AM10/6/06
to

Brian Selzer wrote:
> >> Is the empty set a value? Yes, it is. So why can't a null be?
> >
> > Because an empty set is a value and a NULL is not.
> >
>
> Why not?

> The empty set /indicates/ the absence of a value, yet it /is/ a value; a


> null /indicates/ the absence of a value, yet it /isn't/ a value? Why the
> double standard?

The *absence of value* (known as missing data) certainly is not a value
. A value is the output of a function. As a missing data is several
possible outputs there it is NOT a value but several possible value.
The *unpredictability* of a missing data disqualifies it as a value.

The empty set OTOH perfectly qualifies as a value as it is the output
of a function that can be defined and predicted.

Bob Badour

unread,
Oct 6, 2006, 9:29:53 AM10/6/06
to
David Cressey wrote:

The empty set is the empty set. Its most specific type is a subtype of
every set type, and has only one proper subtype: the universal subtype.

In general, a set can contain strings, integers, cars and trees or
anything else. When you speak of a string set or an integer set, you are
implicitly defining a proper subtype of set via specialization by
constraint.

David Cressey

unread,
Oct 6, 2006, 9:36:18 AM10/6/06
to

"Roy Hann" <spec...@processed.almost.meat> wrote in message
news:k9udnQPr6v8...@pipex.net...

Well put. I particularly like the use of "lacunae" and the phrase "make of
it what you will".

for the sake of example:

We want a report of each salesman with some details for each sale the
salesman closed during the month.
What about salesmen that closed no sales during the month?
Yeah, they should be included.
What should we put in for sales details in this case?
Nothing.

First point: when I first started producing reports like this, I didn't
have an outer join available to me. So I had to do it the clumsy way, by
taking the union of the inner join and the set of salesmen without any
sales. I suspect that this union is just as "non relational" as an outer
join is, but I'm going to defer to the mathematicians in the group on that
score.

Second point: I can't recall EVER using an outer join, other than in the
context of generating a report, or perhaps some extract that, for our
purposes here, can be considered the moral equivalent of a report.

Third point. I extend your concept of "lacunae" to NULLS in tables. A
NULL in a table simply indicates a lacuna in the data. It means nothing
more than whitespace in a report. Make of it what you will. It's the
database designs that assign a "conventional meaning" to a NULL, as if it
were a message, rather than no message, that get into trouble, IMO.

Having said that, the question arises of what to do when an application
tries to do an insert, without providing a value for every column in the
table. Some people would have the database server always reject such an
input. Others would have the database server always substitute a default
value, drawn from the domain of possible values for that column and
specified somewhere in the metadata. Others would have the insert always
succeed, and put a NULL in where no value was specified.

My answer is: It depends. What are you trying to do with the data, and
how do you want your systems to behave?

More later. In the meantime, again, well put, Roy.


David Cressey

unread,
Oct 6, 2006, 10:15:12 AM10/6/06
to

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:1160133403.2...@m73g2000cwd.googlegroups.com...


> I have not dismissed nulls arbitrarily. There are good reasons to
> reject SQL's definition of a null. I'm still waiting for your
> definition. Apparently you don't have one for numerics so dare I ask if
> you have one for boolean? Do you have a null for a graphics image type
> or a date type or a cartesian co-ordinate? How does your null behave
> under the operators for those types? I think you have your work cut out
> here...

I don't know what Brian's going to say, but I'll give you my definition.

A null is a marker indicating the absence of a value, in a place where a
value might normally have been expected.

A null is not a value. A null has no type. And to borrow Roy's phrase,

Bob Badour

unread,
Oct 6, 2006, 10:22:28 AM10/6/06
to
David Cressey wrote:

Roy's phrase has relevance and utility for reports where a human reads
the report and where lacunae and whitespace have utility related to
primate psychology. A dbms is a formal logic system where it has no
place at all.

David Cressey

unread,
Oct 6, 2006, 3:03:57 PM10/6/06
to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:EEtVg.1274$cz.1...@ursa-nb00s0.nbnet.nb.ca...


>... A dbms is a formal logic system ...

Bullshit.


Bob Badour

unread,
Oct 6, 2006, 3:12:01 PM10/6/06
to
David Cressey wrote:

Idiot. Plonk.

Cimode

unread,
Oct 6, 2006, 3:55:34 PM10/6/06
to
Without knowing it, Barking Dog Prime means that he conceives a dbms as
a formal logic system that should handle missing data in a
deterministic manner as opposed to non deterministic method incarnated
by SQL NULLS. On that, he is right 100% and you are wrong.

The only flaw in Bob's reasonning that he simply can not imagine that
such formal system has a one purpose: bridge the gap between subjective
and collective interpretation of data and formal structual
representation of data. In the precise case of missing data, he can
not conceive that a perfectly relational system should both satisfy
conditions of formalism and subjectivity to be succesfull. As a
consequence, he can not accept a second that even if they are
fundamentally wrong on a logical level (with all consequences we know
of), NULLS are supremely succesfull into representing subjectivity into
a system....

So do me a favor, do not engage him, he is lost cause for inquisitive
and questionning purposes...

dawn

unread,
Oct 6, 2006, 4:21:07 PM10/6/06
to

JOG wrote:
> Brian Selzer wrote:
> > "JOG" <j...@cs.nott.ac.uk> wrote in message
> > news:1159954091.1...@m73g2000cwd.googlegroups.com...
> > > All of your points represent a wild goose chase in my eyes Brian. A
> > > proposition with a NULL in it is no proposition at all. From a logical
> > > perspective, case closed. A relation tuple with a NULL in it is no
> > > relation tuple at all. From a mathematical perspective, case closed.
> > > Trying to invoke the 'kludge perspective' is hardly going to convince a
> > > theoretical newsgroup.
> > >
> >
> > Is the empty set a value? Yes, it is. So why can't a null be?
>
> Because an empty set is a value and a NULL is not.

This is entirely a matter if definition. I have worked with "NULL
values" and with an SQL NULL. The first is a value. The way it might
be handled in a 2VL language that works with such values could be
modeled with the empty set. If you are talking about the SQL NULL or
if you define the relational model to include 3VL (which it seems many
theorists do not), then a NULL is defined as a non-value.

So, your above statement is entirely about the definition of NULL,
right?

Given the definition of NULL that I typically use (with non-SQL based
solutions), NULL is a value and can be modeled mathematically with the
empty set. In that case, a relation tuple with a NULL is as valid
mathematically as one without. Agreed? --dawn

Hugo Kornelis

unread,
Oct 6, 2006, 5:35:05 PM10/6/06
to
On Fri, 06 Oct 2006 08:32:36 GMT, Brian Selzer wrote:

>
>"JOG" <j...@cs.nott.ac.uk> wrote in message
>news:1159970386....@i42g2000cwa.googlegroups.com...
>> Brian Selzer wrote:
>>> "JOG" <j...@cs.nott.ac.uk> wrote in message
>>> news:1159954091.1...@m73g2000cwd.googlegroups.com...
>>> > All of your points represent a wild goose chase in my eyes Brian. A
>>> > proposition with a NULL in it is no proposition at all. From a logical
>>> > perspective, case closed. A relation tuple with a NULL in it is no
>>> > relation tuple at all. From a mathematical perspective, case closed.
>>> > Trying to invoke the 'kludge perspective' is hardly going to convince a
>>> > theoretical newsgroup.
>>> >
>>>
>>> Is the empty set a value? Yes, it is. So why can't a null be?
>>
>> Because an empty set is a value and a NULL is not.
>>
>
>Why not?

Hi Brian,

Because relational databases supporting NULL *define* it as a marker
denoting the absence of a value. Dawn actually makes a good point about
context: in C for instance, NULL has a completely different meaning.

But since this is a discussion about relational databases, I'll assume
the standard definition of NULL for relational databases unless you
specifically state otherwise.

(snip)


>The empty set /indicates/ the absence of a value, yet it /is/ a value; a
>null /indicates/ the absence of a value, yet it /isn't/ a value? Why the
>double standard?

Wrong. The empty set *IS* a value. It's domain is the domain of sets. A
set is a value that can hold zero, one or more values of a specified
domain. The empty set happens to hold a zero number of values.

NULL is a marker that says "no value here". If NULL would appear in a
column that is defined to use datatype "set" (not sure if any product
supports it and not sure if I'd want it, but bear with me), then it
would still be different from an empty set. Kinda like the difference
between "the suitcase is empty" and "where's the &*(^^%#$ suitcase!"

Best, Hugo

Hugo Kornelis

unread,
Oct 6, 2006, 5:46:04 PM10/6/06
to
On Fri, 06 Oct 2006 12:59:15 GMT, David Cressey wrote:

>There is one point I'm confused on: what is the domain of the empty set?
>does it even have a domain? To me, the empty set of character strings is
>not "the same thing" as the empty set of integers. But I may be thinking
>like a computer person and not like a mathematician.

Hi David,

Here are some thoughts from someone who is far from a mathematician and
who is more a database practictioner than a database theorist, so take
them with whatever amount of salt you see fit.

When I worked with sets during the Dutch equivalent of highschool, I
usually had to use a two-part notation. I can't replicate the symbols
here and I don't recall all the correct names, but it consisted of a
definition of a domain and a listing or formula to define the values. So
you could have a set that was defined as a subset of the domain of
positive integers consisting of the numbers 2, 4, and 7; but you could
also have a set defined as a subset of the domain of real values
consisting of the numbers 2, 4, and 7.
Later, after highschool, I started to see a simplified notation for sets
that exposes only the values in the set but not the domain.

Are the two sets above equal? I guess that you could defend both answers
here - the sets have the same members, but not the same domain
definition. I also guess that the notation used can sometimes be an
indication of how the answer would be in any give UoD.

For a general answer, I'm tempted to say that there have to be two
equality operators for set arithmetic, one looking at the values of the
set members only, the other also looking at the domain.

Anyway, whatever you favor as an answer to the question of equality of
the two sets above - once you've chosen an answer, the answer to
equality of two empty sets logicallly follows.

Best, Hugo

Cimode

unread,
Oct 6, 2006, 5:59:33 PM10/6/06
to

dawn wrote:

> Given the definition of NULL that I typically use (with non-SQL based
> solutions), NULL is a value and can be modeled mathematically with the
> empty set. In that case, a relation tuple with a NULL is as valid
> mathematically as one without. Agreed? --dawn

An ultra high level density of bullshit and confusion packed in few
sentences...

The fact a construct can be modeled mathematically does not mean it
systematically qualifies it as a value. The *ONLY* valid mathematical
definition of a *value* consist of having the characteristic of being
the output of a *predetermined* transformation (function).

NULLS are a pure input. Nothing can be said of the transformation OR
the output and certainly NOT that it could be NULL.

Not convinced? I dare you to produce/define ANY function that certainly
produces NULL as an output. If you can't then just shut the hell
up...and stop confusing people...

OTOH, one can produce several functions that produce an empty set as an
output. An empty set is therefore a value.

After that logical demonstration and all the arguments presented, if
there is still a chance that you or anybody believe that NULL is a
value that can only mean one thing: that you simply can not or do not
want to build logical and sound reasonning.

dawn

unread,
Oct 6, 2006, 6:20:43 PM10/6/06
to

Cimode wrote:
> dawn wrote:
>
> > Given the definition of NULL that I typically use (with non-SQL based
> > solutions), NULL is a value and can be modeled mathematically with the
> > empty set. In that case, a relation tuple with a NULL is as valid
> > mathematically as one without. Agreed? --dawn
>
> An ultra high level density of bullshit and confusion packed in few
> sentences...
>
> The fact a construct can be modeled mathematically does not mean it
> systematically qualifies it as a value. The *ONLY* valid mathematical
> definition of a *value* consist of having the characteristic of being
> the output of a *predetermined* transformation (function).

Define a function f such that f("F") = "Female" and f("M") = "Male" and
f(null) = null

Functions like this are very common when working with 2VL languages.

> NULLS are a pure input. Nothing can be said of the transformation OR
> the output and certainly NOT that it could be NULL.

Again, that depends on your definition of NULL. I don't understand how
you would not think it depends on your definition. Do you think these
letters fell from the sky with a meaning attached?

>
> Not convinced? I dare you to produce/define ANY function that certainly
> produces NULL as an output. If you can't then just shut the hell
> up...and stop confusing people...

I did produce one. If you would like to see it in action, you can use
the open source dbms OpenQM. Remember, this is NOT an SQL NULL.
OpenQM does not support SQL.

> OTOH, one can produce several functions that produce an empty set as an
> output. An empty set is therefore a value.

And guess what? The languages packaged with OpenQM will treat NULL in
their logic as if it were the empty set.

> After that logical demonstration and all the arguments presented, if
> there is still a chance that you or anybody believe that NULL is a
> value

this is not part of a belief system, but part of a defined system. You
might work with a system that treats a NULL as the lack of a value
while I do not.

> that can only mean one thing: that you simply can not or do not
> want to build logical and sound reasonning.

Guess again, brother. --dawn

It is loading more messages.
0 new messages