Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Does Codd's view of a relational database differ from that of Date & Darwin? [M.Gittens]

62 views
Skip to first unread message

mountain man

unread,
Jun 1, 2005, 8:27:30 AM6/1/05
to
Last year there was this thread with the subject line of
"Issues with the logical consistency of The Third Manifesto",
in which opinions were expressed about Maurice Gittens
paper 'A critical reading of the Third Manifesto' (available
at http://www.gits.nl/)

I did not read *all* of the responses, but notice that many
found the original article too long. There actually now
exists a very small (4 page) update entitled "A Codd
inspired amendment ..." on Maurice Gittens site (above)
which may have escaped the attention of some.

In this subsequent article, the author makes the assertion
that Codd's view of a relational database differs from that
of Date and Darwin.

Here is the relevant quote:


======[quoting M.Gittens]=====================

According to E.F. Codd a relational database is defined as:

"A relational database is a time-varying collection of data,
all of which can be accessed and updated as if they were
organized as a collection of tabular time-varying tabular
(nonhierarchic) relations of assorted degrees defined on a
given set of simple domains."


According to the Third manifesto:

"The question as to what data types are supported is orthogonal
to the question of support of the relational model."


Let us take time to notice the fundamental difference between the
two positions presented here. As a result of the position that
support for the relational model is orthogonal to the supported
data types, The Third Manifesto proceeded to allow domains to have
an arbitrarily complex structure and also to support arbitrarily
complex user defined operators.

Codd on the other hand specifically states that relational
databases must be based on "simple" domains. Codd also says that
domain values should not be decomposable further by the DBMS. Which
is to say: According to Codd, the question as to what data types
are supported is not orthogonal to the question of support of the
relational model. Based on this evidence one can but conclude that
Codd's view of relational database is logically different from The
Third Manifesto's view of a relational database. Such logical
differences are big differences.

======[end quoting M.Gittens]==================


Any opinions on this POV?

--
Pete Brown
IT Managers & Engineers
Falls Creek
Australia
www.mountainman.com.au/software

erk

unread,
Jun 1, 2005, 10:05:40 AM6/1/05
to
mountain man wrote:
> ======[quoting M.Gittens]=====================
>
> According to E.F. Codd a relational database is defined as:
>
> "A relational database is a time-varying collection of data,
> all of which can be accessed and updated as if they were
> organized as a collection of tabular time-varying tabular
> (nonhierarchic) relations of assorted degrees defined on a
> given set of simple domains."
>
> According to the Third manifesto:
>
> "The question as to what data types are supported is orthogonal
> to the question of support of the relational model."
>
> Let us take time to notice the fundamental difference between the
> two positions presented here. As a result of the position that
> support for the relational model is orthogonal to the supported
> data types, The Third Manifesto proceeded to allow domains to have
> an arbitrarily complex structure and also to support arbitrarily
> complex user defined operators.

True enough, but the terms "complex" and "simple" are highly
subjective. Yes, you can define domains that are complex, but they're
"simple" in this way: they're opaque to the RDBMS. They're black boxes.
And I don't think you can prescribe or proscribe enough to keep people
from designing badly.

> Codd on the other hand specifically states that relational
> databases must be based on "simple" domains.

Does he define "simple" anywhere? If not, that's a problem - we don't
know exactly what he meant.

> Codd also says that
> domain values should not be decomposable further by the DBMS.

Correct, and Date says the same.

> Which
> is to say: According to Codd, the question as to what data types
> are supported is not orthogonal to the question of support of the
> relational model.

You're basing that conclusion on the word "simple," which if not
defined is fairly useless. Which paper did Codd's definition come from?
I know that Codd has allowed relation-valued attributes (RVAs) in other
papers, but don't know the chronology and evolution of his definitions.

> Based on this evidence one can but conclude that
> Codd's view of relational database is logically different from The
> Third Manifesto's view of a relational database. Such logical
> differences are big differences.

Agreed, if they exist, but the above isn't convincing. While Date and
Codd definitely disagree on nulls, it's hard to say much about the
conclusions Gittens draws from the above.

- Eric

Alfredo Novoa

unread,
Jun 1, 2005, 10:06:37 AM6/1/05
to
On Wed, 01 Jun 2005 12:27:30 GMT, "mountain man"
<hobbit@southern_seaweed.com.op> wrote:

>In this subsequent article, the author makes the assertion
>that Codd's view of a relational database differs from that
>of Date and Darwin.

Indeed, it differs, but in a tiny degree compared to the differences
between Codd's view of the Relational Model and SQL's view.

<Star Wars mode>
The duty of any disciple is to try to surpass his master. :)
</Star Wars mode>


Regards

-CELKO-

unread,
Jun 1, 2005, 10:34:16 AM6/1/05
to
You forgot about NULLs -- Date wants none, Codd had two kinds in RM II
and SQL has one.

mountain man

unread,
Jun 3, 2005, 8:45:51 PM6/3/05
to
"-CELKO-" <jcel...@earthlink.net> wrote in message
news:1117636456.1...@g44g2000cwa.googlegroups.com...

> You forgot about NULLs -- Date wants none, Codd had two kinds in RM II
> and SQL has one.


There have been, there are,
and there always will be
NULLS in the real world.

Does this fact not conflict
with Date's wants, and if
so, why does Date want
to ignore them?

Jan Hidders

unread,
Jun 4, 2005, 5:30:15 AM6/4/05
to
mountain man wrote:
>
> There have been, there are,
> and there always will be
> NULLS in the real world.

NULLs are an abstraction and like all abstractions they do not really
exist. The relevant question here is if they are necessary and useful to
describe the world.

> Does this fact not conflict
> with Date's wants, and if
> so, why does Date want
> to ignore them?

They unnecessarily complicate the model. Anything that can be modeled
with them can also be modeled without them, and to the extent that they
are convenient this is mostly due to the fact that the possibilities for
user-defined domains were too restricted.

-- Jan Hidders

Marshall Spight

unread,
Jun 4, 2005, 10:54:42 AM6/4/05
to
> Anything that can be modeled with [nulls]

> can also be modeled without them, and to the extent that they
> are convenient this is mostly due to the fact that the possibilities for
> user-defined domains were too restricted.

That statement surprises me. What about outer join?

In application code I often run in to situations where there
is either a legal value or else there is a special case, and
good practice is *not* to use a special value, but rather
to "protect" the value with another variable. This leads
to the unfortunate fact that you still sometimes have a
(meaningless) value in the primary variable.

Ah! But pehaps when you say user-defined domains were
too restricted, you meant that they left off union types:

data MyParticularSituation = IHaveOne of int
| ElseIDont;

I think that covers the "special values" situation pretty
well, but it seems overly manual for capturing the
cardinality=0 case.


Marshall

Eric Junkermann

unread,
Jun 4, 2005, 7:02:29 AM6/4/05
to
In message <3R6oe.1962$F7....@news-server.bigpond.net.au>, mountain man
<hobbit@southern_seaweed.com.op> writes

>
>There have been, there are,
>and there always will be
>NULLS in the real world.
>

Definition and examples, please.
--
Eric Junkermann

Paul

unread,
Jun 4, 2005, 7:47:26 PM6/4/05
to
Marshall Spight wrote:
>>Anything that can be modeled with [nulls]
>>can also be modeled without them, and to the extent that they
>>are convenient this is mostly due to the fact that the possibilities for
>>user-defined domains were too restricted.
>
> That statement surprises me. What about outer join?

Outer joins are really just a syntactic shorthand for a normal join and
some kind of union. If you weren't allowed NULLs you could still have a
kind of outer join, you'd just have to specify default values for the
columns to be used instead of a NULL.

Paull

mountain man

unread,
Jun 5, 2005, 3:06:45 AM6/5/05
to
"Eric Junkermann" <er...@deptj.demon.co.uk> wrote in message
news:u$GfI$CFpYo...@deptj.demon.co.uk...


IMO I view nulls as a void representing incomplete
information that arises because that information
is not (yet) available to the DBMS, for various
reasons.

It could be that the data has been deleted.
It could be that it has not yet been entered.
It could be that it was missed by data entry.

It could be that a power outage caused an
incomplete transaction in a DBMS that does
not have transaction audit capabilities, such
as Access, or a number of larger systems.

In the real world various things prevent the
total gathering of information and the process
whereby that information is entered and then
maintained in the database.

Thus I view nulls as a void representing the
fact that there exists incomplete or missing
information.

mountain man

unread,
Jun 5, 2005, 3:06:49 AM6/5/05
to
"Jan Hidders" <jan.h...@REMOVETHIS.pandora.be> wrote in message
news:Hweoe.110129$zI7.6...@phobos.telenet-ops.be...

> mountain man wrote:
>>
>> There have been, there are,
>> and there always will be
>> NULLS in the real world.
>
> NULLs are an abstraction and like all abstractions they do not really
> exist.

Information is an abstraction which may or may not exist.
If it exists I call it data or information. But if it does not
exist (within the DBMS) for any given reason, then it is
called "missing", void, or null.

> The relevant question here is if they are necessary and useful to describe
> the world.


Very useful.

>> Does this fact not conflict
>> with Date's wants, and if
>> so, why does Date want
>> to ignore them?
>
> They unnecessarily complicate the model.

The model cannot be complicated by essences of reality.
In reality, information may be partial, and thus have elements
in it which have null values.

> Anything that can be modeled with them can also be modeled without them,
> and to the extent that they are convenient this is mostly due to the fact
> that the possibilities for user-defined domains were too restricted.


Modelling holes in information is straightforward
with the use of nulls. You either have them, or
you dont. You could probably replace then with
a value, say x, but what would this gain?

Jan Hidders

unread,
Jun 5, 2005, 5:40:31 AM6/5/05
to
mountain man wrote:
>
> The model cannot be complicated by essences of reality.

Since everything in reality can be described without them, I don't see
how null values can be considered "essences of reality" under any
reasonable definition of that term.

> In reality, information may be partial, and thus have elements
> in it which have null values.

You don't need null values to represent partial information.

> Modelling holes in information is straightforward
> with the use of nulls. You either have them, or
> you dont. You could probably replace then with
> a value, say x, but what would this gain?

It would keep the meta-model simpeler. But that's not the only way to
model partial information w/o null values and in many cases the worst
way to do it.

-- Jan Hidders

Jan Hidders

unread,
Jun 5, 2005, 6:00:07 AM6/5/05
to
Marshall Spight wrote:
>
> Ah! But pehaps when you say user-defined domains were
> too restricted, you meant that they left off union types:
>
> data MyParticularSituation = IHaveOne of int
> | ElseIDont;

Yes, that is what I meant.

> I think that covers the "special values" situation pretty
> well, but it seems overly manual for capturing the
> cardinality=0 case.

True. But I would argue that this is because it makes the true
complexity of the situation explicit. You can try to sweep that under
the rug but then it will (1) complicate the meta-model because you need
a logic and domain operations that deal with null values and (2) it may
come back to bite you on unexpected moments because the system doesn't
force you always to take into account that a certain value may or may
not be there.

Note that especially (1) is important because not only does it mean that
it becomes harder for users / programmers to understand in detail how
the system works, but it makes it harder to implement it efficiently and
correctly.

-- Jan Hidders

Paul

unread,
Jun 5, 2005, 6:20:37 AM6/5/05
to
Jan Hidders wrote:
>> Modelling holes in information is straightforward
>> with the use of nulls. You either have them, or
>> you dont. You could probably replace then with
>> a value, say x, but what would this gain?
>
> It would keep the meta-model simpeler. But that's not the only way to
> model partial information w/o null values and in many cases the worst
> way to do it.

What are better ways to do it? Decomposing relations into several
smaller ones as described here?:

http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

Are there other methods?

Why is the "special values in the domain" method inferior to other methods?

Paul.

Jan Hidders

unread,
Jun 5, 2005, 1:12:02 PM6/5/05
to
Paul wrote:
> Jan Hidders wrote:
>> [...] But that's not the only way to model partial information w/o

>> null values and in many cases the worst way to do it.
>
> What are better ways to do it? Decomposing relations into several
> smaller ones as described here?:
>
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

Yes. But note that the "better" should be a qualified term here. As
always it depends upon many factors whether this is really better or not.

> Are there other methods?

Probably. But I think these are the most important ones.

> Why is the "special values in the domain" method inferior to other
> methods?

Do you always ask that many questions? :-) On this matter I largely
agree with Chris Date's position, so I will refer you to his writings.

-- Jan Hidders

mountain man

unread,
Jun 5, 2005, 7:57:23 PM6/5/05
to
"Jan Hidders" <jan.h...@REMOVETHIS.pandora.be> wrote in message
news:jMzoe.111207$SI7.6...@phobos.telenet-ops.be...

> mountain man wrote:
>>
>> The model cannot be complicated by essences of reality.
>
> Since everything in reality can be described without them, I don't see how
> null values can be considered "essences of reality" under any reasonable
> definition of that term.

While everything in reality may be descibed in an infinite number of
ways what has this got to do with simple databases? The database
may also hold things in it, for example names and addresses, which
are in fact elements of a "postal address reality".

What is the point of your argument?


>> In reality, information may be partial, and thus have elements
>> in it which have null values.

> You don't need null values to represent partial information.


http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

Thanks Paul for the above reference which I have just read.

According to this reference we can replace a null in the salary
field with "Salary not known" and/or "Unsalaried". This has
taken some work to do, by a database professional, to derive
an "improved" version of the personnel table (when needed).

So what? The original design schema is simply missing information
for these elements, and this information needs to be entered,
and/or determined and entered.

Why should a qualified database professional spend time on
such a problem when the only real and viable solution to this
problem is to identify the missing information and then to get
it into the database?

A simple workflow routine, channelling the appearances of
any critical nulls (not taken care of by the constraints!) to
the people in the organisation that are directlt responsible
for the entry of that element of data, also fixes the problem.

Normalisation appears to be a theoretical sledge hammer
trying to cover up underlying integrity issues without actually
solving the integrity issue at its fundamental level. At least
this is the impression I get after reading the above reference.

It's the long way around a problem, and does not in fact
ultimately solve the problem of the missing information,
which the original schema -- by the guidance of the RM
presumeably at implementation - should have been
required as mandatory.

Alexandr Savinov

unread,
Jun 6, 2005, 4:38:15 AM6/6/05
to
Jan Hidders schrieb:

> mountain man wrote:
>
>>
>> There have been, there are,
>> and there always will be
>> NULLS in the real world.
>
>
> NULLs are an abstraction and like all abstractions they do not really
> exist. The relevant question here is if they are necessary and useful to
> describe the world.

Everything can be considered reality and everything can be considered
abstraction - it is too general issue just like existence.

Nulls are equivalent of empty set (absence, hole etc.) and I can imagine
a model without any other elements but not without nulls. In other
words, null is what any model starts from and after that we can add any
other non-primitive elements we want to see :-) Those other elements may
pollute the model or may be argued to be unnecessary but not null
values. Since nulls play such a role they are actually not true
"values"( in the sense we use "normal" values, say, 5, 10 or "some text").

>> Does this fact not conflict
>> with Date's wants, and if
>> so, why does Date want
>> to ignore them?
>
>
> They unnecessarily complicate the model. Anything that can be modeled
> with them can also be modeled without them, and to the extent that they
> are convenient this is mostly due to the fact that the possibilities for
> user-defined domains were too restricted.

Nothing can be modelled without null values. Null is a bootstrapping
point for everything else. It is clear in theory, for example, in set
theory. But it is also true in practice. Null values have very concrete
semantics but the problem is that they are frequently used
inappropriately (for example, as unknown values).

--
alex
http://conceptoriented.com

Paul

unread,
Jun 6, 2005, 5:56:59 AM6/6/05
to
mountain man wrote:
> According to this reference we can replace a null in the salary
> field with "Salary not known" and/or "Unsalaried". This has
> taken some work to do, by a database professional, to derive
> an "improved" version of the personnel table (when needed).
>
> So what? The original design schema is simply missing information
> for these elements, and this information needs to be entered,
> and/or determined and entered.

If you replaced both "Salary unknown" and "Unsalaried" with NULLs, how
do you distinguish between the two?

> Why should a qualified database professional spend time on
> such a problem when the only real and viable solution to this
> problem is to identify the missing information and then to get
> it into the database?

"Unsalaried" isn't missing information. And it's not always feasible to
fill in the gaps.

Paul.

Paul

unread,
Jun 6, 2005, 6:24:42 AM6/6/05
to
Alexandr Savinov wrote:
> Nulls are equivalent of empty set (absence, hole etc.) and I can imagine
> a model without any other elements but not without nulls. In other
> words, null is what any model starts from and after that we can add any
> other non-primitive elements we want to see :-) Those other elements may
> pollute the model or may be argued to be unnecessary but not null
> values. Since nulls play such a role they are actually not true
> "values"( in the sense we use "normal" values, say, 5, 10 or "some text").

I don't think the NULL or empty set of set theory is exactly the same as
the NULL of relational database theory.

After all, simple arithmetic is often built starting from the NULL set,
with it playing the part of zero, which is a value.

The "set" metaphor in RDBMS terms is more like table=set,
row=member_of_set. So a set theory kind of NULL would more correspond to
a table with no rows.

And each row represents a logical proposition, and standard first order
predicate logic is two-valued.

Sure, everything can ultimately be based on set theory and the empty
set, but it makes sense to use more advanced structures (like predicate
logic) where necessary and forget about any fundamental construction.

Also, there are different types of missing values, and these are
specific to the domain, not universal, so it makes sense to have them
modelled as part of a domain. Aggregate functions may want to deal with
different types of NULLs in different ways (e.g. Unsalaried vs. Salary
Unknown).

Paul.

Alexandr Savinov

unread,
Jun 6, 2005, 6:56:51 AM6/6/05
to
Paul schrieb:

The situation with NULLs clearly demonstrates some serious problems of
the relational model. I personally do not see any problem with NULLs but
I am not thinking in terms of relational model. I've read this paper

http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf

and I am really surprised - it provides completely unacceptable
"solution" - it compomises the whole relational model IMO. NULL values
have an absolutely concrete meaning and it does not matter how we call
them after that. And this meaning, the things with such a semantics, is
the basis of the model. This semantics can be expressed as "the absence"
of thing. Any model starts from empty state before it can be populated
with other kind of things and this initial state is strongly associated
with the semantics of absence. If something disappears, if something is
deleted then acutally we get null.

I absolutely understand that in practice we need different kinds of
special values. But having custom special values with the semantics
defined exclusively by the user is different from system special values
with the built-in semantics. Semantics is determined by the consequences
of having such values. If these consequences are defined by the user
then the the model and the system do not care. But there is one
fundamental value without which a model and a database cannot exist at
all - it is NULL - and it has its semantics defined at the model and at
the system level as absence.

The main problem is that people try to use NULL value in order to
overload its meaning by some custom semantics. In relational model it is
quite possible because in RM most of operational semantics is defined
manually in SQL queries, i.e., RDBMS is unaware of its data meaning (it
is the user who knows the semantics). In this case we may ourselves
define the meaning of such special system level things as NULL value -
it is a source of numerous problems and a defect of the model. The
fundamental meaning cannot be overloaded because it is the system that
interprets this value in order to maintain the database in the
consistent state and derive meaningful consequences. In other words, we
cannot provide our own interpetation of NULLs becuase the system and the
model already interpret it in some special way. If we will use NULL
arbitrarily then the database will have some unpredictable meaning.
(Yet, again, in RM such a behaviour is not enforced becuase most things
are done manually.) For example, if we delete a record then all rows in
other tables that had it as a property must be nullified (in this
property). And this bechaviour cannot be changed because it is the
semantics of absence of things, which is what a good model starts from
(along with other fundamental concepts like existence of things).

--
alex
http://conceptoriented.com

FrankHamersley

unread,
Jun 6, 2005, 8:55:36 AM6/6/05
to
Paul wrote:
> mountain man wrote:
>>According to this reference we can replace a null in the salary
>>field with "Salary not known" and/or "Unsalaried". This has
>>taken some work to do, by a database professional, to derive
>>an "improved" version of the personnel table (when needed).
>>
>>So what? The original design schema is simply missing information
>>for these elements, and this information needs to be entered,
>>and/or determined and entered.
>
> If you replaced both "Salary unknown" and "Unsalaried" with NULLs, how
> do you distinguish between the two?

Without redressing the fundamental weaknesses in the schema you can't.

However I remain to be convinced that the domain "overloading" used to
underpin Date's example is a useful contribution to the state of the
art. Frankly my first impression was to see it as justifying a steaming
hot pile! There are no signs of any elegance regardless of how
unsatisfactory that is to purists, although I expect Date would cite the
limitations of the current technologies as largely responsible for the
latent ugliness of the "null-less solution".

Personally I see the better solution within the context of this thread
is to have a discrete attribute with a domain of "Salaried, Unsalaried".
The argument about to null or not to null (sic) is for another time.

Felicitations to all,
Frank.

erk

unread,
Jun 6, 2005, 9:39:12 AM6/6/05
to
mountain man wrote:
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>
> Thanks Paul for the above reference which I have just read.
>
> According to this reference we can replace a null in the salary
> field with "Salary not known" and/or "Unsalaried". This has
> taken some work to do, by a database professional, to derive
> an "improved" version of the personnel table (when needed).
>
> So what? The original design schema is simply missing information
> for these elements, and this information needs to be entered,
> and/or determined and entered.
>
> Why should a qualified database professional spend time on
> such a problem when the only real and viable solution to this
> problem is to identify the missing information and then to get
> it into the database?

The trouble isn't the missing information per se. The trouble is that
that introduces into queries. Examine SQL's use of nulls in aggregate
functions to get an idea. To briefly list some problem areas: does a
"missing" salary count as 0 in a sum? Does it affect an average? When
selecting employees with a salary below, say, $10,000, does "missing"
get included?

The point is that the answers to the above questions can (and do!) vary
from domain to domain; "missing" values in salary might be treated very
differenly from missing values in social security number (for example).

And all of the above sidesteps the problems involved with
distinguishing "types of nulls" - missing versus not applicable versus
other 4 and 5 VLs that I've never really understood.

> A simple workflow routine, channelling the appearances of
> any critical nulls (not taken care of by the constraints!) to
> the people in the organisation that are directlt responsible
> for the entry of that element of data, also fixes the problem.

And until that data is entered, what is to be done with queries over
those tuples? Are they to be completely ignored until null values are
"corrected"?

> Normalisation appears to be a theoretical sledge hammer
> trying to cover up underlying integrity issues without actually
> solving the integrity issue at its fundamental level. At least
> this is the impression I get after reading the above reference.

I don't understand what "solving the integrity issue at its fundamental
level" means, but nulls have the same issue. At best, it's a solution
to a small range of problems, and as SQL suggests, can be difficult to
get "right," if right is even possible.

> It's the long way around a problem, and does not in fact
> ultimately solve the problem of the missing information,
> which the original schema -- by the guidance of the RM
> presumeably at implementation - should have been
> required as mandatory.

Along with domain design, it data to be specified, rather than leaving
the hard decisions and inconsistencies to be discovered later.

- Eric

Paul

unread,
Jun 6, 2005, 9:48:35 AM6/6/05
to
Alexandr Savinov wrote:
> The situation with NULLs clearly demonstrates some serious problems of
> the relational model. I personally do not see any problem with NULLs but
> I am not thinking in terms of relational model. I've read this paper
>
> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>
>
> and I am really surprised - it provides completely unacceptable
> "solution" - it compomises the whole relational model IMO. NULL values
> have an absolutely concrete meaning and it does not matter how we call
> them after that. And this meaning, the things with such a semantics, is
> the basis of the model. This semantics can be expressed as "the absence"
> of thing. Any model starts from empty state before it can be populated
> with other kind of things and this initial state is strongly associated
> with the semantics of absence. If something disappears, if something is
> deleted then acutally we get null.

But the absence of something is done at the row level, not at the
attribute level. If a proposition is true, it gets represented by a row
in a table of the database. If not, it is absent from the database.

Here's some discussion of a very concrete example of a NULL problem in
SQL: http://www.firstsql.com/iexist2.htm

The author says that his RDBMS solves this problem though, and that is
just a problem with SQL and NULLs, not with the relational model and NULLs.

Are there any rebuttals of this claim that anyone has? Any concrete
examples of logical inconsistencies caused by NULLs?

I guess the other claim against NULLs is that they make things more
complex and less intuitive, which is a more subjective point. You could
argue that Darwen's method of avoiding NULLs is more complex that using
NULLs in the first place.

Paul.

Alfredo Novoa

unread,
Jun 6, 2005, 10:19:18 AM6/6/05
to
On Mon, 06 Jun 2005 14:48:35 +0100, Paul <pa...@test.com> wrote:

>I guess the other claim against NULLs is that they make things more
>complex and less intuitive, which is a more subjective point. You could
>argue that Darwen's method of avoiding NULLs is more complex that using
>NULLs in the first place.

Nulls undermine the conceptual integrity of the Relational Model which
is based in predicate logic.

A tuple with nulls does not match to a logical proposition.

This causes many problems. Some of them were pointed by Eric.


Regards

paul c

unread,
Jun 6, 2005, 11:02:17 AM6/6/05
to

i'm an admirer of Date's but i have to say i like your answer. no Latin
to look up, either.

p

Alexandr Savinov

unread,
Jun 6, 2005, 11:07:02 AM6/6/05
to
Paul schrieb:

> Alexandr Savinov wrote:
>
>>The situation with NULLs clearly demonstrates some serious problems of
>>the relational model. I personally do not see any problem with NULLs but
>>I am not thinking in terms of relational model. I've read this paper
>>
>>http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>>
>>
>>and I am really surprised - it provides completely unacceptable
>>"solution" - it compomises the whole relational model IMO. NULL values
>>have an absolutely concrete meaning and it does not matter how we call
>>them after that. And this meaning, the things with such a semantics, is
>>the basis of the model. This semantics can be expressed as "the absence"
>>of thing. Any model starts from empty state before it can be populated
>>with other kind of things and this initial state is strongly associated
>>with the semantics of absence. If something disappears, if something is
>>deleted then acutally we get null.
>
>
> But the absence of something is done at the row level, not at the
> attribute level. If a proposition is true, it gets represented by a row
> in a table of the database. If not, it is absent from the database.

Yes, you are right, especially from the point of view of logic where we
consider rows as statements. But existence and presence of things can be
viewed as follows. Objects (rows) need to exhibit themselves somehow in
order exist. There are two ways how it can be done:
1. using references (or some other kind of a dedicated representation
mechanism like names or primary keys),
2. using object contents (by value).
Both mechanisms have their advantages and disadvantages, which will not
be described here. Assume now that we have only the second mechanism. In
this case if an object has all nulls in its description then the whole
object is qualified as null and non-existing. Formally, it is NULL. In
other words, we may take our special NULL element and then add to it
more fields (dimensions) with NULL values and it will be still the same
NULL object. If some dimension is not NULL then the object exhibts
itself in this way. One way to delete objects consists in assigning all
NULLs to its fields.

If we use only the first mechanism (object life cycle is defined by its
reference) then NULLs do not play so important role (in contrast to
theory). However, in practice these two mechanisms should be connected.
These means that we need to specify when an object has to be really
deleted (by removing its reference) depending on its NULLs. It can be
implemented as constraints.


> Here's some discussion of a very concrete example of a NULL problem in
> SQL: http://www.firstsql.com/iexist2.htm
>
> The author says that his RDBMS solves this problem though, and that is
> just a problem with SQL and NULLs, not with the relational model and NULLs.
>
> Are there any rebuttals of this claim that anyone has? Any concrete
> examples of logical inconsistencies caused by NULLs?

The main problem with NULLs is that they have very concrete meaning
however RM does not rely on it heavily. As a consequence everybody uses
NULLs as a convenience technique by overloading its formal semantics.
For example, in the above article (http://www.firstsql.com/iexist2.htm)
the authour write:

"The null in the row (S1,P1,NULL) means value unknown; in other words,
supplier S1 does supply part P1, but the relevant quantity is not known,
missing."

Interpreting NULL as UNKNOWN is a typical and the most serious problem
because UNKNOWN has also special and very concrete meaning. If NULL
means "nothing (is possible)" then UNKNOWN means "everything (is
possible)". In terms of possibility distribution this means that NULL is
constant 0 while UNKNOWN is constant 1. UNKNOWN can be useful only in
deductive databases while NULL is useful in normal databases.

Problems with NULLs appear when users start using NULL as a convenience
method for encoding their own semantics (unsalaried, partially salaried
etc.) The result is that DBMS has its own interpretation while users may
get their own interpretation.

> I guess the other claim against NULLs is that they make things more
> complex and less intuitive, which is a more subjective point. You could
> argue that Darwen's method of avoiding NULLs is more complex that using
> NULLs in the first place.

May be the method of avoiding NULLs at all is not so bad taking into
account possible errors. In other words, once NULLs cannot be used
appropriately it is more reliable to disable them at all. In this case I
would probably also adivce to implement any custom special states and
values by using dedicated fields rather than reusing NULLs. It is then
the database that uses the mechanism of NULLs for its internal purposes
without intervention of users. However, from theoretical point of view
we need to eventually understand what NULLs mean and how they have to be
used.

--
alex
http://conceptoriented.com

Paul

unread,
Jun 6, 2005, 11:23:13 AM6/6/05
to
Alfredo Novoa wrote:
>>I guess the other claim against NULLs is that they make things more
>>complex and less intuitive, which is a more subjective point. You could
>>argue that Darwen's method of avoiding NULLs is more complex that using
>>NULLs in the first place.
>
> Nulls undermine the conceptual integrity of the Relational Model which
> is based in predicate logic.
>
> A tuple with nulls does not match to a logical proposition.
>
> This causes many problems. Some of them were pointed by Eric.

OK so basically the problem is with aggregates?

Why not say then that all aggregates that involve a NULL return NULL?

And if you want an actual result, you must specify an explicit behaviour
by the use of COALESCE or WHERE clauses.

So could this actually be another problem with SQL's treatment of NULLs
rather than with NULLs per se?

Paul.

paul c

unread,
Jun 6, 2005, 11:38:31 AM6/6/05
to
Alexandr Savinov wrote:
>
> ... If something disappears, if something is
> deleted then acutally we get null.

i think we get 'false', according to the CWA.

> ...

i'd like to know what problem the never-ending arguments about nulls are
aimed at. in mundane applications at least (which i think is where most
people spend their time), what is the problem with using empty strings
for unknown names or zeroes for unknown number values? if the db user
has allowed tuples with such attribute values then they should expect to
have to judge the results accordingly, such as ones produced by
aggregate functions. if they want to record some other kind of
'meta-data' such as keying problems, then they can define relations
specific to those.

i'd rather see a debate about something rather than literally 'nothing',
say second-order predicate calculus.

(don't want to be another troll but can't resist - sometimes i wonder if
all the null talk here and seemingly everywhere else isn't just another
big hoax, like the OO and XML ones, foisted on the world by coders or
kibitzers from outside fields who aren't happy with their procedural lot
in life, not to mention middlemen who see money in it.)

pc

Alexandr Savinov

unread,
Jun 6, 2005, 11:47:59 AM6/6/05
to
Paul schrieb:

> Alfredo Novoa wrote:
>
>>>I guess the other claim against NULLs is that they make things more
>>>complex and less intuitive, which is a more subjective point. You could
>>>argue that Darwen's method of avoiding NULLs is more complex that using
>>>NULLs in the first place.
>>
>>Nulls undermine the conceptual integrity of the Relational Model which
>>is based in predicate logic.
>>
>>A tuple with nulls does not match to a logical proposition.
>>
>>This causes many problems. Some of them were pointed by Eric.
>
>
> OK so basically the problem is with aggregates?

Assume that we have a set of 3 values S = {1, 3, 10}. We want to
aggreage them and apply some function func: A = func(S). Do we have a
problem? No. Now remove some item from the set so that we have S = {1,
3} and then apply again the aggregation function. Do we have a problem? No.

Having null values is actually a way of removing data items from
consideration. In this example we apply the aggregation function to the
set {1, 3} which is equivalent to applying it to the set {1, 3, null}.

Some difficulties may appear in multidimensional case (in the case of
many columns). What if a row has null in field F1? This means that this
object does not exist along the dimension F1. If we project all rows
onto this dimension then we will not be able to find it there - it is
absent. In particular, aggregation functions and other procedures will
not see it at all (if it does not exist then it is not visible).

> Why not say then that all aggregates that involve a NULL return NULL?

It is possible but I do not find it very natural because we need the
properties of NULLs and aggregations to be consistent with other
properties of the model being developed. We cannot say "let's do it so"
- but need to have a kind of global consistency. For example, take a row
<1, 3> and then consider this point in 3-dimensional space by adding one
new dimension. How it will look like (represented)? I find it very
natural to write it as follows: <1, 3, null>. This actually says that
this object does not exist in this dimension, it is not visible, it
cannot be counted or aggregated. We might add some other properties of
nulls and then derive their consequences. And finally we will develop
yet another data model.

Formally, objects exist in all dimensions but in most of them they have
null values. In order to optimize such a property (a limited number of
dimensions for some objects) we use multidimensional hierarchical system
which formally describes what is the data semantics, its
dimensionality, its projections and many other issues unsolved in other
models.

--
alex
http://conceptoriented.com

Paul

unread,
Jun 6, 2005, 11:55:21 AM6/6/05
to
paul c wrote:
> i'd like to know what problem the never-ending arguments about nulls are
> aimed at. in mundane applications at least (which i think is where most
> people spend their time), what is the problem with using empty strings
> for unknown names or zeroes for unknown number values?

Because the empty string or zeros might also be valid values.

But I guess the principle stands: you could use -1 or "ZZZ" or some
other value that would never be valid. This is really just user-defined
domains lite though, because it means you have to manually set the
behaviour each time you use an aggregate function, rather than the
aggregate function knowing automatically what to do with these special
values.

> i'd rather see a debate about something rather than literally 'nothing',
> say second-order predicate calculus.
>
> (don't want to be another troll but can't resist - sometimes i wonder if
> all the null talk here and seemingly everywhere else isn't just another
> big hoax, like the OO and XML ones, foisted on the world by coders or
> kibitzers from outside fields who aren't happy with their procedural lot
> in life, not to mention middlemen who see money in it.)

I think it's a big issue because there doesn't seems to be much
consensus on it, even amongst database theory bigwigs. Maybe there is
some theoretical significance to it, but maybe it's just a subjective
software engineering thing that doesn't really have a right or wrong answer.

I think it will always be a standard database flamewar topic.

Paul.

Paul

unread,
Jun 6, 2005, 12:01:07 PM6/6/05
to
Alexandr Savinov wrote:
> Assume that we have a set of 3 values S = {1, 3, 10}. We want to
> aggreage them and apply some function func: A = func(S). Do we have a
> problem? No. Now remove some item from the set so that we have S = {1,
> 3} and then apply again the aggregation function. Do we have a problem? No.
>
> Having null values is actually a way of removing data items from
> consideration. In this example we apply the aggregation function to the
> set {1, 3} which is equivalent to applying it to the set {1, 3, null}.

Wouldn't it be "1 + 3 + unknown", say, which should be unknown also?

>> Why not say then that all aggregates that involve a NULL return NULL?
>
> It is possible but I do not find it very natural because we need the
> properties of NULLs and aggregations to be consistent with other
> properties of the model being developed. We cannot say "let's do it so"
> - but need to have a kind of global consistency. For example, take a row
> <1, 3> and then consider this point in 3-dimensional space by adding one
> new dimension. How it will look like (represented)? I find it very
> natural to write it as follows: <1, 3, null>. This actually says that
> this object does not exist in this dimension, it is not visible, it
> cannot be counted or aggregated.

To me it says we know the x & y coordinates but at the moment the z
coordinate is unknown. So if we are working with a geometric projection
that collapses the z axis, we have perfect knowledge. But if we need the
z coordinate, everything becomes unknown.

Paul.

Alexandr Savinov

unread,
Jun 6, 2005, 12:33:00 PM6/6/05
to
Paul schrieb:

You describe the semantics of UNKNOWN which is also a special "value".
And I completely aggree with your interpretation until you call it NULL.

Generally, in order to effectively manipulate data semantics we might
need two special "values" with concrete semantics defined at the system
level:

NULL - absence (of value, record or any other entity).
UNKNOWN - presence but having unknown identifier.

They can be formally defined via possibility distribution over a set of
values where 0 means impossibility and 1 means that the value is
possible (but not necessary). NULL is constant zero, i.e., no values are
possible. UNKNOWN is constant one, i.e., all values are possible. If we
have a normal value assigned to a variable then the distribution has all
values zero except for a single point with value 1 which corresponds to
the selected value. Using the mechansim of possiblity distribution
(multidimensional and hierachical) we can also impose semantic constraints.

Here is one example of using nulls:
Attribute/property/variable does not make sense for an object. requently
for the sake of schema simplicity we have wide tables where some columns
do not makes sense for some objects. For example, we have a list of
products (mostly cars) and define a column Power. However, sometimes we
also sell products which do not have a motor so what should we write in
the field Power? This is precisely the case where the object does not
exist in this dimension and we can formally correctly write NULL.

Unknown values are complex in processing and can be used only in
sophisticaed databases like deductive databases. The unknown "value"
actually means that this field has one normal value but it is unknown,
i.e., everything is possible. In this case it is very natural when the
result of aggregation is also unknown. For example, if we have a new
employee then he is guaranteed to have some concrete salary which is
however currently unknown for some reason (no paper arrived the
accounting department or the salary is still negotiated). In this case
we could formally corrected write UNKNOWN. This would inhject a great
deal of uncertainty into the whole database so that many related queries
will return also UNKNOWN (in the case the database is able to process
such values at all).

--
alex
http://conceptoriented.com

paul c

unread,
Jun 6, 2005, 12:58:13 PM6/6/05
to
Alexandr Savinov wrote:
> paul c schrieb:

>
>> Alexandr Savinov wrote:
>>
>>>
>>> ... If something disappears, if something is deleted then acutally we
>>> get null.
>>
>>
>>
>> i think we get 'false', according to the CWA.
>
>
> Actually it is one and the same, i.e., null, false, empty set
> ...

if they are one and the same then we don't need nulls. we already have
boolean true and false.

p

paul c

unread,
Jun 6, 2005, 1:12:23 PM6/6/05
to
Paul wrote:
> paul c wrote:
>
>>i'd like to know what problem the never-ending arguments about nulls are
>>aimed at. in mundane applications at least (which i think is where most
>>people spend their time), what is the problem with using empty strings
>>for unknown names or zeroes for unknown number values?
>
>
> Because the empty string or zeros might also be valid values.
> ...

perhaps i confused things a little by mentioning empty strings instead
of blanks. personally, i find it hard to care whether i know that i
don't know a person's name or whether i simply call him him Mr. 'blank'.
maybe i know the wrong people but most people i know are content
with items on an invoice having a price of zero dollars (maybe i shipped
an item an item before i knew its price) assuming this is anticipated by
the application or preferably the db with partial invoices or somesuch
notion. i think this is easier to explain to people than 3-valued logic.


p

paul c

unread,
Jun 6, 2005, 1:22:17 PM6/6/05
to
Paul wrote:

> ...


> I think it's a big issue because there doesn't seems to be much
> consensus on it, even amongst database theory bigwigs. Maybe there is
> some theoretical significance to it, but maybe it's just a subjective
> software engineering thing that doesn't really have a right or wrong answer.
>
> I think it will always be a standard database flamewar topic.
>
> Paul.

one implication of Alfredo's comment, "Nulls undermine the conceptual

integrity of the Relational Model which is based in predicate logic"

seems pretty clear. Rather than persist with 'subjective' arguments,
null proponents should try to implant their notion in predicate logic
before they implement it in dbms's, let alone applications! Being able
to make objective general implications seems a valuable result of the
relational theory / model.

p

Mikito Harakiri

unread,
Jun 6, 2005, 2:32:00 PM6/6/05
to

Hear, hear! It totally escapes me how could somebody invent putting
nonumeric value ("not salaried") into numeric column. For all reporting
purposes unknown salary is salary equal to 0. If you want to run a
report of all salaries that are wrong (and, therefore, are subject to
adjustment) then, having a flag "salary entry is incorrect" is not
going to help. An entry clerk that misses a digit in the salary is not
going to mark it with such a flag anyway:-)

erk

unread,
Jun 6, 2005, 4:57:41 PM6/6/05
to
Alexandr Savinov wrote:
> Assume that we have a set of 3 values S = {1, 3, 10}. We want to
> aggreage them and apply some function func: A = func(S). Do we have a
> problem? No. Now remove some item from the set so that we have S = {1,
> 3} and then apply again the aggregation function. Do we have a problem? No.

Incorrect - you may have a problem. You're treating S as a variable,
whose "contents" can vary from time to time. A domain, however, isn't a
variable. It's a set whose definition (whether intentional or not) is
fixed. A function over a varying domain, such as you describe,
represents the sort of situation meant for relational constraints.

Adding elements seems to represent a greater problem than removing
them, but still.

I think the above point is addressed in any of the conversations on
dbdebunk about variables vs. values vs. types vs. objects vs. ...

> Having null values is actually a way of removing data items from
> consideration. In this example we apply the aggregation function to the
> set {1, 3} which is equivalent to applying it to the set {1, 3, null}.

Not really. Is null to be counted, for example?

> Some difficulties may appear in multidimensional case (in the case of
> many columns). What if a row has null in field F1? This means that this
> object does not exist along the dimension F1. If we project all rows
> onto this dimension then we will not be able to find it there - it is
> absent. In particular, aggregation functions and other procedures will
> not see it at all (if it does not exist then it is not visible).

How about conditional tests on those attributes?

> It is possible but I do not find it very natural because we need the
> properties of NULLs and aggregations to be consistent with other
> properties of the model being developed. We cannot say "let's do it so"
> - but need to have a kind of global consistency.

So all "objects" need to be addressable by all predicates? I think
that's a nonsense. What's the point, when a simple clause like is2D(x)
can properly "distinguish"?

> For example, take a row
> <1, 3> and then consider this point in 3-dimensional space by adding one
> new dimension. How it will look like (represented)? I find it very
> natural to write it as follows: <1, 3, null>. This actually says that
> this object does not exist in this dimension, it is not visible, it
> cannot be counted or aggregated.

It's not a 3-D point, so why even consider it? If it doesn't exist in
the "dimension" of 3-D points, why even mention it? Are "objects" like
"Love" and "hate" both written <null, null, null> because they have no
"projection" into 3-D space?

> We might add some other properties of
> nulls and then derive their consequences. And finally we will develop
> yet another data model.
>
> Formally, objects exist in all dimensions but in most of them they have
> null values.

Null==absent? Why? Or rather, why bother?

> In order to optimize such a property (a limited number of
> dimensions for some objects) we use multidimensional hierarchical system
> which formally describes what is the data semantics, its
> dimensionality, its projections and many other issues unsolved in other
> models.

- Eric

erk

unread,
Jun 6, 2005, 5:13:19 PM6/6/05
to
Alexandr Savinov wrote:
> You describe the semantics of UNKNOWN which is also a special "value".
> And I completely aggree with your interpretation until you call it NULL.
>
> Generally, in order to effectively manipulate data semantics we might
> need two special "values" with concrete semantics defined at the system
> level:
>
> NULL - absence (of value, record or any other entity).
> UNKNOWN - presence but having unknown identifier.

FYI, this sort of thing has been rehashed many times, most of them
somewhat unsatisfactorily. The fact that data manipulation can be done
without nulls makes their value primarily that of placebos, while their
costs can be great, as they're hard to make consistent.

> Here is one example of using nulls:
> Attribute/property/variable does not make sense for an object. requently
> for the sake of schema simplicity we have wide tables where some columns
> do not makes sense for some objects.

How is that simple? Your table (it's not a relationa) now has many
possible meanings. Keep in mind that a relation is a predicate, not an
"entity." Otherwise how would you characterize a "linking" /
"cross-reference" relation?

> For example, we have a list of
> products (mostly cars) and define a column Power. However, sometimes we
> also sell products which do not have a motor so what should we write in
> the field Power? This is precisely the case where the object does not
> exist in this dimension and we can formally correctly write NULL.

A bad example. "Mostly cars"? Why not define the car and (other
product) relations properly?

> Unknown values are complex in processing and can be used only in
> sophisticaed databases like deductive databases. The unknown "value"
> actually means that this field has one normal value but it is unknown,
> i.e., everything is possible. In this case it is very natural when the
> result of aggregation is also unknown. For example, if we have a new
> employee then he is guaranteed to have some concrete salary which is
> however currently unknown for some reason (no paper arrived the
> accounting department or the salary is still negotiated). In this case
> we could formally corrected write UNKNOWN. This would inhject a great
> deal of uncertainty into the whole database so that many related queries
> will return also UNKNOWN (in the case the database is able to process
> such values at all).

And that's precisely the reason such things are better dealt with
through normalized relations. Then you can get consistent and correct
answers, by specifying exactly what you want: average salaries for
those who have them, count of those without salaries. The joins, far
from being a burden, allow you to be specific without have to work
around however the DBMS has "defined nulls."

- Eric

mAsterdam

unread,
Jun 6, 2005, 6:11:11 PM6/6/05
to
Alexandr Savinov wrote:
> Paul schrieb:
>
>> Alexandr Savinov wrote:
>>
>>> Assume that we have a set of 3 values S = {1, 3, 10}. We want to
>>> aggreage them and apply some function func: A = func(S). Do we have a
>>> problem? No. Now remove some item from the set so that we have S = {1,
>>> 3} and then apply again the aggregation function. Do we have a
>>> problem? No.
>>>
>>> Having null values is actually a way of removing data items from
>>> consideration. In this example we apply the aggregation function to the
>>> set {1, 3} which is equivalent to applying it to the set {1, 3, null}.
>>
Most of the time, in database context NULL denotes, as you point out,
the _absence_ of a value where you would expect one - still you are
freely using "NULL value" as a special kind of value.
ISTM this is inviting a lot of non-issues.

Did you consider viewing "NULL value" as a contradictio in terminis?

Kenneth Downs

unread,
Jun 6, 2005, 5:51:09 PM6/6/05
to
paul c wrote:

Suitable for framing!

--
Kenneth Downs
Secure Data Software, Inc.
(Ken)nneth@(Sec)ure(Dat)a(.com)

Marshall Spight

unread,
Jun 7, 2005, 12:46:48 AM6/7/05
to
> Most of the time, in database context NULL denotes, as you point out,
> the _absence_ of a value where you would expect one - still you are
> freely using "NULL value" as a special kind of value.

In fact, I don't think this is true. The SQL standard mostly takes
the stand that NULL means "there is a value, but we don't know
what it is." This is not the same thing as saying that there isn't
a value.

If SQL's NULL was really the absense of a value, then 1+NULL would
be 1. (The sum of a series of numbers that is one long is that
number.) Instead it is NULL, which is consistent with "value exists
but is unknown" and inconsistent with "the absense of a value."

In fact, it is my expectation that it would not be very hard to
come up with a precise, useful, and not too hard to understand
semantics for allowing the system to deal with "value absent."
It doesn't appear to be possible to do this for "value not known"
though; you get into this whole crappy 3VL space.

You probably still would like to have user-defined special values.

Marshall

mountain man

unread,
Jun 7, 2005, 4:47:15 AM6/7/05
to
"erk" <eric...@gmail.com> wrote in message
news:1118065152.5...@o13g2000cwo.googlegroups.com...

> mountain man wrote:
>> http://www.hughdarwen.freeola.com/TheThirdManifesto.web/Missing-info-without-nulls.pdf
>>
>> Thanks Paul for the above reference which I have just read.
>>
>> According to this reference we can replace a null in the salary
>> field with "Salary not known" and/or "Unsalaried". This has
>> taken some work to do, by a database professional, to derive
>> an "improved" version of the personnel table (when needed).
>>
>> So what? The original design schema is simply missing information
>> for these elements, and this information needs to be entered,
>> and/or determined and entered.
>>
>> Why should a qualified database professional spend time on
>> such a problem when the only real and viable solution to this
>> problem is to identify the missing information and then to get
>> it into the database?
>
> The trouble isn't the missing information per se. The trouble is that
> that introduces into queries. Examine SQL's use of nulls in aggregate
> functions to get an idea. To briefly list some problem areas: does a
> "missing" salary count as 0 in a sum? Does it affect an average? When
> selecting employees with a salary below, say, $10,000, does "missing"
> get included?

Yes, these are the problems that need to be confonted
when dealing with nulls.


> The point is that the answers to the above questions can (and do!) vary
> from domain to domain; "missing" values in salary might be treated very
> differenly from missing values in social security number (for example).


Of course, some nulls are more critical than others. ;-)
And one goes after the most critical set first, and then
work down the priority list.


> And all of the above sidesteps the problems involved with
> distinguishing "types of nulls" - missing versus not applicable versus
> other 4 and 5 VLs that I've never really understood.

While Date et al propose classification systems to categorise
various "types of nulls", if the data is subsequently correctly
maintained in the database (eg: the salary is entered) then the
problem is fixed once and for all at its source, and the
classification systems become redundant.


>> A simple workflow routine, channelling the appearances of
>> any critical nulls (not taken care of by the constraints!) to
>> the people in the organisation that are directlt responsible
>> for the entry of that element of data, also fixes the problem.
>
> And until that data is entered, what is to be done with queries over
> those tuples? Are they to be completely ignored until null values are
> "corrected"?


Of course, they are treated as high-priority integrity exceptions!
If the organisation is geared up to identify such events, and the
resources are available to immediately detect such events, and
then correct them, then what more can you do?


>> Normalisation appears to be a theoretical sledge hammer
>> trying to cover up underlying integrity issues without actually
>> solving the integrity issue at its fundamental level. At least
>> this is the impression I get after reading the above reference.
>
> I don't understand what "solving the integrity issue at its fundamental
> level" means,


That the null value (eg: null salary) is populated
with a valid value (eg: salary of 123.00).

> but nulls have the same issue. At best, it's a solution
> to a small range of problems, and as SQL suggests, can be difficult to
> get "right," if right is even possible.


Null values need to be managed with care, but if they
are managed with care, it is possible to get things "right".

From a practical aspect, the management of null values
involves their elimination from the database, on a search
and destroy basis.

Also, from the practical perspective, we need to know
how the entry of nulls to the database is happening.
What rules are failing, etc, (ie: why are nulls appearing)
and correct the issue.


--
Pete Brown
IT Managers & Engineers
Falls Creek
Australia
www.mountainman.com.au/software

mountain man

unread,
Jun 7, 2005, 4:47:14 AM6/7/05
to
"Marshall Spight" <marshal...@gmail.com> wrote in message
news:1118119608.5...@g49g2000cwa.googlegroups.com...

...[trim]...

> If SQL's NULL was really the absense of a value, then 1+NULL would
> be 1. (The sum of a series of numbers that is one long is that
> number.) Instead it is NULL, which is consistent with "value exists
> but is unknown" and inconsistent with "the absense of a value."


On the contrary, in TSQL (SQL Server) you can toggle
between these two functionalities by using the following
SET command:

SET CONCAT_NULL_YIELDS_NULL { ON | OFF }

Remarks
When SET CONCAT_NULL_YIELDS_NULL is ON,
concatenating a null value with a string yields a NULL result.
For example, SELECT 'abc' + NULL yields NULL.
When SET CONCAT_NULL_YIELDS_NULL is OFF,
concatenating a null value with a string yields the string itself
(the null value is treated as an empty string). For example,
SELECT 'abc' + NULL yields abc.

If not specified, the setting of the concat null yields null
database option applies.

Alexandr Savinov

unread,
Jun 7, 2005, 5:26:18 AM6/7/05
to

Why do you say that we do not need nulls - may be we do not need false :-)

But actually you are absolutely right - we do not need so many terms for
designating the same things. Unfortunately, it is a historical and
political issue. Different branches of science are born and then exist
independently starting from some fundamental assumptions and only later
people start recognizing that they expressed in other terms what has
existed before or in parallel. Political aspect is that different
scientific gangs are trying to establish and retain their priority and
independence so they are against unification of terms. The terms become
their flag and they fight for the possession of the flag as their
symbol. Different terms of course play also very positive role because
they actually reflect different priorities in the theory and its subtle
features.

For example, what is the difference between row, record, tuple and
object? They designate one and the same but in different contexts in
order to emphasize different aspects of the theory where they are used.

--
alex
http://conceptoriented.com

mountain man

unread,
Jun 7, 2005, 5:32:44 AM6/7/05
to
"Paul" <pa...@test.com> wrote:

> Here's some discussion of a very concrete example of a NULL problem in
> SQL: http://www.firstsql.com/iexist2.htm
>
> The author says that his RDBMS solves this problem though, and that is
> just a problem with SQL and NULLs, not with the relational model and
> NULLs.
>
> Are there any rebuttals of this claim that anyone has?

Yes, in respect of his "problem with SQL and nulls" note that
his concrete problem as expressed in the above article is the
behaviour of a standard query *converted* to using an
"EXISTS" clause.

You'll note that there are absolutely no problems
with the *standard* SQL query results, even though
nulls exist in the base table.

Also, it should be pointed out that the author uses
in the standard query (afterwards converted) sub-select
statements, which are not necessary, and in fact
probably exacerbate any problems.

The original query may be written as follows without
the sub-select statement, and has no problems with
the null.


select distinct spx.sno
from sp spx
where spx.pno = 'P1'
and spx.qty <> 1000

Paul

unread,
Jun 7, 2005, 5:41:40 AM6/7/05
to
Alexandr Savinov wrote:
> For example, what is the difference between row, record, tuple and
> object? They designate one and the same but in different contexts in
> order to emphasize different aspects of the theory where they are used.

Maybe row, record and tuple are all used to describe the same thing, but
object is different I think. Rows are identified by their contents
alone, but two objects can have identical contents but be different.

I think this is what Date & Darwen describe as the "first great blunder"
in one of their publications - the idea that class=table and row=object.
They argue that it should be domain=class

Although SQL confuses the issue by allowing tables without candidate
keys, and thus duplicate rows, although you still can't uniquely
identify them.

Paul.

Alexandr Savinov

unread,
Jun 7, 2005, 6:00:39 AM6/7/05
to
erk schrieb:

> Alexandr Savinov wrote:
>
>>Assume that we have a set of 3 values S = {1, 3, 10}. We want to
>>aggreage them and apply some function func: A = func(S). Do we have a
>>problem? No. Now remove some item from the set so that we have S = {1,
>>3} and then apply again the aggregation function. Do we have a problem? No.
>
>
> Incorrect - you may have a problem. You're treating S as a variable,
> whose "contents" can vary from time to time. A domain, however, isn't a
> variable. It's a set whose definition (whether intentional or not) is
> fixed. A function over a varying domain, such as you describe,
> represents the sort of situation meant for relational constraints.

I am sorry but I do not see any problem.
Another point is that variable is something that stores a reference.
This reference may point to
- one object (record),
- a collection of records which is dynamically defined (for example, a
result of some query),
- a table/domain which is statically defined in the schema
- anything else that can be represented by reference
So you are right that domain is not a variable but reference to a domain
can be stored in as many variables as we like just like we can store in
variable references to records or result sets. If store in a variable a
reference to a collection then this collection can be defined by using
different mechanisms by collection is collection and it always has some
set of internal elements (actually they may change in time, for example,
if we delete some rows from a table). But all this has nothing to do
with nulls or I do not understand you point.

In my example variable S is a collection defined explicitly in the
current scope by enumerating its elements. But it might be a result of
some query or its might be a static table. It has no influence on the
semantics of NULLs we are discussing. I want to say that the semantics
of NULL has to be defined as absence of thing (not in relational model
where it hardly makes sense). If we define null as absence then as a
minor advantage we avoid problems with aggregation because absent things
are simply not visible, i.e., null values are skipped. Essentially this
precisely what I wanted to illustrate in my example.

>>Having null values is actually a way of removing data items from
>>consideration. In this example we apply the aggregation function to the
>>set {1, 3} which is equivalent to applying it to the set {1, 3, null}.
>
>
> Not really. Is null to be counted, for example?

No, and this precisely what I wanted to emphasize. Because you cannot
count what does not exist. The problem is to understand that different
things may exist in different dimensions. And again, it is not
relational model (forget about it for a while - otherwise it is not
possible to understand what null means and many other important things
too).

In order to understand
- how different things may exist hierarchically and multidimensionally,
- how the model may have canonical semantics (for example, we can
compare two models if they are semantically equivalent or one of them is
more specific than another),
- what is the dimensionality of the model (how many degrees of freedom
it has),
- how grouping and aggregation works,
- what null means.
you might want to read about the concept-oriented model. But as a said
ouy need to forget for a moment about relational model because if you
always project new terms and concepts onto your good old coordiante
system you will always get a wrong view.

?

?

Yes, "Love" and "hate" has to be written <null, null, null> if
- these three variables do not make sense for them (say, Colour, Weight,
Size)
- these objects do not exist, do not exhibit themselves, along these
dimensions,
- we do not want to see them along these dimensions

Whatever we call such a behavior it has very concrete formal semantics
in the concept-oriented data model and in any other model where we want
to have such useful properties as dimensionality and canonical semantics.

>>We might add some other properties of
>>nulls and then derive their consequences. And finally we will develop
>>yet another data model.
>>
>>Formally, objects exist in all dimensions but in most of them they have
>>null values.
>
>
> Null==absent? Why? Or rather, why bother?

There several reasons but all of them relate to informal properties of
"good" model. Simple answer is "in order to avoid practial problems and
disputes about the meaning of null". When we give such a definition then
we get a very simple and effective data model (along with other principles).

I am sorry, but the only thing I can advice is to read something about
concept-oriented model. It is difficult because the description is bad
(I actually lost an interest to it because almost everything is clear
for me).

--
alex
http://conceptoriented.com

Tony Andrews

unread,
Jun 7, 2005, 6:16:57 AM6/7/05
to
Alexandr Savinov wrote:
> I am sorry, but the only thing I can advice is to read something about
> concept-oriented model. It is difficult because the description is bad
> (I actually lost an interest to it because almost everything is clear
> for me).

So let's get this straight. You have single-handedly developed
something called the "concept-oriented model", which you claim is
better than the relational model. However, you have not managed to
explain this model in a way that anyone other than yourself can
understand, and have now lost interest in trying to do so. Is there
any point in even bringing it up in discussions any more?

Paul

unread,
Jun 7, 2005, 6:22:37 AM6/7/05
to
mountain man wrote:
>>Here's some discussion of a very concrete example of a NULL problem in
>>SQL: http://www.firstsql.com/iexist2.htm
>>
>>The author says that his RDBMS solves this problem though, and that is
>>just a problem with SQL and NULLs, not with the relational model and
>>NULLs.
>>
>>Are there any rebuttals of this claim that anyone has?
>
> Yes, in respect of his "problem with SQL and nulls" note that
> his concrete problem as expressed in the above article is the
> behaviour of a standard query *converted* to using an
> "EXISTS" clause.
>
> You'll note that there are absolutely no problems
> with the *standard* SQL query results, even though
> nulls exist in the base table.

"EXISTS" is standard SQL surely? There are many times when an EXISTS
clause would make the logic clearer than a JOIN.

> Also, it should be pointed out that the author uses
> in the standard query (afterwards converted) sub-select
> statements, which are not necessary, and in fact
> probably exacerbate any problems.

Maybe they just used a simple example for pedagogical purposes, but the
point is that any treatment of NULLs should be totally watertight and
always work as expected, irrespective of however many subqueries you use.

But I was thinking more of a rebuttal of the claim that FirstSQL handles
NULLs in a totally consistent and logical way. Can anyone give an
example where FirstSQL's treatment of NULLs leads to problems? Maybe
such an example doesn't exist? I'm thinking primarily logical problems
rather than ease-of-use problems.

Paul.

Alexandr Savinov

unread,
Jun 7, 2005, 6:35:24 AM6/7/05
to
erk schrieb:

> Alexandr Savinov wrote:
>
>>You describe the semantics of UNKNOWN which is also a special "value".
>>And I completely aggree with your interpretation until you call it NULL.
>>
>>Generally, in order to effectively manipulate data semantics we might
>>need two special "values" with concrete semantics defined at the system
>>level:
>>
>>NULL - absence (of value, record or any other entity).
>>UNKNOWN - presence but having unknown identifier.
>
>
> FYI, this sort of thing has been rehashed many times, most of them
> somewhat unsatisfactorily. The fact that data manipulation can be done
> without nulls makes their value primarily that of placebos, while their
> costs can be great, as they're hard to make consistent.
>
>
>>Here is one example of using nulls:
>>Attribute/property/variable does not make sense for an object. requently
>>for the sake of schema simplicity we have wide tables where some columns
>>do not makes sense for some objects.
>
>
> How is that simple? Your table (it's not a relationa) now has many
> possible meanings. Keep in mind that a relation is a predicate, not an
> "entity." Otherwise how would you characterize a "linking" /
> "cross-reference" relation?

By schema simplicity I meant the number of tables. Sometimes it is
really simpler to have one wide table than numerous specific tables.
There is alway trade off between these two extremes and do not say that
having one wide table is good practice - it is only an example for
demostrating some properties of nulls.

Tables are not entities in relational model in other conventional models
(and it is one of fundamental drawbacks). However, they should be
treated as entities at least formally (in a "good" data model). Indeed,
what are tables and why need to define them as a special kind of things
different from entities? It is a violation of Occams's rasor principle.
I want to have everything treated as only one sort of things. For
example, why tables do not have their own fields where I might store
some properties? In order to understand why do we need tables and what
is their role consider the following example. Instead of one wide table
for all types of objects we are tought to introduce several more special
tables with only a smaller subset of columns. Such tables will have less
rows. However, then we might want to inroduce even more specialized
tables and so on. Finally we get a large number of tables each with a
relatively small number of rows. Interestingly, we can continue this
process and get tables with no rows at all! Thus theoretically we can
express our semantics without rows by using only tables. One simple
conclusion is that "data semantics can be expressed by tables rather
than only by rows (in tables)". It is very strong and somewhat
surprising result. In such an approach we do not need rows because
anything can be modelled by means of tables. Such a model is ugly from
relational point of view (although we simply followed an advice to
decompose tables) however it is interesting from theoretical point of
view (with big consequences). Manipulating data in this model means
adding/remove tables rather than records and one table bears some
semantics which is encoded in its static name. Thus tables in good model
should be treated as normal entities but with special role so that we
can always choose where to store our semantics - in table entity or in
record entity. Of course, such a model is more complex and more
interesting but I just wanted to comment your comment that tables are
not entities.

Yet, again, you touch an issue which has nothing to do with null values
and their semantics.

>>For example, we have a list of
>>products (mostly cars) and define a column Power. However, sometimes we
>>also sell products which do not have a motor so what should we write in
>>the field Power? This is precisely the case where the object does not
>>exist in this dimension and we can formally correctly write NULL.
>
>
> A bad example. "Mostly cars"? Why not define the car and (other
> product) relations properly?

I deliberatly gave such an example!
However, in real world complex applications we (unfortuanately) have a
lot of such situations which cannot be avoided. In other words, it may
well happen that for some record some attribute simply does not make
sense (including bad design).

Theoretically it is not an issue of good or bad design. The problem is
if we want to keep meaningless columns in one table or optimize
representation. I mean that independent of our desire or schema any
object in the database will formally have all dimensions. The question
is only how we *represent* the database semantics. In one case null is
written explicitly. In other cases it still exists as a value of an
object along meaningless dimensions but is not explicitly written
because of good schema. For database and for the model this object still
has null as a value along this dimensions independent of how it is
represented.

>>Unknown values are complex in processing and can be used only in
>>sophisticaed databases like deductive databases. The unknown "value"
>>actually means that this field has one normal value but it is unknown,
>>i.e., everything is possible. In this case it is very natural when the
>>result of aggregation is also unknown. For example, if we have a new
>>employee then he is guaranteed to have some concrete salary which is
>>however currently unknown for some reason (no paper arrived the
>>accounting department or the salary is still negotiated). In this case
>>we could formally corrected write UNKNOWN. This would inhject a great
>>deal of uncertainty into the whole database so that many related queries
>>will return also UNKNOWN (in the case the database is able to process
>>such values at all).
>
>
> And that's precisely the reason such things are better dealt with
> through normalized relations. Then you can get consistent and correct
> answers, by specifying exactly what you want: average salaries for
> those who have them, count of those without salaries. The joins, far
> from being a burden, allow you to be specific without have to work
> around however the DBMS has "defined nulls."

That is a (serious) problem in relational model because depending on
chosen schema you may get different results. However, in good model
normalization does not change the data semantics - it is only an issue
of where and how it will be stored. You can store everything in one wide
table or you can create specialized tables but all queries must return
one and the same result. This is how concept-oriented model works where
normalization is not an issue. The model has some dimensionality and
data in it has some semantics and the question is what structure for
dimensions we prefer - it is a design issue which has nothing to do what
results will be returned by our queries. If data semantics depends on
the structure of its relations, how you use joins and how you use nulls
then it is bad model.

--
alex
http://conceptoriented.com

Alfredo Novoa

unread,
Jun 7, 2005, 6:51:51 AM6/7/05
to
On Mon, 06 Jun 2005 16:23:13 +0100, Paul <pa...@test.com> wrote:

>OK so basically the problem is with aggregates?

The problem is basically with queries, but aggregate queries are
probably the most problematic when we allow nulls.

>Why not say then that all aggregates that involve a NULL return NULL?

This is what SQL does. Sometimes we want to get a value but we get a
null instead.

I have to mantain an application that breaks very often due to this
problem.

>And if you want an actual result, you must specify an explicit behaviour
>by the use of COALESCE or WHERE clauses.

And this is very cumbersome and error prone.

>So could this actually be another problem with SQL's treatment of NULLs
>rather than with NULLs per se?

This problem is inherent to nulls.


Regards

Alfredo Novoa

unread,
Jun 7, 2005, 6:51:46 AM6/7/05
to
On Mon, 06 Jun 2005 15:02:17 GMT, paul c <toledob...@oohay.ac>
wrote:

>i'm an admirer of Date's but i have to say i like your answer. no Latin
>to look up, either.

What's the problem with Latin? :)


Regards

Alexandr Savinov

unread,
Jun 7, 2005, 6:52:25 AM6/7/05
to
Tony Andrews schrieb:

I said, I am sorry.
I did not claim that it is better, because it is already a political
issue (ok, may be I implicitly meant that but I did not want to say that).
I mentioned this model because there is a description and everybody can
read it and I can answer questions if any.
I mentioned this model as the continuation of long discussion in order
to point to the source of ideas.

To keep things simpler: if you are interested to know more about null
values and other general data model issues then read the stuff on
conceptoriented.com. If you are not interested then do not read. If
there are questions then I can answer.

--
alex
http://conceptoriented.com

Tony Andrews

unread,
Jun 7, 2005, 6:55:46 AM6/7/05
to
Alexandr Savinov wrote:
<SNIP>

> In order to understand why do we need tables and what
> is their role consider the following example. Instead of one wide table
> for all types of objects we are tought to introduce several more special
> tables with only a smaller subset of columns. Such tables will have less
> rows. However, then we might want to inroduce even more specialized
> tables and so on. Finally we get a large number of tables each with a
> relatively small number of rows. Interestingly, we can continue this
> process and get tables with no rows at all! Thus theoretically we can
> express our semantics without rows by using only tables. One simple
> conclusion is that "data semantics can be expressed by tables rather
> than only by rows (in tables)". It is very strong and somewhat
> surprising result.

You are not kidding!

> In such an approach we do not need rows because
> anything can be modelled by means of tables. Such a model is ugly from
> relational point of view (although we simply followed an advice to
> decompose tables) however it is interesting from theoretical point of
> view (with big consequences).

Which part of the decomposition process reduces 1 row to 0 rows? I
must have missed that part...

Alexandr Savinov

unread,
Jun 7, 2005, 7:15:53 AM6/7/05
to
Tony Andrews schrieb:

I'll try to exlain using another example (but I am not sure if it is
necessary taking into account your apriori negative predisposition).
This example is especially illustrative for those who absolutely trusts RM.

How can you model a tree of items? There are two major alternatives:
1. Create a table with folders and a table with items. Each folder and
each item has a field which specifies its parent folder.
2. Craate one table for each folder and store all items belonging to
this folder in this table. There exist as many tables as we have folders.

Conclusion: This example demonstrates that tables have the same semantic
load as normal records, i.e., they should be treated not only as a
structural element but also as means for representing data semantics.
Addding/removing tables changes the model semantics rather than only its
structure.

Problem: A good model should somehow take this into account and provide
facilities for representing and manipulating data semantics in at least
these two forms.

Solution: I think before we can describe this item it is necessary to
accept the first two (it does not make sense to solve a problem if it is
not accepted as a problem).

Comment: Any data modeller with enough practical experience may draw
this conclusion and formulate this problem - we can express our data
semantics by means of tables but the relational model does not provide
good means for representing and manipulating it. In other words, tables
should be treated as normal entities with special (organizing) role with
respect to other entities. The above example shows how rows are
converted into tables and vice versa. In extreme case we might represent
everything by tables only however, it is difficult in relational model
because it is not intended to solve such problems.

--
alex
http://conceptoriented.com

Tony Andrews

unread,
Jun 7, 2005, 7:32:34 AM6/7/05
to
Alexandr Savinov wrote:
> Tony Andrews schrieb:

> > Which part of the decomposition process reduces 1 row to 0 rows? I
> > must have missed that part...
>
> I'll try to exlain using another example (but I am not sure if it is
> necessary taking into account your apriori negative predisposition).

Sorry if my negative predisposition is so obvious. (I can't deny that
I have it.) However, I will try to understand...

> This example is especially illustrative for those who absolutely trusts RM.
>
> How can you model a tree of items? There are two major alternatives:
> 1. Create a table with folders and a table with items. Each folder and
> each item has a field which specifies its parent folder.
> 2. Craate one table for each folder and store all items belonging to
> this folder in this table. There exist as many tables as we have folders.
>
> Conclusion: This example demonstrates that tables have the same semantic
> load as normal records, i.e., they should be treated not only as a
> structural element but also as means for representing data semantics.
> Addding/removing tables changes the model semantics rather than only its
> structure.

False: your alternative #2 does not represent the tree structure at
all, it is just a bunch of unrelated tables, each of which contains
some items. Where did the folder tree structure go? Try again!

Alexandr Savinov

unread,
Jun 7, 2005, 8:02:35 AM6/7/05
to
Tony Andrews schrieb:

Yes, you are right, we need to connect the tables so that each table
known its parent table. But it does not change the point - it only even
better demonstrates limitations of relational model. One solution is to
connect them statically via your application. Another solutions is where
you can store this information a table from #1 where one row represents
one table. But do not say that #2 is now equivalent to #1. Because the
main issue here is that for each row from Folder table there is one real
table which contains a set of items. Now very important point (for the
solution): if in case #1 each item specifies the parent folder by using
some field, i.e., by using legal relational facility, then in #2 each
item belongs to its folder (table) by using 'instance of' relation,
i.e., it is a row in this table. Thus we use for data modeling two
absolutely different types of relations. And again, the whole example
demonstrates that tables should be legal element for use in representing
data semantics.

Here is even simnpler example. Assume you have a number of departments.
There is two ways how you can represent them:
1. As records in a table of departments
2. As individual tables (possibly with support from one common
meta-table from #1 as you noticed because relational model does not
allow for tables to have fields and to be used as entities)

It is very general and wide spread trade off and it is important to
recognize that there is such an alternative (but it is prohibited to
think so in RM). Once you start a new model you need to decide what
entities will be represented by tables. Normally these are special
entities which have high level organizing role. Sometimes we can to keep
them as normal tables without support. But in many cases we need to have
additional properites and structure for them. In the latter case we also
duplicate them as normal records in a (meta-)table. I can continue with
advantages and disadvantages and analysis of properties but if you
recognize this point as an alternative then you can do it easily yourself.

I return to the initial point: tables should be considered normal
entities with special role and vice versa normal entities should be able
to play a role of tables.

Another aspect of this problem. A schema is not part of the relational
model but it is heavily used in complex applications. You know that any
implementation has a table of tables and it is reflects the fact that
tables are rows and have normal properties. Thus practice also confirms
my hypothesis however currently there is no a data model (theory) for that.


--
alex
http://conceptoriented.com

Paul

unread,
Jun 7, 2005, 8:23:32 AM6/7/05
to
Alfredo Novoa wrote:
>>Why not say then that all aggregates that involve a NULL return NULL?
>
> This is what SQL does. Sometimes we want to get a value but we get a
> null instead.

Are you sure? I've just tried summing a column that contains a NULL in
PostgreSQL and it doesn't return NULL - it treats the NULLs as zeros.

>>So could this actually be another problem with SQL's treatment of NULLs
>>rather than with NULLs per se?
>
> This problem is inherent to nulls.

I'm not asking whether the fact that NULLs are cumbersome and errorprone
is inherent to NULLs, but whether a DBMS with NULLs will always have
queries that return incorrect results (as per Date's example with the
EXISTS clause).

Could it be that a variant of SQL exists that has NULLs, so might be
considered cumbersome (a somewhat subjective opinion?), but has zero
logical inconsistencies?

Paul.

Paul

unread,
Jun 7, 2005, 8:31:16 AM6/7/05
to
Mikito Harakiri wrote:
> Hear, hear! It totally escapes me how could somebody invent putting
> nonumeric value ("not salaried") into numeric column. For all reporting
> purposes unknown salary is salary equal to 0.

I disagree; suppose 90% of salaries are unknown. Someone does a quick
query to get the total estimated salary bill, not realising this. They
are going to get a shock when the real thing comes through!

I would have thought that it would be safest to return an "unknown",
flagging to the user that there are unkown salaries in the column. Then,
having been alerted to that fact, they can proceed to use COALESCE to
explicitly treat the NULLs as zero if they want.

But I think (and this is a subjective opinion) that treating unknowns as
zeros or empty strings should be manually defined behaviour rather than
the default.

And the "salary" column could be thought of not as a numeric column, but
as a column that holds the answer to the question: "What is employee X's
salary?"

Now adding support for this involves a lot more complication at the
domain level, but maybe it's worth it. I guess only experience of using
such a system will tell.

Paul.

Paul

unread,
Jun 7, 2005, 11:01:25 AM6/7/05
to
mountain man wrote:
> Nulls exist in set theory, they exist in the real world, and
> they exist in data elements. While they may be problematic,
> their problematics can be effectively managed and resolved.

The empty set in set theory isn't really like a SQL NULL though - it's a
perfectly well defined value - a set with no elements.

Relational database theory comes from a mix of set theory and predicate
logic. Set theory is for the tables and rows, but once you get down to
the row level, it switches to predicate logic. And standard predicate
logic doesn't say anything about NULL arguments. So a set-theoretical
empty set really corresponds to an empty table, I'd say, rather than a
NULL attribute.

>>>Why not say then that all aggregates that involve a NULL return NULL?
>>
>>This is what SQL does. Sometimes we want to get a value but we get a
>>null instead.
>

> This is user-definable in TSQL by use of the command
> SET CONCAT_NULL_YIELDS_NULL { ON | OFF }

That's only for binary operations though, not for aggregates like SUM or
AVG. I don't think MSSQL has an concatenation aggregate function (it's a
non-commutative operation, so it wouldn't be well-defined on a set
anyway) and I think this options is mainly for interoperability with
Oracle, which I think for some reason didn't used to differentiate
between empty strings and NULL strings.

Paul.

Jon Heggland

unread,
Jun 7, 2005, 11:09:00 AM6/7/05
to
In article <42a58ced$1...@news.fhg.de>, sav...@host.com says...
> Tony Andrews schrieb:

> > False: your alternative #2 does not represent the tree structure at
> > all, it is just a bunch of unrelated tables, each of which contains
> > some items. Where did the folder tree structure go? Try again!
>
> Yes, you are right, we need to connect the tables so that each table
> known its parent table. But it does not change the point - it only even
> better demonstrates limitations of relational model.

No, it doesn't. The RM handles the case just fine, if you use it
properly. This is not a limitation of the RM any more than it is a
limitation of a shovel that it is really bad at digging if you hold it
upside down.

> One solution is to connect them statically via your application.

Which defies the purpose of a DBMS.

> Another solutions is where
> you can store this information a table from #1 where one row represents
> one table.

I have trouble parsing this.

> But do not say that #2 is now equivalent to #1. Because the
> main issue here is that for each row from Folder table there is one real
> table which contains a set of items. Now very important point (for the
> solution): if in case #1 each item specifies the parent folder by using
> some field, i.e., by using legal relational facility, then in #2 each
> item belongs to its folder (table) by using 'instance of' relation,
> i.e., it is a row in this table. Thus we use for data modeling two
> absolutely different types of relations.

So you have added complexity, but not expressive power. That is what
most "new" data models do. It is pointless.

> And again, the whole example
> demonstrates that tables should be legal element for use in representing
> data semantics.

No. It is not necessary.

> Here is even simnpler example. Assume you have a number of departments.
> There is two ways how you can represent them:
> 1. As records in a table of departments
> 2. As individual tables (possibly with support from one common
> meta-table from #1 as you noticed because relational model does not
> allow for tables to have fields and to be used as entities)

Again, you introduce unnecessary complexity.

> I return to the initial point: tables should be considered normal
> entities with special role and vice versa normal entities should be able
> to play a role of tables.

Why? What purpose does this serve?

> Another aspect of this problem. A schema is not part of the relational
> model but it is heavily used in complex applications. You know that any
> implementation has a table of tables and it is reflects the fact that
> tables are rows and have normal properties.

No, tables (relvars) are not rows (tuples). A relvar can be *described*
*using* tuples. This is very different.
--
Jon

paul c

unread,
Jun 7, 2005, 10:59:52 AM6/7/05
to
Paul wrote:
> Mikito Harakiri wrote:
>
>>Hear, hear! It totally escapes me how could somebody invent putting
>>nonumeric value ("not salaried") into numeric column. For all reporting
>>purposes unknown salary is salary equal to 0.
>
>
> I disagree; suppose 90% of salaries are unknown. Someone does a quick
> query to get the total estimated salary bill, not realising this. They
> are going to get a shock when the real thing comes through!
> ...


they'll get a bigger shock when the business goes under. if 90% of
salaries are unknown, serves them right. they have bigger problems than
the database.

p

Alexandr Savinov

unread,
Jun 7, 2005, 11:18:15 AM6/7/05
to
Jon Heggland schrieb:
> In article <e%hpe.7195$F7....@news-server.bigpond.net.au>,
> hobbit@southern_seaweed.com.op says...
>
>>"Tony Andrews" <andr...@onetel.com> wrote in message
>>news:1118141746.2...@o13g2000cwo.googlegroups.com...
>>
>>>Alexandr Savinov wrote:
>>><SNIP>

>>>
>>>Which part of the decomposition process reduces 1 row to 0 rows? I
>>>must have missed that part...
>>
>>I imagine this can be done by querying the system table register
>>for the name of a table, instead of querying a table for the name
>>(value) of a key.
>>
>>Instead of writing or deleting a row, you could
>>create or drop a table with zero rows.
>
>
> In that case, you are using the rows in the "system table register"
> table for your information; the actual tables you create and drop are
> just irrelevant side effects -- you don't use them for anything at all.

Yes, these are side effects if we are not going to insert any rows into
those system table rows. But this example was used to demonstrate that
tables are used to represent some entities from the problem domain and
in this sense it is possible to model the problem domain by using
tables. The fact that tables are normally used to include records while
records are normally used to include fields simply emphasizes their
special use. In general case any entity has two aspects:

- it can be viewed as a collection of other entities (collection here
means formally logical sum or a set normally designated as {}),

- it can be viewed as a combination of other entities (combination here
means formally logical product designated as <>).
If one of these aspects is absent (not implemented) then we get either a
table/container or a row/object.

Another point is how can we model the problem domain. Here again we have
two ways:

- we can define an entity as a collection of other entities (let's call
it 'instance of' relation between a table and its member). In this case
the members physically belong to the collection.

- we can use a relation between an object and its value. In this case
the record logically belongs to its values (other records), for example,
a record <1, 5, 10> belongs to 1, to 5 and to 10 simultaniously.

Some consequences of such a definition can be found on
conceptoriented.com (not all - many things are not described).

In relational model tables are simply containers (collection of other
entities) and it is impossible to directly define new properties for
them. Records are simply objects (combinations of other entities) and we
cannot add records to records. And such a definition is really effective
and covers a huge number of situations. However, the question was how
can we treat tables as entities and can they be treated as entities at
all. There was an opinion that tables are not entities and have nothing
to do with the data semantics.

> Alternatively, if the "system table register" is not organized as a
> relational table, you are no longer using the relational model, and the
> whole point is lost.

As far as I understand the relational model does not use any system
table or a table of tables - it is how RDBMSs are implemented.
Relational model works independently of how this meta information is
stored and if it exists at all.

--
alex
http://conceptoriented.com

Jon Heggland

unread,
Jun 7, 2005, 9:07:09 AM6/7/05
to
In article <42a591c4$0$41901$ed26...@ptn-nntp-reader03.plus.net>,
pa...@test.com says...

> Alfredo Novoa wrote:
> >>Why not say then that all aggregates that involve a NULL return NULL?
> >
> > This is what SQL does. Sometimes we want to get a value but we get a
> > null instead.
>
> Are you sure? I've just tried summing a column that contains a NULL in
> PostgreSQL and it doesn't return NULL - it treats the NULLs as zeros.

Not exactly: SQL ignores NULLs in aggregate functions (except COUNT(*)).
It is not treated as zero for AVG, for instance. Also note that x + NULL
evaluates to NULL; therefore, SQL's SUM is not iterated addition -- it
has a much more complicated definition.
--
Jon

Tony Andrews

unread,
Jun 7, 2005, 9:29:49 AM6/7/05
to
Alexandr Savinov wrote:
> Here is even simnpler example. Assume you have a number of departments.
> There is two ways how you can represent them:
> 1. As records in a table of departments
> 2. As individual tables (possibly with support from one common
> meta-table from #1 as you noticed because relational model does not
> allow for tables to have fields and to be used as entities)

#1 is the right way
#2 is the wrong way

> It is very general and wide spread trade off and it is important to
> recognize that there is such an alternative (but it is prohibited to
> think so in RM).

What would be the benefits of approach #2? The drawbacks are obvious
enough. How would you express a relationship between departments with
approach #2? Or associate employees with departments?

> Another aspect of this problem. A schema is not part of the relational
> model but it is heavily used in complex applications. You know that any
> implementation has a table of tables and it is reflects the fact that
> tables are rows and have normal properties. Thus practice also confirms
> my hypothesis however currently there is no a data model (theory) for that.

I'm not sure what you mean by this. AFAIK it has always been the case
that we would expect that the relational schema (metadata) would itself
be defined in relations in the form of the "system catalog". No new
theory is needed, since it is just another application of the
relational model.

mountain man

unread,
Jun 7, 2005, 9:48:52 AM6/7/05
to
"Alfredo Novoa" <alfred...@hotmail.com> wrote in message
news:32vaa1l02d4hu4p2h...@4ax.com...

> On Mon, 06 Jun 2005 16:23:13 +0100, Paul <pa...@test.com> wrote:
>
>>OK so basically the problem is with aggregates?
>
> The problem is basically with queries, but aggregate queries are
> probably the most problematic when we allow nulls.

Nulls exist in set theory, they exist in the real world, and
they exist in data elements. While they may be problematic,
their problematics can be effectively managed and resolved.

>>Why not say then that all aggregates that involve a NULL return NULL?
>
> This is what SQL does. Sometimes we want to get a value but we get a
> null instead.

This is user-definable in TSQL by use of the command
SET CONCAT_NULL_YIELDS_NULL { ON | OFF }

> I have to mantain an application that breaks very often due to this
> problem.


Find the corresponding SET command.
Which SQL-DBMS is it?

Alexandr Savinov

unread,
Jun 7, 2005, 10:07:20 AM6/7/05
to
Tony Andrews schrieb:

> Alexandr Savinov wrote:
>
>>Here is even simnpler example. Assume you have a number of departments.
>>There is two ways how you can represent them:
>>1. As records in a table of departments
>>2. As individual tables (possibly with support from one common
>>meta-table from #1 as you noticed because relational model does not
>>allow for tables to have fields and to be used as entities)
>
>
> #1 is the right way
> #2 is the wrong way

We do not have here "wrong" or "right". It is a matter of fact: we can
model things by using records and we can model things by using tables.
When you say "wrong" you probably mean it is not desirable but you still
do it! If you have tables in your model then you have some things
represented by them! Any model is some thing existing in the problem
domain.

When you say "wrong" then you contradict to another principle. You
actually suggest to keep all data in one common table because you find
it wrong to decompose schema and introduce more tables for special
purposes. So again, entities can be represented by tables and vice versa
one table represents one entity. It is not right or wrong, it is what we
currently have.

>>It is very general and wide spread trade off and it is important to
>>recognize that there is such an alternative (but it is prohibited to
>>think so in RM).
>
>
> What would be the benefits of approach #2? The drawbacks are obvious
> enough. How would you express a relationship between departments with
> approach #2? Or associate employees with departments?

As I wrote approach #2 is already used in any model that has tables in
it because any table represents some element of the problem domain.
However, RM does not provide means for manipulating and representing
data in this way so it is qualified as illegal procedure.

When you ask about benefits and indicate drawbacks then you are actually
critisizing relational model. In other words, you came to the conclusion
that RM does not allow us to deal with some entities such as those
represented by tables. So you are right that it is difficult to
represent relationships between departments in approach 2 but you need
to address this question to those who is responsible for RM - not me!
Because I know that it is a serious problem in RM (among others).

>>Another aspect of this problem. A schema is not part of the relational
>>model but it is heavily used in complex applications. You know that any
>>implementation has a table of tables and it is reflects the fact that
>>tables are rows and have normal properties. Thus practice also confirms
>>my hypothesis however currently there is no a data model (theory) for that.
>
>
> I'm not sure what you mean by this. AFAIK it has always been the case
> that we would expect that the relational schema (metadata) would itself
> be defined in relations in the form of the "system catalog". No new
> theory is needed, since it is just another application of the
> relational model.

No, it is not another application. Schema with its information is an
integral part of the model so it is the same application. If you have a
data model and you define a couple of tables then you may ask a question
what properites these tables have and where they are living actually. RM
does not ask these questions and does not provide an answer. Schema
(meta-data) is an integral part of database but on the other hand it is
something completely different. What if you have meta-meta-data? What
might be a solution.

--
alex
http://conceptoriented.com

mountain man

unread,
Jun 7, 2005, 10:16:42 AM6/7/05
to
"Tony Andrews" <andr...@onetel.com> wrote in message
news:1118141746.2...@o13g2000cwo.googlegroups.com...
> Alexandr Savinov wrote:
> <SNIP>
>
> Which part of the decomposition process reduces 1 row to 0 rows? I
> must have missed that part...

I imagine this can be done by querying the system table register


for the name of a table, instead of querying a table for the name
(value) of a key.

Instead of writing or deleting a row, you could
create or drop a table with zero rows.

Such routines were handy in the "old days" because the
utilities of the proprietory DBMS usually included a create
table (file) command.

Jon Heggland

unread,
Jun 7, 2005, 10:48:20 AM6/7/05
to
In article <e%hpe.7195$F7....@news-server.bigpond.net.au>,
hobbit@southern_seaweed.com.op says...
> "Tony Andrews" <andr...@onetel.com> wrote in message
> news:1118141746.2...@o13g2000cwo.googlegroups.com...
> > Alexandr Savinov wrote:
> > <SNIP>
> >
> > Which part of the decomposition process reduces 1 row to 0 rows? I
> > must have missed that part...
>
> I imagine this can be done by querying the system table register
> for the name of a table, instead of querying a table for the name
> (value) of a key.
>
> Instead of writing or deleting a row, you could
> create or drop a table with zero rows.

In that case, you are using the rows in the "system table register"

table for your information; the actual tables you create and drop are
just irrelevant side effects -- you don't use them for anything at all.

Alternatively, if the "system table register" is not organized as a

relational table, you are no longer using the relational model, and the
whole point is lost.

--
Jon

Paul

unread,
Jun 7, 2005, 10:49:47 AM6/7/05
to
Jon Heggland wrote:
>>>>Why not say then that all aggregates that involve a NULL return NULL?
>>>
>>>This is what SQL does. Sometimes we want to get a value but we get a
>>>null instead.
>>
>>Are you sure? I've just tried summing a column that contains a NULL in
>>PostgreSQL and it doesn't return NULL - it treats the NULLs as zeros.
>
> Not exactly: SQL ignores NULLs in aggregate functions (except COUNT(*)).
> It is not treated as zero for AVG, for instance. Also note that x + NULL
> evaluates to NULL; therefore, SQL's SUM is not iterated addition -- it
> has a much more complicated definition.

I'd suggest that SQL's SUM *should* be iterated addition. And that all
aggregates should return NULL if any of the attributes in the aggregated
column are NULL. Surely if any of them are unknown, we must say that the
sum or average is unknown also?

What are the arguments for not doing this?

Paul.

Alexandr Savinov

unread,
Jun 7, 2005, 11:42:23 AM6/7/05
to
Jon Heggland schrieb:
> In article <42a5b411$0$8714$ed26...@ptn-nntp-reader02.plus.net>,
> pa...@test.com says...

>
>>>Not exactly: SQL ignores NULLs in aggregate functions (except COUNT(*)).
>>>It is not treated as zero for AVG, for instance. Also note that x + NULL
>>>evaluates to NULL; therefore, SQL's SUM is not iterated addition -- it
>>>has a much more complicated definition.
>>
>>I'd suggest that SQL's SUM *should* be iterated addition.
>
>
> I agree. Which would also mean that the empty sum is 0, not NULL. Date
> has written quite a lot on this.

>
>
>>And that all
>>aggregates should return NULL if any of the attributes in the aggregated
>>column are NULL. Surely if any of them are unknown, we must say that the
>>sum or average is unknown also?
>
>
> Yes, if NULL means unknown.

To say "if NULL means unknown" is the same as to say "if 0 is equal 1"
or "if empty set is full set". Unknown and null are two different things
and deserve to have to special designations.

NULL means absence
UNKNOWN means presence with unknown identity

Everything else has a custom semantics defined and maintained by the user.

--
alex
http://conceptoriented.com

Alfredo Novoa

unread,
Jun 7, 2005, 11:25:09 AM6/7/05
to

On Tue, 7 Jun 2005 15:07:09 +0200, Jon Heggland <hegg...@idi.ntnu.no>
wrote:

Thanks for the clarification.

Here is clear that SQL nulls are a complete botch-up.


Regards


Alexandr Savinov

unread,
Jun 7, 2005, 11:36:32 AM6/7/05
to
Jon Heggland schrieb:

I did not describe any data model here - you try to see an evil where it
does not exist. I provided a couple of examples and a couple of
alternative designs and interpreations with the purpose to demonstrate
that tables can be viewed as entities - that is all. Moreover, this
point has a little to do with the relational model - it is a higher
level issue.

>>And again, the whole example
>>demonstrates that tables should be legal element for use in representing
>>data semantics.
>
>
> No. It is not necessary.

So why you represent real entities from your problem domain by tables?
Again, it is not a matter of necessity, right or wrong. Whenever you
create a table you want it to represent some entity. If you define
tables Departments, Products and Personel then you defined three kinds
of entities and your model consists of three elements already! And you
are saying it is not necessary. What is not necessary, representing the
state of data in the database? I do not understand. The initial point
was that tables represent real entities even if you do not like that.
You might define a table ProductItems and then keep your products in one
wide table. And then you might find it inconvenient and two additional
tables, say, Cars and Houses. So you introduced into your model two
additional entities called Cars and Houses. The only problem is that RM
does not allow to add properties to these entities because they are
represented by tables. This is why we might want to represent them by
normal rows is a special table ProductTypes.

>>Here is even simnpler example. Assume you have a number of departments.
>>There is two ways how you can represent them:
>>1. As records in a table of departments
>>2. As individual tables (possibly with support from one common
>>meta-table from #1 as you noticed because relational model does not
>>allow for tables to have fields and to be used as entities)
>
>
> Again, you introduce unnecessary complexity.

I do not introduce anything at all and if there is some complexity then
it is done deliberatly - I provide examples which demonstrate that
tables *are* used for modeling and representing data semantics.

>>I return to the initial point: tables should be considered normal
>>entities with special role and vice versa normal entities should be able
>>to play a role of tables.
>
>
> Why? What purpose does this serve?

Because our data has such a property. We always represent entities by
tables and want our records to store internal records.

>>Another aspect of this problem. A schema is not part of the relational
>>model but it is heavily used in complex applications. You know that any
>>implementation has a table of tables and it is reflects the fact that
>>tables are rows and have normal properties.
>
>
> No, tables (relvars) are not rows (tuples). A relvar can be *described*
> *using* tuples. This is very different.

Of course, and I absolutely agree here, tables are not vars - but only
in relational model. But in data modeling they are! And it would nice to
have a model that formalizes this fact. As I illustrated in several
previous examples we frequently use tables as records and vice versa.

You project everything on relational model so it does not make sense to
continue. It is the same if I were explaining what a complex number is
and you would reply that we do not the imaginable part because real
numbers do not need it. Of course, real numbers do not need the
imaginable part because they are defined so. But numerous practical
applications show that real numbers simply ignore this part of number
and in many applications it is much easier to have it explicitly.

--
alex
http://conceptoriented.com

Jon Heggland

unread,
Jun 7, 2005, 11:24:35 AM6/7/05
to
In article <42a5b411$0$8714$ed26...@ptn-nntp-reader02.plus.net>,
pa...@test.com says...
> > Not exactly: SQL ignores NULLs in aggregate functions (except COUNT(*)).
> > It is not treated as zero for AVG, for instance. Also note that x + NULL
> > evaluates to NULL; therefore, SQL's SUM is not iterated addition -- it
> > has a much more complicated definition.
>
> I'd suggest that SQL's SUM *should* be iterated addition.

I agree. Which would also mean that the empty sum is 0, not NULL. Date

has written quite a lot on this.

> And that all


> aggregates should return NULL if any of the attributes in the aggregated
> column are NULL. Surely if any of them are unknown, we must say that the
> sum or average is unknown also?

Yes, if NULL means unknown.

> What are the arguments for not doing this?

I don't know; you'd have to ask the SQL designers. Probably convenience;
practice without theory. Quite a lot of people seem to prefer a
(slightly?) incorrect answer instead of no answer.
--
Jon

Alfredo Novoa

unread,
Jun 7, 2005, 11:25:03 AM6/7/05
to

On Tue, 07 Jun 2005 15:49:47 +0100, Paul <pa...@test.com> wrote:

>I'd suggest that SQL's SUM *should* be iterated addition. And that all
>aggregates should return NULL if any of the attributes in the aggregated
>column are NULL. Surely if any of them are unknown, we must say that the
>sum or average is unknown also?
>
>What are the arguments for not doing this?

It does not eliminate nulls. Everything else is a botch-up.


Regards

erk

unread,
Jun 7, 2005, 12:02:26 PM6/7/05
to
Alexandr Savinov wrote:
> Another point is that variable is something that stores a reference.

A variable stores a value. The value it stores can change over time.
References and pointers are implementation "niceties" that are often
overexposed; they are, in essence, variable-variables. Contrast with
relational theory, in which relations are the only variables.

> This reference may point to
> - one object (record),
> - a collection of records which is dynamically defined (for example, a
> result of some query),
> - a table/domain which is statically defined in the schema
> - anything else that can be represented by reference

References without limitations, while useful in some meta-theory, are
invitations to unrestrained graph theory. Not what most practicioners
need, as relational and OO and other languages all reign this in in
different ways.

> So you are right that domain is not a variable but reference to a domain
> can be stored in as many variables as we like just like we can store in
> variable references to records or result sets.

Your previous example showed a changing domain, n'est-ce pas?

> If we define null as absence then as a
> minor advantage we avoid problems with aggregation because absent things
> are simply not visible, i.e., null values are skipped.

So we can't count the number of "objects" that "contain" a "null
value"?

> No, and this precisely what I wanted to emphasize. Because you cannot
> count what does not exist.

So an object can "contain" a null, but we can't be aware of that fact
because the null cannot be counted? Is this some baroque application of
Heisenberg's Uncertainty Theorem to data management? :-)

> The problem is to understand that different
> things may exist in different dimensions.

Projection is the usual reference here, but projections aren't
completely unrestrained as you would have them.

> When we give such a definition then
> we get a very simple and effective data model (along with other principles).


>
> I am sorry, but the only thing I can advice is to read something about
> concept-oriented model. It is difficult because the description is bad
> (I actually lost an interest to it because almost everything is clear
> for me).

I wish you'd share it with the rest of us.

- erk

Tony Andrews

unread,
Jun 7, 2005, 11:55:08 AM6/7/05
to
Alexandr Savinov wrote:
> Here is even simnpler example. Assume you have a number of departments.
> There is two ways how you can represent them:
> 1. As records in a table of departments
> 2. As individual tables (possibly with support from one common
> meta-table from #1 as you noticed because relational model does not
> allow for tables to have fields and to be used as entities)

#1 is the right way


#2 is the wrong way

> It is very general and wide spread trade off and it is important to


> recognize that there is such an alternative (but it is prohibited to
> think so in RM).

What would be the benefits of approach #2? The drawbacks are obvious


enough. How would you express a relationship between departments with
approach #2? Or associate employees with departments?

> Another aspect of this problem. A schema is not part of the relational


> model but it is heavily used in complex applications. You know that any
> implementation has a table of tables and it is reflects the fact that
> tables are rows and have normal properties. Thus practice also confirms
> my hypothesis however currently there is no a data model (theory) for that.

I'm not sure what you mean by this. AFAIK it has always been the case

Jon Heggland

unread,
Jun 7, 2005, 11:54:28 AM6/7/05
to
In article <42a5bac9$1...@news.fhg.de>, sav...@host.com says...
> Jon Heggland schrieb:

> > In that case, you are using the rows in the "system table register"
> > table for your information; the actual tables you create and drop are
> > just irrelevant side effects -- you don't use them for anything at all.
>
> Yes, these are side effects if we are not going to insert any rows into
> those system table rows.

You are not making sense. In the RM, we do not insert rows into rows.
(Well, it is possible to have tuple-valued attributes, but I don't think
that is what you are talking about.)

> But this example was used to demonstrate that
> tables are used to represent some entities from the problem domain and
> in this sense it is possible to model the problem domain by using
> tables.

"Entity" is a very informal concept. Whether something is regarded as an
"entity" or not depends on the viewpoint. A relvar may be considered as
an entity, but it is not very useful to do so when your other entities
are departments, employees and managers and such -- it mixes
perspectives best kept separate. This is conceptual/semantic modelling,
which is more of an art than a science.

> The fact that tables are normally used to include records while
> records are normally used to include fields simply emphasizes their
> special use. In general case any entity has two aspects:
>
> - it can be viewed as a collection of other entities (collection here
> means formally logical sum or a set normally designated as {}),
>
> - it can be viewed as a combination of other entities (combination here
> means formally logical product designated as <>).

This makes no sense to me. Is an entity in your world a boolean
expression?

> - we can define an entity as a collection of other entities

Can we? That is a recursive definition. Where does it end?

> (let's call
> it 'instance of' relation between a table and its member). In this case
> the members physically belong to the collection.

Physically? What do you mean by that?

> - we can use a relation between an object and its value. In this case
> the record logically belongs to its values (other records), for example,
> a record <1, 5, 10> belongs to 1, to 5 and to 10 simultaniously.

What does "logically belongs" mean? Is "record", "object" and "entity"
the same in your terminology? Are 1, 5 and 10 also
records/objects/entities? Or values? What do they "logically belong" to?

> In relational model tables are simply containers (collection of other
> entities) and it is impossible to directly define new properties for
> them.

Is that a problem? Why?

> Records are simply objects (combinations of other entities) and we
> cannot add records to records.

"Record" is not a RM term. Do you mean tuples (rows)? If so, in what
sense is a tuple a combination of other tuples? And why, if this is true
in your world, can't we add tuples to tuples? I must admit I am on the
verge of invoking Date's incoherence principle here.

> And such a definition is really effective
> and covers a huge number of situations. However, the question was how
> can we treat tables as entities and can they be treated as entities at
> all.

Before we answer that, we should ask ourselves what the purpose of such
treatment would be. Of course, the catalog in a database treats the
tables as entities, but I don't think that is what you mean.

> > Alternatively, if the "system table register" is not organized as a
> > relational table, you are no longer using the relational model, and the
> > whole point is lost.
>
> As far as I understand the relational model does not use any system
> table or a table of tables - it is how RDBMSs are implemented.
> Relational model works independently of how this meta information is
> stored and if it exists at all.

It must of course exist in a real implementation. But I agree that it
has little relevance when discussing how the RM works. My point was just
that the "empty tables" approach to modelling is faulty -- an empty
table does not contain any information (except its "closed world
assumption" interpretation, but that is of limited use in such
circumstances).
--
Jon

Jon Heggland

unread,
Jun 7, 2005, 12:04:28 PM6/7/05
to
In article <42a5c072$1...@news.fhg.de>, sav...@host.com says...
> Jon Heggland schrieb:

> > Yes, if NULL means unknown.
>
> To say "if NULL means unknown" is the same as to say "if 0 is equal 1"
> or "if empty set is full set". Unknown and null are two different things
> and deserve to have to special designations.

That depends on how you define "NULL". You define it as absence; other
people obviously define it otherwise. It is of course a problem. The
point in this thread is how SQL defines it. ("Non-empty set" is a better
term than "full set", by the way.)

> NULL means absence
> UNKNOWN means presence with unknown identity

Not in SQL. Anyway, you subscribe to Codd's I-marks and A-marks, and
4VL? What about "unknown whether absent or not"?
--
Jon

Paul

unread,
Jun 7, 2005, 11:53:13 AM6/7/05
to
Alexandr Savinov wrote:
> To say "if NULL means unknown" is the same as to say "if 0 is equal 1"
> or "if empty set is full set". Unknown and null are two different things
> and deserve to have to special designations.
>
> NULL means absence
> UNKNOWN means presence with unknown identity

I don't think NULL should mean absence.

As an example, consider storing middle names. Some people don't have
middle names. But I think it would be good practice to store an empty
string for people without middle names, and reserve NULLs for people
whose middle name is unknown, whether or not it exists.

So NULL means unknown, irrespective of whether it's present or absent.

This is presupposing you want to have NULLs in the first place, which is
another question altogether.

Paul.

erk

unread,
Jun 7, 2005, 12:26:24 PM6/7/05
to
Alexandr Savinov wrote:
> By schema simplicity I meant the number of tables. Sometimes it is
> really simpler to have one wide table than numerous specific tables.
> There is alway trade off between these two extremes and do not say that
> having one wide table is good practice - it is only an example for
> demostrating some properties of nulls.

Not every "choice" represents an arbitrary continuum. There are
repercussions to those choices, and I have yet to encounter a
situation, even for simple databases, where the wide table with lots o'
nullable columns made sense.

> Tables are not entities in relational model in other conventional models
> (and it is one of fundamental drawbacks).

In what sense is it a drawback? Relational deals with predicates, not
tables or objects.

> However, they should be
> treated as entities at least formally (in a "good" data model).

Applying relational to the system catalogs for the database renders the
sort of meta-manipulation that I think you're leaning toward.

> I want to have everything treated as only one sort of things.

Why is that desirable? Occam's Razor says entities should not be
multiplied beyond necessity - and I think necessity dictates that there
is more than one type of thing.

> Thus theoretically we can
> express our semantics without rows by using only tables.

Have you seen Neo's XDb? I think you'd like it.

> One simple
> conclusion is that "data semantics can be expressed by tables rather
> than only by rows (in tables)".

How do you express any commonality between things, and how is that any
"better" (e.g. simpler) than relations which store facts that have
common sets of attributes?

> It is very strong and somewhat surprising result.

It's not actually very surprising, since it's been done before.

> In such an approach we do not need rows because
> anything can be modelled by means of tables.

I'd suggest bits - you can "model" anything with them.

> Such a model is ugly from
> relational point of view (although we simply followed an advice to
> decompose tables) however it is interesting from theoretical point of
> view (with big consequences). Manipulating data in this model means
> adding/remove tables rather than records and one table bears some
> semantics which is encoded in its static name.

Semantics encoded in the name alone? Can you give an example? I suspect
this is a meta-nonsense.

> Yet, again, you touch an issue which has nothing to do with null values
> and their semantics.

I think the above has much to do with nulls, even though I didn't
mention the word. Nulls are all "about" what we know about each fact X,
and how X relates to other facts Y and Z. The existence or
non-existence of nulls is at the heart of the matter.

> > A bad example. "Mostly cars"? Why not define the car and (other
> > product) relations properly?
>
> I deliberatly gave such an example!
> However, in real world complex applications we (unfortuanately) have a
> lot of such situations which cannot be avoided.

Yes, they can.

> In other words, it may
> well happen that for some record some attribute simply does not make
> sense (including bad design).

Coping with a pre-existing bad design is another matter - I thought we
were discussing doing good design?

> Theoretically it is not an issue of good or bad design.

Oh, then ignore my question. We'll proceed with judgement-free design.
:-)

> The problem is
> if we want to keep meaningless columns in one table or optimize
> representation. I mean that independent of our desire or schema any
> object in the database will formally have all dimensions. The question
> is only how we *represent* the database semantics. In one case null is
> written explicitly. In other cases it still exists as a value of an
> object along meaningless dimensions but is not explicitly written
> because of good schema. For database and for the model this object still
> has null as a value along this dimensions independent of how it is
> represented.

So null is there even if it's not? Then I much prefer my nulls
not-there.

> That is a (serious) problem in relational model because depending on
> chosen schema you may get different results.

Well, that's a good thing. The schema reflects the logical properties
of what you're "modeling." So naturally different choices will give you
different "results" - unless you think that a query should always
return the same results regardless of the underlying data and
structures?

> However, in good model
> normalization does not change the data semantics - it is only an issue
> of where and how it will be stored. You can store everything in one wide
> table or you can create specialized tables but all queries must return
> one and the same result. This is how concept-oriented model works where
> normalization is not an issue. The model has some dimensionality and
> data in it has some semantics and the question is what structure for
> dimensions we prefer - it is a design issue which has nothing to do what
> results will be returned by our queries.

Does that mean it's a performance consideration only? Surely semantics
has an impact on query results?

> If data semantics depends on
> the structure of its relations, how you use joins and how you use nulls
> then it is bad model.

The alternative, then, is to have 'data semantics' independent of the
relations (which are what one designs) and their constraints? So the
semantics arise magically from the data themselves?

Seriously, write to Neo.

- Tired in Pittsburgh

Tony Andrews

unread,
Jun 7, 2005, 12:28:39 PM6/7/05
to
Alexandr Savinov wrote:
> Tony Andrews schrieb:
> > Alexandr Savinov wrote:
> >
> >>Here is even simnpler example. Assume you have a number of departments.
> >>There is two ways how you can represent them:
> >>1. As records in a table of departments
> >>2. As individual tables (possibly with support from one common
> >>meta-table from #1 as you noticed because relational model does not
> >>allow for tables to have fields and to be used as entities)
> >
> >
> > #1 is the right way
> > #2 is the wrong way
>
> We do not have here "wrong" or "right". It is a matter of fact: we can
> model things by using records and we can model things by using tables.
> When you say "wrong" you probably mean it is not desirable but you still
> do it! If you have tables in your model then you have some things
> represented by them! Any model is some thing existing in the problem
> domain.

I mean it would be wrong to do so. You *cannot* and should not try to
model things by using tables with no rows.

> When you say "wrong" then you contradict to another principle. You
> actually suggest to keep all data in one common table because you find
> it wrong to decompose schema and introduce more tables for special
> purposes.

No, I don't suggest any such thing. I use different tables to record
different *kinds* of proposition. I would *never*, as you propose, use
different tables to represent single instances of the same kind of
proposition.

> > What would be the benefits of approach #2? The drawbacks are obvious
> > enough. How would you express a relationship between departments with
> > approach #2? Or associate employees with departments?
>
> As I wrote approach #2 is already used in any model that has tables in
> it because any table represents some element of the problem domain.
> However, RM does not provide means for manipulating and representing
> data in this way so it is qualified as illegal procedure.

And you think it should be legal because...?

> When you ask about benefits and indicate drawbacks then you are actually
> critisizing relational model. In other words, you came to the conclusion
> that RM does not allow us to deal with some entities such as those
> represented by tables. So you are right that it is difficult to
> represent relationships between departments in approach 2 but you need
> to address this question to those who is responsible for RM - not me!

No, *you* are the odd one out here, who for some reason thinks the RM
should offer two different ways to achieve one thing, the second being
your bizarre plan to create 1 table per fact.

> > I'm not sure what you mean by this. AFAIK it has always been the case
> > that we would expect that the relational schema (metadata) would itself
> > be defined in relations in the form of the "system catalog". No new
> > theory is needed, since it is just another application of the
> > relational model.
>
> No, it is not another application. Schema with its information is an
> integral part of the model so it is the same application. If you have a
> data model and you define a couple of tables then you may ask a question
> what properites these tables have and where they are living actually. RM
> does not ask these questions and does not provide an answer.

Because RM is logical, not physical. It doesn't care where the data
"lives".

Before embarking on this whole enterprise, have you actually studied
the relational model in depth? I did see that in your references in
one of your papers you listed only a 1970 paper by Codd. I think you
really need to have a very thorough understanding of the relational
model before you try to overthrow it, and I'm not convinced that you
have that at the moment - I can't see how you could make some of the
claims you do if you did! ;-)

Alexandr Savinov

unread,
Jun 7, 2005, 12:42:11 PM6/7/05
to
Jon Heggland schrieb:

> In article <42a5bac9$1...@news.fhg.de>, sav...@host.com says...
>
>>Jon Heggland schrieb:
>>
>>>In that case, you are using the rows in the "system table register"
>>>table for your information; the actual tables you create and drop are
>>>just irrelevant side effects -- you don't use them for anything at all.
>>
>>Yes, these are side effects if we are not going to insert any rows into
>>those system table rows.
>
>
> You are not making sense. In the RM, we do not insert rows into rows.
> (Well, it is possible to have tuple-valued attributes, but I don't think
> that is what you are talking about.)

In RM yes, we are not inserting rows into rows. But in reality a row in
a (system) table represents another table which can be used to insert
records. Since the table and its counter-part in the system catalogue
are the same we insert records into records. Again, this is not a part
of the relational model.

>>But this example was used to demonstrate that
>>tables are used to represent some entities from the problem domain and
>>in this sense it is possible to model the problem domain by using
>>tables.
>
>
> "Entity" is a very informal concept. Whether something is regarded as an
> "entity" or not depends on the viewpoint. A relvar may be considered as
> an entity, but it is not very useful to do so when your other entities
> are departments, employees and managers and such -- it mixes
> perspectives best kept separate. This is conceptual/semantic modelling,
> which is more of an art than a science.

By entities we normally mean something that has some properties (there
are some other important criteria).

So you prefer to separate things and add new dedicated mechanisms for
each use. I prefer to decrease the number of primary things and reduce
other to a small number of basic elements. Why to we need to have
conceptual/semantic modeling separate from other data modeling issues?

>>The fact that tables are normally used to include records while
>>records are normally used to include fields simply emphasizes their
>>special use. In general case any entity has two aspects:
>>
>>- it can be viewed as a collection of other entities (collection here
>>means formally logical sum or a set normally designated as {}),
>>
>>- it can be viewed as a combination of other entities (combination here
>>means formally logical product designated as <>).
>
>
> This makes no sense to me. Is an entity in your world a boolean
> expression?

If you interpret collection and combination as logical connectives (AND
and OR) then it can be expressed so, if they are algebraic operations
(sum and product) then it is algebraic expression, if they are set
operations (intersection and union) then...

Whatever name these operations have we have:

x = <a1,a2,...,an>{b1,b2,...,bm}

where ai and bi are other entities from the model. In particular, this
new entity x may be a member of some combination, say,
y=<...,x,...>{...} or some collection z=<...>{...,x,...}

>>- we can define an entity as a collection of other entities
>
>
> Can we? That is a recursive definition. Where does it end?

Recursion is prohibited. There exists top and bottom.

>>(let's call
>>it 'instance of' relation between a table and its member). In this case
>>the members physically belong to the collection.
>
>
> Physically? What do you mean by that?

If x = <a1,a2,...,an>{b1,b2,...,bm}
then x physically includes b1,b2,...,bm

The meaning is that physical inclusion is hierarchical and permanent. If
an entity bi was included (created) into x then it exists there forever
and cannot change its container (records cannot be moved without change
of identity and references are always constant).

>>- we can use a relation between an object and its value. In this case
>>the record logically belongs to its values (other records), for example,
>>a record <1, 5, 10> belongs to 1, to 5 and to 10 simultaniously.
>
>
> What does "logically belongs" mean? Is "record", "object" and "entity"
> the same in your terminology? Are 1, 5 and 10 also
> records/objects/entities? Or values? What do they "logically belong" to?

If x = <a1,a2,...,an>{b1,b2,...,bm}
then x is logically included into a1,a2,...,an

The logical inclusion is flexible and we can freely change it. A record
is included into all its values simultaniously as a group or category.

Records/objects/entities/items are one and the same - there is no
necessity to distinguish them.

Value is role of object in a combination where it is a member:

If x = <a1,a2,...,an>{b1,b2,...,bm} then
x, a1,a2,...,an are all normal objects but a1,a2,...,an play a role of
values for x.

>>In relational model tables are simply containers (collection of other
>>entities) and it is impossible to directly define new properties for
>>them.
>
>
> Is that a problem? Why?

It depends (too general question).

For relational model no, it is an advantage.

But when you create your 101st data model you start feeling that tables
have some additional meaning in addition to being a container for
records, i.e., you feel that in many cases tables behave like normal
entities (like records) and frequently we need to treat them like normal
entities.

>>Records are simply objects (combinations of other entities) and we
>>cannot add records to records.
>
>
> "Record" is not a RM term. Do you mean tuples (rows)? If so, in what
> sense is a tuple a combination of other tuples? And why, if this is true
> in your world, can't we add tuples to tuples? I must admit I am on the
> verge of invoking Date's incoherence principle here.

In concept-oriented terms:

r = <r1,r2,...,rn>{}

where r is a record and r1,r2,...,rn are other records.

Records, items, rows and tuples are used as equivalent (depending on
context).

A table is defined as follows:

t = <>{r1,r2,...,rn}

>>>Alternatively, if the "system table register" is not organized as a
>>>relational table, you are no longer using the relational model, and the
>>>whole point is lost.
>>
>>As far as I understand the relational model does not use any system
>>table or a table of tables - it is how RDBMSs are implemented.
>>Relational model works independently of how this meta information is
>>stored and if it exists at all.
>
>
> It must of course exist in a real implementation. But I agree that it
> has little relevance when discussing how the RM works. My point was just
> that the "empty tables" approach to modelling is faulty -- an empty
> table does not contain any information (except its "closed world
> assumption" interpretation, but that is of limited use in such
> circumstances).

Definitely empty tables are useless because they are costly resource
while records are cheap. There are also other obvious arguments.

But there was an opinion that tables are not entities and cannot be used
for modelling. I have another opinion - tables can be treated as tables
and frequently we need to treat them as entitites.

--
alex
http://conceptoriented.com

Alexandr Savinov

unread,
Jun 7, 2005, 12:48:19 PM6/7/05
to
Paul schrieb:

I meant that there are two basic interpretations:

1. as an absence of item (it is skiped from aggregates)
2. as a present but unknown item (it brings uncertainty in aggregates)

They are basic because they can be formally described. And then we need
at least to distinguish them, i.e., we cannot say "if 1. is 2".

The second question as you correctly noticed is what kind of behavior we
would like to have in our database. Here I would accept different
opinions and solutions but have to be consistent. If we say that special
value X means absence then the values have to absent for *all*
aggreagtion function rather than SUM will skip the value while COUNT
will take it into account.

--
alex
http://conceptoriented.com

Jon Heggland

unread,
Jun 7, 2005, 1:04:08 PM6/7/05
to
In article <42a5bf13$1...@news.fhg.de>, sav...@host.com says...
> Jon Heggland schrieb:

> >>But do not say that #2 is now equivalent to #1. Because the
> >>main issue here is that for each row from Folder table there is one real
> >>table which contains a set of items. Now very important point (for the
> >>solution): if in case #1 each item specifies the parent folder by using
> >>some field, i.e., by using legal relational facility, then in #2 each
> >>item belongs to its folder (table) by using 'instance of' relation,
> >>i.e., it is a row in this table. Thus we use for data modeling two
> >>absolutely different types of relations.
> >
> >
> > So you have added complexity, but not expressive power. That is what
> > most "new" data models do. It is pointless.
>
> I did not describe any data model here - you try to see an evil where it
> does not exist.

You described a strange way of representing some information in the RM;
it was noted that it didn't capture all the essential information, and
you introduced some vague functionality to remedy this -- which I
interpreted as non-relational. RM + new structures/operators = new data
model.

> I provided a couple of examples and a couple of
> alternative designs and interpreations with the purpose to demonstrate
> that tables can be viewed as entities - that is all.

Tables are viewed as entities in the database's catalog. I'm sorry, what
is your point again?

> >>And again, the whole example
> >>demonstrates that tables should be legal element for use in representing
> >>data semantics.
> >
> > No. It is not necessary.
>
> So why you represent real entities from your problem domain by tables?
> Again, it is not a matter of necessity, right or wrong.

I disagree, if we are talking about the qualities of data models. A data
model should contain only necessary and sufficient functionality.

> Whenever you create a table you want it to represent some entity.

Actually, I want it to record facts (propositions). "Entities" is a too
fuzzy and informal concept for me.

> If you define
> tables Departments, Products and Personel then you defined three kinds
> of entities and your model consists of three elements already! And you
> are saying it is not necessary.

No, of course these relvars are necessary. What is *not* necessary is
structure and operators to enable you to represent each department as a
table with no rows, and still be able to assign personnel to departments
in a proper manner.

> What is not necessary, representing the
> state of data in the database? I do not understand. The initial point
> was that tables represent real entities even if you do not like that.

I prefer the view that relvars contain true propositions about the real
world. What *is* a "real entity"? It depends---hence it is a concept of
limited use in formal discussions.

> You might define a table ProductItems and then keep your products in one
> wide table. And then you might find it inconvenient and two additional
> tables, say, Cars and Houses. So you introduced into your model two
> additional entities called Cars and Houses.

Please explain more clearly. Does ProductItems initially include Cars
and Houses? What are the attributes of Cars and Houses? Do they differ
from ProductItems? What inconvenience am I remedying?

> The only problem is that RM
> does not allow to add properties to these entities because they are
> represented by tables.

You mean each car and house is a table? Why on earth would I do that? If
not, what is keeping you from adding attributes to the Cars and Houses
tables?

> >>2. As individual tables (possibly with support from one common
> >>meta-table from #1 as you noticed because relational model does not
> >>allow for tables to have fields and to be used as entities)
> >
> > Again, you introduce unnecessary complexity.
>
> I do not introduce anything at all and if there is some complexity then
> it is done deliberatly - I provide examples which demonstrate that
> tables *are* used for modeling and representing data semantics.

You introduce "support from one common meta-table".

You "demonstrate that tables *are* used for modeling and representing
data semantics"---how? These examples seem contrived; are you suggesting
that anyone actually creates their databases along the lines of your #2?
I still don't see your point---it's like me saying I provide examples
that shovels *are* used upside down, therefore we should redesign them
to work equally well both ways.

> >>I return to the initial point: tables should be considered normal
> >>entities with special role and vice versa normal entities should be able
> >>to play a role of tables.
> >
> > Why? What purpose does this serve?
>
> Because our data has such a property.

What property? Nesting, inclusion? Are you arguing for
network/hierarchical models?

> We always represent entities by
> tables and want our records to store internal records.

What are "internal records"? Why do we want our record to store them?

> >>Another aspect of this problem. A schema is not part of the relational
> >>model but it is heavily used in complex applications. You know that any
> >>implementation has a table of tables and it is reflects the fact that
> >>tables are rows and have normal properties.
> >
> > No, tables (relvars) are not rows (tuples). A relvar can be *described*
> > *using* tuples. This is very different.
>
> Of course, and I absolutely agree here, tables are not vars - but only
> in relational model.

Vars? Rows/tuples, you mean?

> But in data modeling they are! And it would nice to
> have a model that formalizes this fact. As I illustrated in several
> previous examples we frequently use tables as records and vice versa.

In metamodelling, you mean? I still think your wording is sloppy---we do
not use tables as records, we store information about tables in records.
You can use the RM for this, as most relational-based implementations
do.

> You project everything on relational model so it does not make sense to
> continue.

I agree about the sense bit, but I do not think it is a case of
relational blindness on my part. It may rather be that we are talking
about different things without realising it. I have some trouble
understanding your messages; often, it seems that important words are
missing. In any case, I have interpreted you as believing the RM has
insufficient expressiveness for some purposes. I do not think you have
substantiated this.
--
Jon

erk

unread,
Jun 7, 2005, 1:12:06 PM6/7/05
to
Alexandr Savinov wrote:
> Yes, you are right, we need to connect the tables so that each table
> known its parent table. But it does not change the point - it only even
> better demonstrates limitations of relational model. One solution is to
> connect them statically via your application. Another solutions is where

> you can store this information a table from #1 where one row represents
> one table. But do not say that #2 is now equivalent to #1. Because the

> main issue here is that for each row from Folder table there is one real
> table which contains a set of items. Now very important point (for the
> solution): if in case #1 each item specifies the parent folder by using
> some field, i.e., by using legal relational facility, then in #2 each
> item belongs to its folder (table) by using 'instance of' relation,
> i.e., it is a row in this table. Thus we use for data modeling two
> absolutely different types of relations. And again, the whole example

> demonstrates that tables should be legal element for use in representing
> data semantics.

Here's what I can conclude from the above:
1. We have two relational database designs that accomplish the same
thing
2. #2 requires an additional relation to identify the various folders
(each of which is a separate table) as folders - this relation is never
detailed
3. There's no inherent advantage to #2 (that I can identify)
4. #2 requires more work

> Here is even simnpler example. Assume you have a number of departments.
> There is two ways how you can represent them:
> 1. As records in a table of departments

> 2. As individual tables (possibly with support from one common
> meta-table from #1 as you noticed because relational model does not
> allow for tables to have fields and to be used as entities)
>

> It is very general and wide spread trade off and it is important to
> recognize that there is such an alternative (but it is prohibited to
> think so in RM).

No, it's not. You CAN define a separate relation for each department -
for example, DEPATMENT_01, DEPARTMENT_02, etc. What I think you're
arguing for is multi-values, as in Pick. We've had that discussion
before, but in short, if you really want to store a list in an
attribute, you can. You just can't expect an RDBMS to know how to cope
with it, apart from operations you define over it.

But again, if you want to query the "contents" of several departments,
you now need to include some other relation, even if it's a sort of
catalog relation. None of the above looks in the least appealing to me,
and you've yet to identify the benefit.

> Once you start a new model you need to decide what
> entities will be represented by tables. Normally these are special
> entities which have high level organizing role. Sometimes we can to keep
> them as normal tables without support. But in many cases we need to have
> additional properites and structure for them. In the latter case we also
> duplicate them as normal records in a (meta-)table. I can continue with
> advantages and disadvantages and analysis of properties but if you
> recognize this point as an alternative then you can do it easily yourself.

I see no advantages - can you enumerate them?

> I return to the initial point: tables should be considered normal
> entities with special role and vice versa normal entities should be able
> to play a role of tables.

And you've still given no reasons for this. Your examples show
relational doing it more simply, and yielding a result with more
general power.

> Another aspect of this problem. A schema is not part of the relational
> model but it is heavily used in complex applications. You know that any
> implementation has a table of tables and it is reflects the fact that

> tables are rows and have normal properties. Thus practice also confirms
> my hypothesis however currently there is no a data model (theory) for that.

Nor does there need to be, although a common model would be nice. It's
fairly simple (and most DBMS vendors do it for SQL) to define "system
catalog" relations that allow you to use the full relational operators
on your database (which could have constraints you define - for
example, to control naming, enforce patterns, etc). This would be
simply a relational database design, not an extension of the theory.

- Eric

Alexandr Savinov

unread,
Jun 7, 2005, 1:10:21 PM6/7/05
to
erk schrieb:

> Alexandr Savinov wrote:
>
>>Another point is that variable is something that stores a reference.
>
>
> A variable stores a value. The value it stores can change over time.
> References and pointers are implementation "niceties" that are often
> overexposed; they are, in essence, variable-variables. Contrast with
> relational theory, in which relations are the only variables.
>
>
>>This reference may point to
>>- one object (record),
>>- a collection of records which is dynamically defined (for example, a
>>result of some query),
>>- a table/domain which is statically defined in the schema
>>- anything else that can be represented by reference
>
>
> References without limitations, while useful in some meta-theory, are
> invitations to unrestrained graph theory. Not what most practicioners
> need, as relational and OO and other languages all reign this in in
> different ways.

A constraint for variables is type, which specifies from where we are
allowed to take objects (type is also a reference but it references a
scope or container or table).

I prefer to distinguish only two types of references and variables:
- variables for referencing individual items/objects/records,
- variables for referencing collections (of items/objects/records),

>>So you are right that domain is not a variable but reference to a domain
>>can be stored in as many variables as we like just like we can store in
>>variable references to records or result sets.
>
>
> Your previous example showed a changing domain, n'est-ce pas?
>
>
>>If we define null as absence then as a
>>minor advantage we avoid problems with aggregation because absent things
>>are simply not visible, i.e., null values are skipped.
>
>
> So we can't count the number of "objects" that "contain" a "null
> value"?

Objects exhibit themselves by having 1. a reference (should be
constant), and 2. properties (may change). So when we say that an object
is absent because of having null (having no property) we mean that it is
absent only along one concrete dimension. If we count references (by
means of COUNT(*)) then it will return all objects in the domain
(reference are not null by definition - if it is null then the object is
deleted or marked as non-existing). But we get objects existing in some
dimension and then count them by their values in this dimension. In this
case objects with null values for this property should not be counted
(if we striclty follow the semantics of absence).

>>No, and this precisely what I wanted to emphasize. Because you cannot
>>count what does not exist.
>
>
> So an object can "contain" a null, but we can't be aware of that fact
> because the null cannot be counted? Is this some baroque application of
> Heisenberg's Uncertainty Theorem to data management? :-)

No, I would be glad to reach such a level :-)

The problem as I noticed is what concretely we are going to count. Any
object has its semantics distributed globally over the whole database.
This means that an object taken by itself is meaningless - it has only a
reference. So we can count objects by references and find all of them.
Further we can count objects as they exhibit themselves along available
dimensions, i.e., depending on their properties (see above). Then we can
count the same objects as they exhibit themselves in other parts of this
model (the notorious concept-oriented approach by the way).

>>The problem is to understand that different
>>things may exist in different dimensions.
>
>
> Projection is the usual reference here, but projections aren't
> completely unrestrained as you would have them.

Yes, projection is the same as aggregation (do not understand me
absolutely), i.e., aggregation is the process or a procedure which shows
how information/objects look like from some side/dimnension.

Deprojection is an opposite process (called also cylindrical extension)
but has a little use in data modeling.

--
alex
http://conceptoriented.com

Alexandr Savinov

unread,
Jun 7, 2005, 1:25:03 PM6/7/05
to
erk schrieb:

>>The problem is
>>if we want to keep meaningless columns in one table or optimize
>>representation. I mean that independent of our desire or schema any
>>object in the database will formally have all dimensions. The question
>>is only how we *represent* the database semantics. In one case null is
>>written explicitly. In other cases it still exists as a value of an
>>object along meaningless dimensions but is not explicitly written
>>because of good schema. For database and for the model this object still
>>has null as a value along this dimensions independent of how it is
>>represented.
>
>
> So null is there even if it's not? Then I much prefer my nulls
> not-there.

Assume you have two tables Products =<Price, Size> and Personel = <Age,
Salary>. Then you have a record in Products r=<$100, big>.

Question: What Age has the record r?

Answer: null.

Comment: Even if the null value is not stored explicitly because of such
two-table design, it does not change the data semantics, i.e., record r
in the canonical semantics has no value for dimension Age or it does not
exist in this dimension. If you transform equivalently the model then
you have a change to see them explicitly as real null values in
4-columns table. Independent of how you represent your data and how you
decompose schema the semantics reamains the same.

--
alex
http://conceptoriented.com

erk

unread,
Jun 7, 2005, 1:37:06 PM6/7/05
to
Alexandr Savinov wrote:
> But this example was used to demonstrate that
> tables are used to represent some entities from the problem domain and
> in this sense it is possible to model the problem domain by using
> tables.

Keep in mind that relations also "model" relationships, not just
"entities." They are predicates, which of course vary infinitely; data
design is the process of deciding the types of facts with which your
application needs to concern itself.

> Some consequences of such a definition can be found on
> conceptoriented.com (not all - many things are not described).

Some advice for your FAQs: I found them very difficult to get through
because of the sheer number of nouns, and inconsistent (or opaque)
usage. If nothing else, the volume of distinct nouns swamps any attempt
to understand them (at least for me).

Here's a partial list: concept, entity, space, collection, set,
combination, primitive concept, dimension, variable, domain, value,
item, characteristic, coordinate, reference, custom reference, group,
constraint, part, representation, description. There are many more, and
verbs and adjectives too, and none of it helps describe much to me
beyond the 4th page. You might want some "semantics" - e.g. algebraic
definitions, a glossary, something. I think fewer words might be better
than what you have.

> In relational model tables are simply containers (collection of other
> entities) and it is impossible to directly define new properties for
> them.

Every relation (not table - use that when discussing SQL, for clarity)
is defined by the designer. What properties are you talking about?

> Records are simply objects (combinations of other entities) and we

> cannot add records to records. And such a definition is really effective


> and covers a huge number of situations.

Just because it covers "a huge number of situations" doesn't mean other
methods don't cover those same situations, and better. Relations do.

> However, the question was how

> can we treat tables as entities and can they be treated as entities at
> all.

What does "entity" mean, and why is it valuable to treat X as one?

> There was an opinion that tables are not entities and have nothing
> to do with the data semantics.

Relations define the data - of course it's semantic.

> As far as I understand the relational model does not use any system
> table or a table of tables - it is how RDBMSs are implemented.
> Relational model works independently of how this meta information is
> stored and if it exists at all.

True, relational doesn't directly address the situation, but no
theoretical extension is needed - just, perhaps, a standard schema for
the system relations.

- erk

Alexandr Savinov

unread,
Jun 7, 2005, 1:48:37 PM6/7/05
to
Tony Andrews schrieb:
> Alexandr Savinov wrote:
>
>>Tony Andrews schrieb:
>>
>>>Alexandr Savinov wrote:
>>>
>>>
>>>>Here is even simnpler example. Assume you have a number of departments.
>>>>There is two ways how you can represent them:
>>>>1. As records in a table of departments
>>>>2. As individual tables (possibly with support from one common
>>>>meta-table from #1 as you noticed because relational model does not
>>>>allow for tables to have fields and to be used as entities)
>>>
>>>
>>>#1 is the right way
>>>#2 is the wrong way
>>
>>We do not have here "wrong" or "right". It is a matter of fact: we can
>>model things by using records and we can model things by using tables.
>>When you say "wrong" you probably mean it is not desirable but you still
>>do it! If you have tables in your model then you have some things
>>represented by them! Any model is some thing existing in the problem
>>domain.
>
>
> I mean it would be wrong to do so. You *cannot* and should not try to
> model things by using tables with no rows.

Again, nobody advices to do so and I hardly can imagine that somebody
does so. We are doing a pure experiment in order to make explicit some
interesting propertites.

No, I did not offer to have one table for one fact or one table for all
entities. But if you want to understand what your data mean then you
need to be able to see how it will look like if put in one table, or if
each record will be represented by one table. It is an imaginable
experiment what was said in the very beginning.

>>>I'm not sure what you mean by this. AFAIK it has always been the case
>>>that we would expect that the relational schema (metadata) would itself
>>>be defined in relations in the form of the "system catalog". No new
>>>theory is needed, since it is just another application of the
>>>relational model.
>>
>>No, it is not another application. Schema with its information is an
>>integral part of the model so it is the same application. If you have a
>>data model and you define a couple of tables then you may ask a question
>>what properites these tables have and where they are living actually. RM
>>does not ask these questions and does not provide an answer.
>
>
> Because RM is logical, not physical. It doesn't care where the data
> "lives".

But I do care where my data lives and I want to have a model where I can
ignore it if I do not care and where I can specify a location if it is
important. I want to see my data at the highest level of abstraction and
at the lowest level of abstraction.

> Before embarking on this whole enterprise, have you actually studied
> the relational model in depth? I did see that in your references in
> one of your papers you listed only a 1970 paper by Codd. I think you
> really need to have a very thorough understanding of the relational
> model before you try to overthrow it, and I'm not convinced that you
> have that at the moment - I can't see how you could make some of the
> claims you do if you did! ;-)

The only thing I can advice in this situation is to start reading from
the beginning of paper - not from the end. If you read Codd's papers in
the same way then I am not surprised that you do not understand his
ideas :-)

--
alex
http://conceptoriented.com

mAsterdam

unread,
Jun 7, 2005, 3:10:32 PM6/7/05
to
Marshall Spight wrote:
>>Most of the time, in database context NULL denotes, as you point out,
>>the _absence_ of a value where you would expect one - still you are
>>freely using "NULL value" as a special kind of value.
>
> In fact, I don't think this is true. The SQL standard mostly takes
> the stand that NULL means "there is a value, but we don't know
> what it is." This is not the same thing as saying that there isn't
> a value.

True. Sure you know not everybody is happy about that (and the lovely
'mostly'), but I have to admit it talks about databases :-)

> If SQL's NULL was really the absense of a value, then 1+NULL would
> be 1. (The sum of a series of numbers that is one long is that
> number.) Instead it is NULL, which is consistent with "value exists
> but is unknown" and inconsistent with "the absense of a value."

Yep. Sharp. Did I get infected by some of Date's writings?

> In fact, it is my expectation that it would not be very hard to
> come up with a precise, useful, and not too hard to understand
> semantics for allowing the system to deal with "value absent."

I am less confident. BTW this reminds me of people saying it wouldn't be
that tough to come up with a usable, relatively simple 'list' algebra.

> It doesn't appear to be possible to do this for "value not known"
> though; you get into this whole crappy 3VL space.
>
> You probably still would like to have user-defined special values.

Jan Hidders

unread,
Jun 7, 2005, 4:33:35 PM6/7/05
to
Marshall Spight wrote:
>>Most of the time, in database context NULL denotes, as you point out,
>>the _absence_ of a value where you would expect one - still you are
>>freely using "NULL value" as a special kind of value.
>
> In fact, I don't think this is true. The SQL standard mostly takes
> the stand that NULL means "there is a value, but we don't know
> what it is."

As a small side note: the reason that the "mostly" above is not
"completely" is because that would make the result of an SQL query
uncomputable.

-- Jan Hidders

Jan Hidders

unread,
Jun 7, 2005, 4:35:14 PM6/7/05
to
mAsterdam wrote:
>
> I am less confident. BTW this reminds me of people saying it wouldn't be
> that tough to come up with a usable, relatively simple 'list' algebra.

That's funny. That reminds me of people saying that such algebras are
already available in the literature. :-)

-- Jan Hidders

mAsterdam

unread,
Jun 7, 2005, 5:14:43 PM6/7/05
to

Earlier I wrote (in fact _we_ wrote, I summarized):

> [NULL]
> The insanity bit. No! The humility marker.
> mu: The absence of an answer to a question which requires an answer.
>
> /adj./
> 1. Attributes to something the absence of values.
> Ex: "The *null* set is the empty set, often represented by {}."
>
> /n. colloq./
> 1. A noted appearance of the absence of values.
> Ex: "This table contains *nulls*."
>
> Common usage:
>
> - Confusion arises when people use terms like "null value",
> a paradox to some, a contradictio in terminis to others.
>
> - Confusion arises due to the fact that nullness (the absence of value)
> is often represented on computers by the number 0.
> (Obviously, 0 is not null.)
>
> - In some contexts, 'null' and 'nil' mean the same thing; in others,
> they do not.
>
> In databases traditionally NULL is used and and opposed.
> If you want to go into this, please first search for
> mu NIL void NULL undef, 2VL 3VL.
>
> "It isn't the things we don't know that give us trouble.
> It's the things we know that ain't so." - Will Rogers

I now think this is to much from the "NULL is the absence of value"
point of view. Any suggestions for improvement?

paul c

unread,
Jun 7, 2005, 5:57:17 PM6/7/05
to
Alfredo Novoa wrote:
> On Mon, 06 Jun 2005 15:02:17 GMT, paul c <toledob...@oohay.ac>
> wrote:
>
>
>>i'm an admirer of Date's but i have to say i like your answer. no Latin
>>to look up, either.
>
>
> What's the problem with Latin? :)
>
>
> Regards
>

Latin is too adhoc! :)

p

Mikito Harakiri

unread,
Jun 7, 2005, 6:33:49 PM6/7/05
to

Paul wrote:
> I disagree; suppose 90% of salaries are unknown. Someone does a quick
> query to get the total estimated salary bill, not realising this. They
> are going to get a shock when the real thing comes through!
>
> I would have thought that it would be safest to return an "unknown",
> flagging to the user that there are unkown salaries in the column. Then,
> having been alerted to that fact, they can proceed to use COALESCE to
> explicitly treat the NULLs as zero if they want.
>
> But I think (and this is a subjective opinion) that treating unknowns as
> zeros or empty strings should be manually defined behaviour rather than
> the default.
>
> And the "salary" column could be thought of not as a numeric column, but
> as a column that holds the answer to the question: "What is employee X's
> salary?"
>
> Now adding support for this involves a lot more complication at the
> domain level, but maybe it's worth it. I guess only experience of using
> such a system will tell.

Because we have nulls the developers and entry clerk now can afford to
be lazy and type nulls even where it takes a little thought to figure
out the correct value. Take the Bonus/Sales_Comission column, for
example, in the Emp table. If Bonus is null I bet that this person had
_no_ bonus, or Bonus = zero. Naturally, someone would want to know what
the total (salary+bonus) compensation were and, oops, he has to program
around this nulls mess.

Gene Wirchenko

unread,
Jun 7, 2005, 7:17:38 PM6/7/05
to
On 6 Jun 2005 11:32:00 -0700, "Mikito Harakiri"
<mikharaki...@yahoo.com> wrote:

[snip]

>Hear, hear! It totally escapes me how could somebody invent putting
>nonumeric value ("not salaried") into numeric column. For all reporting
>purposes unknown salary is salary equal to 0. If you want to run a

Give me a report of the number of people for whom we do not know
their salary. If you have 0 meaning an unknown salary, and since 0
can also mean a salary of 0, you can not determine the count.

>report of all salaries that are wrong (and, therefore, are subject to
>adjustment) then, having a flag "salary entry is incorrect" is not
>going to help. An entry clerk that misses a digit in the salary is not
>going to mark it with such a flag anyway:-)

Sincerely,

Gene Wirchenko

paul c

unread,
Jun 7, 2005, 8:27:07 PM6/7/05
to
Gene Wirchenko wrote:
> On 6 Jun 2005 11:32:00 -0700, "Mikito Harakiri"
> <mikharaki...@yahoo.com> wrote:
>
> [snip]
>
>
>>Hear, hear! It totally escapes me how could somebody invent putting
>>nonumeric value ("not salaried") into numeric column. For all reporting
>>purposes unknown salary is salary equal to 0. If you want to run a
>
>
> Give me a report of the number of people for whom we do not know
> their salary. If you have 0 meaning an unknown salary, and since 0
> can also mean a salary of 0, you can not determine the count.
>
> ...

just wait until the next payday and count the number of people who
squawk. if nobody complains, drop the salary attribute from whatever
table you are looking at since it is clearly redundant (and wrong).

p

mountain man

unread,
Jun 7, 2005, 10:49:19 PM6/7/05
to
"Alfredo Novoa" <alfred...@hotmail.com> wrote in message
news:g2fba11tarn9pb4m8...@4ax.com...
>
> On Tue, 7 Jun 2005 15:07:09 +0200, Jon Heggland <hegg...@idi.ntnu.no>
> wrote:
>
>>In article <42a591c4$0$41901$ed26...@ptn-nntp-reader03.plus.net>,
>>pa...@test.com says...
>>> Alfredo Novoa wrote:
>>> >>Why not say then that all aggregates that involve a NULL return NULL?
>>> >
>>> > This is what SQL does. Sometimes we want to get a value but we get a
>>> > null instead.
>>>
>>> Are you sure? I've just tried summing a column that contains a NULL in
>>> PostgreSQL and it doesn't return NULL - it treats the NULLs as zeros.

>>
>>Not exactly: SQL ignores NULLs in aggregate functions (except COUNT(*)).
>>It is not treated as zero for AVG, for instance. Also note that x + NULL
>>evaluates to NULL; therefore, SQL's SUM is not iterated addition -- it
>>has a much more complicated definition.
>
> Thanks for the clarification.

On the contrary, the answer is wrong for the MS TSQL.
The AVG and SUM commands ignore nulls - they are
not treated as *anything*, which in fact, they're not.

And x + NULL does not evaluate to NULL but x.
See the appropriate SET command in another post.
Here the SQL SUM *is* iterated addition.


> Here is clear that SQL nulls are a complete botch-up.

On the contrary, some understanding of the management
of nulls by SQL is missing some relevant information, and
I suspect it may be purposeful in order to mimick the
rhetorical [see Gittens] arguments of Fearless Leader
CJDate ;-)


--
Pete Brown
IT Managers & Engineers
Falls Creek
Australia
www.mountainman.com.au/software


mountain man

unread,
Jun 7, 2005, 10:49:17 PM6/7/05
to
"paul c" <toledob...@oohay.ac> wrote in message
news:IDipe.1593625$6l.719448@pd7tw2no...
> Paul wrote:

>> Mikito Harakiri wrote:
>>
>>>Hear, hear! It totally escapes me how could somebody invent putting
>>>nonumeric value ("not salaried") into numeric column. For all reporting
>>>purposes unknown salary is salary equal to 0.
>>
>>
>> I disagree; suppose 90% of salaries are unknown. Someone does a quick
>> query to get the total estimated salary bill, not realising this. They
>> are going to get a shock when the real thing comes through!
>> ...
>
>
> they'll get a bigger shock when the business goes under. if 90% of
> salaries are unknown, serves them right. they have bigger problems than
> the database.


This defines the fractal basin boundary between use of theory
and use of practice in regard to database systems theory.

An automated data integrity exception identification system
serves many purposes for the organisation (who owns the
data, and for whom the data has the "ultimate" meaning).
Especially seeing salary is but one element in a system
which may have thousands of elements.

It is loading more messages.
0 new messages