Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: 3 value logic. Why is SQL so special?

4 views
Skip to first unread message
Message has been deleted

frosty

unread,
Sep 8, 2006, 3:49:37 PM9/8/06
to
You should read the thread with the subject:
"So what's null then if it's not nothing?"

--
frosty

Karen Hill wrote:
> X-No-Archive:yes
>
> I know that visual basic, lisp, python, c , c++ , perl, all have 2
> value logic. What makes SQL so special that it needs 3vl when all
> these langauges make do with 2vl?
>
> I read a book on SQL which was authored by Date before I knew of the
> whole Date controversy and when I was learning SQL. I was thoroughly
> confused when he went on a spiel about nulls. I echoed this sentiment
> at work and was chastized and told that Date is a kook. Yet I have
> some time later accepted nulls and work with them just fine. In the
> back of my mind though, coming from a C++ background I still feel that
> if c++ can make it on 2vl why not SQL?
>
> Why do nulls make us feel so strange?


Evan Keel

unread,
Sep 8, 2006, 6:34:18 PM9/8/06
to

"Karen Hill" <karen_...@yahoo.com> wrote in message
news:1157742315....@b28g2000cwb.googlegroups.com...

> X-No-Archive:yes
>
> I know that visual basic, lisp, python, c , c++ , perl, all have 2
> value logic. What makes SQL so special that it needs 3vl when all
> these langauges make do with 2vl?
>
> I read a book on SQL which was authored by Date before I knew of the
> whole Date controversy and when I was learning SQL. I was thoroughly
> confused when he went on a spiel about nulls. I echoed this sentiment
> at work and was chastized and told that Date is a kook. Yet I have
> some time later accepted nulls and work with them just fine. In the
> back of my mind though, coming from a C++ background I still feel that
> if c++ can make it on 2vl why not SQL?
>
> Why do nulls make us feel so strange?
>

As you know, 3 valued logic is simple: true, false, don't know. The problem
with null is that it is often used as a default value which is almost always
surely wrong. Null does not mean N/A, or missing. That's why you can't do
arithmetic with it. Null is a semantic construct that started out correctly
and ended up in most RDMS implementations. FWIW, I try to create subtypes
where there are no nulls.

Evan


Bob Badour

unread,
Sep 8, 2006, 10:33:08 PM9/8/06
to
Evan Keel wrote:

> "Karen Hill" <karen_...@yahoo.com> wrote in message
> news:1157742315....@b28g2000cwb.googlegroups.com...
>
>>X-No-Archive:yes
>>
>>I know that visual basic, lisp, python, c , c++ , perl, all have 2
>>value logic. What makes SQL so special that it needs 3vl when all
>>these langauges make do with 2vl?
>>
>>I read a book on SQL which was authored by Date before I knew of the
>>whole Date controversy and when I was learning SQL. I was thoroughly
>>confused when he went on a spiel about nulls. I echoed this sentiment
>>at work and was chastized and told that Date is a kook. Yet I have
>>some time later accepted nulls and work with them just fine. In the
>>back of my mind though, coming from a C++ background I still feel that
>>if c++ can make it on 2vl why not SQL?
>>
>>Why do nulls make us feel so strange?
>
> As you know, 3 valued logic is simple: true, false, don't know.

Actually, I don't know that and neither does anyone else really. The
3-vl semantics in SQL are inconsistent. Sometimes null behaves like
unknown and other times it behaves like inapplicable.


The problem
> with null is that it is often used as a default value which is almost always
> surely wrong.

That's hardly the only or even the primary problem. Null--even when
handled with the utmost care--breaks fundamental identities.


Null does not mean N/A, or missing.

Except that sometimes it does.


That's why you can't do
> arithmetic with it.

What do you mean? x + null = null <-- that's arithmetic.

Except that sometimes x + null = x due to the inconsistencies in SQL.


Null is a semantic construct that started out correctly
> and ended up in most RDMS implementations.

Some of the best minds in database management would disagree that it
ever started out correctly.


FWIW, I try to create subtypes
> where there are no nulls.

That's nice. I simply don't allow null in any database I design.

Alexandr Savinov

unread,
Sep 9, 2006, 6:27:01 AM9/9/06
to
Karen Hill schrieb:

> X-No-Archive:yes
>
> I know that visual basic, lisp, python, c , c++ , perl, all have 2
> value logic. What makes SQL so special that it needs 3vl when all
> these langauges make do with 2vl?
>
> I read a book on SQL which was authored by Date before I knew of the
> whole Date controversy and when I was learning SQL. I was thoroughly
> confused when he went on a spiel about nulls. I echoed this sentiment
> at work and was chastized and told that Date is a kook. Yet I have
> some time later accepted nulls and work with them just fine. In the
> back of my mind though, coming from a C++ background I still feel that
> if c++ can make it on 2vl why not SQL?
>
> Why do nulls make us feel so strange?

Because they have different formal and informal interpretations.

As a matter of fact, in most cases you use 2-valued logic:

either an element exists in a database or it does not.

So the two values can be called "existence" and "non-existence". Indeed,
if your take any database then you either will find some concrete
element in it or you will not.

After that the question is how to denote these two cases and how to
interpret them formally from the point of view of numerous existing
logics. All these issues produce infinite and frequently fruitless
debates mostly of religious character. But the fact is that in most
practical cases given a database you either find a data item in it or
you cannot find it there. 3rd value, 4th value or an infinite number of
values (say, fuzzy logic) can be introduced but these approaches are not
widely used.

As to nulls, then the simplest and the most natural interpretation of it
is as "non-existence". More precisely, null is a marker for a thing that
is known to be non-existing. If you ask why to mark things as
non-existing, then the answer is that in many cases it is more efficient
to use this method instead of physical deletion.

--
http://conceptoriented.com

Evan Keel

unread,
Sep 10, 2006, 7:37:26 PM9/10/06
to

"Bob Badour" <bba...@pei.sympatico.ca> wrote in message
news:EJpMg.15039$9u.1...@ursa-nb00s0.nbnet.nb.ca...

I'm with you on this one.

Evan


JOG

unread,
Sep 11, 2006, 11:47:05 AM9/11/06
to
> X-No-Archive:yes
>
> I know that visual basic, lisp, python, c , c++ , perl, all have 2
> value logic. What makes SQL so special that it needs 3vl when all
> these langauges make do with 2vl?
>
> I read a book on SQL which was authored by Date before I knew of the
> whole Date controversy and when I was learning SQL. I was thoroughly
> confused when he went on a spiel about nulls. I echoed this sentiment
> at work and was chastized and told that Date is a kook. Yet I have
> some time later accepted nulls and work with them just fine. In the
> back of my mind though, coming from a C++ background I still feel that
> if c++ can make it on 2vl why not SQL?

Well, clearly it could, and indeed should. It is extremely simple to
recognise Nulls as gibberish if one simply rewinds back to the
proposition being recorded. Consider that

"The employee with empid 101 has a firstname John, lastname Doe and
middlename NULL."

makes absolutely no sense in native english. Either one is trying to
say:

"The employee with empid 101 has a firstname John and lastname Doe"
AND
"The employee with empid 101 and has no middlename"

or trying to state that:

"The employee with empid 101 has a firstname John and lastname Doe"
AND
"The employee with empid 101 does have a middlename, but we don't know
what it is"

To this day, I don't see any relevance whatsoever for 3VL in recording
these two distinct propositions. (There is of course a third option,
where one doesn't know anything /at all/ concerning the person's
middlename, but then why /on earth/ would you be trying to record
something about it anyway?)

As ever its just bad design, generated by those who have no idea what
the tool they are using actually does.

Thank god these people aren't building bridges.

Roy Hann

unread,
Sep 11, 2006, 12:45:01 PM9/11/06
to
"JOG" <j...@cs.nott.ac.uk> wrote in message
news:1157989625.2...@q16g2000cwq.googlegroups.com...

>
> Well, clearly it could, and indeed should. It is extremely simple to
> recognise Nulls as gibberish if one simply rewinds back to the
> proposition being recorded. Consider that

I think you have shown a few specific glitches without giving a proper
appreciation of the full horror of the situation. From my point of view, as
a practitioner slaving at the code face, THE overwhelming problem with being
allowed to fob the DBMS off with a null is that it allows the (so-called)
database designer to just sweep a whole lot of inconvenient details under
the carpet.

He "appears" to have produced a complete database design, and that
appearance is reinforced by the fact that he can write an SQL script that
will run without error to create a database. Unfortunately he's done next
to nothing about capturing any understanding of what data the application
will be expected to handle or what the application should (and should not)
do with it. Instead of being forced to discover that there are, for
example, six different types of customer, with different business rules, the
database designer declares a one-shape-fits-all table with a lot of nullable
attributes and leaves me to figure out what is really going on, and to write
the giant tangle of code to make it happen. Gee, thanks. What a hero.

I have come to suspect that *at least* 75% of the many tens of thousand of
lines of code I see in a year are there only because some slack-ass DB
designer didn't want to spec out a few more tables.

Roy


Paul

unread,
Sep 11, 2006, 2:24:56 PM9/11/06
to

Bob Badour <bba...@pei.sympatico.ca> wrote:


> > FWIW, I try to create subtypes
> > where there are no nulls.

> That's nice. I simply don't allow null in any database I design.


Why not?

(Personal work experience). Aircraft. Flight Schedules.

Dep time,
Arr time,

&c.


Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
Actual_Departure_Time = <null> untill we know different, then we put
in a value.


Otherwise you have to take account of the fact that these aircraft
took off in 9999 or else in 0001 or whatever.


Null is easier. The aircraft hasn't taken off yet, we don't know when
it will take off, or even if it will take off.


Paul...

--

plinehan __at__ yahoo __dot__ __com__

XP Pro, SP 2,

Oracle, 9.2.0.1.0 (Enterprise Ed.)
Interbase 6.0.1.0;

When asking database related questions, please give other posters
some clues, like operating system, version of db being used and DDL.
The exact text and/or number of error messages is useful (!= "it didn't work!").
Thanks.

Furthermore, as a courtesy to those who spend
time analysing and attempting to help, please
do not top post.

JOG

unread,
Sep 11, 2006, 2:36:28 PM9/11/06
to
Paul wrote:
> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
>
> > > FWIW, I try to create subtypes
> > > where there are no nulls.
>
> > That's nice. I simply don't allow null in any database I design.
>
>
> Why not?

you're kidding right?

>
> (Personal work experience). Aircraft. Flight Schedules.
>
> Dep time,
> Arr time,
>
> &c.
>
>
> Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
> Actual_Departure_Time = <null> untill we know different, then we put
> in a value.
>
>
> Otherwise you have to take account of the fact that these aircraft
> took off in 9999 or else in 0001 or whatever.
>
>
> Null is easier. The aircraft hasn't taken off yet, we don't know when
> it will take off, or even if it will take off.

So an aircraft might not take off and yet it has a departure time
attribute? That makes absolutely no sense at all.

Bob Badour

unread,
Sep 11, 2006, 2:48:24 PM9/11/06
to
Paul wrote:

>
> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
>
>
>>> FWIW, I try to create subtypes
>>>where there are no nulls.
>
>
>
>>That's nice. I simply don't allow null in any database I design.
>
>
>
> Why not?
>
> (Personal work experience). Aircraft. Flight Schedules.
>
> Dep time,
> Arr time,
>
> &c.
>
>
> Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
> Actual_Departure_Time = <null> untill we know different, then we put
> in a value.
>
>
> Otherwise you have to take account of the fact that these aircraft
> took off in 9999 or else in 0001 or whatever.

Or whatever. One describes the actual departure time of a flight in a
relation describing actual departures and not in a relation describing
aircraft.


> Null is easier. The aircraft hasn't taken off yet, we don't know when
> it will take off, or even if it will take off.

How is null easier than not inserting anything into an actual departures
relation?

Anne & Lynn Wheeler

unread,
Sep 11, 2006, 3:02:28 PM9/11/06
to

"JOG" <j...@cs.nott.ac.uk> writes:
> So an aircraft might not take off and yet it has a departure time
> attribute? That makes absolutely no sense at all.

i think the characteristic was scheduled departure time. scheduled
departure time can be useful for a number of things.

commercial aircraft tend to have scheduled departure times (helps give
you an approx idea when to show up at the airport) ... as well as
scheduled arrival times.

at one time the FAA didn't interfer with departure times ... and let
aircraft circle if weather cut down on capacity of airfield landing
rates. at some point the FAA started holding plane departures if there
was predictions about reduced capacity to land the plane.

you frequently hear pilots making announcements about particular
arrival being ahead of time or late ... which is the difference
between the scheduled arrival time and the actual arrival time. both
the actual arrival time and the difference (early/late) between the
actual arrival time and the scheduled arrival time ... aren't actually
known until near to the time that the plane actually arrives.

in the routes database used for scheduling reservations involving
connecting flights ... there typically is an attempt to match
scheduled arrival time to connecting scheduled departure time (within
some fudge factor).

however as anybody who has had a late arrival and missed their
connecting flight ... the scheduled arrival time and the actual
arrival time may be different. also some carriers when they see that
their are a significant connecting passengers ... on a flight there is
starting to show a predicted late arrival time ... may attempt to
delay the departure of some number of flights as convinience to
connecting passengers. however, this can have cascading effects if the
connecting departing flight then impacts other passengers that have
connecting flights at the next arrival.

lots of flights that have significant, long term differences between
scheduled arrival and actual arrival may indicate various loading and
capacity issues and reason to update projected schedules. changes in
projected specific schedules then can have cascading effects with
regard to scheduling departure/arrivals for purposes of route and load
planning involving connecting flights.

various past posts mentioning 3-value logic
http://www.garlic.com/~lynn/2003g.html#40 How to cope with missing values - NULLS?
http://www.garlic.com/~lynn/2004f.html#2 Quote of the Week
http://www.garlic.com/~lynn/2004l.html#75 NULL
http://www.garlic.com/~lynn/2005.html#15 Amusing acronym
http://www.garlic.com/~lynn/2005b.html#17 [Lit.] Buffer overruns
http://www.garlic.com/~lynn/2005i.html#35 The Worth of Verisign's Brand
http://www.garlic.com/~lynn/2005m.html#19 Implementation of boolean types
http://www.garlic.com/~lynn/2005t.html#20 So what's null then if it's not nothing?
http://www.garlic.com/~lynn/2005t.html#23 So what's null then if it's not nothing?
http://www.garlic.com/~lynn/2005t.html#33 What ever happened to Tandem and NonStop OS ?
http://www.garlic.com/~lynn/2005u.html#12 3vl 2vl and NULL
http://www.garlic.com/~lynn/2006e.html#34 CJ Date on Missing Information

various past posts mentiong working with OAG database of all
(world-wide) commercial schedule flight segments as part of airline
res system (over 4k airports and over 400k commercial scheduled flight
segments)
http://www.garlic.com/~lynn/99.html#136a checks (was S/390 on PowerPC?)
http://www.garlic.com/~lynn/2000f.html#20 Competitors to SABRE?
http://www.garlic.com/~lynn/2001d.html#69 Block oriented I/O over IP
http://www.garlic.com/~lynn/2002g.html#2 Computers in Science Fiction
http://www.garlic.com/~lynn/2002j.html#83 Summary: Robots of Doom
http://www.garlic.com/~lynn/2003o.html#17 Rationale for Supercomputers
http://www.garlic.com/~lynn/2004b.html#6 Mainframe not a good architecture for interactive workloads
http://www.garlic.com/~lynn/2004o.html#23 Demo: Things in Hierarchies (w/o RM/SQL)
http://www.garlic.com/~lynn/2004q.html#85 The TransRelational Model: Performance Concerns
http://www.garlic.com/~lynn/2005o.html#24 is a computer like an airport?
http://www.garlic.com/~lynn/2005p.html#8 EBCDIC to 6-bit and back
http://www.garlic.com/~lynn/2006o.html#18 RAMAC 305(?)


Anne & Lynn Wheeler

unread,
Sep 11, 2006, 3:19:49 PM9/11/06
to
Bob Badour <bba...@pei.sympatico.ca> writes:
> How is null easier than not inserting anything into an actual
> departures relation?

re:
http://www.garlic.com/~lynn/2006q.html#22 3 value logic. Why is SQL so special?

you have a row entry for a specific flight that has fields for
scheduled and actual departures and arrival ... possibly with
provision for computed values (difference between scheduled and
actual).

or possibly you have two different tables ... one with scheduled
flights and another with actual flights. an entry isn't inserted in
the table of actual flights until the flight has actually happened.

the absence of a row in the actual flight table takes the place of
null value fields (for not yet known information) in the table of
specific flights. two tables then possibly involves referential
integrity if there is an attempt to match an actual flight against a
scheduled flight (in two different tables).

there is also, always the possibility that you might have an
unscheduled actual flight ... an unplanned flight for which there
hadn't been a (pre-)scheduled departure/arrival ... but for which
there is actual departure ... with initially a projected arrival (as
opposed to a scheduled arrival) ... and then eventually an actual
arrival.

Bob Badour

unread,
Sep 11, 2006, 4:06:09 PM9/11/06
to

And your point would be? What is the theoretical foundation or even the
ad hoc design criterion to choose among the options you present?

Bob Badour

unread,
Sep 11, 2006, 4:09:36 PM9/11/06
to
Anne & Lynn Wheeler wrote:

> "JOG" <j...@cs.nott.ac.uk> writes:


>> Paul wrote:
>>
>>> Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
>>> Actual_Departure_Time = <null> untill we know different, then we put
>>> in a value.

>>So an aircraft might not take off and yet it has a departure time


>>attribute? That makes absolutely no sense at all.
>
> i think the characteristic was scheduled departure time. scheduled
> departure time can be useful for a number of things.

The characteristic was quite explicitly the actual departure time. If
you were less interested in self-promotion, you might actually pause
long enough to read with comprehension.

paul c

unread,
Sep 11, 2006, 7:16:53 PM9/11/06
to
Paul wrote:
>
> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
>
>>> FWIW, I try to create subtypes
>>> where there are no nulls.
>
>> That's nice. I simply don't allow null in any database I design.
>
>
> Why not?
>
> (Personal work experience). Aircraft. Flight Schedules.
>
> Dep time,
> Arr time,
>
> &c.
> ...

The point gets made here over and over that personal work experience is
most often the worst basis for talking about designs but some people
continue to tout it. Before Codd came along, the airlines had THE
state-of-the-art networks after that they "missed the boat" if I may put
it that way. I'd say any airline example is at least doubly poor
because of their tpf/acp heritage which placed almost total emphasis on
physical design to the detriment of the logical, not to mention the
adhoc and mostly thoughtless IATA regulations/so-called standards. My
personal experience helping one airline go bankrupt argued more for
suppressing the bulk of their traditional techniques wherever possible.
(Of course I wasn't completely responsible for that bankruptcy, eg.,
that airline had 6 people in IT for every plane!)

p

paul c

unread,
Sep 11, 2006, 7:38:32 PM9/11/06
to

Hear, hear. So rare to see the most important argument against nulls
stated, usually the arguments are degenerate technicalese. A DB with
nulls is most likely incomplete. Told this, even the most incompetent
exec' ought to be able to ask the right question about the latest design.

p

Paul

unread,
Sep 12, 2006, 2:32:56 PM9/12/06
to


"JOG" <j...@cs.nott.ac.uk> wrote:


> > > That's nice. I simply don't allow null in any database I design.
> > Why not?
> you're kidding right?


No.

> > Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
> > Actual_Departure_Time = <null> untill we know different, then we put
> > in a value.

> So an aircraft might not take off and yet it has a departure time
> attribute? That makes absolutely no sense at all.


I see that you are from Nottigham university - I would have thought
that a reasonable grasp of the English language was necessary to
either take or give courses there.

What part of "Scheduled" did you not understand?

I can *_schedule_* a flight for Dublin airport for 6 months time - the
damn airport might have been dirty bombed by Osama and his pals in the
meantime, meaning that the aircraft will never take off from there.

I don't know how familiar you are with aviation, but *_actual_*
departure time is important - pilots fill out VR's (Voyage Reports)
which are legally binding documents which have to have the actual
departure time (apart from pay issues for staff).

Now, if the field "Actual_Departure_Time" is not null, then it has to
be something - yes?

That "something" is normally some arbitrary value - i.e. 01/01/1970 or
whatever - the point is any application programmer has to take that
system defined arbitrary value into account when writing code against
the system.

*_WHEN_* the value used is <null>, then the programmer doesn't need
any special knowledge of the system - i.e. "By the way, the date for
planes which haven't taken off yet is xxx/yyy/zzzz" - just a simple,
"Oh, that's null until the VR is filed".


What could be simpler or more logical?

Paul

unread,
Sep 12, 2006, 2:42:26 PM9/12/06
to


Bob Badour <bba...@pei.sympatico.ca> wrote:


> > Otherwise you have to take account of the fact that these aircraft
> > took off in 9999 or else in 0001 or whatever.

> Or whatever. One describes the actual departure time of a flight in a
> relation describing actual departures and not in a relation describing
> aircraft.


Being your usual irascible self, eh Bob? The table I was describing
was Flight_Schedules, there's a before, a during and an after.

That way, one can run reports on, say, Scheduled_Departure_Time
against Actual_Departure_Time to check on punctuality (a stat that
some airlines publish in their advertising - I know that Ryanair
does).


What value would *_you_* put in for *_Actual_Departure_Time_* in such
a table?


> > Null is easier. The aircraft hasn't taken off yet, we don't know when
> > it will take off, or even if it will take off.

> How is null easier than not inserting anything into an actual departures
> relation?


The relation has to exist before the flight takes off - the Scheduled
time is there, the pilot's name might be there, the aircraft type
might be there, but the actual time is a field which should be in the
table, with a null value until such time as the Ops staff are happy
that it has taken off - unless you are suggesting a separate table?

David Portas

unread,
Sep 12, 2006, 2:59:41 PM9/12/06
to
Paul wrote:
>
> Now, if the field "Actual_Departure_Time" is not null, then it has to
> be something - yes?

I think you did not read or understand JOG's reply. If the entity does
not have an Actual_Departure_Time attribute then it should be
represented by a table that does not have an Actual_Departure_Time
attribute. Your alternative is unsound in several ways.

>
> That "something" is normally some arbitrary value - i.e. 01/01/1970 or
> whatever - the point is any application programmer has to take that
> system defined arbitrary value into account when writing code against
> the system.
>
> *_WHEN_* the value used is <null>, then the programmer doesn't need
> any special knowledge of the system - i.e. "By the way, the date for
> planes which haven't taken off yet is xxx/yyy/zzzz" - just a simple,
> "Oh, that's null until the VR is filed".
>
>
> What could be simpler or more logical?


Put like that, almost anything. You have made the mistake of thinking
that Null is a value. It is not. Nulls break everything that is
logical, including such simple matters as assignment, the Information
Rule and Data Independence.

--
David Portas

Bob Badour

unread,
Sep 12, 2006, 3:26:44 PM9/12/06
to
Paul wrote:

> "JOG" <j...@cs.nott.ac.uk> wrote:
>
>>>>That's nice. I simply don't allow null in any database I design.
>>>
>>>Why not?
>>
>>you're kidding right?
>
> No.
>
>
>>>Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
>>>Actual_Departure_Time = <null> untill we know different, then we put
>>>in a value.
>
>>So an aircraft might not take off and yet it has a departure time
>>attribute? That makes absolutely no sense at all.
>
> I see that you are from Nottigham university - I would have thought
> that a reasonable grasp of the English language was necessary to
> either take or give courses there.
>
> What part of "Scheduled" did you not understand?
>
> I can *_schedule_* a flight for Dublin airport for 6 months time - the
> damn airport might have been dirty bombed by Osama and his pals in the
> meantime, meaning that the aircraft will never take off from there.
>
> I don't know how familiar you are with aviation, but *_actual_*
> departure time is important - pilots fill out VR's (Voyage Reports)
> which are legally binding documents which have to have the actual
> departure time (apart from pay issues for staff).
>
> Now, if the field "Actual_Departure_Time" is not null, then it has to
> be something - yes?

Not if it doesn't yet exist.


> That "something" is normally some arbitrary value - i.e. 01/01/1970 or
> whatever - the point is any application programmer has to take that
> system defined arbitrary value into account when writing code against
> the system.

Since your assumption that it has to be something is false, everything
you predicate on the assumption is unsound.


> *_WHEN_* the value used is <null>, then the programmer doesn't need
> any special knowledge of the system - i.e. "By the way, the date for
> planes which haven't taken off yet is xxx/yyy/zzzz" - just a simple,
> "Oh, that's null until the VR is filed".
>
> What could be simpler or more logical?

Logic.

Bob Badour

unread,
Sep 12, 2006, 3:42:24 PM9/12/06
to
Paul wrote:

> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
>>>Otherwise you have to take account of the fact that these aircraft
>>>took off in 9999 or else in 0001 or whatever.
>
>>Or whatever. One describes the actual departure time of a flight in a
>>relation describing actual departures and not in a relation describing
>>aircraft.
>
> Being your usual irascible self, eh Bob? The table I was describing
> was Flight_Schedules, there's a before, a during and an after.

I disagree. The attribute you described was for an actual completed
flight and not for a schedule at all. A scheduled flight would have a
scheduled departure and a scheduled arrival defining a scheduled before,
a scheduled during and a scheduled after. As anyone who has taken a few
flights--especially anyone who has flown on the Romanian national
airline--can attest, the scheduled before, scheduled during and
scheduled after need have no particular relationship to the actual
completed before, actual completed during or actual completed after,
which any reasonably intelligent person would define in a sentence
describing an actual completed flight using an actual departure and an
actual arrival.


> That way, one can run reports on, say, Scheduled_Departure_Time
> against Actual_Departure_Time to check on punctuality (a stat that
> some airlines publish in their advertising - I know that Ryanair
> does).

Since no NULL attribute is required to form the query in question, I
fail to see any point in what you wrote here.


> What value would *_you_* put in for *_Actual_Departure_Time_* in such
> a table?

I would not use any Actual_Departure_Time attribute to describe a
scheduled flight. I would use such an attribute to describe a flight in
progress or a completed flight.


>>>Null is easier. The aircraft hasn't taken off yet, we don't know when
>>>it will take off, or even if it will take off.
>
>>How is null easier than not inserting anything into an actual departures
>>relation?
>
> The relation has to exist before the flight takes off

I agree the relation variable describing completed flights exists before
any particular flight departs. I further agree the relation variable
describing flights in progress exists before any particular flight takes
off. However, neither relation variable need have a tuple describing any
scheduled flight before it takes off.


- the Scheduled
> time is there, the pilot's name might be there, the aircraft type
> might be there,

Why would I have the scheduled time in relations describing the actual
flight? It's already available in the schedule. The pilot's name or the
aircraft type, of course, might differ from the schedule.


but the actual time is a field which should be in the
> table, with a null value until such time as the Ops staff are happy
> that it has taken off - unless you are suggesting a separate table?

Of course, I am suggesting a separate relation. I am not an idiot, and
only an idiot would suggest otherwise.

JOG

unread,
Sep 12, 2006, 8:05:34 PM9/12/06
to
Paul wrote:
> "JOG" <j...@cs.nott.ac.uk> wrote:
>
>
> > > > That's nice. I simply don't allow null in any database I design.
> > > Why not?
> > you're kidding right?
>
>
> No.
>
> > > Scheduled_Departure_Time = 'DD:MM:YYYY 10:15'
> > > Actual_Departure_Time = <null> untill we know different, then we put
> > > in a value.
>
>
> > So an aircraft might not take off and yet it has a departure time
> > attribute? That makes absolutely no sense at all.
>
>
> I see that you are from Nottigham university - I would have thought
> that a reasonable grasp of the English language was necessary to
> either take or give courses there.
>
> What part of "Scheduled" did you not understand?

Sorry, I am not interested in peurile flaming games so I will simply
ignore this. I am only interested in helping people with theory so that
the field may progress.

>
> I can *_schedule_* a flight for Dublin airport for 6 months time - the
> damn airport might have been dirty bombed by Osama and his pals in the
> meantime, meaning that the aircraft will never take off from there.
>
> I don't know how familiar you are with aviation, but *_actual_*
> departure time is important - pilots fill out VR's (Voyage Reports)
> which are legally binding documents which have to have the actual
> departure time (apart from pay issues for staff).

Whether the attribute is "Scheduled" or "Actual" departure time is
completely irrelevant. In fact the whole example is irrelevant - it
matters not whether you are talking about aeroplane takeoffs, employee
records or monkey fish.

So I repeat, If an attribute does not exist for a certain statement,
why /on earth/ are you trying to record it? The formal proposition you
are encoding does not fit the predicate you are attempting to kludge it
into. This is basic theoretical stuff.

>
> Now, if the field "Actual_Departure_Time" is not null, then it has to
> be something - yes?

No. You do not understand what the tool you are using is doing. The
field should be in a separate relation, obviously. Then if no value
for the field exists, no row is input.

>
> That "something" is normally some arbitrary value - i.e. 01/01/1970 or
> whatever - the point is any application programmer has to take that
> system defined arbitrary value into account when writing code against
> the system.
>
> *_WHEN_* the value used is <null>, then the programmer doesn't need
> any special knowledge of the system - i.e. "By the way, the date for
> planes which haven't taken off yet is xxx/yyy/zzzz" - just a simple,
> "Oh, that's null until the VR is filed".
>
>
> What could be simpler or more logical?

The illogic of using nulls has been covered to death in the literature.
Surely you ought have read it given your job?

Either way, I gave you a rudimentary breakdown of the simplicity of the
matter here:
http://groups.google.com/group/comp.databases.theory/msg/62dedf999812fd70

and Roy Hann expanded here:
http://groups.google.com/group/comp.databases.theory/msg/44edaf876e369802

I refer you back to them.

Many take pride in their professions. I can only encourage you to do
the same, and get up to speed on the theory.

paul c

unread,
Sep 12, 2006, 9:17:17 PM9/12/06
to
JOG wrote:
> Paul wrote:
>> "JOG" <j...@cs.nott.ac.uk> wrote:
>>...

> The illogic of using nulls has been covered to death in the literature.
> Surely you ought have read it given your job?
> ...

Right. If I may go further, if a single 'table' is seen as desireable
for some reason, AFAIAC, actual departure time can be safely deemed to
be equal to estimated departure time until the biz rules dictate that a
plane has departed. I say 'biz rules' because the purpose of the system
should dictate interpretation, not the viewpoint of an objective
observer. I don't think this violates any RM principle.

Some misconceived queries might use actual departure time to determine
that a flight has departed (note the word 'flight' as opposed to
'airplane', this is a common abstraction in the airline biz', one of the
few that makes sense, along with 'flight segment'), but the writers of
those queries should think twice, assuming that the purposes of the
schema have been explained to them. After all is said and done, a
schema is an arbitrary abstraction, the fact that this is so is what
allows us to make systems that are models of other systems.

p

Alexandr Savinov

unread,
Sep 13, 2006, 4:46:56 AM9/13/06
to

David Portas schrieb:


> Paul wrote:
>> Now, if the field "Actual_Departure_Time" is not null, then it has to
>> be something - yes?
>
> I think you did not read or understand JOG's reply. If the entity does
> not have an Actual_Departure_Time attribute then it should be
> represented by a table that does not have an Actual_Departure_Time
> attribute. Your alternative is unsound in several ways.

That is a naive position because moving an attribute in another table
does not mean that this attribute does not describe the entities from
the original table. In other words:

*an attribute does not need to be in a table in order to describe
entities from this table*

So if you put Actual_Departure_Time in table A, B, C or whatever other
table you find appropriate then it will still describe entities from
Flights table (if linked appropriately of course). Moreover, formally
these alternative schemas are equivalent (again, if we link the tables
appropriately).

In this sense there are two extremes:

1. putting all the data in one wide table (a kind of canonical
representation), and

2. putting all the data in separate tables (a kind of "one table - one
attribute" approach)

In the first case all absent attributes will (have to) be denoted by
nulls. In the second case the absent attributes are denoted by the
physical absence of the records. From the point of view of users (data
semantics) nothing changes.

Real world schemas are normally somewhere in between because both
extreme designs are not efficient: the first due to space, the second
because of time.

--
http://conceptoriented.com

David Portas

unread,
Sep 13, 2006, 6:40:34 AM9/13/06
to
Alexandr Savinov wrote:
>
> So if you put Actual_Departure_Time in table A, B, C or whatever other
> table you find appropriate then it will still describe entities from
> Flights table (if linked appropriately of course). Moreover, formally
> these alternative schemas are equivalent (again, if we link the tables
> appropriately).
>

What model is that? Relational tables are not "linked" other than in
the conceptual sense that their propositions describe some subsets of
the same reality.


> In this sense there are two extremes:
>
> 1. putting all the data in one wide table (a kind of canonical
> representation), and
>
> 2. putting all the data in separate tables (a kind of "one table - one
> attribute" approach)
>
> In the first case all absent attributes will (have to) be denoted by
> nulls. In the second case the absent attributes are denoted by the
> physical absence of the records. From the point of view of users (data
> semantics) nothing changes.

Again, a model without nulls cannot be equivalent to one with nulls.
One represents information as values in relations and the other does
not. From the user's point of view *everything* changes - that after
all is the essence of the OP's question.


> Real world schemas are normally somewhere in between because both
> extreme designs are not efficient: the first due to space, the second
> because of time.
>

If the models were truly equivalent then in principle they could have
the same physical representation and therefore any hypothetical
difference in storage space and efficiency would disappear.

--
David Portas

Damien

unread,
Sep 13, 2006, 8:35:48 AM9/13/06
to

I normally just read CDT rather than posting, but I think Alexandr is
getting at the fact that the following two models are notionally
equivalent:

(Apologies for posting SQL here, it's closer to how I think than
relations, plus I've more experience creating examples in it. Hopefully
you guys can translate)

Model 1:

Create table PartA (KeyColumn int not null, DataColumn1 datetime not
null);
Create table PartB (KeyColumn int not null, DataColumn2 datetime not
null);
Create view PartNullable(KeyColumn,DataColumn1,DataColumn2) as select
KeyColumn,DataColumn1,DataColumn2 from PartA full outer join PartB on
PartA.KeyColumn = PartB.KeyColumn;

Model 2:
Create table PartNullable(KeyColumn int not null, DataColumn1 datetime
null, DataColumn2 datetime null);
Create view PartA(KeyColumn,DataColumn1) as select
KeyColumn,DataColumn1 from PartNullable where DataColumn1 is not null;
Create view PartB(KeyColumn,DataColumn2) as select
KeyColumn,DataColumn2 from PartNullable where DataColumn2 is not null;

So if you're needing to perform operations considering both DataColumn1
and DataColumn2, you have to accept that one or other may be absent.

Of course, all of the above is purely from a practical perspective,
whereas this is comp.databases.THEORY, so I guess I should just shut up
and go back to lurking.

Damien

Bob Badour

unread,
Sep 13, 2006, 8:47:29 AM9/13/06
to
Damien wrote:

But since full outer join is just a shortcut for a UNION statement, a
better solution for the first view is to use an explicit union with an
actual value of some sort.

Regardless, Savinov is an idiot. One can cluster data for performance
without using NULL.


> Of course, all of the above is purely from a practical perspective,
> whereas this is comp.databases.THEORY, so I guess I should just shut up
> and go back to lurking.

Theory is practical. Practice in ignorance is anything but practical.

Alexandr Savinov

unread,
Sep 13, 2006, 9:13:11 AM9/13/06
to

Damien schrieb:

This example is a good illustration that one and the same data semantics
can be expressed in different forms (structures). Here both Model 1 and
Model 2 describe an entity with two attributes. It is already the second
question (for theory) how these attributes are distributed among tables.

> Of course, all of the above is purely from a practical perspective,
> whereas this is comp.databases.THEORY, so I guess I should just shut up
> and go back to lurking.

Although we are in comp.databases.THEORY, 80% of posts here have nothing
to do with data issues at all :)

> Damien

--
http://conceptoriented.com

Cimode

unread,
Sep 13, 2006, 5:06:37 PM9/13/06
to
A simple alternative solution WITHOUT using the god damn NULLS...
Flight entity is just equivalent to an assumed and recurring plane
departure being associated to a specific departure airport. 1:N
cardinality

_DT: departure_time
_AT: arrival_time
_ADT: actual departure time
_AAT: actual arrival time
_DA: departure_airport
_AA: arrival_airport
_ADA: actual departure_airport
_AAA: actual arrival_airport
_LN: last name
_FN: first name
_DOB: date of birth

Note: I use ONLY concatenated PRIMARY key (no surrogate, never liked
them)

passenger:
passenger_LN, passenger_FN, passenger_DOB
plane_model:
plane_model (PK), plane_capacity
plane:
planeid, plane_model
book:
flight_DT, flight_DA, flight_AT, flight_AA, passengerlname,
passengerfname, passenger_birthdate
plane_flight:
planeid, flight_DT, flight_DA, flight_AT, flight_AA

flight:
flight_DT, flight_DA, flight_AT, flight_AA

plane_departure:
flight_DT, flight_DA, flight_AT, flight_AA, flight_ADT,
flight_ADA
plane_departure_check:
flight_DT, flight_DA, flight_AT, flight_AA, flight_ADT,
flight_ADA, passenger_LN, passenger_FN, passenger_DOB
plane_arrival:
flight_DT, flight_DA, flight_AT, flight_AA, flight_AAT,
flight_AAA


>From the above logical structure you can pretty much get anything you
want including:

> How late/early are the planes on current schedules
> If the planes could land at their supposed destination (you may associate a reason for it: BAD WEATHER, HIJACK). You know if the plane actually landed where it was supposed to land or not...You need to create and additional entity for that...
> What plane model should be scheduled to do a flight depending on book counts
> Which passengers did not leave where there were supposed to...

And many other information...
This is a purely pedagogical case (far from being complete) to
demonstrate that it is perfectly possible to build some logical design
in minutes (took me 20 of them) WITHOUT using NULLS...while sticking to
the God Damn Real World (lazyness) excuse...

Regards all...

Cimode

unread,
Sep 13, 2006, 5:20:16 PM9/13/06
to

Anne & Lynn Wheeler wrote:

> there is also, always the possibility that you might have an
> unscheduled actual flight ... an unplanned flight for which there
> hadn't been a (pre-)scheduled departure/arrival ... but for which
> there is actual departure ... with initially a projected arrival (as
> opposed to a scheduled arrival) ... and then eventually an actual
> arrival.

See proposed schema...

ANY unscheduled departure is still a FLIGHT before being a new
DEPARTURE.

A new DEPARTURE line should be added IF AND ONLY IF the plane leaves
from a specific AIRPORT at a SPECIFIC TIME to SOME SUPPOSED AIRPORT and
SOME SUPPOSED ARRIVAL TIME which is nessecarily what constitutes a
FLIGHT. 2 inserts instead of one...That way such the FLIGHT can be
unscheduled only ONCE.

Note: been putting caps for emphasis. No hostile intent. ;)

Anne & Lynn Wheeler

unread,
Sep 13, 2006, 8:37:08 PM9/13/06
to

"Cimode" <cim...@hotmail.com> writes:
> And many other information...
> This is a purely pedagogical case (far from being complete) to
> demonstrate that it is perfectly possible to build some logical design
> in minutes (took me 20 of them) WITHOUT using NULLS...while sticking to
> the God Damn Real World (lazyness) excuse...

small matter ... scheduled arrival, projected arrival, actual arrival
can have different business uses. the scheduled arrival ... for
scheduled flights have all sorts of issues for reservations, planning,
etc. the projected arrival may be used in real time for things like
decisions about holding other flights for the convinence of other
connecting passengers. actual arrival may be used for long term
planning purposes ... as well as computed values like difference
between scheduled and actual.

one approach about information not yet known is to have different
tables that don't yet have entries for the unknown values. another
possible mechanism is to have fields mean different things depending
on other state information ... for instance have a state flag to
switch the meaning of the field containing the projected arrival to
actual arrival (or other domain knowledge that changes what a field
means based on other circumstances).

it may turn out to not only be useful to have different tables ... but
possibly even completely different servers and applications. for
instance the projected arrival tends to be of relatively short-term
use ... and rarely needs to be carried for very long. the table of
projected flight departures and arrivals ... may be relatively few
flights with lots of inserts, updates, and deletes. people needing
real-time information for scheduling and provisioning adjustments tend
to have totally concerns (i.e. will the equipment coming in on a
different scheduled flight be available in time to turn around for a
brand new flight, will the crew arriving on a different flight be
available in time for the their new assigned flight, will connecting
passengers be able to make their connection, etc) than some of the
longer term considerations. Having totally different tables and
databases eliminates the requirement for trying to maintain global
encompassing information.

an unscheduled flight ... is they are flying some equipment that isn't
a "scheduled" flight ... but for which they have to file a flight plan
and actually fly ... or not ... possible to file a flight plan and
then have it abort because of weather ... and then file a brand new
flight plan ... which possibly might be a totally different
unscheduled flight (as opposed to a flight plan for a scheduled
flight).

res. system with 400k plus flight segments for scheduled flights,
scheduled departures, scheduled arrivals ... tends to be close to 400k
plus flight segments every day (i.e. a database with 400k plus
records) .... slightly less than that because there are re-occuring
flights every day ... but there are also re-occuring flight segments
specified just for weekdays ... with different re-occuring flight
segments specified for saturday and/or sunday.

actual flight departures and arrivals on the order of 400k some per day
accumulate and can be kept for months ... tens of millions of records
with 400k some inserts everyday for new flights and 400k some deletes everyday
for old flights that are aged out (actually this tends to be
partitioned into smaller subsets that are carrier specific).

PNR (passenger name record) can be a couple orders larger ... starts
when the reservation is made ... possibly a couple months before the
actual flight (hundred or so people per flt) and may be kept from a
couple months to possibly a couple years. a few tens of millions
reservations are made every day, a few tens of millions of updates for
flights taken, and a few tens of millions of deletes as reservations
are canceled or aged out ... the number of records should hover around
the avg. number of reservations made per day (few tens of millions)
times how long the records are kept (several hundred days). That can
be several billion records ... the total size isn't too bad since PNR
record information is fairly condensed and runs around 4k bytes
... frequently has lots of nulls for unknown information since they
try and have everything in a single record
read/write/update/insert/delete/etc ... so possibly only several
terabytes aggregate. another kind of PNR null might be zip code field
that is not applicable (as opposed to unknown) ... to get around
possible SQL issues ... fill it in with anything as a substitute for
actual null ... like the letters N/A. Then any mailing processing
would have to know that N/A actually means null when it goes to
generate an actual address (as opposed to the zip code: N/A). Not
available information could be filled with N/A also (for not available
as opposed to not applicable) or possibly UNKN ... which i've also
seen used.

Bob Badour

unread,
Sep 13, 2006, 8:53:58 PM9/13/06
to

That's an awefully long-winded way to repeat stuff others in this thread
have already dealt with more accurately and more succinctly.

paul c

unread,
Sep 14, 2006, 9:30:47 AM9/14/06
to

I haven't examined all of the above, but:

1) Where is flight number? This is IATA-mandated and perhaps the most
common term used in the biz as well as by customers. An individual
flight segment is identified by flight number (which these days also
includes alpha characters), qualified by origin and destination airport
codes and date and time - note the time includes only hours and minutes,
not seconds, note also that a flight number can't be used twice in one
day with the same origin and destination - 'lollipop' flights are
illegal because they are a way to escape certain airport fees, also only
certain apps, such as loading and gate assignment, have anything to do
with plane identification).

The main point I'm trying to make here is that much of what WE think
doesn't matter and usually only adds to an inflated cost to make and run
a system. The strictest interpretation of the biz rules (many of which
are well-documented in this case) is the main avenue towards rightness
and economy.


2) Just as the DBA isn't finished if he includes nulls, he's not
finished until he considers the frequent usages of his schemas.
Personally, if I had a choice, I would never use all the above
attributes as the usual 'key' as massive changes to many tables are
needed whenever flights are re-scheduled and this is frequent and
common. Better to have some kind of flight assignment table so that
most of the app's deal only with a unique assigned internal number,
whether most people would call this a surrogate, I don't know.

p

Cimode

unread,
Sep 14, 2006, 3:44:19 PM9/14/06
to
My main point was to close the NULL debate by showing that doing
without them is NOT as difficult as people think but as you seem
interested here are the answers to your question...

> I haven't examined all of the above, but:
>
> 1) Where is flight number? This is IATA-mandated and perhaps the most
> common term used in the biz as well as by customers.

Just keep in mind that I purposely used only concatenated keys for the
example's sake. It is easy to use a given IATA surrogate key kept in
synch with the concatenated key of
flight: the final entity would look like...
flight_DT, flight_DA, flight_AT, flight_AA,
flight_number(SK)
and Flight number can replace occurrences of the concatenated keys in
all other entities....

> An individual
> flight segment is identified by flight number (which these days also
> includes alpha characters), qualified by origin and destination airport
> codes and date and time - note the time includes only hours and minutes,
> not seconds, note also that a flight number can't be used twice in one
> day with the same origin and destination - 'lollipop' flights are
> illegal because they are a way to escape certain airport fees, also only
> certain apps, such as loading and gate assignment, have anything to do
> with plane identification).
>
> The main point I'm trying to make here is that much of what WE think
> doesn't matter and usually only adds to an inflated cost to make and run
> a system. The strictest interpretation of the biz rules (many of which
> are well-documented in this case) is the main avenue towards rightness
> and economy.

Your point is valid but it was not my intent to produce the best design
(this would be impossible without knowing the details of the project)
but rather to demonstrate that doing without NULLS is not an
abstraction but a perfectly applicable rule in "REAL WORLD"
example...Thank you for such comment.

> 2) Just as the DBA isn't finished if he includes nulls, he's not
> finished until he considers the frequent usages of his schemas.
> Personally, if I had a choice, I would never use all the above
> attributes as the usual 'key' as massive changes to many tables are
> needed whenever flights are re-scheduled and this is frequent and
> common. Better to have some kind of flight assignment table so that
> most of the app's deal only with a unique assigned internal number,
> whether most people would call this a surrogate, I don't know.

In the point provided, the modification (using surrogate key flight
number) would not require to be implemented more than once. IN a word,
it is a minor one shot adjustment.

But thanks for pointing the issue of having stable keys...

Regards...
> p

Cimode

unread,
Sep 14, 2006, 3:47:43 PM9/14/06
to

There are numerous ways to tell people NOT to use NULLS...There are
fewer ways to tell them it is perfectly applicable when they state
specific example: show it using their example...

Paul

unread,
Sep 14, 2006, 3:55:06 PM9/14/06
to

Bob Badour <bba...@pei.sympatico.ca> wrote:


> That's an awefully long-winded way to repeat stuff others in this thread
> have already dealt with more accurately and more succinctly.


A bit like quoting 90 lines to add 2? Ever hear of snipping Bob - it's
part of the (human)-relational model! 8-)


Paul...


p.s did it inspire awe or was it just awful? ditto smiley!

Cimode

unread,
Sep 14, 2006, 4:10:06 PM9/14/06
to

Paul wrote:
> Bob Badour <bba...@pei.sympatico.ca> wrote:

> > That's an awefully long-winded way to repeat stuff others in this thread
> > have already dealt with more accurately and more succinctly.

BB is an example of what NOT to do to defend RM...I can bet, he snipped
when looking at my post because it was me...As I exposed his ignorance
and incoherence on several occasions in this board before, he is
simply pissed nothing less more nothing more... As a consequence, he
lacks coherence into fighting the fight he tries to advocate...He does
not fight for the ideas but for his character...Else he would have
agreed with most I have stated. (Thing I did on few occasions on his
behalf...)

Bob Badour

unread,
Sep 14, 2006, 4:25:28 PM9/14/06
to
Paul wrote:

>
> Bob Badour <bba...@pei.sympatico.ca> wrote:
>
>>That's an awefully long-winded way to repeat stuff others in this thread
>>have already dealt with more accurately and more succinctly.
>
> A bit like quoting 90 lines to add 2? Ever hear of snipping Bob - it's
> part of the (human)-relational model! 8-)

While the content was more-or-less irrelevant, the 90 lines were central
to the point I was making.

Cimode

unread,
Sep 15, 2006, 4:07:39 AM9/15/06
to
You may have made a point but you certainly did not get it through.
You just pointed out a valid point (namely NOT using NULLS) but you did
not clarify sufficiently. Being succint or consise is no excuse for
assuming people will understand things right away. You should be using
examples to clarify your point. And in no case, you should be blaming
others for trying to clarify things.

Chris Lim

unread,
Sep 16, 2006, 8:08:03 AM9/16/06
to
Cimode wrote:
> And many other information...
> This is a purely pedagogical case (far from being complete) to
> demonstrate that it is perfectly possible to build some logical design
> in minutes (took me 20 of them) WITHOUT using NULLS...while sticking to
> the God Damn Real World (lazyness) excuse...

In a database without NULLs, would you have to do LEFT OUTER JOINs in
all queries where you want all rows from your 'main' table returned,
regardless of whether they have the optional attributes or not? Say,
for example, all flights, with or without an actual departure time. And
if so, how do the 'missing' attributes get returned from these queries,
if NULLs are not allowed?

Chris

David Portas

unread,
Sep 16, 2006, 10:11:03 AM9/16/06
to
Chris Lim wrote:
>
> In a database without NULLs, would you have to do LEFT OUTER JOINs in
> all queries where you want all rows from your 'main' table returned,
> regardless of whether they have the optional attributes or not? Say,
> for example, all flights, with or without an actual departure time. And
> if so, how do the 'missing' attributes get returned from these queries,
> if NULLs are not allowed?

As Bob already stated, an "outer join" is not really a join. It is
SQL's syntax shortcut for a UNION. Obviously the answer to your
question depends on the desired result from such a UNION. The user can
contrive a query to return whatever values are deemed suitable.

--
David Portas, SQL Server MVP

Whenever possible please post enough code to reproduce your problem.
Including CREATE TABLE and INSERT statements usually helps.
State what version of SQL Server you are using and specify the content
of any error messages.

SQL Server Books Online:
http://msdn2.microsoft.com/library/ms130214(en-US,SQL.90).aspx
--

Cimode

unread,
Sep 16, 2006, 1:03:42 PM9/16/06
to
I know what you are getting at. But you are using the wrong example
and asking the wrong question. The impact of using NULL values have
nothing to do with presentation issues. You don't design data schemas
according to the way data should look on user's screen but according to
rules that allow to preserve integrity(for instance NULLS will make
your count results false if you do not add a filtering condition to
exclude NULLS from ALL your queries!). Keep in imnd that NULLS is just
SQL's way (a poor way) of dealing with missing data. For explanations
on how to deal with missing data while preserving a normalized schema,
you may want to take a look at McGoveran paper on the thirdmanifesto
website...


> Chris

Cimode

unread,
Sep 16, 2006, 1:22:54 PM9/16/06
to

David Portas wrote:
> Chris Lim wrote:

> As Bob already stated, an "outer join" is not really a join. It is
> SQL's syntax shortcut for a UNION. Obviously the answer to your
> question depends on the desired result from such a UNION. The user can
> contrive a query to return whatever values are deemed suitable.

A UNION equivalent an LEFT OUTER JOIN BULLSHIT!!!. A LEFT OUTER JOIN is
just SQL miserable way to force a presentation inspired perspective
into design issues that should be regulated by normalization to
preserve integrity. The proof is that you can do a UNION using 2
relations but using LEFT OUTER JOIN when dealing with relations has no
meaning. Consider 2 tables table1(id1, key1) and table2(id2, key2)

Table1
1, A
2, B
3, C

Table2
8, A
10, C

Table1 LEFT OUTER JOIN Table2 on key1 = key2 produces the following
view

1, A, 8, A
2, B, NULL, NULL
3, C, 10, C

Table1 UNION Table2 produces

1, A
2, B
3, C
8, A
10, C

So basically stating that UNION = LEFT OUTER JOIN is stating that

1, A, 8, A
2, B, NULL, NULL
3, C, 10, C
=
1, A
2, B
3, C
8, A
10, C

?...

David Portas

unread,
Sep 16, 2006, 1:56:59 PM9/16/06
to
Cimode wrote:
> A UNION equivalent an LEFT OUTER JOIN BULLSHIT!!!. A LEFT OUTER JOIN is
> just SQL miserable way to force a presentation inspired perspective
> into design issues that should be regulated by normalization to
> preserve integrity. The proof is that you can do a UNION using 2
> relations but using LEFT OUTER JOIN when dealing with relations has no
> meaning. Consider 2 tables table1(id1, key1) and table2(id2, key2)
>
> Table1
> 1, A
> 2, B
> 3, C
>
> Table2
> 8, A
> 10, C
>

T1{k,i1,i2} UNION (T1{k,i1} JOIN T2{k,i2})

Obviously some suitable value or expression has to be supplied for i2
on the left hand side of the UNION.

--
David Portas

Marshall

unread,
Sep 16, 2006, 2:43:43 PM9/16/06
to

Isn't that going to run in to a problem if we infer a key for the
result? With the left outer join, we can infer k as a key of
the result, but with the above union we can't. (The key
is {k,i1,i2}.) And we'll have more than one row for some
values of k.

Wouldn't it be more like:

(T1{k,i1,i2} MINUS (T1{k} JOIN T2{k}))


UNION (T1{k,i1} JOIN T2{k,i2})

(Not sure if I got the syntax right.) I still don't see
how to do key inference correctly on the above, but
at least it won't have multiple rows for some k values.

Marshall

Marshall

unread,
Sep 16, 2006, 3:38:14 PM9/16/06
to

Okay, here it is:

((T1{k} MINUS T2{k}) JOIN { (i2=<default>) }) UNION T2)
JOIN T1

and you can infer k as the key of the result. Whew!

(I used <default> to represent whatever value you want to
put in in place of where SQL would put a NULL.)

That's twice today I've replied to my own post. So
here's a cat with a box on his head:

http://video.google.com/videoplay?docid=8135777110359469253


Marshall

CMCC

unread,
Sep 16, 2006, 3:50:11 PM9/16/06
to

Cimode wrote:
> David Portas wrote:
> > Chris Lim wrote:
>
> > As Bob already stated, an "outer join" is not really a join. It is
> > SQL's syntax shortcut for a UNION. Obviously the answer to your
> > question depends on the desired result from such a UNION. The user can
> > contrive a query to return whatever values are deemed suitable.
> A UNION equivalent an LEFT OUTER JOIN BULLSHIT!!!. A LEFT OUTER JOIN is
> just SQL miserable way to force a presentation inspired perspective
> into design issues that should be regulated by normalization to
> preserve integrity. The proof is that you can do a UNION using 2
> relations but using LEFT OUTER JOIN when dealing with relations has no
> meaning. Consider 2 tables table1(id1, key1) and table2(id2, key2)
>
> Table1
> 1, A
> 2, B
> 3, C
>
> Table2
> 8, A
> 10, C
>
>

select id1,key1,id2,key2 from Table1 join Table2 on key1=key2
union
select id1,key1,-1,'WHAT?' from Table1 where key1 not in (select key2
from Table2)

Carlos

Chris Lim

unread,
Sep 16, 2006, 3:58:32 PM9/16/06
to
Cimode wrote:
> I know what you are getting at. But you are using the wrong example
> and asking the wrong question. The impact of using NULL values have
> nothing to do with presentation issues. You don't design data schemas
> according to the way data should look on user's screen but according to

Okay bad example. I wasn't really thinking about returning the data to
the end-user, but from a programmer's view if I had to perform some
calculations on data where I wanted flights with and without an actual
departure date (and the calculation uses actual departure date if it's
present and scheduled departure date if not). Without doing an outer
join I can't see any way of doing this without having two queries or a
UNION, which effectively doubles the amount of code to be written.

Another example. Say you needed to identify all flights that had the
same scheduled_departure_date and actual_departure_date as another
flight (including flights with no actual_departure_date). If
actual_departure_date is in a separate table, then wouldn't you need
two queries (or a UNION) to do the same thing you would in a single
query if you allowed NULLs?

> SQL's way (a poor way) of dealing with missing data. For explanations
> on how to deal with missing data while preserving a normalized schema,
> you may want to take a look at McGoveran paper on the thirdmanifesto
> website...

Thanks, I'll check that out.

Cheers,
Chris

Cimode

unread,
Sep 16, 2006, 4:10:32 PM9/16/06
to
Not the point. A UNION has nothing to do with a JOIN. They are both
separate operators applyable on relations. Trying to prove that one is
derived from another is just
meaningless, the same way you would try to derive a DIVIDE operator
with a PLUS operator...This is a dead end.

> David Portas

David Portas

unread,
Sep 16, 2006, 5:42:02 PM9/16/06
to
Cimode wrote:
>
> Not the point. A UNION has nothing to do with a JOIN. They are both
> separate operators applyable on relations. Trying to prove that one is
> derived from another is just
> meaningless, the same way you would try to derive a DIVIDE operator
> with a PLUS operator...This is a dead end.
>

In Codd's algebra that's true but only because it doesn't have a
complete set of operators. JOIN is the relational AND operator and
UNION is a special case of the corresponding OR operator. Via De
Morgan's laws they CAN be defined in terms of each other. NAND or NOR
plus REMOVE are the truly primitive operators according to Date and
Darwen. Details in TTM(3).

--
David Portas

Cimode

unread,
Sep 16, 2006, 5:48:10 PM9/16/06
to

Chris Lim wrote:
> Cimode wrote:
> > I know what you are getting at. But you are using the wrong example
> > and asking the wrong question. The impact of using NULL values have
> > nothing to do with presentation issues. You don't design data schemas
> > according to the way data should look on user's screen but according to
>
> Okay bad example. I wasn't really thinking about returning the data to
> the end-user, but from a programmer's view if I had to perform some
> calculations on data where I wanted flights with and without an actual
> departure date (and the calculation uses actual departure date if it's
> present and scheduled departure date if not). Without doing an outer
> join I can't see any way of doing this without having two queries or a
> UNION, which effectively doubles the amount of code to be written.
>
> Another example. Say you needed to identify all flights that had the
> same scheduled_departure_date and actual_departure_date as another
> flight (including flights with no actual_departure_date). If
> actual_departure_date is in a separate table, then wouldn't you need
> two queries (or a UNION) to do the same thing you would in a single
> query if you allowed NULLs?

Chris Lim wrote:
> Cimode wrote:
> > I know what you are getting at. But you are using the wrong example
> > and asking the wrong question. The impact of using NULL values have
> > nothing to do with presentation issues. You don't design data schemas
> > according to the way data should look on user's screen but according to
>
> Okay bad example. I wasn't really thinking about returning the data to
> the end-user, but from a programmer's view if I had to perform some
> calculations on data where I wanted flights with and without an actual
> departure date (and the calculation uses actual departure date if it's
> present and scheduled departure date if not). Without doing an outer
> join I can't see any way of doing this without having two queries or a
> UNION, which effectively doubles the amount of code to be written.

> Another example. Say you needed to identify all flights that had the
> same scheduled_departure_date and actual_departure_date as another
> flight (including flights with no actual_departure_date). If
> actual_departure_date is in a separate table, then wouldn't you need
> two queries (or a UNION) to do the same thing you would in a single
> query if you allowed NULLs?

You should not be surprised. SQL is not a relational language for a
long time. SQL is by nature a redundant language. Additionally, your
second example is a bad one two. At the first place, there should not
be a second table for flights if you pretend your schema is normalized.
If 2 relations have the same attributes then they should be one
relation. In a word, one can not state: I have a shitty non normalized
schema design and I complain because it makes queries more difficult to
write: in this precise cas, it is because the schema is not normalized
that you have to go through such pain...

Cimode

unread,
Sep 16, 2006, 6:00:57 PM9/16/06
to
Let's say that such additions do not me feel always confortable because
it gets RM way too far from the math background that supports it..

Nevertheless, you are correct. Which is the precise reason why such
definitions should not be done accordingly to SQL 's LEFT OUTER JOIN as
initial comments were made. Only leads to additional confusion I was
trying to clarify to an audience that has no clue about the difference
between operators and hears that both LEFT OUTER JOIN and UNON a
perfectly equal. Distinguishing the two operators is essential to begin
use them. Thank you for helping clarifying...

Regards...

> David Portas

Chris Lim

unread,
Sep 16, 2006, 7:59:33 PM9/16/06
to
Cimode wrote:
> long time. SQL is by nature a redundant language. Additionally, your
> second example is a bad one two. At the first place, there should not
> be a second table for flights if you pretend your schema is normalized.

No I didn't mean a separate table - I meant within the same table. For
example, in SQL Server the query to find the occurrences of flights
with the same scheduled and actual date (if the table allowed NULLs)
would be:

SELECT scheduled_date,
ISNULL(actual_date, '1 Jan 1900') AS actual_date,
COUNT(*) AS cnt
FROM flights
GROUP BY scheduled_date, ISNULL(actual_date, '1 Jan 19000')
HAVING COUNT(*) > 1

I'm just wondering what the equivalent would be if you did not allow
NULLs. Wouldn't it be something like:

SELECT scheduled_date,
CONVERT(DATETIME, '1 Jan 1900') AS actual_date,
COUNT(*) AS cnt
FROM flights
WHERE NOT EXISTS( SELECT *
FROM flight_actuals fa
WHERE fa.FlightID = f.FlightID)
GROUP BY scheduled_date
HAVING COUNT(*) > 1

UNION

SELECT f.scheduled_date,
fa.actual_date,
COUNT(*) AS cnt
FROM flights f
INNER JOIN flight_actuals fa
ON fa.flightID = f.flightID
GROUP BY f.scheduled_date, fa.actual_date
HAVING COUNT(*) > 1

Or is there a simpler way?

Chris

Chris Lim

unread,
Sep 16, 2006, 8:21:17 PM9/16/06
to
Cimode wrote:
> You should not be surprised. SQL is not a relational language for a
> long time. SQL is by nature a redundant language. Additionally, your
> second example is a bad one two. At the first place, there should not

I also wanted to confirm something regarding the number of extra tables
that would need to be created to avoid NULLs. Am I right in saying that
if I have a Customer table which has a lot of optional attributes (that
a customer may or may not have or supply (e.g. middle name, birth date,
gender, number of children, ethnicity, etc)), then I would need a
separate table for each of those attributes (e.g. CustomerMiddleName,
CustomerBirthDate etc) to avoid NULLs? That seems like an awful lot of
overhead to me.

And if I needed a query to identify customers with the same last name +
middle name + birth_date + gender, what would the query look like for
that? If they were NULLable columns in the same table, it would be
something like:

SELECT last_name,
ISNULL(middle_name, '') AS middle_name,
ISNULL(birth_date, '1 Jan 2050') AS birth_date,
ISNULL(gender, '') AS gender,
COUNT(*) AS cnt
FROM Customer
GROUP BY last_name,
ISNULL(middle_name, ''),
ISNULL(birth_date, '1 Jan 2050'),
ISNULL(gender, '')
HAVING COUNT(*) > 1

What would be the equivalent in the no NULLs database?

Chris

mAsterdam

unread,
Sep 17, 2006, 3:06:19 AM9/17/06
to
Cimode wrote:
> If 2 relations have the same attributes then they should be one
> relation.

No. As long as the name of the relation and the meaning of the
propositions differ it is ok to have two relations with the same
attributes. The meaning of the propositions depends on the external
predicate.

Cimode

unread,
Sep 17, 2006, 1:52:53 PM9/17/06
to

I see your point...
Thank your for pointing that out but such definition is valid only for
SQL systems. I should have stated that the proposed definition assumes
an implementation that supports domain derivability (in a word, forget
SQL)

Keep in mind that the answer was specific to the example of FLIGHT with
actual departure time and arrival time. In this specifc example,
having 2 relations with these 2 attributes is a mistake EVEN IF they do
NOT share the same external predicates. For instance, consider 2
type of FLIGHTS, some SCHEDULED and some UNPLANNED. In order to deal
with NULLS properly, you may think of the 2 following entities
FLIGHT_SCHEDULED: departure_time, arrival_time, flight_IATA_number and
FLIGHT_UNPLANNED: departure_time, arrival_time, reason (code possible
reason explaining why the FLIGHT had to leave while not scheduled).
You may also argue that both respond to different external predicates.
Nevertheless, a better design should be

FLIGHT: departure_time, arrival_time
FLIGHT_SCHEDULED: flight_IATA_number
FLIGHT_UNPLANNED: reason

Both FLIGHT_SCHEDULED and FLIGHT_UNPLANNED being derived from FLIGHT
type (and therefore sharing all its attributes) with specific
attributes of their own.

Some systems already support domain derivability(Caché from
Intersystems). It's just a shame, SQL beats the crap out of them on
pretty much everything else...

Regards...

Cimode

unread,
Sep 17, 2006, 2:16:34 PM9/17/06
to

You can not have a table containing NULLS and say it is correctly
designed, therefore dealing with NULLS imposes creating extra tables
(in SQL at least)

Consider 2 types of FLIGHTS some scheduled having a departure and
arrival time.
Some flights are scheduled and have a IATA_Flight_number others are
unplanned.

Consider the two following SQL implementations, one having NULLS and
the other not

1) FLIGHT: flight_departure_time, flight_arrival_time,
flight_IATA_number(allowing NULL)

in this design ALL flight_IATA_number would be NULL for all unplanned
flights

2)
FLIGHT: flight_departure_time, flight_arrival_time
SCHEDULED: flight_departure_time, flight_arrival_time,
flight_IATA_number
UNPLANNED: flight_departure_time, flight_arrival_time


Using first method, to count ONLY scheduled flights require this

select count(*) from FLIGHT where flight_IATA_number is not NULL

while the second method allows

select count(*) from SCHEDULED

Does not that say it all? The simple fact of allowing NULLS imposes to
you to do an index scan and a count while all it should do is a count.

:
> Chris

Cimode

unread,
Sep 17, 2006, 2:24:26 PM9/17/06
to

Chris Lim wrote:
> Cimode wrote:
> > You should not be surprised. SQL is not a relational language for a
> > long time. SQL is by nature a redundant language. Additionally, your
> > second example is a bad one two. At the first place, there should not
>
> I also wanted to confirm something regarding the number of extra tables
> that would need to be created to avoid NULLs. Am I right in saying that
> if I have a Customer table which has a lot of optional attributes (that
> a customer may or may not have or supply (e.g. middle name, birth date,
> gender, number of children, ethnicity, etc)), then I would need a
> separate table for each of those attributes (e.g. CustomerMiddleName,
> CustomerBirthDate etc) to avoid NULLs? That seems like an awful lot of
> overhead to me.
Depends what you call *awful lot of overhead*.
If *awful lot of overhead* is creating 3 extra tables to do it the
right way ONCE at design time then I may call you a lazy ass. If
*awful lot of overhead* is all the redundant code and conditions
(including NULLS, excluding NULLS) you and all the developpers will
have to add to make your queries return correct count results ( during
the entire system life cycle) then you will find out quicly the *awful
lot of overhead* really happens only when NULLS are allowed not the
opposite!

If you wan to find out more about NULLS and their impact, you will need
to do some reading. My time is as limited as yours, therefore I can
not answer all questions...(especially if you don't post your table
definitions) ;)

Bob Badour

unread,
Sep 17, 2006, 2:28:19 PM9/17/06
to
Chris Lim wrote:

> Cimode wrote:
>
>>You should not be surprised. SQL is not a relational language for a
>>long time. SQL is by nature a redundant language. Additionally, your
>>second example is a bad one two. At the first place, there should not
>
> I also wanted to confirm something regarding the number of extra tables
> that would need to be created to avoid NULLs. Am I right in saying that
> if I have a Customer table which has a lot of optional attributes (that
> a customer may or may not have or supply (e.g. middle name, birth date,
> gender, number of children, ethnicity, etc)), then I would need a
> separate table for each of those attributes (e.g. CustomerMiddleName,
> CustomerBirthDate etc) to avoid NULLs? That seems like an awful lot of
> overhead to me.

Overhead in what sense? It just means you have a few extra names.
Temporal data caused Date et al to propose a 6th normal form, and I
think their arguments for independently varying attributes apply to
optional data as well.


> And if I needed a query to identify customers with the same last name +
> middle name + birth_date + gender, what would the query look like for
> that?

That would depend entirely on the language syntax.


If they were NULLable columns in the same table, it would be
> something like:
>
> SELECT last_name,
> ISNULL(middle_name, '') AS middle_name,
> ISNULL(birth_date, '1 Jan 2050') AS birth_date,
> ISNULL(gender, '') AS gender,
> COUNT(*) AS cnt
> FROM Customer
> GROUP BY last_name,
> ISNULL(middle_name, ''),
> ISNULL(birth_date, '1 Jan 2050'),
> ISNULL(gender, '')
> HAVING COUNT(*) > 1
>
> What would be the equivalent in the no NULLs database?

How about the following?

WITH INTEGRATE Customer(
cust#
, last_name
, middle_name
, birth_date
, gender
) OPTIONAL (middle_name, birth_date, gender)
AS T1
, SUMMARIZE T1 BY {ALL BUT cust#} ADD COUNT(cust#) AS cnt
AS T2
, T2 WHERE cnt > 1;

In the above, I introduce an INTEGRATE operator as a short-cut to join a
set of 6th normal form relations and an OPTIONAL operator as a short-cut
to GROUP potentially missing attributes as relation valued attributes
with cardinality at most 1.

I welcome any suggestions for better names or for pointers to where Date
et al have specified different short-cuts for handling 6NF relations.

mAsterdam

unread,
Sep 17, 2006, 7:08:21 PM9/17/06
to
Cimode wrote:
> mAsterdam wrote:
>> Cimode wrote:
>>> If 2 relations have the same attributes then they should be one
>>> relation.
>> No. As long as the name of the relation and the meaning of the
>> propositions differ it is ok to have two relations with the same
>> attributes. The meaning of the propositions depends on the external
>> predicate.
>
> I see your point...
> Thank your for pointing that out but such definition is valid only for
> SQL systems. I should have stated that the proposed definition assumes
> an implementation that supports domain derivability (in a word, forget
> SQL)

SQL has nothing to do with it.

Is it ok to have two different relations with the same attributes?

No, you say, and you build on that.
Yes, I say, so IMO the rest of your building is irrelevant.

[snip]

JOG

unread,
Sep 17, 2006, 7:33:02 PM9/17/06
to

I concur only with mAsterdam Cimode. According to relational theory two
relations may quite happily consist of the same attributes, while being
totally distinct relations - and that this has nothing at all to do
with SQL.

Volker Hetzer

unread,
Sep 18, 2006, 11:21:39 AM9/18/06
to
Chris Lim schrieb:
Besides, wasn't there a rule that the result of a relational operation
should be a table too?
Therefore an "insert into xxx select ... from ..." should always work.
Voila, there are the nulls in xxx.
What I don't understand is that everyone who decries nulls, somewhere
works with default values. Ok, big deal, NULL is an universally applicable
default value that cannot be confused with real data.
What am I missing?
Btw, what would be the consequences of NULL=NULL being true?

Lots of Greetings!
Volker
--
For email replies, please substitute the obvious.

Gene Wirchenko

unread,
Sep 18, 2006, 12:15:28 PM9/18/06
to
Volker Hetzer <firstname...@ieee.org> wrote:

[snip]

>What I don't understand is that everyone who decries nulls, somewhere
>works with default values. Ok, big deal, NULL is an universally applicable
>default value that cannot be confused with real data.
>What am I missing?

That it can be confused with real data. For one thing, NULL is
*NOT* a value. It is the absence of a value.

>Btw, what would be the consequences of NULL=NULL being true?

The value is unknown. What are the chances of two unknown pieces
of data being equal?

Sincerely,

Gene Wirchenko
Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Volker Hetzer

unread,
Sep 18, 2006, 3:16:55 PM9/18/06
to
Gene Wirchenko schrieb:

> Volker Hetzer <firstname...@ieee.org> wrote:
>
> [snip]
Pity.

>> What I don't understand is that everyone who decries nulls, somewhere
>> works with default values. Ok, big deal, NULL is an universally applicable
>> default value that cannot be confused with real data.
>> What am I missing?
>
> That it can be confused with real data.

How?
Every interface worth its name has either a special flag or allows the mapping
to a certain value upon reading it out of the database. And if not, then, at
least in oracle, I can create a simple boolean (C convention) flag in the select
list: select expr, nvl2(expr,0,1) Expr_Is_Null ...

> For one thing, NULL is
> *NOT* a value. It is the absence of a value.

Conceptually, so is any other default value. Only, with NULL, the database agrees.
With other default values I have to code it all in the sql.

Roy Hann

unread,
Sep 18, 2006, 3:32:35 PM9/18/06
to
"Gene Wirchenko" <ge...@ocis.net> wrote in message
news:h3htg2hlprsaaefku...@4ax.com...

>>Btw, what would be the consequences of NULL=NULL being true?
>
> The value is unknown. What are the chances of two unknown pieces
> of data being equal?

It's worse than that. Suppose one user can't be bothered to find out if the
customer is insolvent and another user decides the null means insolvency or
otherwise isn't a relevant attribute.

Roy


Gene Wirchenko

unread,
Sep 18, 2006, 3:38:00 PM9/18/06
to
Volker Hetzer <firstname...@ieee.org> wrote:

>Gene Wirchenko schrieb:
>> Volker Hetzer <firstname...@ieee.org> wrote:
>>
>> [snip]
>Pity.
>
>>> What I don't understand is that everyone who decries nulls, somewhere
>>> works with default values. Ok, big deal, NULL is an universally applicable
>>> default value that cannot be confused with real data.
>>> What am I missing?
>>
>> That it can be confused with real data.
>How?
>Every interface worth its name has either a special flag or allows the mapping
>to a certain value upon reading it out of the database. And if not, then, at
>least in oracle, I can create a simple boolean (C convention) flag in the select
>list: select expr, nvl2(expr,0,1) Expr_Is_Null ...

This means that there is no just getting the value. Now, you
have to check everything for NULL. Quite the mess.

>> For one thing, NULL is
>> *NOT* a value. It is the absence of a value.
>Conceptually, so is any other default value. Only, with NULL, the database agrees.

Nonsense. NULL is not a value. A default value is a value.

>With other default values I have to code it all in the sql.

Yes, you set the default.

David Cressey

unread,
Sep 18, 2006, 4:22:55 PM9/18/06
to

"Volker Hetzer" <firstname...@ieee.org> wrote in message
news:eemdi3$1t3$1...@nntp.fujitsu-siemens.com...

> What am I missing?


> Btw, what would be the consequences of NULL=NULL being true?

NULL=NULL should not be true.
NULL=NULL should not be false.
NULL=NULL should not be UNKNOWN.
NULL=NULL should be NULL.


Volker Hetzer

unread,
Sep 18, 2006, 4:54:10 PM9/18/06
to
Gene Wirchenko schrieb:
> Volker Hetzer <firstname...@ieee.org> wrote:
>
>> Gene Wirchenko schrieb:
>>> Volker Hetzer <firstname...@ieee.org> wrote:
>>>
>>> [snip]
>> Pity.
>>
>>>> What I don't understand is that everyone who decries nulls, somewhere
>>>> works with default values. Ok, big deal, NULL is an universally applicable
>>>> default value that cannot be confused with real data.
>>>> What am I missing?
>>> That it can be confused with real data.
>> How?
>> Every interface worth its name has either a special flag or allows the mapping
>> to a certain value upon reading it out of the database. And if not, then, at
>> least in oracle, I can create a simple boolean (C convention) flag in the select
>> list: select expr, nvl2(expr,0,1) Expr_Is_Null ...
>
> This means that there is no just getting the value.
There is no just getting the value with default values either, because
a) I have to differentiate between value and non-value anyway, with default
values I just mix up application defaults with database values. Btw, the example I gave
was the most complex one for when default values don't work for the application.
Normally, if I'm concerned about NULLs I select nvl(expr,<DEFAULT>)... for the
client app and select expr for database internal work (like insert into ... select ...)
b) as soon as I've got an outer join, I've got NULLs anyway. And even if NULL would
be replaced by a bit of SQL syntax, every application, table, view, coder, company
could have its own default value, which might even change over time. Which is hard
to maintain and to code.

> Now, you
> have to check everything for NULL. Quite the mess.

Otherwise I'd have to check for the default value.
Worse, the default value may be different depending on what application
accesses the data (0, -1, INT_MIN, INT_MAX, ''), so it IMHO makes sense to store
an universally valid default. Which NULL happens to be.

>>> For one thing, NULL is
>>> *NOT* a value. It is the absence of a value.
>> Conceptually, so is any other default value. Only, with NULL, the database agrees.
>

That's why I wrote "conceptually". Both express a "not filled in", albeit one with
database support and one without it.

>
>> With other default values I have to code it all in the sql.
>
> Yes, you set the default.

But with NULL I don't. Or, only at the end of everything, when retrieving
final results and client specific.

By the way, before I exit this thread (I'm not that proficient in theory),
has anything changed between Date/Codd's original arguments and today?

Chris Lim

unread,
Sep 18, 2006, 4:58:30 PM9/18/06
to
Roy Hann wrote:
> It's worse than that. Suppose one user can't be bothered to find out if the
> customer is insolvent and another user decides the null means insolvency or
> otherwise isn't a relevant attribute.

I don't see the problem with NULLs. Yes you have to be careful when
using nullable columns in queries, but it's not the mess that you guys
make out (at least not in practical terms). It's certainly easier (when
writing queries) to deal with NULLs than to handle separate tables for
every optional attribute, which is the proposed solution (considering
how many NULLs there are in a 'typical' database, the number of extra
tables is enormous).

This is one argument where the theory just does not seem to have a
case, at least until SQL supports a better way of combining all the
optional attributes of an entity together to make querying easier. I
mean, with the whole surrogate key vs natural key debate I can see both
sides to the argument, and even though I wouldn't do it personally, I
can see how a database with purely natural keys could work. But a
database without NULLs? It might be theorectically correct, but it
would be a nightmare to write queries against.

Chris

Cimode

unread,
Sep 18, 2006, 5:11:29 PM9/18/06
to
What does IMO stand for?
> [snip]

mAsterdam

unread,
Sep 18, 2006, 6:43:06 PM9/18/06
to
Cimode wrote:

> What does IMO stand for?

In my opinion. See, for instance,
http://en.wikipedia.org/wiki/Internet_slang

Frank Hamersley

unread,
Sep 18, 2006, 8:47:29 PM9/18/06
to
Gene Wirchenko wrote:
[..]

> The value is unknown. What are the chances of two unknown pieces
> of data being equal?

Que! Exactly the same chance as 2 known pieces of data being equal!

Cheers, Frank.

-CELKO-

unread,
Sep 18, 2006, 10:14:59 PM9/18/06
to
>>, what would be the consequences of NULL=NULL being true?

NULL=NULL should not be true.
NULL=NULL should not be false.
NULL=NULL should not be UNKNOWN.
NULL=NULL should be NULL. <<

NULL is a missing *attribute* value; UNKNOWN is a *logical* value. The
first rule of NULLs is that they propagate. You can easily set up
contradictions that depend on the order evaluation when you have a
BOOLEAN data type. All SQLK data types must allow NULLs by definition.


NULL OR TRUE = NULL -- by definition
UNKNOWN OR TRUE = TRUE -- by definition

NULL AND TRUE = NULL -- by definition
UNKNOWN AND TRUE = UNKNOWN -- by definition

This is why we have the <exp> IS [NOT] [TRUE | FALSE |UNKNOWN]
predicate in SQL-92

Bob Badour

unread,
Sep 19, 2006, 12:16:46 AM9/19/06
to
Volker Hetzer wrote:

> Gene Wirchenko schrieb:
>
>> Volker Hetzer <firstname...@ieee.org> wrote:
>>
>>> Gene Wirchenko schrieb:
>>>
>>>> Volker Hetzer <firstname...@ieee.org> wrote:

Since default values are not necessary, this whole thread of discussion
is an empty waste of time.

Chris Lim

unread,
Sep 19, 2006, 12:22:45 AM9/19/06
to
Bob Badour wrote:
> Since default values are not necessary, this whole thread of discussion
> is an empty waste of time.

But it's not a NULL waste of time, is it? ;-)

Bob Badour

unread,
Sep 19, 2006, 12:23:49 AM9/19/06
to
Chris Lim wrote:

> Roy Hann wrote:
>
>>It's worse than that. Suppose one user can't be bothered to find out if the
>>customer is insolvent and another user decides the null means insolvency or
>>otherwise isn't a relevant attribute.
>
>
> I don't see the problem with NULLs. Yes you have to be careful when
> using nullable columns in queries, but it's not the mess that you guys
> make out (at least not in practical terms).

Then why have I had to spend so much time in my career explaining to
reasonably intelligent people why their queries returned the wrong answer?


It's certainly easier (when
> writing queries) to deal with NULLs than to handle separate tables for
> every optional attribute,

I must insist you back up that statement quantitatively and
qualitatively. It is far easier to deal with two names than with
surprisingly inconsistent semantics for the same reason it is far easier
to deal with a compile-time error than a run-time error.


which is the proposed solution (considering
> how many NULLs there are in a 'typical' database, the number of extra
> tables is enormous).

So? All a table is is a variable name. Do you likewise suggest
programmers use the same name for all variables in their programs?


> This is one argument where the theory just does not seem to have a
> case, at least until SQL supports a better way of combining all the
> optional attributes of an entity together to make querying easier.

SQL never will, which is ultimately why SQL is irrelevant.


I
> mean, with the whole surrogate key vs natural key debate I can see both
> sides to the argument, and even though I wouldn't do it personally, I
> can see how a database with purely natural keys could work.

If you can see two sides to the same thing, you don't have a clue what
you are talking about. A natural key is nothing more or less than a
familiar surrogate.


But a
> database without NULLs? It might be theorectically correct, but it
> would be a nightmare to write queries against.

I disagree. My personal experience dealing with scores of intelligent
database users suggests that NULL is the nightmare.

Chris Lim

unread,
Sep 19, 2006, 12:41:07 AM9/19/06
to
Bob Badour wrote:
> SQL never will, which is ultimately why SQL is irrelevant.

Then I guess your argument does not apply to databases using SQL (even
though this is comp.databases.theory, I mistakenly thought you were
talking about the real world, i.e. SQL).

I don't know why you have dealt with so many people who have trouble
with NULLs - to me this is a beginner's problem that is dealt with very
easily.

Chris

Marshall

unread,
Sep 19, 2006, 12:55:43 AM9/19/06
to
Bob Badour wrote:
>
> So? All a table is is a variable name. Do you likewise suggest
> programmers use the same name for all variables in their programs?

I run in to variations on this weird meme in a variety of contexts.
"We have to minimize the number of <X>" for some value of
X that costs nothing.

In comp.lang.java.programmer, it was not uncommon to have
someone describe a problem that would trivially be solved
by adding a new class, but they would reject that solution because
"too many classes."

Perhaps the (often misdirected) habit of conserving everything
in the natural world makes them feel like they have to do the
same in the world of abstractions. Whatever the cause, it's
annoying.


Marshall

Marshall

unread,
Sep 19, 2006, 12:56:22 AM9/19/06
to

How long have you been working with SQL? Just curious.


Marshall

Bob Badour

unread,
Sep 19, 2006, 12:58:17 AM9/19/06
to
Chris Lim wrote:

At this point in time, any discussion of NULL is a NULL waste of time as
opposed to for example a statutory waste of time.

Chris Lim

unread,
Sep 19, 2006, 1:04:15 AM9/19/06
to
Marshall wrote:
> In comp.lang.java.programmer, it was not uncommon to have
> someone describe a problem that would trivially be solved
> by adding a new class, but they would reject that solution because
> "too many classes."

The key difference here is 'trivially'. Adding lots (and we are talking
about lots when you consider the number of nullable columns) of tables
makes it much hard to query that data (especially if you also disallow
outer joins!). And if you are going to prove me wrong with an example,
at least use a table with lots of nullable columns instead of just one
or two.

Chris

Bob Badour

unread,
Sep 19, 2006, 1:05:51 AM9/19/06
to
Chris Lim wrote:

I can only conclude that you are either a neophyte or a dilettante. The
problems with NULL are legion and very well documented elsewhere.

NULL confuses reasonably intelligent people because it breaks important
identities those people have been conditioned to expect such as:

SUM(A) + SUM(B) = SUM(A+B)

select sum(a) from atable;
select sum(b) from atable;
select sum(a+b) from atable;

In the presense of NULL, those queries violate the identity above.

Chris Lim

unread,
Sep 19, 2006, 1:06:09 AM9/19/06
to
Marshall wrote:
> How long have you been working with SQL? Just curious.

12 years, but I learnt about NULLs and how to deal with them at
university.

Chris

Chris Lim

unread,
Sep 19, 2006, 1:16:56 AM9/19/06
to
Bob Badour wrote:
> NULL confuses reasonably intelligent people because it breaks important
> identities those people have been conditioned to expect such as:

I don't disagree with that. But show me a way to structure the database
without NULLs that does not make it overly complicated to write queries
for (in SQL). Otherwise I will take the lesser of two evils (i.e the
NULLs).

Chris

Chris Lim

unread,
Sep 19, 2006, 1:52:15 AM9/19/06
to
Bob Badour wrote:
> NULL confuses reasonably intelligent people because it breaks important
> identities those people have been conditioned to expect such as:

Just to confirm that I understand the method to avoid NULLs, let's take
this example:

CREATE TABLE Customers(
CustomerID INT NOT NULL,
FirstName VARCHAR(30) NOT NULL,
LastName VARCHAR(30) NOT NULL,
StatusCode CHAR(1) NOT NULL,
MiddleName VARCHAR(30) NULL,
BirthDate DATETIME NULL,
OccupationCode CHAR(5) NULL,
EthnicityCode CHAR(5) NULL,
ReligionCode CHAR(5) NULL

PRIMARY KEY (CustomerID)
);


SELECT CustomerID,
FirstName,
LastName,
MiddleName,
BirthDate,
OccupationCode,
EthnicityCode,
ReligionCode
FROM Customers;

Okay, we do not have any of the problems associated with NULLs here,
but my point is not the problems with NULLs (as you say they are
well-documented), but the overhead inccurred when avoiding NULLs. Am I
correct that this would be the way to avoid NULLs?

CREATE TABLE Customers(
CustomerID INT NOT NULL,
FirstName VARCHAR(30) NOT NULL,
LastName VARCHAR(30) NOT NULL,
StatusCode CHAR(1) NOT NULL,

PRIMARY KEY (CustomerID)
);

CREATE TABLE CustomerMiddleName(
CustomerID INT NOT NULL,
MiddleName VARCHAR(30) NOT NULL,

PRIMARY KEY (CustomerID)
);


CREATE TABLE CustomerBirthDate(
CustomerID INT NOT NULL,
BirthDate DATETIME NOT NULL,

PRIMARY KEY (CustomerID)
);

CREATE TABLE CustomerOccupation(
CustomerID INT NOT NULL,
OccupationCode CHAR(5) NOT NULL,

PRIMARY KEY (CustomerID)
);

CREATE TABLE CustomerEthnicity(
CustomerID INT NOT NULL,
EthnicityCode CHAR(5) NOT NULL,

PRIMARY KEY (CustomerID)
);

CREATE TABLE CustomerReligion(
CustomerID INT NOT NULL,
ReligionCode CHAR(5) NOT NULL

PRIMARY KEY (CustomerID)
);

SELECT CustomerID,
FirstName,
LastName,
MiddleName,
BirthDate = (SELECT BirthDate
FROM CustomerBirthDate c
WHERE c.CustomerID = Customers.CustomerID),
OccupationCode = (SELECT
OccupationCode
FROM CustomerOccupation c
WHERE c.CustomerID =
Customers.CustomerID),
EthnicityCode = (SELECT EthnicityCode
FROM CustomerEthnicity c
WHERE c.CustomerID =
Customers.CustomerID),
ReligionCode = (SELECT ReligionCode
FROM CustomerReligion c
WHERE c.CustomerID =
Customers.CustomerID)
FROM Customers;

If there is an easier way in SQL to do this (without using outer joins,
which I understand is also a no-no), let me know. Otherwise here we
have a trivial table and query which has a reasonable amount of
overhead added to it by avoiding NULLs. Imagine a more complicated
query.

Chris

Roy Hann

unread,
Sep 19, 2006, 2:59:37 AM9/19/06
to
"Chris Lim" <black...@hotmail.com> wrote in message
news:1158642255....@h48g2000cwc.googlegroups.com...

You are incorrectly assuming that each nullable column gives rise to a
separate table. Based on the three or four dozen databases that I work
with regularly that wouldn't usually happen. In most cases a table will
have 5, 10, maybe more nullable columns because it conflates two or three
different entity types. Thus just one or two additional tables are often
sufficient.

But if more were required, so what? You've got to write the code to sort it
out somewhere, so why not in the query?

Roy


Jonathan Leffler

unread,
Sep 19, 2006, 3:03:07 AM9/19/06
to
-CELKO- wrote:
>>> , what would be the consequences of NULL=NULL being true?
>
> NULL=NULL should not be true.
> NULL=NULL should not be false.
> NULL=NULL should not be UNKNOWN.
> NULL=NULL should be NULL. <<
>
> NULL is a missing *attribute* value; UNKNOWN is a *logical* value. The
> first rule of NULLs is that they propagate. You can easily set up
> contradictions that depend on the order evaluation when you have a
> BOOLEAN data type. All SQLK data types must allow NULLs by definition.
>
>
> NULL OR TRUE = NULL -- by definition

By which definition? I thought that was TRUE by definition?

OR T ? F
T T T T
? T ? ?
F T ? F

(Recommended to use constant-width font for viewing the table.)

> UNKNOWN OR TRUE = TRUE -- by definition
>
> NULL AND TRUE = NULL -- by definition
> UNKNOWN AND TRUE = UNKNOWN -- by definition
>
> This is why we have the <exp> IS [NOT] [TRUE | FALSE |UNKNOWN]
> predicate in SQL-92


--
Jonathan Leffler #include <disclaimer.h>
Email: jlef...@earthlink.net, jlef...@us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/

Chris Lim

unread,
Sep 19, 2006, 3:38:02 AM9/19/06
to
Roy Hann wrote:
> You are incorrectly assuming that each nullable column gives rise to a
> separate table. Based on the three or four dozen databases that I work
> with regularly that wouldn't usually happen. In most cases a table will
> have 5, 10, maybe more nullable columns because it conflates two or three
> different entity types. Thus just one or two additional tables are often
> sufficient.

I do tend to use this approach for sub-typing. e.g. In my Customers
example, I would split out attributes about a business customer (a
company) into a separate table, attributes about a personal customer (a
person) into a separate table, and leave the generic attributes in the
Customers table.

However in one of the databases I currently work on, there are dozens
of attributes captured (particularly for personal customers) for
marketing purposes. These are all optional as a customer does not have
to supply them. To avoid NULLs, I would end up with dozens of tables,
complicating the extremely large number of queries that reference them.

There is also a duplicate-checking process which identifies potential
duplicates between customers, using several of the optional attributes
(e.g. First Name, Middle Name, Last Name, Birth Date, as a unique
identifier is not always available). The current queries are
complicated enough (and NULLs are easily handled with ISNULLs).
Increasing the number of tables just to avoid NULLs would be a
nightmare in this scenario (unless outer joins were permitted, but then
you'd end up dealing with NULLs anyway).

> But if more were required, so what? You've got to write the code to sort it
> out somewhere, so why not in the query?

I would weigh up the advantages of avoiding NULLs with the extra
overhead incurred by creating the extra tables (in terms of complexity
in querying the data). In some cases I do create extra tables, but I
just find the thought of taking this approach to the extreme to avoid
all NULLs to be a bit, well, extreme.

Chris

Roy Hann

unread,
Sep 19, 2006, 4:32:56 AM9/19/06
to
"Chris Lim" <black...@hotmail.com> wrote in message
news:1158651482.2...@i42g2000cwa.googlegroups.com...

> However in one of the databases I currently work on, there are dozens
> of attributes captured (particularly for personal customers) for
> marketing purposes. These are all optional as a customer does not have
> to supply them. To avoid NULLs, I would end up with dozens of tables,
> complicating the extremely large number of queries that reference them.

I am afraid I don't see the benefit of using NULL to "record" the certain
knowledge that someone declined to supply a piece of information. However,
if this were the only way to use NULL I wouldn't bother objecting.
Unfortunately there is a very different way to use NULL and it is only your
self-discipline (and mine) that keeps us out of trouble. So as far as I can
see nullability is a barely useful feature that can also very easily cause
serious problems in return. Too many of my predecessors had no
self-discipline whatever and created a legacy of complication and
uncertainty that I have to live with every day.

> There is also a duplicate-checking process which identifies potential
> duplicates between customers, using several of the optional attributes
> (e.g. First Name, Middle Name, Last Name, Birth Date, as a unique
> identifier is not always available).

Having worked on both medical and police intelligence applications I know
very well how hard the problem of identification is. But there are
solutions to the problem based on a subtle understanding of what the
business problem really is. NULLs have no part in any of the robust
solutions.

> The current queries are
> complicated enough (and NULLs are easily handled with ISNULLs).

No doubt--because you are doing at run-time what could have been done at
design time. How is your life made simpler by ISNULL?

> Increasing the number of tables just to avoid NULLs would be a
> nightmare in this scenario (unless outer joins were permitted, but then
> you'd end up dealing with NULLs anyway).

I will let that pass, although I have some opinions on it.

>> But if more were required, so what? You've got to write the code to sort
>> it
>> out somewhere, so why not in the query?
>
> I would weigh up the advantages of avoiding NULLs with the extra
> overhead incurred by creating the extra tables (in terms of complexity
> in querying the data).

The compexity doesn't go away. Any additional tables merely reveal the
complexity that is otherwise hidden, lying in wait but still very much
there. If you don't resolve it with a complex query you have to resolve it
with complex code somewhere else. As horrid as it is, SQL is still almost
always more concise than whatever ghastly 3GL language we might use
otherwise.

> In some cases I do create extra tables, but I
> just find the thought of taking this approach to the extreme to avoid
> all NULLs to be a bit, well, extreme.

I think perhaps we are not as far apart as it seems. For instance we agree
that entity subtyping makes a large number of spuriously nullable attributes
unnecessary. Many remaining nullable attributes probably don't need to be
nullable even given the limitations of SQL data types; they are used freely
only because many people see no harm in them.

I won't go so far as to insist you don't ever need to introduce nullable
attributes into an SQL database, but only because I couldn't defend that
position except to say that I don't remember the last time I had absolutely
no choice but to do so.

Roy


Chris Lim

unread,
Sep 19, 2006, 5:00:58 AM9/19/06
to
Roy Hann wrote:
> No doubt--because you are doing at run-time what could have been done at
> design time. How is your life made simpler by ISNULL?

For example, in the duplicate-checking process I mentioned earlier,
there might be some code such as:

SELECT FirstName,
LastName,
ISNULL(MiddleName, '') AS MiddleName,
COUNT(*) AS Cnt
FROM Customers
GROUP BY
FirstName,
LastName,
ISNULL(MiddleName, '')
HAVING
COUNT(*) > 1

This single query covers both customers with and without a MiddleName.
If I had a separate CustomerMiddleName table, I would need to do a
UNION to achieve the same thing (or a LEFT OUTER JOIN, but apparently
these are bad too - what is your opinion on this?):

SELECT C.FirstName,
C.LastName,
'' AS MiddleName,
COUNT(*) AS Cnt
FROM Customers C
WHERE
NOT EXISTS(SELECT *
FROM CustomerMiddleName CM
WHERE CM.CustomerID = C.CustomerID)
GROUP BY
C.FirstName,
C.LastName

UNION

SELECT C.FirstName,
C.LastName,
CM.MiddleName,
COUNT(*) AS Cnt
FROM Customers C
INNER JOIN CustomerMiddleName CM
ON CM.CustomerID = C.CustomerID
GROUP BY
FirstName,
LastName,
MiddleName
HAVING
COUNT(*) > 1


> The compexity doesn't go away. Any additional tables merely reveal the
> complexity that is otherwise hidden, lying in wait but still very much
> there. If you don't resolve it with a complex query you have to resolve it
> with complex code somewhere else. As horrid as it is, SQL is still almost

Isn't it better to hide complexity then, assuming that you end up with
the correct result? Like you say, self-discipline keeps us out of
trouble.

Chris

Chris Lim

unread,
Sep 19, 2006, 5:28:41 AM9/19/06
to
Chris Lim wrote:
> SELECT FirstName,
> LastName,
> ISNULL(MiddleName, '') AS MiddleName,

Bad example of ISNULL since it's not required in this case!

A simpler example:

SELECT CustomerID,
FirstName
+ ISNULL(' ' + MiddleName, '')
+ ' ' + LastName AS FullName
FROM Customers

If you don't have a problem with outer joins then it's not too bad, but
you still have to deal with NULLs:

SELECT C.CustomerID,
C.FirstName
+ ISNULL(' ' + CM.MiddleName, '')
+ ' ' + C.LastName AS FullName
FROM Customers C
LEFT OUTER JOIN CustomerMiddleName CM
ON CM.CustomerID = C.CustomerID

Otherwise it requires a UNION:

SELECT C.CustomerID,
C.FirstName
+ ' ' + C.LastName AS FullName


FROM Customers C
WHERE NOT EXISTS(
SELECT *
FROM CustomerMiddleName CM
WHERE CM.CustomerID = C.CustomerID
)

UNION

SELECT C.CustomerID,
C.FirstName
+ ' ' + CM.MiddleName
+ ' ' + C.LastName AS FullName


FROM Customers C
INNER JOIN CustomerMiddleName CM
ON CM.CustomerID = C.CustomerID

Chris

Roy Hann

unread,
Sep 19, 2006, 6:05:08 AM9/19/06
to
"Chris Lim" <black...@hotmail.com> wrote in message
news:1158656458.8...@m73g2000cwd.googlegroups.com...

>> The compexity doesn't go away. Any additional tables merely reveal the
>> complexity that is otherwise hidden, lying in wait but still very much
>> there. If you don't resolve it with a complex query you have to resolve
>> it
>> with complex code somewhere else. As horrid as it is, SQL is still
>> almost
>
> Isn't it better to hide complexity then, assuming that you end up with
> the correct result?

I have no problem hiding irrelevant complexity. But it has to be
understood, and hidden in an appropriate way so that your assumption that
the result is correct will always be safe.

In an ideal DBMS much complexity could be safely hidden inside suitable
datatypes. Of course many/most SQL implementations have very limited
support for complex types. But even so, we do seem to have an urge to
introduce spurious complexity even when SQL doesn't impose it on use. For
example, I don't think I've ever seen a database where the internal fine
structure of a postal address wasn't layed out in all it's irrelevant glory.
I can see why in many businesses you'd want to know the postal code, but the
rest of it is usually irrelevant to anyone but the posty walking the route.
(And just to be clear, I am sure that there are obscure applications for
which one does need to model the fine structure of the postal address,
perhaps down to the level of the forward sortation area and local delivery
unit. My point is that much complexity I see is not really dictated by the
enterprise of interest at all but is either thoughtlessly introduced, or
introduced "just in case" by rogue designers. Which is ironic since so many
would probably then also claim to want to minimize the number of tables "to
reduce complexity".)

Roy

PS: This is a particularly sore point with me because my address is strange.
As a result I regularly have trouble using Internet shopping sites, the GPS
in my car can't be told where I live, and I recently had a credit card
application declined because they couldn't match my address to some register
or other. In all cases it was because a bunch of database designers got
right up themselves modelling irrelevant complexities thus making everyone's
life much harder.

I'm over it now.

Really.


peter koch

unread,
Sep 19, 2006, 7:19:12 AM9/19/06
to

Bob Badour wrote:
> Chris Lim wrote:
>
> > Roy Hann wrote:
> >

[snip]

>
> Then why have I had to spend so much time in my career explaining to
> reasonably intelligent people why their queries returned the wrong answer?
>

[snip]

>
> I must insist you back up that statement quantitatively and
> qualitatively. It is far easier to deal with two names than with
> surprisingly inconsistent semantics for the same reason it is far easier
> to deal with a compile-time error than a run-time error.
>

[snip]


> But a
> > database without NULLs? It might be theorectically correct, but it
> > would be a nightmare to write queries against.
>
> I disagree. My personal experience dealing with scores of intelligent
> database users suggests that NULL is the nightmare.

All these arguments against NULL are only valid against some specific
implementations of that concept - here SQL in its various dialects.
They are not arguments against the concept of having null-values (e.g.
to represent unknown values) in some database system. And this is
comp.databases.theory after all.

/Peter

Bob Badour

unread,
Sep 19, 2006, 8:16:14 AM9/19/06
to
peter koch wrote:

I have found the arguments against 3VL and NVL for N > 2 compelling. For
example, if DEE and DUM are canonical relations for true and false, what
are the canonical relations for the other logical values?

We currently have no theory that I am aware of for dealing with missing
information. At this time, I cannot even imagine what such a theory
would look like because science and the scientific method have such
strong roots in empiricism. Hopefully someone much smarter than me will
come up with a sound one, though.

The whole point of NULL is to have something other than a value to
represent missing information. To use a value to represent unknown, what
we really need are extensible data type systems that as much as possible
enforce consistency. For example, if SUM is an aggregate of +, an
extensible data type system should enforce consistent semantics +
semantics for SUM.

peter koch

unread,
Sep 19, 2006, 9:28:54 AM9/19/06
to

Other logical values? I believe I miss your point here. If we can agree
that there is a basic boolean type which can take the values true and
false (this is all we need in order to create other, more complex
types), there could be a nullable type named e.g. nullable<boolean>
consisting of a pair of booleans.
If you prefer to see this as tables/relations this is alright with me -
but why on earth the names DEE resp. DUM - and how do you describe
anything at all if the logical types are not primitive?

>
> We currently have no theory that I am aware of for dealing with missing
> information. At this time, I cannot even imagine what such a theory
> would look like because science and the scientific method have such
> strong roots in empiricism. Hopefully someone much smarter than me will
> come up with a sound one, though.

That I do not understand either. Somewhere else in this thread you gave
an example where missing information was involved. Do you mean to say
that that example was meaningless?

>
> The whole point of NULL is to have something other than a value to
> represent missing information. To use a value to represent unknown, what
> we really need are extensible data type systems that as much as possible
> enforce consistency. For example, if SUM is an aggregate of +, an
> extensible data type system should enforce consistent semantics +
> semantics for SUM.

I would say that "+" was defined with respect to a type (or a set of
types) and that SUM (in its least generic form) must be defined in
terms of +. Of course since for a given type T, nullable<T> extends T
and depending on both nullable and T the meaning of SUM could be given
different interpretations (as could SUM also for a given T)

/Peter

Bob Badour

unread,
Sep 19, 2006, 10:01:10 AM9/19/06
to
peter koch wrote:

What is NULL if not the basis for 3 valued logic or 4 valued logic?


If we can agree
> that there is a basic boolean type which can take the values true and
> false (this is all we need in order to create other, more complex
> types), there could be a nullable type named e.g. nullable<boolean>
> consisting of a pair of booleans.

Can we agree that the algebra of nullable<boolean> is not boolean
algebra and is not 2-valued logic? Similarly, we can create a string
type or an integer type whose algebras are neither boolean algebra nor
2-valued logic.


> If you prefer to see this as tables/relations this is alright with me -
> but why on earth the names DEE resp. DUM - and how do you describe
> anything at all if the logical types are not primitive?

Relations are primitive in the relational model. DEE and DUM are the
values of a canonical relation type.


>>We currently have no theory that I am aware of for dealing with missing
>>information. At this time, I cannot even imagine what such a theory
>>would look like because science and the scientific method have such
>>strong roots in empiricism. Hopefully someone much smarter than me will
>>come up with a sound one, though.
>
> That I do not understand either. Somewhere else in this thread you gave
> an example where missing information was involved. Do you mean to say
> that that example was meaningless?

No, not at all. My example demonstrated a succinct syntax for explicit
expression of how to treat missing information in a specific context.
The example was basically an explicit override of the default closed
world assumption. The example used neither defaults nor markers.


>>The whole point of NULL is to have something other than a value to
>>represent missing information. To use a value to represent unknown, what
>>we really need are extensible data type systems that as much as possible
>>enforce consistency. For example, if SUM is an aggregate of +, an
>>extensible data type system should enforce consistent semantics +
>>semantics for SUM.
>
> I would say that "+" was defined with respect to a type (or a set of
> types) and that SUM (in its least generic form) must be defined in
> terms of +.

How does saying "one defines it with respect to a type" change what I
said about using a data type system? It sounds to me like you agree
vehemently.


Of course since for a given type T, nullable<T> extends T

I don't necessarily agree. Whether one creates a union type as a proper
supertype of T or a disjoint type is a matter of design. In both cases,
one extends the type system but only the former extends T.


> and depending on both nullable and T the meaning of SUM could be given
> different interpretations (as could SUM also for a given T)

If one expresses the semantics of SUM as an aggregate over + with
identity element 0, the type system will enforce consistency. Of course,
since SUM and + are merely symbols, one could express entirely different
semantics (as questionable as doing so might be.)

Bob Badour

unread,
Sep 19, 2006, 10:39:17 AM9/19/06
to
Roy Hann wrote:

I had a similar problem recently. After I filed my correct address with
the IRS, they discarded half of it because it didn't fit their idea of
what an address should look like. As a result, I received some very
important correspondence from them about nine months after the fact. (I
am dumbfounded I ever received it at all.)

They didn't seem to get the fact that from my perspective the address
they have on file for me is the address I filed with them on physical
documents and not the partial address they recorded in their database.

Bob Badour

unread,
Sep 19, 2006, 10:47:05 AM9/19/06
to
Chris Lim wrote:

> Roy Hann wrote:
>
>>You are incorrectly assuming that each nullable column gives rise to a
>>separate table. Based on the three or four dozen databases that I work
>>with regularly that wouldn't usually happen. In most cases a table will
>>have 5, 10, maybe more nullable columns because it conflates two or three
>>different entity types. Thus just one or two additional tables are often
>>sufficient.
>
> I do tend to use this approach for sub-typing. e.g. In my Customers
> example, I would split out attributes about a business customer (a
> company) into a separate table, attributes about a personal customer (a
> person) into a separate table, and leave the generic attributes in the
> Customers table.
>
> However in one of the databases I currently work on, there are dozens
> of attributes captured (particularly for personal customers) for
> marketing purposes. These are all optional as a customer does not have
> to supply them. To avoid NULLs, I would end up with dozens of tables,
> complicating the extremely large number of queries that reference them.

Or you could correctly record the known information. In the case of a
middle name, you know the customer entered a character string of length
zero for the middle name. If the customer was given several options for
some question and chose the option to decline answering, you know they
chose that option. etc.

I have worked with direct marketing databases with millions of customer
records that got that part right. In fact, I worked with one in 1994
right around the time you started your career. Even though the database
had no nulls, I was forever explaining to the business analysts why the
queries they wrote using outer joins did not behave as expected.

In the ludicrously oversimplified examples you gave with middle names
and isnull, had you correctly recorded a known zero-length string for
the middle name instead of using NULL, the comparison you gave would
become the much simpler "a = b".

JOG

unread,
Sep 19, 2006, 3:11:47 PM9/19/06
to
Bob Badour wrote:
> What is NULL if not the basis for 3 valued logic or 4 valued logic?

An ugly hack to workaround the fact that most designers are either too
lazy or too lacking in pride to design their databases correctly?

It is loading more messages.
0 new messages