boolean datatype ... wtf?

paul c

unread,

Sep 28, 2010, 2:53:45 PM9/28/10

to

Looking for qbql, I happened on a nice post (Sept 16) by Vadim at:

http://vadimtropashko.wordpress.com/

I know as little about the machinations in 'ask tom' as I do about what
goes on in the heads of the self-styled 'stackoverflow' posters
mentioned here lately but talk about chasing one's tail (for nearly ten
years at that)! Running around in circles because it would seem that
the 'type' was there all along had it occurred to the SQL authors that a
heading can be empty and therefore a tuple can be the empty set and vice
versa. Instead, people are arguing about ways to re-invent a relation
that's inherent and staring them in face. I think Vadim was right on
when he disparaged the recording of derived info (what he called
'calculated') instead of the determining info. What a wasteful industry
IT is.

Maybe there's an excuse for the original post dating 'only' four years
after TTM 1st edition was published. I'll bet some of these posters are
the same people who say there's been no progress in RT for twenty years,
but I'd say that's only because they don't bother reading what the
deeper thinkers write.

Paul Mansour

unread,

Sep 29, 2010, 7:51:51 AM9/29/10

to

What is the problem with a Boolean data type?
.
It is fundamental - so fundamental that in TTM it is the only
required scalar data type: “We require that at least one built-in
scalar type be supported : Namely, type “Boolean” (BOOLEAN in Tutorial
D”).

Date and Darwen go on to give the obvious reasons for this.

The fact that DEE and DUM may be interpreted as TRUE and FALSE is not
relevant. The result of A=B or A>B is not a relation.

Erwin

unread,

Sep 29, 2010, 8:48:49 AM9/29/10

to

On 29 sep, 13:51, Paul Mansour <psmansour2...@yahoo.com> wrote:
>
> What is the problem with a Boolean data type?
> .
> It is fundamental - so fundamental that in TTM it is the only
> required scalar data type: “We require that at least one built-in
> scalar type be supported : Namely, type “Boolean” (BOOLEAN in Tutorial
> D”).

It is fundamental _FOR A COMPUTING LANGUAGE_ because that computing
language might have a need for evaluating some expression that follows
the 'IF' (or for evaluating a ternary operator of the style <xpr> ?
result1 : result2).

It is, imo, _NOT_ fundamental in the context of actual database
design. In fact, I think that the justifiable cases for including a
BOOLEAN in an actual database design (not talking of derived relvars
aka views) are few and far between, if existant at all. I think that
is precisely what VT was talking of: the RM already has a way for
representing truth information (as the presence/absence of some tuple
in some relvar with some particular predicate), and as a consequence
the type BOOLEAN (_WITHIN DATABASE RELVARS_) must be considered
redundant and unnecessary.

That the relation type {} has the same cardinality as the set of 2VL
truth values, and that therefore, when amended/augmented/provided with
a proper set of operators, it is isomorphic to boolean algebra, seems
all too obvious. Whether that can be exploited usefully in the design
of a language that wants to eliminate true and false (as such, only to
replace those with DEE and DUM), is a bit less obvious.

Tony Andrews

unread,

Sep 29, 2010, 9:39:22 AM9/29/10

to

On Sep 28, 7:53 pm, paul c <toledobythe...@oohay.ac> wrote:
> I know as little about the machinations in 'ask tom' as I do about what
> goes on in the heads of the self-styled 'stackoverflow' posters

> mentioned here lately ...

In what sense are these stackoverflow posters 'self-styled'? Do you
mean that they don't actually post to stackoverflow.com, but just
pretend that they do? I'm just curious!

paul c

unread,

Sep 29, 2010, 9:46:04 AM9/29/10

to

On 29/09/2010 4:51 AM, Paul Mansour wrote:
...

>
> What is the problem with a Boolean data type?
> .
> It is fundamental - so fundamental that in TTM it is the only
> required scalar data type: “We require that at least one built-in
> scalar type be supported : Namely, type “Boolean” (BOOLEAN in Tutorial
> D”).
>
> Date and Darwen go on to give the obvious reasons for this.
>
> The fact that DEE and DUM may be interpreted as TRUE and FALSE is not
> relevant. The result of A=B or A>B is not a relation.

Aren't A=B and A>B relations?

paul c

unread,

Sep 29, 2010, 9:47:49 AM9/29/10

to

They didn't understand Bob B's question about keys, so they concluded it
wasn't important.

Bob Badour

unread,

Sep 29, 2010, 10:03:43 AM9/29/10

to

paul c wrote:

They are neither more nor less relations than X*X + Y*Y + Z*Z = R*R is a
relation.

But what does it mean to be the result of a relation?

Erwin

unread,

Sep 29, 2010, 10:07:21 AM9/29/10

to

Here we go again ...

But then again, incidentally, he did not say that 'A=B' isn't a
relation. He said that _THE RESULT OF_ 'A=B' isn't a relation.

Conceptually, it could be argued that what a language implementation
achieves when it evaluates an expression such as 'A=B', is to join a
tuple {A,B} with the EQUALS relation (which is implicitly defined by
the system) of the pertinent type, and then check whether the result
of that join is empty or not ...

Fair bit of hairsplitting going on, if you ask me ...

Paul Mansour

unread,

Sep 29, 2010, 10:13:02 AM9/29/10

to

I was under the impression that they are not, even if A and B are
relations. But I may have misintepreted C.J. Date on this.

In "Database in Depth" he writes:

"In Chapter 2, I mentioned the fact that the equality comparison
operator "=" applies to every type. In partitular, therefore it
applies to relation types.... Now I must immediatly explain that these
opeartors are not relational operators as such -- that is they are not
part of the relational algebra-- because their result is a truth
value, not a relation."

paul c

unread,

Sep 29, 2010, 10:26:56 AM9/29/10

to

> ...

I guess it would have been more precise to ask "aren't the equals and
greater than operators expressible as relations?"

Not to say they should be recorded as relations but I don't see the need
to record a value 'Y' of type 'character' or suchlike as was suggested
in the asktom column when it seems that projection is conceptually all
that is needed to determine truth or falseness. Not suggesting either
that that was Vadim T's main point.

Paul Mansour

unread,

Sep 29, 2010, 10:41:31 AM9/29/10

to

Erwin, I don't disagree that a boolean type may be rarely if ever
necessary in a base relation. But it seems somewhat awkard and
artificial to allow them in derived relvars and not database relvars.
And they occur whether one wants them to or not if you have a where or
a having clause.

I also realize that there is a distinction between database design,
and something like tutorial D, a language interface to a DMBS. But it
does not seem unresonable to say that boolean types are fundamental to
relational database theory and to any RDMS, even if that RDMS does not
have an explicit boolean type available to the user (wrongly, I
think). Maybe that is only because they are fundamental to the
language interface, but that seems like a distinction without much
difference.

Bob Badour

unread,

Sep 29, 2010, 10:45:24 AM9/29/10

to

Paul Mansour wrote:

An algebra is a set of values and a set of operations closed on that set
of values.

Date is saying that comparisons are not part of the relational algebra
in the same way that division is not part of the integer algebra because
dividing any two arbitrary integers is not necessarily closed on integers.

Consider an expression of the form:

R = f(A,B)

where both A and B are relations and f is some binary operation. If f is
natural join, then R is a relation. If f is a comparison, then R is not
a relation; it is a boolean.

Similarly, if A and B are integers and f is multiplication, then R is an
integer whereas if f is division, R is not necessarily an integer.

The expression R = f(A,B) itself, however, is always a relation. We
could re-express the relation as g(R,A,B)
g: {(R,A,B)| R = f(A,B)}

If f is part of some algebra, all three of R, A and B, will have the
same type.

If one does away with the form R = f(A,B) entirely, then comparison is
just a relation Eq(A,B) and natural join is just a relation Jn(R,A,B)
and division is just a relation Dv(Quotient,Dividend,Divisor) | Divisor
!= 0.

paul c

unread,

Sep 29, 2010, 10:55:19 AM9/29/10

to

On 29/09/2010 7:41 AM, Paul Mansour wrote:
> I also realize that there is a distinction between database design,
> and something like tutorial D, a language interface to a DMBS. But it
> does not seem unresonable to say that boolean types are fundamental to
> relational database theory and to any RDMS, even if that RDMS does not
> have an explicit boolean type available to the user (wrongly, I
> think). Maybe that is only because they are fundamental to the
> language interface, but that seems like a distinction without much
> difference.
>

I presume 'boolean types', plural, is just a typo' here (but not in the
asktom column!) Ie., there might be many syntaxes but only one type.

Vadim Tropashko

unread,

Sep 29, 2010, 11:00:36 AM9/29/10

to

> Fair bit of hairsplitting going on, if you ask me ...- Hide quoted text -

I would argue that we join (ordinary finite) relations with equality
predicates much more often that it appeared at the first sight. One
operation which utility can rival that of natural join is composition
(aka set intersection join). This is merely a natural join projected
to the set of distinct attributes. For example composition of

Pets=[nickname specie]
folly dog
scruffy cat
;

with

Fed=[name nickname time]
claire folly 200
max folly 200
max folly 300
max scruffy 200
;

is

Pets/^Fed=[name specie time]
claire dog 200
max cat 200
max dog 200
max dog 300
;

It is essentially natural join with the common column "nickname"
REMOVEd (to use A-Algebra terminology:-).

Renaming -- which is arguably second most used relational operation --
is composition with equality predicate:

Pets/^[nickname=petname]=[petname specie]
folly dog
scruffy cat
;

Paul Mansour

unread,

Sep 29, 2010, 11:12:31 AM9/29/10

to

Yes indeed. Should have been "it is" rather than "they are" as well.

paul c

unread,

Sep 29, 2010, 11:38:05 AM9/29/10

to

On 29/09/2010 7:45 AM, Bob Badour wrote:
> Paul Mansour wrote:
>
>> On Sep 29, 9:46 am, paul c <toledobythe...@oohay.ac> wrote:
>>

...

>>> Aren't A=B and A>B relations?
>>
>> I was under the impression that they are not, even if A and B are
>> relations. But I may have misintepreted C.J. Date on this.
>>
>> In "Database in Depth" he writes:
>>
>> "In Chapter 2, I mentioned the fact that the equality comparison
>> operator "=" applies to every type. In partitular, therefore it
>> applies to relation types.... Now I must immediatly explain that these
>> opeartors are not relational operators as such -- that is they are not
>> part of the relational algebra-- because their result is a truth
>> value, not a relation."
>
> An algebra is a set of values and a set of operations closed on that set
> of values.
>
> Date is saying that comparisons are not part of the relational algebra
> in the same way that division is not part of the integer algebra because
> dividing any two arbitrary integers is not necessarily closed on integers.
>
> Consider an expression of the form:
>
> R = f(A,B)
>
> where both A and B are relations and f is some binary operation. If f is
> natural join, then R is a relation. If f is a comparison, then R is not
> a relation; it is a boolean.

> ...

Thanks for the precise example. It seems to me that Date means your
last sentence as his starting point. Nothing 'wrong' with that, but my
question might be 'why does that have to be so'? Isn't it just as valid
to say that R can be one of DEE or DUM for certain f?

Bob Badour

unread,

Sep 29, 2010, 11:45:40 AM9/29/10

to

paul c wrote:

Sure, but that's a different f. The first f is part of a closed boolean
algebra, and the second f is not.

Paul Mansour

unread,

Sep 29, 2010, 12:36:28 PM9/29/10

to

> to say that R can be one of DEE or DUM for certain f?- Hide quoted text -
>
> - Show quoted text -

Regarding "why does that have to be so?"...

The quote from Date above follows and refers to a discussion of the
equality operator for types. It appears to suggest that there should
be some consistancy in the result of the operator.

Are you suggesting that 5=6 returns DEE as well, or that 5=6 returns
a boolean scalar while R1=R2 (where Rx is a relation) returns DEE or
DUM? It seems to me that while it is very important to recognize the
value of the concepts of DEE and DUM, that attempting to eliminate a
boolean type and replace it with DEE and DUM is only going to add
complexity.

paul c

unread,

Sep 29, 2010, 12:47:29 PM9/29/10

to

Okay, then my next question is why couldn't any operation on relations
return a relation? eg., subsetof(r1,r2).

Is it because language designers want to use the same 'type' for ops
that don't operate on relations? If so, I'd say that's their right,
just as it's Erwin S's right to call "whether A=B should be treated as a
relation" hairsplitting. I don't see a logical requirement for either
attitude, although I'm sure there are lots of practical ones.

Regarding division by zero, as impractical as it might be, I also might
have a mental block about that. If there were an op called 'integer
division', is there a theoretical reason why its operands couldn't be
relations and the result an empty relation?

Bob Badour

unread,

Sep 29, 2010, 1:04:50 PM9/29/10

to

paul c wrote:

No. The result of 1/0 is undefined not an empty set of values, and the
result of 0/0 is indeterminant not the set of all integers. Dividing by
zero is not the only case where division is not closed on integers: 1/2
is not an integer.

paul c

unread,

Sep 29, 2010, 1:12:21 PM9/29/10

to

I think all I'm suggesting is that a relation called
'integer_five_equals_six' with value (omitting types):

x y
5 6

is considered true if it appears in a db. We could materialize it to
see what integers make it true but if we don't care which ones do that,
we can avoid the materialization and just project it on the empty set of
attributes. I'm thinking this comes down to saying that strictly
speaking a minimal language doesn't even need to embed names such as
'dee' and 'dum', it could be inherent in the syntax, eg., I think TTM
allows what 'dee' stands for to be written as RELATION {} {} or suchlike.

No argument if you say that is too ponderous or more complex than users
want but I think that is an answer to a different question than "is it
logical?".

paul c

unread,

Sep 29, 2010, 1:23:10 PM9/29/10

to

On 29/09/2010 10:04 AM, Bob Badour wrote:
> paul c wrote:
>
>> On 29/09/2010 8:45 AM, Bob Badour wrote:
>>
>>> paul c wrote:
>>>

...

>> Regarding division by zero, as impractical as it might be, I also
>> might have a mental block about that. If there were an op called
>> 'integer division', is there a theoretical reason why its operands
>> couldn't be relations and the result an empty relation?
>
> No. The result of 1/0 is undefined not an empty set of values, and the
> result of 0/0 is indeterminant not the set of all integers. Dividing by
> zero is not the only case where division is not closed on integers: 1/2
> is not an integer.

Yes, I was taught that dividing the integer 1 by the integer 0 is
undefined. Maybe I should have suggested such an op be called
'relational integer division'. Surely the set of integers for which 1/2
results in an integer is empty?

Bob Badour

unread,

Sep 29, 2010, 2:42:52 PM9/29/10

to

paul c wrote:

But the result of division is not empty.

David BL

unread,

Sep 29, 2010, 11:53:34 PM9/29/10

to

Nonsense. Assuming that we don't consider the boolean values to be
relations, an expression like

R = f(A,B)

is not a set of tuples. In a sentence in which this expression
appears it stands for or evaluates to some boolean value. By
contrast the expressions

g

and

{(R,A,B)| R = f(A,B)}

indeed stand for a set of tuples.

David BL

unread,

Sep 30, 2010, 2:03:22 AM9/30/10

to

On Sep 29, 8:48 pm, Erwin <e.sm...@myonline.be> wrote:

> the RM already has a way for
> representing truth information (as the presence/absence of some tuple
> in some relvar with some particular predicate), and as a consequence
> the type BOOLEAN (_WITHIN DATABASE RELVARS_) must be considered
> redundant and unnecessary.

What if I want to record a base relation like the following?

a b c d
----------
F F F F
F F T T
F T F T
T F F F
T F T T

These boolean attributes can't be eliminated in the manner you
suggest.

On another note, I can easily imagine applications with boolean
attributes that cannot be regarded as derived or calculated from other
information recorded in the database. E.g. soft cover versus hard
cover, read versus unread, fiction versus non-fiction for a book.

I agree that one typically has the option of using more relvars to
record these attributes instead as the presence/absence of some
tuple. However I think it won't be possible when these boolean
attributes form part of the primary key. I can imagine applications
where we don't have convenient and simple identifiers for entities,
and they end up being uniquely identified by a large number of
attributes, and I would have thought there could easily be cases where
some of the attributes in the key are boolean.

Tony Andrews

unread,

Sep 30, 2010, 6:32:58 AM9/30/10

to

On Sep 30, 7:03 am, David BL <davi...@iinet.net.au> wrote:
> On another note, I can easily imagine applications with boolean
> attributes that cannot be regarded as derived or calculated from other
> information recorded in the database. E.g. soft cover versus hard
> cover, read versus unread, fiction versus non-fiction for a book.

I am with you there, even though it could be argued that you could
have cover_type='soft', status='read', classification='fiction' etc.
instead. A pretty common "real world" example in my experience would
be where we have to design a database to record data gleaned from the
public via a paper or web-based form with lots of "checkbox" questions
- e.g. for a car insurance application:
[ ] Is the car kept in a locked garage overnight?
[ ] Does the car have an immobiliser fitted
[ ] Have you made a claim in the past 3 years
... and so on for dozens of questions to which the business needs to
know just the Boolean answer.

We could again introduce a lot of specific domains like garaged_status
('GARAGED','NOT GARAGED'), but I'm not sure what would be gained by
that. Or we could create a plethora of tables like:
create table applications_with_garages (application_id references
applications primary key);
create table applications_with_immobolisers (application_id
references applications primary key);
... etc.

That may be the right approach in a theoretical true RDBMS, but I'm
pretty sure it would get me sacked as a lunatic in any SQL-based DBMS
team!

Erwin

unread,

Sep 30, 2010, 7:47:04 AM9/30/10

to

On 30 sep, 08:03, David BL <davi...@iinet.net.au> wrote:
>
> What if I want to record a base relation like the following?

Then I'm tempted to suggest that you are resorting to theoretical
artifice for which, while not invalid per se, it will be extremely
hard to find a compelling practical case where such base relations are
needed. (pls note that what triggered the discussion here, was a blog
entry by VT commenting on some other discussion that was very clearly
rooted in 'practical applications'.)

> a b c d
> ----------
> F F F F
> F F T T
> F T F T
> T F F F
> T F T T
>
> These boolean attributes can't be eliminated in the manner you
> suggest.
>
> On another note, I can easily imagine applications with boolean
> attributes that cannot be regarded as derived or calculated from other
> information recorded in the database. E.g. soft cover versus hard
> cover, read versus unread, fiction versus non-fiction for a book.
>
> I agree that one typically has the option of using more relvars to
> record these attributes instead as the presence/absence of some
> tuple. However I think it won't be possible when these boolean
> attributes form part of the primary key.

That is correct, but that is also where my question for a compelling
case comes in.

> I can imagine applications
> where we don't have convenient and simple identifiers for entities,
> and they end up being uniquely identified by a large number of
> attributes, and I would have thought there could easily be cases where
> some of the attributes in the key are boolean.

Well, if that really is "easy" then you can easily provide a
compelling case.

Booleans in relations mean that the external predicate for that
relation must contain constructs such as "it is <boolattr> that ...".
Meaning specifically that there might be tuples that say that "it is
false that ...".

Booleans inside a candidate key means that a natural language
formulation for the identifier represented by that key, is something
like "the class of things for which it is <boolattr> that ...", e.g.,
"the class of things for which it is true that they are male", and
"the class of things for which it is false that they are male".

Does that strike you as a very sensible idea ? I say it smells like
excessive circumlocution, baked into the database itself.

Erwin

unread,

Sep 30, 2010, 8:04:45 AM9/30/10

to

On 30 sep, 12:32, Tony Andrews <tony.andrew...@gmail.com> wrote:
> On Sep 30, 7:03 am, David BL <davi...@iinet.net.au> wrote:
>
> > On another note, I can easily imagine applications with boolean
> > attributes that cannot be regarded as derived or calculated from other
> > information recorded in the database. E.g. soft cover versus hard
> > cover, read versus unread, fiction versus non-fiction for a book.
>
> I am with you there, even though it could be argued that you could
> have cover_type='soft', status='read', classification='fiction' etc.
> instead.

If you have to include person gender in some database design, which
would be your preferred option:
(a) a boolean to say "male Y/N", or
(b) something that more akin to the style "additional type GENDER with
values [that mean] 'male' and 'female' " ?

> Or we could create a plethora of tables like:
> create table applications_with_garages (application_id references
> applications primary key);
> create table applications_with_immobolisers (application_id
> references applications primary key);
> ... etc.
>
> That may be the right approach in a theoretical true RDBMS, but I'm
> pretty sure it would get me sacked as a lunatic in any SQL-based DBMS
> team!

If your SQL-based DBMS had proper physical data independence, then I
am quite convinced your claim would be false.

(Allthough by now, SQL has existed so long that it would take quite
some amount of time before the majority of data management people
using SQL have their totally derailed and confused thinking completely
straightened out ...)

Roy Hann

unread,

Sep 30, 2010, 8:23:41 AM9/30/10

to

Tony Andrews wrote:

> On Sep 30, 7:03 am, David BL <davi...@iinet.net.au> wrote:
>> On another note, I can easily imagine applications with boolean
>> attributes that cannot be regarded as derived or calculated from other
>> information recorded in the database. E.g. soft cover versus hard
>> cover, read versus unread, fiction versus non-fiction for a book.
>
> I am with you there, even though it could be argued that you could
> have cover_type='soft', status='read', classification='fiction' etc.
> instead. A pretty common "real world" example in my experience would
> be where we have to design a database to record data gleaned from the
> public via a paper or web-based form with lots of "checkbox" questions
> - e.g. for a car insurance application:
> [ ] Is the car kept in a locked garage overnight?
> [ ] Does the car have an immobiliser fitted
> [ ] Have you made a claim in the past 3 years
> ... and so on for dozens of questions to which the business needs to
> know just the Boolean answer.

No, the business doesn't want to know true or false.

It wants to know the nature of the insurance application. Some
"business analyst" or other nitwit intervened between the customer and
business to strip the meaningful information from the customers'
response so that it can be interpreted only with reference to some
secret decoder ring. That contributes nothing but cost, burden, mystery
and confusion where there needed to be none of these things.

I have actually screamed out loud at seeing an unintelligible stream of
numbers, codes, and flags where actual values would have been just as
easy and so much better. What is this fetish for encoding and
concealing?

> That may be the right approach in a theoretical true RDBMS, but I'm
> pretty sure it would get me sacked as a lunatic in any SQL-based DBMS
> team!

Now there's a badge of honour I can get behind!

--
Roy

David BL

unread,

Sep 30, 2010, 9:41:29 AM9/30/10

to

If you're saying it is statistically unlikely that you'll find
examples then fair enough, but I thought you were saying something
stronger than that.

> > I can imagine applications
> > where we don't have convenient and simple identifiers for entities,
> > and they end up being uniquely identified by a large number of
> > attributes, and I would have thought there could easily be cases where
> > some of the attributes in the key are boolean.
>
> Well, if that really is "easy" then you can easily provide a
> compelling case.

Compelling... hmmm. Is plausible enough?

Example : I have the means to measure (x,y) in a cave but not z. I
want to record pH measurements taken on every stalagmite or
stalactite. Relation is

pHLevels(x,y, isStalagmite,pH)

isStalagmite is part of the key.

> Booleans in relations mean that the external predicate for that
> relation must contain constructs such as "it is <boolattr> that ...".
> Meaning specifically that there might be tuples that say that "it is
> false that ...".
>
> Booleans inside a candidate key means that a natural language
> formulation for the identifier represented by that key, is something
> like "the class of things for which it is <boolattr> that ...", e.g.,
> "the class of things for which it is true that they are male", and
> "the class of things for which it is false that they are male".
>
> Does that strike you as a very sensible idea ? I say it smells like
> excessive circumlocution, baked into the database itself.

So you're saying two valued enumerated types are ok, but boolean
attributes are not.

I regard this as a matter of personal choice (or you could say style
or syntax or religion). Obviously you could work either way and map
between them easily enough. Since it can be cumbersome to define
enums and think of "neutral" attribute names I'm not going to agree
with you that boolean attributes should be universally outlawed purely
on the basis that enums are more descriptive or map more reasonably to
natural language.

David BL

unread,

Sep 30, 2010, 10:27:08 AM9/30/10

to

On Sep 30, 9:41 pm, David BL <davi...@iinet.net.au> wrote:

> Example : I have the means to measure (x,y) in a cave but not z. I
> want to record pH measurements taken on every stalagmite or
> stalactite. Relation is
>
> pHLevels(x,y, isStalagmite,pH)
>
> isStalagmite is part of the key.

My example is no good. Obviously it is easy to eliminate the 2-valued
attribute in this case by introducing two relations

Stalagmites(x,y,pH)
Stalactites(x,y,pH)

One can always remove a 2-valued attribute by using a pair of relvars
(irrespective of whether it is part of the key). If this is done
repeatedly the number of relvars grows exponentially.

In my example of recording a relation with 4 boolean attributes, one
could in fact introduce 16 relvars that each have an empty heading and
hold DEE or DUM.

So a "compelling example" would need a key involving perhaps 5 or more
2-valued attributes in the key such that splitting into many relvars
is awkward.

Tricky...

Tegiri Nenashi

unread,

Sep 30, 2010, 11:17:44 AM9/30/10

to

On Sep 30, 3:32 am, Tony Andrews <tony.andrew...@gmail.com> wrote:
> create table applications_with_immobolisers (application_id
> references applications primary key);
> ... etc.
>
> That may be the right approach in a theoretical true RDBMS, but I'm
> pretty sure it would get me sacked as a lunatic in any SQL-based DBMS
> team!

Ah, this is why one don't usually find any unary relation in SQL Dbms!
This is very odd from theoretical perspective as one might expect
there are many more unary relations than binary ones, many more binary
than ternary and so on (akin to Zipfian distribution). Well SQL made
creating a table more expensive than adding an column, that is one of
its many implementational sins.

Vadim Tropashko

unread,

Sep 30, 2010, 1:09:16 PM9/30/10

to

Expanded this idea a little:
http://vadimtropashko.wordpress.com/2010/09/30/impedance-mismatch/

Paul Mansour

unread,

Sep 30, 2010, 2:23:38 PM9/30/10

to

Isn't creating a new table conceptually more expensive than adding a
column, regardless of the implementation? Perhaps this is why Date's
well-known supplier and parts database has a S.STATUS column rather
than a STATUS table, and P.COLOR column column rather than P.COLOR
table, and all the attendent foreign keys.

Bob Badour

unread,

Sep 30, 2010, 3:22:37 PM9/30/10

to

Paul Mansour wrote:

Concepts are free.

If one uses 6NF for temporal data, little difference exists between
creating a new table or adding a column.

Brian

unread,

Sep 30, 2010, 8:24:17 PM9/30/10

to

> creating a new table or adding a column.- Hide quoted text -

>
> - Show quoted text -

Ignorance or irresponsibility?

Is Badour exhibiting his complete ignorance to the consequences of
decomposing a 5NF database scheme into one that is in 6NF? Unless the
5NF scheme was already in 6NF, the equivalent 6NF scheme will contain
numerous additional cyclical referential constraints that were not
present in the 5NF scheme. Assuming that the introduction of the
additional attribute doesn't violate 5NF, no additional constraints
are required that aren't implied by the candidate key constraint on
the 5NF relation scheme. Introducing a 6NF relation scheme to the
equivalent 6NF database scheme requires explicit declaration of a
cyclical referential constraint. If it isn't just plain ignorance,
then doesn't such careless disregard of those consequences border on
irresponsibility? People here have the impression that Badour knows
what he's talking about. Deliberately or carelessly misleading those
who look up to them is behaviour usually attributible to politicians
and lawyers, con-men and thieves--predatory personalities.

Erwin

unread,

Oct 1, 2010, 8:18:30 AM10/1/10

to

> and lawyers, con-men and thieves--predatory personalities.- Tekst uit oorspronkelijk bericht niet weergeven -
>
> - Tekst uit oorspronkelijk bericht weergeven -

Explain "cyclical referential constraint".

And explain "additional attribute".

Tony Andrews

unread,

Oct 1, 2010, 8:41:39 AM10/1/10

to

On Sep 30, 1:04 pm, Erwin <e.sm...@myonline.be> wrote:
> On 30 sep, 12:32, Tony Andrews <tony.andrew...@gmail.com> wrote:
> > Or we could create a plethora of tables like:
> > create table applications_with_garages (application_id references
> > applications primary key);
> > create table applications_with_immobolisers (application_id
> > references applications primary key);
> > ... etc.
>
> > That may be the right approach in a theoretical true RDBMS, but I'm
> > pretty sure it would get me sacked as a lunatic in any SQL-based DBMS
> > team!
>
> If your SQL-based DBMS had proper physical data independence, then I
> am quite convinced your claim would be false.

But would this unary table approach to avoid a Boolean attribute
really be considered good practice in a true RDBMS? If I want to know
"does the applicant's car have an immboliser", is the absence of a row
in applications_with_immobilisers sufficient to answer it? I thought
that was a proposed solution to missing information ("it is unknown
whether the applicant's car has an immobiliser"), now it seems to be
acting as available information - i.e. the predicate "applicant 123's
car does not have an immboliser". Seems dubious to "record" a known
fact by, er, not recording it.

Erwin

unread,

Oct 1, 2010, 9:54:26 AM10/1/10

to

Does this make a meaningful difference ?

I mean, you came up with the example of checkboxes on paper. The
checkbox on paper can be "marked" if the corresponding label/property
is "true". If the checkbox on paper is not marked, you interpret that
as "false", no ? You wouldn't consider/interpret this as
"unanswered", or "unknown", or whatever, no ?

If the answer is two-way, then those two ways are isomorphic to "tuple
present"/"tuple absent", no ? And "unknown"/"unanswered" simply
doesn't enter the picture, no ?

If you want to _explicitly_ take the option of "unknown"/"unanswered"
into account, then on the paper version, you would be _forced_ to
provide an _additional_ checkbox saying "I know the answer to the next
question", or "You can consider the next checkbox as having been
answered", no ?

Erwin

unread,

Oct 1, 2010, 9:58:02 AM10/1/10

to

> answered", no ?- Tekst uit oorspronkelijk bericht niet weergeven -

>
> - Tekst uit oorspronkelijk bericht weergeven -

PS

This is the same phenomenon as http forms containing a checkbox.
Checked implies that the posted URL contains a <fieldname>=true
portion, unchecked means that the posted URL does not contain
<fieldname> at all. Checkbox unanswered is, by itself,
irrepresentible in HTML.

Tony Andrews

unread,

Oct 1, 2010, 10:19:46 AM10/1/10

to

You have a strong point there!

Tony Andrews

unread,

Oct 1, 2010, 10:39:04 AM10/1/10

to

However, I can't help feeling it side-steps the real issue. Assuming
we DO want to distinguish between:
- we KNOW the car has an immobiliser
- we KNOW the car does not have an immobiliser
- we DON'T KNOW whether the car has an immobiliser or not

... what is the correct way to model this?

Paul Mansour

unread,

Oct 1, 2010, 10:51:37 AM10/1/10

to

> answered", no ?- Hide quoted text -

>
> - Show quoted text -

Erwin, suppose then that we do want to explicity take "unknown" into
account with two checkboxes as you say, or I think better yet, with
three mutually exclusive radio buttons, to explicity specifiy Yes,
No, and Unknown or Male,Female and Unkown, or what have you. How then
would you model this? Is this a column with three allowable values,
or a column with two values where no corresponding row indicates the
third option? Alternatively, there could be just two radio buttons
indicating, say, Male and Female, and neither one is pre-selected, so
it is clear whether or not the question was answered. Same question,
how is this then modeled? I think Tony Andrews has a good question,
and it's more fundamental than the GUI.

Bob Badour

unread,

Oct 1, 2010, 11:08:05 AM10/1/10

to

>>Ignorance or irresponsibility?
>>
>>Is Badour exhibiting his complete ignorance to the consequences of
>>decomposing a 5NF database scheme into one that is in 6NF? Unless the
>>5NF scheme was already in 6NF, the equivalent 6NF scheme will contain
>>numerous additional cyclical referential constraints that were not
>>present in the 5NF scheme. Assuming that the introduction of the
>>additional attribute doesn't violate 5NF, no additional constraints
>>are required that aren't implied by the candidate key constraint on
>>the 5NF relation scheme. Introducing a 6NF relation scheme to the
>>equivalent 6NF database scheme requires explicit declaration of a
>>cyclical referential constraint. If it isn't just plain ignorance,
>>then doesn't such careless disregard of those consequences border on
>>irresponsibility? People here have the impression that Badour knows
>>what he's talking about. Deliberately or carelessly misleading those
>>who look up to them is behaviour usually attributible to politicians
>>and lawyers, con-men and thieves--predatory personalities.
>

> Explain "cyclical referential constraint".
>
> And explain "additional attribute".

As a general observation, lower normal forms imply dependencies that
"equivalent" higher normal forms do not imply. In that sense, the higher
normal form is not really equivalent to the lower normal form. However,
in general, we choose the higher normal form precisely because the
dependencies implied by the lower normal form are invalid.

BS assumes going from 5NF to 6NF that the additional dependencies
implied by the 5NF design are valid and must be enforced, when the
designer may have chosen 6NF precisely because those dependencies are
invalid.

None of BS's BS does anything to address my observation that proper
support for 6NF demands a product where the addition of a table is no
more costly than the addition of a column.

Tegiri Nenashi

unread,

Oct 1, 2010, 12:37:20 PM10/1/10

to

Two unary relations? CarsWithImmobiliser and CarsWithoutImmobiliser?
It sounds awkward, but think about it would you ever have any trouble
when querying such a system (as compared to 3VL with NULL)?

Tony Andrews

unread,

Oct 1, 2010, 12:48:05 PM10/1/10

to

On Oct 1, 5:37 pm, Tegiri Nenashi <tegirinena...@gmail.com> wrote:
> Two unary relations? CarsWithImmobiliser and CarsWithoutImmobiliser?
> It sounds awkward, but think about it would you ever have any trouble
> when querying such a system (as compared to 3VL with NULL)?

I though that was frowned upon due to POOD?

Brian

unread,

Oct 1, 2010, 1:54:47 PM10/1/10

to

BULLSHIT! If any of the dependencies were invalid, then the schema
would not have been in 5NF.

For example, given a 5NF schema

{A,B,C,D} KEY {A,B}

The notrivial functional dependencies,

AB -> CD, AB -> C, AB -> D

are implied by the key.

In the 6NF schema,

{A,B,C} KEY{A,B}, {A,B,D} KEY {A,B}

The functional dependency AB -> CD is lost. If that dependency were
invalid in the 5NF schema, {A,B,C,D} KEY {A,B}, then it would not be
in 5NF because that dependency is implied by the key, so the correct
5NF schema would be,

{A,B,C} KEY {A,B}, {A,B,D} KEY {A,B},

which just happens to also be in 6NF.

In order for the functional dependency AB -> CD to be preserved, a
cyclical inclusion dependency

{A,B,C}[A,B] = {A,B,D}[A,B]

must be introduced, because only when it is in force does the FD AB ->
CD hold in the view formed by joining {A,B,C} and {A,B,D} for all
{A,B} in {A,B,C} or {A,B,D}.

{A,B,C}[A,B] = {A,B,D}[A,B] is equivalent to the conjunction of the
noncyclical inclusion dependencies,

{A,B,C}[A,B] IN {A,B,D}[A,B] AND {A,B,D}[A,B] IN {A,B,C}[A,B]

>
> None of BS's BS does anything to address my observation that proper
> support for 6NF demands a product where the addition of a table is no

> more costly than the addition of a column.- Hide quoted text -

That wasn't your original observation. It still doesn't change the
fact that you're wrong, however.

Erwin

unread,

Oct 1, 2010, 2:07:25 PM10/1/10

to

The strong form of POOD as I understand it ("No two distinct relvars
can be allowed to have the same heading") is not tenable. For it
implies that there can only exist one single niladic relvar, meaning
that the database can contain at most one niladic predicate, sort of
meaning that "there can only exist at most one truism".

In (appendix E of) the manifesto book, Darwen suggests that Date knows
and understands that and no longer goes by that strong form of POOD.
Darwen also suggests that Date has even abandoned the weaker forms of
POOD (though I cannot precisely explain what that might mean). In
"Database Explorations", Date seems to suggest that the latter is not
really the case (that is, he still does believe there is value in
POOD). The situation is extremely foggy from where I stand, and it is
probably flat out wrong to make any firm statement on the matter.

Erwin

unread,

Oct 1, 2010, 2:21:19 PM10/1/10

to

On 1 okt, 17:08, Bob Badour <bbad...@pei.sympatico.ca> wrote:
> Erwin wrote:
>

> > Explain "cyclical referential constraint".
>
> > And explain "additional attribute".
>
> As a general observation, lower normal forms imply dependencies that
> "equivalent" higher normal forms do not imply. In that sense, the higher
> normal form is not really equivalent to the lower normal form. However,
> in general, we choose the higher normal form precisely because the
> dependencies implied by the lower normal form are invalid.
>
> BS assumes going from 5NF to 6NF that the additional dependencies
> implied by the 5NF design are valid and must be enforced, when the
> designer may have chosen 6NF precisely because those dependencies are
> invalid.
>
> None of BS's BS does anything to address my observation that proper
> support for 6NF demands a product where the addition of a table is no

> more costly than the addition of a column.- Tekst uit oorspronkelijk bericht niet weergeven -

>
> - Tekst uit oorspronkelijk bericht weergeven -

Last things first: of course I agree with your last paragraph. In
fact, I believe that in SIRA_PRISE, addition of a table is _AT MOST AS
COSTLY_ as addition of an attribute. Thus: never more expensive, and
oft times much cheaper.

First things last: I was inviting BS to demonstrate to us where there
are "cyclical referential constraints" and/or "additional attributes"
in the following equivalent designs, which are 5NF and 6NF,
respecitvely:

(a)
VAR 5NF RELATION {K1:... K2:... A1:... A2:...} KEY {K1 K2};

(b)
VAR 6NF1 RELATION {K1:... K2:... A1:...} KEY {K1 K2};
VAR 6NF2 RELATION {K1:... K2:... A2:...} KEY {K1 K2};
CONSTRAINT A1A2ALWAYSPAIRED ISEMPTY(6NF1{K1 K2} XUNION 6NF2{K1 K2});

You said something about "the designer may have _chosen_ 6NF", but of
course, if my constraint A1A2ALWAYSPAIRED simply does not apply, then
there wasn't any "choice" to make to begin with.

Erwin

unread,

Oct 1, 2010, 2:26:27 PM10/1/10

to

On 1 okt, 19:54, Brian <br...@selzer-software.com> wrote:

>
> > Erwin wrote:
>
> > > Explain "cyclical referential constraint".
>
> > > And explain "additional attribute".
>

> The functional dependency AB -> CD is lost. If that dependency were
> invalid in the 5NF schema, {A,B,C,D} KEY {A,B}, then it would not be
> in 5NF because that dependency is implied by the key, so the correct
> 5NF schema would be,
>
> {A,B,C} KEY {A,B}, {A,B,D} KEY {A,B},
>
> which just happens to also be in 6NF.
>
> In order for the functional dependency AB -> CD to be preserved, a
> cyclical inclusion dependency
>
> {A,B,C}[A,B] = {A,B,D}[A,B]

If you care to explain, my question amounted to "why do you qualify
this as 'cyclical' ?".

And you also spoke of "additional attribute". Please point out to me
where this "additional attribute" is in your very own answer ...

Erwin

unread,

Oct 1, 2010, 2:41:25 PM10/1/10

to

> and it's more fundamental than the GUI.- Tekst uit oorspronkelijk bericht niet weergeven -

>
> - Tekst uit oorspronkelijk bericht weergeven -

Well, one column with three allowable values is a possible and by no
means "definitely invalid" way to do it, but at any rate it makes the
issue off-topic, because the original topic was "boolean attributes",
and "three possible values" cannot possibly be represented by a single
boolean. Note that since "question unanswered" is very explicitly
made part of the business domain, it is not invalid in this case to
include in the database explicit assertions of absence.

Three relvars (instead of the theoretical four - because no known
answer can exist if an answer is asserted to not exist) are another
possible and by no means "definitely the only" way to do it. It's the
one I prefer, but I might reconsider this position if I have to apply
this technique for thousands and thousands of questions in a survey
(thousands and thousands of relvar names to remember, you know).

One relvar with two boolean attributes (a nullable boolean in an SQL
table would essentially boil down to the same, if SQL supported it) is
yet another way to do it that is not invalid per se, but it's the
option I would only resort to as a last refuge.

Tegiri Nenashi

unread,

Oct 1, 2010, 3:37:50 PM10/1/10

to

On Oct 1, 11:41 am, Erwin <e.sm...@myonline.be> wrote:
> ... but I might reconsider this position if I have to apply

> this technique for thousands and thousands of questions in a survey
> (thousands and thousands of relvar names to remember, you know).

One thing that makes a user more comfortable with attributes is that
for each relation a set of its attributes is exhibited naturally. It
is certainly more challenging to find a relation of interest, than an
attribute in a relation in poorly designed database, where dictionary
is no help, and one have to guess relation names. Compare it to
properly designed database, where one just looks up all the foreign
keys from master entity table, e.g. Cars <- CarsWithImmobiliser

Erwin

unread,

Oct 1, 2010, 4:16:25 PM10/1/10

to

Yes. That's why I said what you quoted.

Gene Wirchenko

unread,

Oct 1, 2010, 5:29:31 PM10/1/10

to

On Wed, 29 Sep 2010 05:48:49 -0700 (PDT), Erwin <e.s...@myonline.be>
wrote:

[snip]

>It is, imo, _NOT_ fundamental in the context of actual database
>design. In fact, I think that the justifiable cases for including a
>BOOLEAN in an actual database design (not talking of derived relvars
>aka views) are few and far between, if existant at all. I think that
>is precisely what VT was talking of: the RM already has a way for
>representing truth information (as the presence/absence of some tuple
>in some relvar with some particular predicate), and as a consequence
>the type BOOLEAN (_WITHIN DATABASE RELVARS_) must be considered
>redundant and unnecessary.

I do not want the exponential explosion if I have customers with
boolean attributes. In my client billing system, there are two:
whether the client billing is PDF only and whether they pay by credit
card transaction where we process a payment after a delay if we have
not heard otherwise.

Why would I want to split the client table into four tables?

[snip]

Sincerely,

Gene Wirchenko

Erwin

unread,

Oct 1, 2010, 6:24:34 PM10/1/10

to

On 1 okt, 23:29, Gene Wirchenko <ge...@ocis.net> wrote:
>
> I do not want the exponential explosion if I have customers with
> boolean attributes. In my client billing system, there are two:
> whether the client billing is PDF only and whether they pay by credit
> card transaction where we process a payment after a delay if we have
> not heard otherwise.
>
> Why would I want to split the client table into four tables?
>
> [snip]
>
> Sincerely,
>
> Gene Wirchenko

OK. I admit that on the sheer face of it, your argument seems right.

But I propose we get a look _beyond_ the "face of it".

If you speak about "whether client billing is PDF only", you are
already suggesting that "client billing" really isn't just a boolean.
In particular, you are suggesting that there really are other billing
modes than "PDF", and that there may exist clients who want "PDF +
postal mail", that there are other clients who want "postal mail
only", and (for the humorous part of it) "no billing at all".

I propose you consider a design with a relvar that simply records
"client wants PDF billing", another one that records "client wants
postal mail billing", a constraint "client must express at least one
billing mode".

With such a design, if newly arising technology allows for "novel
billing modes", then all you have to do to your database is add
another relvar, plus include that relvar in the union that is part of
the constraint.

Similar observations apply to "customer's preferred mode of payment".

When I was a "young Turk" in IT land, I formulated my own design
principle : "codes are poor, entities are rich". What I meant by that
is that "A boolean is a piece of 'codified' information, and therefore
it doesn't convey much information after all. A separate entity (ER
terminology, relational equivalent being 'a separate relvar' holding
particular tuples) allows for specifying WHY that boolean piece of
information happens to be true.".

IOW: if a boolean is true, then somewhere deeper down inside, _THERE
IS ALWAYS A REASON WHY_, and _KNOWING_ that 'reason why' is _ALWAYS
MORE AND BETTER_ than just 'knowing that it is so'.

Oh, and I am very much aware of the case where legislation does not
allow you to actually be 'aware' of 'the reason why'.

Brian

unread,

Oct 1, 2010, 6:39:31 PM10/1/10

to

> where this "additional attribute" is in your very own answer ...- Hide quoted text -

>
> - Show quoted text -

Suppose you have a 5NF schema R {A,B,C,D,E} KEY {A,B}

The inclusion dependency equivalent 6NF schema,
Rc {A,B,C} KEY {A,B},
Rd {A,B,D} KEY {A,B},
Re {A,B,E} KEY {A,B},
Rc[AB] = Rd[AB] AND Rc[AB] = Re[AB]

can also be written

Rc[AB] IN Rd[AB] AND Rd[AB] IN Re[AB] AND Re[AB] IN Rc[AB]

Notice the cyclical nature of the constraint:

Rc references Rd, Rd references Re and Re references Rc

When there are only two relation schemata, the cyclical nature of the
constraint doesn't stand out like it does when there are three or
more.

In this example, there is an additional attribute in the 5NF relation
schema then in the example I used in my last post. Note that no
additional constraints are needed that aren't implied by the key.
Notice further that the equivalent 6NF schema has not only an
additional relation schema, Re {A,B,E} KEY {A,B}, but also an
additional cyclical constraint, Rc[AB] = Re[AB].

Erwin

unread,

Oct 1, 2010, 7:16:57 PM10/1/10

to

So set equality is, by definition, "cyclic" ?

If A and B are sets, then A=B implies both A in B && B in A.

> When there are only two relation schemata, the cyclical nature of the
> constraint doesn't stand out like it does when there are three or
> more.

As I have shown above, it stands out EXACTLY AS OBVIOUS when there are
less than three.

Oh yes, and of course A in A && A in A is also a "cyclic" construct ?

My God, man, shut the fuck up.

Brian

unread,

Oct 1, 2010, 10:58:26 PM10/1/10

to

Huh? I though we were talking about inclusion dependencies. It is
not necessary that the lists of attributes X and Y in the arbitrary
inclusion dependency,

P[X] IN Q[Y]

be identical. All that is required is that they be the same length.

Since the lists need not be identical, the projections over those
lists need not be identical for the constraint to be satisfied.

The expession

P[X] = Q[Y]

is not an application of the set equality predicate: it is just a
shortcut for the pair of inclusion dependencies,

P[X] IN Q[Y] AND Q[Y] IN P[X].

Hugo Kornelis

unread,

Oct 2, 2010, 5:41:37 AM10/2/10

to

On Fri, 1 Oct 2010 07:51:37 -0700 (PDT), Paul Mansour wrote:

>Erwin, suppose then that we do want to explicity take "unknown" into
>account with two checkboxes as you say, or I think better yet, with
>three mutually exclusive radio buttons, to explicity specifiy Yes,
>No, and Unknown or Male,Female and Unkown, or what have you. How then
>would you model this? Is this a column with three allowable values,
>or a column with two values where no corresponding row indicates the
>third option? Alternatively, there could be just two radio buttons
>indicating, say, Male and Female, and neither one is pre-selected, so
>it is clear whether or not the question was answered. Same question,
>how is this then modeled? I think Tony Andrews has a good question,
>and it's more fundamental than the GUI.

The most logical choices, IMO, are:

1) A nullable column with two possible values. Whether boolean or
CHAR(n) and a CHECK constraint for the legal values is immaterial; any
column with two possible values is basically a boolean in disguise
(except that the DBMS doesn't hand you the algebraic operations on a
platter, you have to explicitly code the ones you need).

2) For the NULL haters, that same column (but now not nullable) in a
seperate table, so you can avoid the NULL by omitting the row there.

(And of course, the NULL haters who also don't like extra tables will
simply add a third value to the CHECK constraint and add the extra logic
in their queries to make that third value act like NULL, so they can
think that they have avoided the issue by simply renaming it).

Best, Hugo

Paul Mansour

unread,

Oct 2, 2010, 7:27:50 AM10/2/10

to

Erwin, thanks for your response to my previous questions. I was
trying to get my head around why if it is a good idea to split a
column with two potential values into two separate relvars, it would
not also be a good idea to put a column with n potential values into n
separate relvars, when, in your response to Gene Wirchenko, you
suggest just such a design, on the grounds that it is then easy to add
new values, by simply adding a new relvar. What does such a design
possibly get you over a single column (perhaps with a lookup table)
with n possible values whether in the main entity table or off in its
own table to avoid nulls? Furthermore it takes a schema change to add
a new category, rather than simply adding a new row to a lookup
table. So now you are suggesting that not only boolean columns are
bad design but even columns with 3 or more unique values. Is there a
number (say 25?), where you just say to heck with it, I'll go to a
single table? I must be missing something.

Paul Mansour

unread,

Oct 2, 2010, 7:52:38 AM10/2/10

to

On Oct 2, 5:41 am, Hugo Kornelis

This is how I thought it would be done. And I wouldn't have thought it
would be controversial.

You write above that “any column with two possible values is basically
a boolean in disguise”. Contrast this a comment by Erwin above.
Erwin writes about Gene’s post that in Genes’s case, his Boolean
“really isn't just a Boolean”.

This highlights something I was thinking about during this discussion.
I wonder is it useful to make the distinction between truly Boolean
and superficially Boolean? Consider what was I think Vadim’s original
examply, IsCreditWorthy – truly a Boolean as it divides the world into
two parts, credit worthy and not credit worthy (and as Vadim notes,
computable, so you don’t need a column for it anyway). You can’t
simply add another category without changing the meaning of the first
two categories. Now consider a column Size with the two allowable
values: “Small” and “Large”. Maybe this is just superficially
Boolean? “Medium” is probably just around the corner, even if it was
not part of the original requirements of the system. We can add it
without changing the name of the column, or changing the meaning of
the two existing values. Perhaps then if a column is superficially
Boolean it should not be treated any differently than a column with n
potential values. Or maybe I’m just talking nonsense.

Erwin

unread,

Oct 2, 2010, 9:18:33 AM10/2/10

to

If you abstract away all the differences, then after that everything
is the same.

Being a two-valued type does not suffice to be a boolean.

Stating things like " 'x implies y' is equivalent to 'not y implies
not x' " is sensible if x and y are truth-valued, but not if x and y
are, say, gender-valued.

CAN one model gender as a boolean ? Of course one can. Is it also a
good idea ? I don't think so. It's circumlocution to say "it is
false that z is a man".

CAN one include booleans in base relvars ? Of course one can. Is it
also a good idea ? I don't think so. It's circumlocution to say,
within a proposition represented by the tuple in which the boolean
appears, "and it is the case/not the case that ...".

I think that if such circumlocution is built into the database
structure, then the expressions we need to write to manipulate the
data will expose such circumlocution too.

I don't think I said something else anywhere in this thread.

Bob Badour

unread,

Oct 2, 2010, 9:42:50 AM10/2/10

to

Hugo Kornelis wrote:

Hi Hugo,

What makes either of those choices logical?

Gene Wirchenko

unread,

Oct 4, 2010, 5:49:44 PM10/4/10

to

On Fri, 1 Oct 2010 15:24:34 -0700 (PDT), Erwin <e.s...@myonline.be>
wrote:

>On 1 okt, 23:29, Gene Wirchenko <ge...@ocis.net> wrote:

>>
>> I do not want the exponential explosion if I have customers with
>> boolean attributes. In my client billing system, there are two:
>> whether the client billing is PDF only and whether they pay by credit
>> card transaction where we process a payment after a delay if we have
>> not heard otherwise.
>>
>> Why would I want to split the client table into four tables?
>>
>> [snip]

>OK. I admit that on the sheer face of it, your argument seems right.

>
>But I propose we get a look _beyond_ the "face of it".

In your proposal below, you have neglected to tell me why I
should split the client table into four tables.

If I have to split the client table, I have to rewrite a lot of
other programs. If I split again, I get to do it again. Is there
some benefit to this? I do not see it.

>If you speak about "whether client billing is PDF only", you are
>already suggesting that "client billing" really isn't just a boolean.
>In particular, you are suggesting that there really are other billing
>modes than "PDF", and that there may exist clients who want "PDF +
>postal mail", that there are other clients who want "postal mail
>only", and (for the humorous part of it) "no billing at all".

Ah, no. There are semantics that you do not realise.

All clients billing generates PDFs. We keep an archive of them
for internal use. If a client wants to be billed only by PDF, then we
do not print anything on paper. If he does not, then we do. There
are only two choices.

>I propose you consider a design with a relvar that simply records
>"client wants PDF billing", another one that records "client wants
>postal mail billing", a constraint "client must express at least one
>billing mode".

It does not happen. If it did, I would make a change. Actually,
since this is a client billing system, we would probably bill for
that.

>With such a design, if newly arising technology allows for "novel
>billing modes", then all you have to do to your database is add
>another relvar, plus include that relvar in the union that is part of
>the constraint.

So what?

I will also have to implement the use of the "newly arising
technology". That will probably be far more complex than a simple
table structure change.

>Similar observations apply to "customer's preferred mode of payment".

Ditto with me.

>When I was a "young Turk" in IT land, I formulated my own design
>principle : "codes are poor, entities are rich". What I meant by that
>is that "A boolean is a piece of 'codified' information, and therefore
>it doesn't convey much information after all. A separate entity (ER
>terminology, relational equivalent being 'a separate relvar' holding
>particular tuples) allows for specifying WHY that boolean piece of
>information happens to be true.".

In these cases, call it whim. The system need not concern itself
with why.

>IOW: if a boolean is true, then somewhere deeper down inside, _THERE
>IS ALWAYS A REASON WHY_, and _KNOWING_ that 'reason why' is _ALWAYS
>MORE AND BETTER_ than just 'knowing that it is so'.

Not in this case. The system can not derive the value itself.

>Oh, and I am very much aware of the case where legislation does not
>allow you to actually be 'aware' of 'the reason why'.

Not applicable here.

This is a mature application. Great changes in the way that
things are done are rather unlikely. If my boss says that the frog
can only jump one of two ways, I will use a boolean. If the frog gets
more dextrous in the future, I will deal with it then. Our frogs must
be slackers, because I do not have to do this very often. My
congratulations on the success of your frog-breeding program.

Sincerely,

Gene Wirchenko

Hugo Kornelis

unread,

Oct 4, 2010, 6:19:46 PM10/4/10

to

On Sat, 2 Oct 2010 04:52:38 -0700 (PDT), Paul Mansour wrote:

(snip)

>You write above that “any column with two possible values is basically
>a boolean in disguise”. Contrast this a comment by Erwin above.
>Erwin writes about Gene’s post that in Genes’s case, his Boolean
>“really isn't just a Boolean”.

Hi Paul,

Upon rereading, I must admit that my statement was too bold. Not any
two-valued domain is boolean. Whe I wrote this, I was thinking of
examples I have come across where a two-values domain was indeed a
boolean in disguise, but I forgot some other cases, such as gender.

The domain gender has two possible values, male and female. But in order
to qualify as "boolean in disguise", it also has to support boolean
operators. I can define NOT (NOT male = female, and vice versa), but I
am stuck at AND and OR. What should the result of "male AND female" be?
I can think of a lot of possible answers, but none of them falls in the
domain {male, female}.

(Interesting - Erwin mentions the same gender domain in his reply, and a
different but related test: he usees the implies operator rather than
the and and or operators.)

Of course, one could model gender as an "is_female" attributem, which
would be boolean. And when this attribute is implemented as a character
column with its domain limited to 'yes' and 'no', it would be a boolean
is disguise.

>This highlights something I was thinking about during this discussion.
>I wonder is it useful to make the distinction between truly Boolean
>and superficially Boolean? Consider what was I think Vadim’s original
>examply, IsCreditWorthy – truly a Boolean as it divides the world into
>two parts, credit worthy and not credit worthy (and as Vadim notes,
>computable, so you don’t need a column for it anyway). You can’t
>simply add another category without changing the meaning of the first
>two categories. Now consider a column Size with the two allowable
>values: “Small” and “Large”. Maybe this is just superficially
>Boolean? “Medium” is probably just around the corner, even if it was
>not part of the original requirements of the system. We can add it
>without changing the name of the column, or changing the meaning of
>the two existing values. Perhaps then if a column is superficially
>Boolean it should not be treated any differently than a column with n
>potential values. Or maybe I’m just talking nonsense.

Hmmmm, no. I don't think this is nonsense. I think this is yet another
way to distinguish domains that just happen to have two values from
domain that of course have two values because they're booleans in
disguise.
If, as a result of some freaky genetic coincidence, children with a
third possible gender start being born tomorrow, the "gender" column
needs to have its domain expanded to a third value. But the "is_female"
column can remain unchanged - a person is still either a female, or not
(though in the latter case, one could now no longer automatically assume
that the not female person is a male). The pseudo-boolean still has two
values even after such a drastic change; the non-boolean does get a
third possible value.

Best, Hugo

Hugo Kornelis

unread,

Oct 4, 2010, 6:36:14 PM10/4/10

to

On Sat, 2 Oct 2010 06:18:33 -0700 (PDT), Erwin wrote:

>If you abstract away all the differences, then after that everything
>is the same.
>
>Being a two-valued type does not suffice to be a boolean.
>
>Stating things like " 'x implies y' is equivalent to 'not y implies
>not x' " is sensible if x and y are truth-valued, but not if x and y
>are, say, gender-valued.

Hi Erwin,

True. I have just acknowledged this in my reply to Paul.

>CAN one model gender as a boolean ? Of course one can. Is it also a
>good idea ? I don't think so. It's circumlocution to say "it is
>false that z is a man".
>
>CAN one include booleans in base relvars ? Of course one can. Is it
>also a good idea ? I don't think so. It's circumlocution to say,
>within a proposition represented by the tuple in which the boolean
>appears, "and it is the case/not the case that ...".

I disagree. It is my conviction that a database should be modeled to
store the information that a business needs - not less (of course), but
also definitely not more. Designers in general and database designers in
specific too often try to cram more in the database than the users
really need and want.

Consider a company that sells to both natural persons and businesses. A
lot of commercial mailings go to only the natural persons, so they need
an attribute to store whether a customer is a natural person. I would
choose a boolean attribute "is_natural_person". You would call that
circumlocution and choose an attribute "person_type", with "natural
person" or "business" as its domain.

But now the company finds a new market. Two more types of customers
start buying - government and charity organisations. Mailings are still
for natural persons only.
Your model changes: the domain for the "person_type" attribute gets two
extra values. My model remains unchanged, as a customer still can only
be or not be a natural person.
This domain change is of course only a minor change. The major change
falls on the users' shoulders. Now, for all the customers that will
never get commercial mailings, some user will have to find out if they
are business, government or charity - in order to populate a column that
was added for the sole purpose of these commercial mailings.

If all the users are interested in is whether it is or isn't true that a
customer is a private person, then it is in no way circumlocution to
represent this in the database with a boolean that represents just that
and nothing else.

Best, Hugo

Hugo Kornelis

unread,

Oct 4, 2010, 6:53:28 PM10/4/10

to

Hi Bob,

Sometimes, the simplest looking questions work out to be the hardest to
answer. It all made so much sense to me when I wrote this (and still
does), that I find it hard to explain why this appears logical to me.

Pauls question was about how to model a situation where it is required
to represent three possibilities: male, female, or gender unknown.

My first observation is that these three possibilities do not come from
a single domain. Male and female are values from the domain of genders;
unknown is not. So it would be wrong to define a domain with three
values to represent the three possibilities.

But if unknown is not in the same domain as male and female, then what
is it? Assuming that Pauls hypothetical example is about gender data, it
is not a data value, but a condition where no data value is present. The
absence of any data value. Which is almost a verbatim quote from the
definition of "the null value" in the ANSI standard for SQL.

Now, NULL is not the same as unknown. NULL is the absence of any data
value. Unknown is the absence of any data value for a specific reason.
Unknown data should in an SQL database always be represented as NULL,
but a NULL does not always represents unknown data. However, in the case
proposed by Paul, I know that there are only three possible states:
male, female, or gender unknown. So in this specific case, a NULL does
indeed represent gender unknown, since there are no other scenarios that
allow the data to be missing.

With these observations, the rest seems obvious to me. Unknown can not
be represented in the same domain as male and female, but NULL can
(since NULL is part of any domain in an SQL database). The domain now
consists of male, female and NULL (representing the situation where the
gender is unknown). This domain quickly maps to a data type and a CHECK
constraint to allow only two values, and no NOT NULL constraint.

That's the logic of the first choice. But I know that this group has a
lot of visitors who feel that NULL should not be in a relational
database. Their response when challenged on how to represent missing
data is always to break optional attributes into seperate tables, in a
1:(0-1) relationship with the original table, where the attribute now is
mandatory. That simple step to avoid NULLs in tables is what brought me
to the second choice.

Best, Hugo

Paul Mansour

unread,

Oct 4, 2010, 8:57:07 PM10/4/10

to

> single table? I must be missing something.- Hide quoted text -

>
> - Show quoted text -

Erwin, I wrote above "I must be missing something" and I certainly
was! It wasn't until reading Gene's response that I realized you were
not specifiying unique values, but allowing for one or more values. I
did a careless reading of your original post. Sorry about that.

Brian

unread,

Oct 5, 2010, 5:06:21 PM10/5/10

to

On Oct 4, 6:36 pm, Hugo Kornelis

Here's another example for the null-haters. Suppose that P{A,B,C} is
a table with key {A,B} that permits C to be null. To eliminate the
null, it must first be determined whether there should be a value for
C or not--in other words, is each NULL in column C an applicable NULL,
or an inapplicable NULL. So now you have P1{A,B,D} where D is a
boolean indicating whether there should be a value for C, and
P2{A,B,C} where C cannot be NULL, and P2[AB] IN P1[AB]. If you also
need to eliminate D, then you need P1{A,B}, P2{A,B} and P3{A,B,C},
with P2[AB] IN P1[AB] and P3[AB] IN P2[AB]. So if you disallow
boolean types too, you would need three tables and two interrelational
constraints to eliminate nulls from just one column! Moreover, the
resulting schema may be in violation of POOD, since P1 and P2 have
exactly the same heading!

Brian

Keith H Duggar

unread,

Oct 5, 2010, 6:33:46 PM10/5/10

to

On Oct 4, 6:36 pm, Hugo Kornelis
<h...@perFact.REMOVETHIS.info.INVALID> wrote:

So let's see. In your database the user's code changes from

where R.is_natural_person = true
to
where R.is_natural_person = true

ie no change. And in the "other database it changes from

where R.person_type = natural
to
where R.person_type = natural

wow, yeah, I see what you mean. That is a huge difference! Not.
For real man, enumerated types are a tried and true model that
people find makes code /better/ not worse than only bools.

By the way, the evaluation of whether a commercial mailing should
be sent should be encapsulated into a predicate function. So in a
/good/ design a client would write neither of the above. Instead
they would write

where sendCommericialMailing(R)

and sendCommericialMailing is simple to implement in both models.

> If all the users are interested in is whether it is or isn't true that a
> customer is a private person, then it is in no way circumlocution to
> represent this in the database with a boolean that represents just that
> and nothing else.

They are not interested in whether they are "private person" (why
are you changing terms?). They are interested in whether they need
to send a commercial mailing. Furthermore, when their world expands
they will be interested in meeting new requirements. That is going
to be easier with the enumerated type.

KHD

Keith H Duggar

unread,

Oct 5, 2010, 6:43:32 PM10/5/10

to

On Oct 5, 5:06 pm, Brian <br...@selzer-software.com> wrote:
> Here's another example for the null-haters. Suppose that P{A,B,C} is
> a table with key {A,B} that permits C to be null. To eliminate the
> null, it must first be determined whether there should be a value for
> C or not--in other words, is each NULL in column C an applicable NULL,
> or an inapplicable NULL. So now you have P1{A,B,D} where D is a
> boolean indicating whether there should be a value for C, and
> P2{A,B,C} where C cannot be NULL, and P2[AB] IN P1[AB]. If you also
> need to eliminate D, then you need P1{A,B}, P2{A,B} and P3{A,B,C},
> with P2[AB] IN P1[AB] and P3[AB] IN P2[AB]. So if you disallow
> boolean types too, you would need three tables and two interrelational
> constraints to eliminate nulls from just one column! Moreover, the
> resulting schema may be in violation of POOD, since P1 and P2 have
> exactly the same heading!

Let's rephrase the above in plain language:

Suppose we throw trash all around the inside of our house. In order
to eliminate the trash we'll need to determine whether each item in
the house is trash, non-trash, or smelly-trash. In other words, we
are going to attach a note to each item. But these notes are just
more trash! If you disallow notes then we will need a second house
(or at least storage facility) into which we can move either the
trash or the non-trash. But what about the smelly-trash?! It has the
same look as normal trash!

In other words, your post was truly stupid.

KHD

Brian

unread,

Oct 5, 2010, 7:12:12 PM10/5/10

to

I never said anything about trash. NULLs are not trash, they're
indicators: no value here. Eliminating each requires not only
splitting off the nullable column into a separate relation but also
introducing a boolean indication to distinguish between inapplicable
nulls and applicable nulls. The bottom line: if you're a null hater
and a boolean hater, then you need two additional tables to eliminate
each nullable column, along with two additional interrelational
contraints. Are you denying that?

Hugo Kornelis

unread,

Oct 6, 2010, 2:41:45 AM10/6/10

to

On Tue, 5 Oct 2010 16:12:12 -0700 (PDT), Brian wrote:

(snip)

>I never said anything about trash. NULLs are not trash, they're
>indicators: no value here. Eliminating each requires not only
>splitting off the nullable column into a separate relation but also
>introducing a boolean indication to distinguish between inapplicable
>nulls and applicable nulls. The bottom line: if you're a null hater
>and a boolean hater, then you need two additional tables to eliminate
>each nullable column, along with two additional interrelational
>contraints. Are you denying that?

Hi Brian,

I have no idea where this extra relation comes from.

Yes, it is true that, in general, a NULL value can have various reasons
(and that goes beyond "applicabe/inmapplicable", so a simple boolean
would not do). In practice, a specific column can often only be NULL
for one of those reasons, so that it is not required to store the
reason; the reason is metadata that should be stored in the data
dictionary, not in the data.

There are also cases where more than one reasons may apply to the NULLs
in a single column. In those cases, there are two possibilities. Either
the reason why the value is missing is important to the business, or it
is not. If it is not, then we only need the nullable column in the
design with NULLs and nothing else - and when transforming this to the
NULL-less design, there is no need at all to suddenly introduce an extra
indication to store information the business is not interested in.

And if the distinction between the various reasons for missing
information IS important to the business, then we would have had to add
another column in the design with NULLs. After all, "birthdate" and
"reason birthdate is not recorded" are two completely different
propositions, and any attempt to combine them in a single column would
be a serious violation of 1NF. (This, by the way, is the fallacy in
Date's arguuments in his "Much ado about Nothing" debate with Codd
(http://www.dbdebunk.com/page/page/1706814.htm), which Codd
unfortunately did not see through). So in the case where the business
does want to record the reason for a missing attribute value, the only
correct way to model this is to include an extra column - one that is
also nullable, and that is constrained to be null when the attribute
column is not null, and to be not null when the attribute column is
null. Converting THIS schema to a NULL-less design will of course
require the addition of two extra tables - but since they originate in
two nullable columns, the ratio of (at most) one additional table for
each nullable column still applies.

So yes: I am denying that more than one table is ever required to
eliminate a nullable column - with or without the boolean data type.

Best, Hugo

Hugo Kornelis

unread,

Oct 6, 2010, 3:02:34 AM10/6/10

to

On Tue, 5 Oct 2010 15:33:46 -0700 (PDT), Keith H Duggar wrote:

>> This domain change is of course only a minor change. The major change
>> falls on the users' shoulders. Now, for all the customers that will
>> never get commercial mailings, some user will have to find out if they
>> are business, government or charity - in order to populate a column that
>> was added for the sole purpose of these commercial mailings.
>
>So let's see. In your database the user's code changes from
>
> where R.is_natural_person = true
>to
> where R.is_natural_person = true
>
>ie no change. And in the "other database it changes from
>
> where R.person_type = natural
>to
> where R.person_type = natural
>
>wow, yeah, I see what you mean. That is a huge difference! Not.

Hi Keith,

Interesting world you're living in, where users get to change the
application code. Do you then let your database developers research
whether each customer who is not a natural person is a company, charity
or government and enter that information in the database?

>For real man, enumerated types are a tried and true model that
>people find makes code /better/ not worse than only bools.

No disagreement here - in cases where several types are relevant to the
business, so that an enumeration makes sense.

>By the way, the evaluation of whether a commercial mailing should
>be sent should be encapsulated into a predicate function. So in a
>/good/ design a client would write neither of the above. Instead
>they would write
>
> where sendCommericialMailing(R)
>
>and sendCommericialMailing is simple to implement in both models.

Not really. The code above would work in a database that supports the
boolean data type; other databases would throw a syntax error; you'd
have to code around the limitation imposed by the missing data type by
having the function return (e.g.) a CHAR(1) value, and then change the
predicate to

WHERE sendCommercialMailing(R) = 'T'

So thanks for making my point - you have given me yet another good
reason to endorse the inclusion of a boolean data type in databases.

>> If all the users are interested in is whether it is or isn't true that a
>> customer is a private person, then it is in no way circumlocution to
>> represent this in the database with a boolean that represents just that
>> and nothing else.
>
>They are not interested in whether they are "private person" (why
>are you changing terms?).

Sorry for the shift in terms from "natural person" to "private person";
English is not my first language. I had to google some terms to write
that post, and while writing I accidentally switched terms.

> They are interested in whether they need
>to send a commercial mailing. Furthermore, when their world expands
>they will be interested in meeting new requirements. That is going
>to be easier with the enumerated type.

Yes, you are right. Requirements may change. Who knows, maybe next year
management will decide that income, shoe size, or eye color will be
relevant for deciding who to send a commercial mailing. Meeting those
new requirements will be much easier when those attributes are already
in the database. Sure, it's a bit more work for the tens or hundreds of
employees who actually USE the database to enter data of new customers,
and it'll be some more maintenance of the application code, but that is
only a small price to pay for the savings when management does decide to
add shoe size as a deciding factor for mailings.

Sounds silly to you? I sure hope so!
But what exactly is the difference between the above scenario, and havng
end users find out and enter "company / charity / government" if the
business is only interested in whether a customer is a natural person or
not?

Best, Hugo

Erwin

unread,

Oct 6, 2010, 5:13:56 AM10/6/10

to

On 6 okt, 09:02, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

> Hi Keith,
>
> Interesting world you're living in, where users get to change the
> application code.

It must be understood that the "user" is the _DBMS_ user.

That is, the person that you presumably call the "end user" ONLY in
the case where that end user has direct interaction with the DBMS,
through a query console or so.

In the other cases, the DBMS user is the application developer writing
the code of the programs that will be accessing the database.

This notion is very clearly explained in the introductory chapters of
"Introduction to Database Systems". 8th edition, but I suspect this
is one of those pieces of text in the book that have been present ever
since edition 1.

> So thanks for making my point - you have given me yet another good
> reason to endorse the inclusion of a boolean data type in databases.

You are conflating "inclusion of booleans in databases (/database
designs)" with "inclusion of booleans in data manipulation languages
that access/interact with databases".

I thought the initial subject was the former. You are debating the
latter.

Erwin

unread,

Oct 6, 2010, 5:44:08 AM10/6/10

to

On 5 okt, 00:53, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

> On Sat, 02 Oct 2010 10:42:50 -0300, Bob Badour wrote:
>
> >What makes either of those choices logical?
>
> Hi Bob,
>

> My first observation is that these three possibilities do not come from
> a single domain. Male and female are values from the domain of genders;
> unknown is not. So it would be wrong to define a domain with three
> values to represent the three possibilities.
>
> But if unknown is not in the same domain as male and female, then what
> is it?

From domain boolean ?

Even Brian acknowledged that NULLs are (what he called) "indicators",
or iow, boolean flags, or yet iow, 2VL truth values. They say whether
data is really present yes or no. That sounds very boolean to me.

Now guess what. Would the fact that I think it is better to eliminate
booleans from a logical database design by vertical decomposition,
have anything to do with the fact that I also think it is better to
eliminate nulls from a logical database design by vertical
decomposition ?

> That's the logic of the first choice.

I think you missed the point about 'logic'.

Hugo Kornelis

unread,

Oct 6, 2010, 7:43:43 AM10/6/10

to

On Wed, 6 Oct 2010 02:13:56 -0700 (PDT), Erwin wrote:

>On 6 okt, 09:02, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>wrote:
>> Hi Keith,
>>
>> Interesting world you're living in, where users get to change the
>> application code.
>
>It must be understood that the "user" is the _DBMS_ user.
>
>That is, the person that you presumably call the "end user" ONLY in
>the case where that end user has direct interaction with the DBMS,
>through a query console or so.
>
>In the other cases, the DBMS user is the application developer writing
>the code of the programs that will be accessing the database.
>
>This notion is very clearly explained in the introductory chapters of
>"Introduction to Database Systems". 8th edition, but I suspect this
>is one of those pieces of text in the book that have been present ever
>since edition 1.

Apologies for the terminology mismaatch, then. In my world, the "user"
is the one who uses an application. One who develops an application, I
call (who boring) a "developer". This terminology mismatch supports my
idea that many database developers tend to forget who they develop for.
(And who pays their salary).
Too many ideas vented by database developers (and database development
academics) focus on what's good for the database developer rather than
what's good for the peron I call the "end user" (you forgot to tell me
how you call him or her). But let's not drift too far off-topic.

>> So thanks for making my point - you have given me yet another good
>> reason to endorse the inclusion of a boolean data type in databases.
>
>You are conflating "inclusion of booleans in databases (/database
>designs)" with "inclusion of booleans in data manipulation languages
>that access/interact with databases".
>
>I thought the initial subject was the former. You are debating the
>latter.

The initial subject was the former, and I think I presented lots of
arguments for including boolean as a data type in databsae designs.

It is also my experience that most database implementations either allow
the boolean datatype in DDL *and* allow boolean-valued expressions as
predicates in queries, or disallow the boolean datatype in DDL *and*
disallow boolean-valued expressions as predicates in queries.

But you are right, that connection is not necessary. It would be
possible for an implementation to allow a user-defined function to
return a boolean data type, yet not offer the option to use that same
datatype in column declarations. (Though I would not like such an
exception-ridden implementation - but again, let's not stray off-topic).

Best, Hugo

Hugo Kornelis

unread,

Oct 6, 2010, 7:50:23 AM10/6/10

to

On Wed, 6 Oct 2010 02:44:08 -0700 (PDT), Erwin wrote:

>On 5 okt, 00:53, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>wrote:
>> On Sat, 02 Oct 2010 10:42:50 -0300, Bob Badour wrote:
>>
>> >What makes either of those choices logical?
>>
>> Hi Bob,
>>
>> My first observation is that these three possibilities do not come from
>> a single domain. Male and female are values from the domain of genders;
>> unknown is not. So it would be wrong to define a domain with three
>> values to represent the three possibilities.
>>
>> But if unknown is not in the same domain as male and female, then what
>> is it?
>
>From domain boolean ?

Eh? I guess I don't understand what you mean. Why bring up the domain
boolean at this point in my reasoning? Either I misunderstand what you
try to say, or you misunderstood what I tried to say.

>Even Brian acknowledged that NULLs are (what he called) "indicators",
>or iow, boolean flags, or yet iow, 2VL truth values. They say whether
>data is really present yes or no. That sounds very boolean to me.

Probably. But I guess Brian kind of lost me when he wrote that a single
nullable column has to be replaced by not one but two seperate tables in
a null-less design.

>Now guess what. Would the fact that I think it is better to eliminate
>booleans from a logical database design by vertical decomposition,
>have anything to do with the fact that I also think it is better to
>eliminate nulls from a logical database design by vertical
>decomposition ?

I think I accomodated for the "NULL haters" by also including the
alternative design with an extra table and no nullable columns.

>> That's the logic of the first choice.
>
>I think you missed the point about 'logic'.

Well, then I'm sure you can explain exactly when and where I missed the
point about logic, how you think I went wrong and what I should have
written.
For if you make only this remark without further arguments, it is just
an unfounded ad-hom attack that adds nothing to the discussion.

Best, Hugo

Erwin

unread,

Oct 6, 2010, 9:19:08 AM10/6/10

to

On 6 okt, 13:50, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

I'm afraid I can't, but I can always try.

Your use of the word 'logic' seemed to me to appeal to the notion of
"common sense". Especially if your use of the word derives from the
Dutch "het is logisch dat ...". Bob's use of the word 'logic'
presumably appealed to the notion of "formal logic", i.e. the
scientific discipline in mathematics. In formal logic, the
proposition "All pink elephants are blue" can be true (if there are no
pink elephants). In common sense, the same proposition is usually
considered nonsense. (I add this example merely to demonstrate that
there is indeed a difference between the two.)

As another example, in your reply you talked of domains. However,
formal logic doesn't necessarily have such a thing as "domains". Only
sorted logic does. And even then, sorted logic does not prescribe
which domains can exist and which can't. Even sorted logic does not
proscribe against a domain such as {black chair, black night, black
woman}. Your reply gave me the impression that you were building on
the assumptions that all considered domains must "make sense" in some
sense.

> For if you make only this remark without further arguments, it is just
> an unfounded ad-hom attack that adds nothing to the discussion.

Huh ? Believe me, ad-hom attacks coming from me are worded quite
differently.

Erwin

unread,

Oct 6, 2010, 9:29:45 AM10/6/10

to

On 6 okt, 13:50, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

> Probably. But I guess Brian kind of lost me when he wrote that a single

> nullable column has to be replaced by not one but two seperate tables in
> a null-less design.

In lots of situations where nulls show up in a design, the reality is
that "missing" information may be "genuinely unknown" (e.g. natural
person is known to have a professional occupation, but which
profession that is, happens to be unknown), or it may be
"inapplicable" (e.g. natural person is known to be unemployed, i.e. is
known to not have any professional occupation at all). Or take
religious belief: "known to have one, but unknown which one it is",
versus "known to not have one".

It's explored in more depth in "How to handle missing information
without using nulls" on www.thethirdmanifesto.com, and in "Database
Explorations", chpt 23. If you have only a single separate relvar for
the "null" case, AND it is the case that "inapplicable" applies, then
the predicate of that separate table must be a disjunction, such that
in the end, if a tuple appears in that table, you still don't _really_
know what that tuple means.

Bob Badour

unread,

Oct 6, 2010, 9:51:18 AM10/6/10

to

Hugo Kornelis wrote:

> On Wed, 6 Oct 2010 02:44:08 -0700 (PDT), Erwin wrote:
>
>>On 5 okt, 00:53, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>>wrote:
>>
>>>On Sat, 02 Oct 2010 10:42:50 -0300, Bob Badour wrote:
>>>
>>>>What makes either of those choices logical?
>>>
>>>Hi Bob,
>>>
>>>My first observation is that these three possibilities do not come from
>>>a single domain. Male and female are values from the domain of genders;
>>>unknown is not. So it would be wrong to define a domain with three
>>>values to represent the three possibilities.
>>>
>>>But if unknown is not in the same domain as male and female, then what
>>>is it?
>>
>>From domain boolean ?
>
> Eh? I guess I don't understand what you mean. Why bring up the domain
> boolean at this point in my reasoning? Either I misunderstand what you
> try to say, or you misunderstood what I tried to say.

Hugo, you misunderstand the simplest of things. I added you to my killfile.

Null is an indicator that evaluates to NULL or NOT NULL. ie. it is a
boolean.

>>Even Brian acknowledged that NULLs are (what he called) "indicators",
>>or iow, boolean flags, or yet iow, 2VL truth values. They say whether
>>data is really present yes or no. That sounds very boolean to me.
>
> Probably. But I guess Brian kind of lost me when he wrote that a single
> nullable column has to be replaced by not one but two seperate tables in
> a null-less design.

Brian makes shit up. It's easier to just filter him and be done with it
citing Date's _Principle of Incoherence_

>>Now guess what. Would the fact that I think it is better to eliminate
>>booleans from a logical database design by vertical decomposition,
>>have anything to do with the fact that I also think it is better to
>>eliminate nulls from a logical database design by vertical
>>decomposition ?
>
> I think I accomodated for the "NULL haters" by also including the
> alternative design with an extra table and no nullable columns.

By "NULL hater", do you perhaps mean "intelligent and informed data
manager" ? After all, the intelligent and informed try to avoid NULLs
like the plague.

>>>That's the logic of the first choice.
>>
>>I think you missed the point about 'logic'.
>
> Well, then I'm sure you can explain exactly when and where I missed the
> point about logic, how you think I went wrong and what I should have
> written.

When you wrote "logical" you meant something else where logic had no
bearing whatsoever. I am not sure whether the options you gave were the
first 2 things that popped into your head or whether they best matched
your limited preconceptions or whether you simply found them
aesthetically pleasing; however, logic had nothing to do with your
decision process.

> For if you make only this remark without further arguments, it is just
> an unfounded ad-hom attack that adds nothing to the discussion.

I asked my question in the hope you would learn to distinguish logic,
the foundation of data management, from internal bias or intuition. I
agree with Erwin that you completely missed the point.

paul c

unread,

Oct 6, 2010, 10:18:34 AM10/6/10

to

On 06/10/2010 2:44 AM, Erwin wrote:
> Even Brian acknowledged that NULLs are (what he called) "indicators",
> or iow, boolean flags, or yet iow, 2VL truth values. They say whether
> data is really present yes or no. That sounds very boolean to me.
>
> Now guess what. Would the fact that I think it is better to eliminate
> booleans from a logical database design by vertical decomposition,
> have anything to do with the fact that I also think it is better to
> eliminate nulls from a logical database design by vertical
> decomposition ?
>

Reminds me of Codd who frequently mentioned 'marks' of various kinds.
Not to put words in his mouth, but that often gave me the feeling he was
straying into the hardware realm, as if they were a hardware signal to
the logical system to 'switch gears'. Many of the machines Codd used in
the 1950's and 1960's had 'variable word' sizes, with reserved byte
configurations that would signal physical exceptions. Some of these
were called 'field marks' and 'word marks' and their use even persisted
into the 1990's on certain IBM terminals. When hardware designers in
their wisdom moved away from such 'modal' features the notion survived,
eg. I think it was Kernighan and Ritchie who put the string delimiter in
the C language.

It was some years after Date's first books before he started to treat
projection precisely. That was when I began to wonder how a projection
operator could possibly deal with a table containing nulls unless it
specified two sets of attributes, both the ones to result and the ones
to be projected away. Either that or (I thought) some exception ought
to be raised, not that I'm in favour of lots of exceptions.

paul c

unread,

Oct 6, 2010, 11:00:16 AM10/6/10

to

On 06/10/2010 6:51 AM, Bob Badour wrote:
> Null is an indicator that evaluates to NULL or NOT NULL. ie. it is a
> boolean.

This makes me want to take another stab at an old point. I think it is
a boolean and how it's constructed within a relational db seems crucial
which I'd say it can't be in any logical way. Maybe this is part of
what what Erwin S means too.

It's reminiscent of how Table_Dee has a maximum cardinality of one,
whereas a made-up domain such as {'Yes', 'No'} is something else when it
is specified by an attribute. To construct NOT Table_Dee, we use a
relation that has no tuples, such as Table_Dum, but that's not the same
thing as imagining that we can record all the values of some boolean
domain in a single relation. I'd say it needs to be more widely
understood that it is the empty relation which means 'NOT' where a
relational db is concerned.

paul c

unread,

Oct 6, 2010, 11:31:57 AM10/6/10

to

On 06/10/2010 6:51 AM, Bob Badour wrote:

> Hugo Kornelis wrote:
>
>> On Wed, 6 Oct 2010 02:44:08 -0700 (PDT), Erwin wrote:
>>
>>> On 5 okt, 00:53, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>>> wrote:
>>>
>>>> On Sat, 02 Oct 2010 10:42:50 -0300, Bob Badour wrote:
>>>>
>>>>> What makes either of those choices logical?
>>>>
>>>> Hi Bob,
>>>>
>>>> My first observation is that these three possibilities do not come from
>>>> a single domain. Male and female are values from the domain of genders;
>>>> unknown is not. So it would be wrong to define a domain with three
>>>> values to represent the three possibilities.
>>>>
>>>> But if unknown is not in the same domain as male and female, then what
>>>> is it?
>>>
>>> From domain boolean ?
>>
>> Eh? I guess I don't understand what you mean. Why bring up the domain
>> boolean at this point in my reasoning? Either I misunderstand what you
>> try to say, or you misunderstood what I tried to say.
>
> Hugo, you misunderstand the simplest of things.

I don't think it's so simple. I know people who think there is a third
sex, maybe even a fourth, and other people who don't. They would all
come up with different domains if they designed a db. Personally, in my
own db, I'd just use {male, notmale}, but I wouldn't call those two
values Boolean. Some people I know would be angry with me for choosing
that domain but it wouldn't bother me, just as it wouldn't bother me if
they chose a domain values of {female, notfemale}. If I assigned sex to
a Boolean 'domain', my db could record only one of the sexes.

Bob Badour

unread,

Oct 6, 2010, 2:03:03 PM10/6/10

to

paul c wrote:

> On 06/10/2010 6:51 AM, Bob Badour wrote:
>
>> Null is an indicator that evaluates to NULL or NOT NULL. ie. it is a
>> boolean.
>
> This makes me want to take another stab at an old point. I think it is
> a boolean and how it's constructed within a relational db seems crucial
> which I'd say it can't be in any logical way. Maybe this is part of
> what what Erwin S means too.
>
> It's reminiscent of how Table_Dee has a maximum cardinality of one,

You misspoke. Table_Dee has a cardinality of exactly one. Table_Dum has
a cardinality of exactly zero. Nilary relations have a maximum
cardinality of one and a minimum cardinality of zero.

Bob Badour

unread,

Oct 6, 2010, 2:05:24 PM10/6/10

to

paul c wrote:

> On 06/10/2010 6:51 AM, Bob Badour wrote:
>
>> Hugo Kornelis wrote:
>>
>>> On Wed, 6 Oct 2010 02:44:08 -0700 (PDT), Erwin wrote:
>>>
>>>> On 5 okt, 00:53, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>>>> wrote:
>>>>
>>>>> On Sat, 02 Oct 2010 10:42:50 -0300, Bob Badour wrote:
>>>>>
>>>>>> What makes either of those choices logical?
>>>>>
>>>>> Hi Bob,
>>>>>
>>>>> My first observation is that these three possibilities do not come
>>>>> from
>>>>> a single domain. Male and female are values from the domain of
>>>>> genders;
>>>>> unknown is not. So it would be wrong to define a domain with three
>>>>> values to represent the three possibilities.
>>>>>
>>>>> But if unknown is not in the same domain as male and female, then what
>>>>> is it?
>>>>
>>>> From domain boolean ?
>>>
>>> Eh? I guess I don't understand what you mean. Why bring up the domain
>>> boolean at this point in my reasoning? Either I misunderstand what you
>>> try to say, or you misunderstood what I tried to say.
>>
>> Hugo, you misunderstand the simplest of things.
>
> I don't think it's so simple. I know people who think there is a third
> sex, maybe even a fourth, and other people who don't.

I don't see what bearing that has on whether the null marker is a
boolean flag.

Brian

unread,

Oct 6, 2010, 2:14:17 PM10/6/10

to

On Oct 6, 2:41 am, Hugo Kornelis

I think you misunderstand what I'm saying: whenever there is no value,
either there isn't supposed to be a value, or there is supposed to be
a value but it hasn't been supplied. In other words, either the
predicate of the atomic formula that expresses the assertion does not
have a place for an individual term that corresponds to the nullable
column name, or there is a place but an individual variable instead of
an individual constant occupies that place. Why there is a variable
instead of a constant in that place doesn't change the fact that the
predicate still has a place for that term. So whether there is
supposed to be a value determines the atom's predicate. It follows
that since it can't be determined just from the absence of a value
whether there is supposed to be a value, it must be explicitly stated
that there is or is not supposed to be a value. If you're only a null
hater, then only two relvars are needed: one with an additional
boolean attribute that specifies for each tuple whether there should
be a value or not, and another relvar for the values that have been
supplied. If you're also a boolean hater, then the relvar with the
boolean attribute would have to be split as well.

Erwin

unread,

Oct 6, 2010, 2:18:27 PM10/6/10

to

On 6 okt, 13:43, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

>
> This terminology mismatch supports my
> idea that many database developers tend to forget who they develop for.
> (And who pays their salary).
> Too many ideas vented by database developers (and database development
> academics) focus on what's good for the database developer rather than
> what's good for the peron I call the "end user" (you forgot to tell me
> how you call him or her). But let's not drift too far off-topic.

When I read this, I couldn't help but be mused at the thought that my
ideas are being objected to on the grounds that it leads to "an
explosion of relvar names" (that the database developer in question is
not willing to have to remember). Whose comfort are those opponents
thinking of ?

(And I find "end user" a good term to denote the end user.)

paul c

unread,

Oct 6, 2010, 3:38:48 PM10/6/10

to

yes, the word 'maximum' here was pointless, albeit not untrue. What is
a 'nilary' relation, don't remember that term, is it one with no attributes?

Bob Badour

unread,

Oct 6, 2010, 4:25:37 PM10/6/10

to

paul c wrote:

Yes. Nilary (perhaps more common as nullary?) is to unary as unary is to
binary and as binary is to ternary.

http://en.wikipedia.org/wiki/Arity

Hugo Kornelis

unread,

Oct 6, 2010, 5:47:56 PM10/6/10

to

On Wed, 6 Oct 2010 11:14:17 -0700 (PDT), Brian wrote:

>I think you misunderstand what I'm saying: whenever there is no value,
>either there isn't supposed to be a value, or there is supposed to be
>a value but it hasn't been supplied.

No I did not misunderstand. Both "there is not supposed to be a value
here" and "there should be a value here but it has not been supplied"
are fine examples of reasons why a value is missing in the database
population. Note that with "missing", I do not mean that there should be
a value but isn't, but just there is no value. (Not being a native
English speaker, I don't know which association is prevalent when
English readers read "missing").

My point is that this distinction is not always relevant. I'll use an
example to explain. Let's say that the Customers table has a nullable
column "Gender", and a CHECK constraint that allows only the values Male
and Female. A NULL in this column can have two causes - either the
customer is not a natural person but a legal person (there is not
supposed to be a value for its gender), or the customer is a natural
person but never specified his/her gender (there is supposed to be a
value but it hasn't been supplied).

Many documents I read on or linked from sites such as dbdebunk.com or
thethirdmanifesto.com make this observation and then assume that,
because there are multiple situations that lead to a missing, one HAS to
somehow distinguish between those situations. But that need not be
necessary; that depends on the business. Only the domain expert can tell
us if, for a specific application, this distinction is important.

If it is, then this has to be modeled as an extra predicate. (Okay, in
this specific example, the extra predicate may not be required, as it
can be derived from the predicate that distinquishes natural persons
from legal persons - if such a predicate even exists in the model. But
in other situations, the extra predicate would be required.) That extra
predicate also has to allow for missing values (as there is no value
applicable for customers where the gender is recorded).

> In other words, either the
>predicate of the atomic formula that expresses the assertion does not
>have a place for an individual term that corresponds to the nullable
>column name, or there is a place but an individual variable instead of
>an individual constant occupies that place.

My background is NIAM - the Dutch counterpart of ORM (Object Role
Modeling). NIAM centers around Fact Types and their "readings". These
concepts are very similar to predicats, but there might be subtle
differences I am unaware of.
In NIAM, a missing fact can not be read. Regardless of why the fact is
missing. For the Fact Type gender, the reading form would be Customer
"The customer identified by customernumber <CustomerNumber> has the
<Gender> gender". For reading an individual fact, you would replace
<CustomerNumber> with the identifier of a customer, and <Gender> with
his or her gender, to get readings such as "The customer identified by
customernumber 125 has the male gender", or "The customer identified by
customernumber 8623 has the female gender". For customer 17 (a legal
person), this fact can not be read, as there is no value in the domain
of <Gender> that forms the reading "The customer identified by
customernumber 17 has no gender". But for customer 229 (a natural person
whose gender is not recorded), we can't read this fact either, as there
is also no value in the domain of <Gender> that forms the reading "The
customer identified by customernumber 222 has a gender but we don't know
which".

Your text above suggests (to me) that predicates in the relational model
accept variables as well as values, so that a predicate that is similar
to the reading form I used above could be instantiated to read "The
customer identified by customernumber 222 has a *variable* gender". If
that is a correct interpretation, it sheds a new light on many things I
have read. Please confirm if I am not misinterpretating you.

> Why there is a variable
>instead of a constant in that place doesn't change the fact that the
>predicate still has a place for that term. So whether there is
>supposed to be a value determines the atom's predicate.

Yes, I get that. This is a problem you always get when you combine
several atomic fact types (falling back to the NIAM terminology) in a
single table. NIAM and ORM avoid this problem by working with elementary
fact types. If there is no information about the gender of a customer,
there is no fact for that customer in the gender fact type.

The FTD notation that prof Nijssen (one of the founders of NIAM)
developed later removes the requirement to work on elementary fact
types, and then solves the predicate problem by supplying multiple
predicates for each FTD (which maps to a table in the relational model).
So an FTD with one optional role (column) would have two predicates, one
that includes this role and one that does not.

And in relational database, the answer is to create seperate tables for
optional attributes, or even to limit tables to at most one non-key
column (which maps right back to the elementary fact types used in ORM
and NIAM).

> It follows
>that since it can't be determined just from the absence of a value
>whether there is supposed to be a value, it must be explicitly stated
>that there is or is not supposed to be a value.

If, and only if, that distinction is relevant to the business. Otherwise
you simply don't record the value and you're done.

> If you're only a null
>hater, then only two relvars are needed: one with an additional
>boolean attribute that specifies for each tuple whether there should
>be a value or not, and another relvar for the values that have been
>supplied. If you're also a boolean hater, then the relvar with the
>boolean attribute would have to be split as well.

I still don't see the leap here, nor the connection with hating or not
hating booleans.
If the business is not interested in recording is a value is absent
because there should not be a value or because it just happens to not be
there, then you need either one nullable column, or one extra relvar,
with no booleans either way.
And if the business does require you to record this distinction than you
either need two nullable columns, or two extra relvars. Again, with no
booleans in either situation.

You imply that it's possible to track both values and (for missing
values) whether a missing value is applicable or not, in just two
relvars with no nulls, but with booleans. I don't see how those relvars
would be designed.

Best, Hugo

Hugo Kornelis

unread,

Oct 6, 2010, 6:20:05 PM10/6/10

to

On Wed, 6 Oct 2010 06:19:08 -0700 (PDT), Erwin wrote:

>On 6 okt, 13:50, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
>wrote:
>> On Wed, 6 Oct 2010 02:44:08 -0700 (PDT), Erwin wrote:
>> >I think you missed the point about 'logic'.
>>
>> Well, then I'm sure you can explain exactly when and where I missed the
>> point about logic, how you think I went wrong and what I should have
>> written.
>
>I'm afraid I can't, but I can always try.
>
>Your use of the word 'logic' seemed to me to appeal to the notion of
>"common sense". Especially if your use of the word derives from the
>Dutch "het is logisch dat ...".

This use of the word is also endorsed by most English dictionaries. For
instance, http://www.merriam-webster.com/dictionary/logic describes
formal logic as the first meaning, but lists "a particular mode of
reasoning viewed as valid or faulty" as the second meaning.

And within the context of the discussion, this was indeed the meaning.
If you read back, you'll see that I started my reply to a question
posted by Paul Mansour with:

>>>>>> The most logical choices, IMO, are:

In this sentence, the word "logical" obviously did not refer to formal
logic, as that would not make any sense in that context.
Bob Badour then challenged me:

>>>>>What makes either of those choices logical?

I tried to explain, and concluded my explanation with:

>>>> That's the logic of the first choice.

And at that point, you claimed that I missed the point about 'logic'.

> Bob's use of the word 'logic'
>presumably appealed to the notion of "formal logic", i.e. the
>scientific discipline in mathematics.

If that is the case, then Bob must have completely misunderstood what I
meant when I wrote "The most logical choices, IMO, are". Frankly, I fail
to see anything in that message that could possibly lead to such an
interpretation. My command of the English language must be far below the
level I always presumed it to be.

(snipped explanation of formal logic - I am familiar with that concept)

>As another example, in your reply you talked of domains. However,
>formal logic doesn't necessarily have such a thing as "domains". Only
>sorted logic does. And even then, sorted logic does not prescribe
>which domains can exist and which can't. Even sorted logic does not
>proscribe against a domain such as {black chair, black night, black
>woman}. Your reply gave me the impression that you were building on
>the assumptions that all considered domains must "make sense" in some
>sense.

I was refering to domains in a data model. Pauls question was about how
to model a specific situation where the user interface allowed exactly
one of the three options Male, Female, and Unknown. I replied how I
would model this.

Best, Hugo

Hugo Kornelis

unread,

Oct 6, 2010, 6:51:01 PM10/6/10

to

Thanks for the link. Interesting reading. I was aware of the vertical
decomposition, but I had not seen the "very loosely speaking" horizontal
decomposition before. Since the concept of horizontal decomposition
builds on the assumption that the distinction between "unknown" and "not
applicable" HAS to be made, I don't think this approachh should always
be used.

However, my understanding of vertical decomposition may always have been
incorrect. I have always thought that the vertical decomposition of
PERSON:
ID NAME JOB SALARY
1234 Anne Lawyer 100,000
1235 Boris Banker ?
1236 Cindy ? 70,000
1237 Davinder ? ?
leads to these three relvars:
CALLED DOES_JOB EARNS
ID NAME ID JOB ID SALARY
1234 Anne 1234 Lawyer 1234 100,000
1235 Boris 1235 Banker 1236 70,000
1236 Cindy
1237 Davinder
which requires outer joins to reconstruct the original. But the
presentation you point me to includes all persons in all three relvars,
so that the original can be reconstructed with an innder join, but no
nulls are eliminated at all, and the DOES_JOB and EARNS relvars still
have the awkward predicate problem.

In the three relvars I sketch above:
* No NULL values are required
* It's not possible to distinguish between "unsalaried" and "salary
unknown", so **IF** that distinction is required, extra relvars are
required.
* Predicates for each table are simple - they are given in the
presentation, so no need to repeat them here, but there are no tuples
with question marks to blur their meaning.
* So for each tuple that appears in a table, we know _exactly_ what it
means.

I was surprised to see the rows with the question marks in the
presentation. Removing them is not exactly rocket science, so there must
obviously be some other problem that prohibits this apparently simple
solution. But I fail to see what that problem might be.

Best, Hugo

Hugo Kornelis

unread,

Oct 6, 2010, 7:16:00 PM10/6/10

to

On Wed, 06 Oct 2010 10:51:18 -0300, Bob Badour wrote:

>Hugo, you misunderstand the simplest of things. I added you to my killfile.

Either you debate with me (present your arguments AND read mine), or you
don't. But killfiling me (i.e. refusing to hear my arguments) and then
continuing to present your arguments is very similar to what my children
did when they were six - yell their opinion, then stick fingers in their
ears so that they can't hear any counterarguments.

>Null is an indicator that evaluates to NULL or NOT NULL. ie. it is a
>boolean.

How can Null ever evaluate to NOT NULL?
The predicates "SomeColumn IS NULL" and "SomeColumn IS NOT NULL" however
do evaluate to a boolean. As do the predicates "SomeColumn = 40" or
"SomeColumn < 5" - so this is not really a feature unique to nulls.

(snip)

>By "NULL hater", do you perhaps mean "intelligent and informed data
>manager" ? After all, the intelligent and informed try to avoid NULLs
>like the plague.

Call it however you like. Fact is that there are people who are
convinced that NULLs should never be used in a database, and there are
those who believe that NULLs are an adequate tool to handle the problem
of missing information in a relational database. Rather than sticking my
head in that wasps nest, I tried to stay out of it by giving both a
design with a nullable column and an alternative without one.

When I used the term "NULL hater", I intended it as a short and jocular
reference to the people who believe that NULLs should be avoided; I did
not intend it to be derogatory. If you, or anyone else, felt offended by
the term, please accept my apologies. I have avoided using this term in
the replies I wrote today.

>>>>That's the logic of the first choice.
>>>
>>>I think you missed the point about 'logic'.
>>
>> Well, then I'm sure you can explain exactly when and where I missed the
>> point about logic, how you think I went wrong and what I should have
>> written.
>
>When you wrote "logical" you meant something else where logic had no
>bearing whatsoever. I am not sure whether the options you gave were the
>first 2 things that popped into your head or whether they best matched
>your limited preconceptions or whether you simply found them
>aesthetically pleasing; however, logic had nothing to do with your
>decision process.

Well, thanks for this fascinating insight in my mental processes. But I
am sorry to have to disappoint you - your mindreading skills are even
less than your social interaction skills, as you are completely wrong on
how my decision process took place.

But I have never been too modest to admit that I am fallable, and I am
always willing to learn. So on the question originally raised by Paul
Mansour, how would you answer? And why is that answer better than mine?

>> For if you make only this remark without further arguments, it is just
>> an unfounded ad-hom attack that adds nothing to the discussion.
>
>I asked my question in the hope you would learn to distinguish logic,
>the foundation of data management, from internal bias or intuition. I
>agree with Erwin that you completely missed the point.

Obivously. I hope your answer to Pauls question will help me understand
the points I am still missing. For so far, you are wasting lots of words
to tell me how miserably I fail, but you don't spend half as much effort
in explaining the better ways.

Best, Hugo

David BL

unread,

Oct 7, 2010, 9:57:37 PM10/7/10

to

On Oct 6, 9:51 pm, Bob Badour <bbad...@pei.sympatico.ca> wrote:

> Null is an indicator that evaluates to NULL or NOT NULL. ie. it is a
> boolean.

That is nonsensical.

"evaluate" means literally to "ascertain the value of", but NULL is
not a value because it cannot be an element of some set.

This is evident from the fact that in formal set theory it is required
that all elements of sets be comparable with '=' and this must be an
equivalence relation. However no appropriate equivalence relation is
defined on any set that is purported to contain an element with the
usual characteristics of NULL.

Brian

unread,

Oct 9, 2010, 2:12:48 AM10/9/10

to

On Oct 6, 5:47 pm, Hugo Kornelis

Let me clarify.

The extension of a predicate is the logical disjunction of all and
only ground atoms that satisfy the predicate. Under the closed world
assumption, the tuples in a relation map one-to-one onto the set
composed of exactly those ground atoms. A ground atom is an atomic
formula that does not contain individual variables as terms. A
relation, therefore does not permit information to be missing. A
table that allows nulls, however, is not a relation. Codd
differentiated between applicable and inapplicable null markers in his
1990 book, citing applicable and inapplicable as the main reasons for
missing information. An inapplicable null indicates that there is no
value because there shouldn't be a value, for whatever reason. An
applicable null represents a particular but unspecified value. In
first-order logic, an unbound individual variable is a term that
represents a particular but unspecified individual. An atomic formula
consists of an n-ary predicate symbol along with an n-tuple of terms.
0-ary predicate symbols are propositions. Terms can be individual
variables or individual constants. In first-order languages that
allow function symbols, terms can be individual variables or m-ary
function symbols along with an m-tuple of terms. 0-ary function
symbols are individual constants. In either case an atomic formula
can contain individual variables as terms. The atoms in a table that
has two non-nullable columns and one nullable column can have one of
two distinct predicates, one that accepts tuples with three terms, and
one that accepts tuples with two. Where there is supposed to be a
value but it hasn't been supplied, an arbitrary but distinct
individual variable is placed in a tuple containing three terms; where
there is not supposed to be a value, there is no place for it because
the tuple contains only two terms.

Did that clarify?

I would argue that the distinction is always relevant because it
determines which predicate applies and therefore which fact is
represented.

> > If you're only a null
> >hater, then only two relvars are needed: one with an additional
> >boolean attribute that specifies for each tuple whether there should
> >be a value or not, and another relvar for the values that have been
> >supplied. If you're also a boolean hater, then the relvar with the
> >boolean attribute would have to be split as well.
>
> I still don't see the leap here, nor the connection with hating or not
> hating booleans.
> If the business is not interested in recording is a value is absent
> because there should not be a value or because it just happens to not be
> there, then you need either one nullable column, or one extra relvar,
> with no booleans either way.
> And if the business does require you to record this distinction than you
> either need two nullable columns, or two extra relvars. Again, with no
> booleans in either situation.
>
> You imply that it's possible to track both values and (for missing
> values) whether a missing value is applicable or not, in just two
> relvars with no nulls, but with booleans. I don't see how those relvars
> would be designed.

Table: P ( A INT NOT NULL, B INT NOT NULL, C INT NULL, D INT NULL ),
PRIMARY KEY ( A, B )

Relations:
VAR Q { A INT, B INT, I BOOL, J BOOL } // I,J indicate C,D apply,
respectively
KEY { A, B };

VAR R { A INT, B INT, C INT } KEY { A, B }; // for supplied C values
CONSTRAINT COUNT( R ) = COUNT( R JOIN ( Q WHERE I ) ); // but only
where applicable

VAR S { A INT, B INT, D INT } KEY {A, B }; // for supplied D values
CONSTRAINT COUNT( S ) = COUNT( S JOIN ( Q WHERE J ) ); // but only
where applicable

Now imagine the relvars split due to the elimination of the introduced
boolean attributes I and J.

VAR Q { A INT, B INT }
KEY { A, B };

VAR QI { A INT, B INT } a tuple here indicates that C is applicable
KEY { A, B };
CONSTRAINT COUNT( QI ) = COUNT( QI JOIN Q );

VAR R { A INT, B INT, C INT } KEY { A, B }; // for supplied C values
CONSTRAINT COUNT( R ) = COUNT( R JOIN QI WHERE ); // but only where
applicable

VAR QJ { A INT, B INT } a tuple here indicates that D is applicable
KEY { A, B };
CONSTRAINT COUNT( QJ ) = COUNT( QJ JOIN Q );

VAR S { A INT, B INT, D INT } KEY {A, B }; // for supplied D values
CONSTRAINT COUNT( S ) = COUNT( S JOIN QJ ); // but only where
applicable

>
> Best, Hugo

Erwin

unread,

Oct 9, 2010, 8:59:58 AM10/9/10

to

On 6 okt, 23:47, Hugo Kornelis <h...@perFact.REMOVETHIS.info.INVALID>
wrote:

> Yes, I get that. This is a problem you always get when you combine
> several atomic fact types (falling back to the NIAM terminology) in a
> single table. NIAM and ORM avoid this problem by working with elementary
> fact types. If there is no information about the gender of a customer,
> there is no fact for that customer in the gender fact type.
>
> The FTD notation that prof Nijssen (one of the founders of NIAM)
> developed later removes the requirement to work on elementary fact
> types, and then solves the predicate problem by supplying multiple
> predicates for each FTD (which maps to a table in the relational model).
> So an FTD with one optional role (column) would have two predicates, one
> that includes this role and one that does not.
>
> And in relational database, the answer is to create seperate tables for
> optional attributes, or even to limit tables to at most one non-key
> column (which maps right back to the elementary fact types used in ORM
> and NIAM).
>

Relating to this, Darwen wrote a paper/draft investigating what he
calls "multirelations" (essentially: a "supra-relation" encompassing
multiple relations, where the heading of one of those
relationsencompassed can be a proper subset of the declared heading of
the multirelation - iow: a construct that allows a tuple with heading
{A:type} to appear in a multirelation that has heading {A:type
B:...}).

The draft may still be present on www.thethirdmanifesto.com, the most
recent version of the text appears as chpt 24 in "Database
Explorations".

You might want to read that too.

paul c

unread,

Oct 14, 2010, 2:23:36 PM10/14/10

to

Thanks. It doesn't seem to mention 'nilary' but does mention
'nullary' and 'niladic' which I take to be the same.

We're all prisoners of the lingo we know. I never learned much SQL
but when consumer cpu's started to proliferate I started to use 'C'
because I didn't want to learn a bunch of different assembly
languages. That's the first time I ran into 'null' and for me the
meaning was extremely narrow, just referring to a pointer that didn't
point to anything useful. Old assembler people would have equated
that with a 'flag', which is why your mention elsewhere of it being a
'designator' didn't confuse me. They were all used to core/storage
dumps where no matter what the cause the first step was always to take
a quick look at the register contents. Often there would be register
that had a value which was obviously the hex value of some four-
character ebcdic or ascii sequence and one could a lot of time by not
reading the rest of the dump. I still haven't got used to the idea
that 'null' can be a value in the same context that the number three
might be a value. With any luck, I never will.

(Sending this msg is painful. Local internet supplier (BC Telus) has
dropped nntp. This google groups I/F sucks (a word I hardly ever use
but it seem to suit here). For me the most obvious feature is missing
- the ability at first glance to see the latest individual post
subjects, author and date/time and number of lines. Can't figure out
how to have this google web interface do the same, seems I have to
start at the 'beginning' and keep clicking 'newer' until I get to the
latest post. Just another example of the internet going backwards,
not unlike other worldly phenomena but kind of ironic for a fairly
modern innovation to do that.)

Clifford Heath

unread,

Oct 15, 2010, 8:42:13 AM10/15/10

to

paul c wrote:
> (Sending this msg is painful. Local internet supplier (BC Telus) has
> dropped nntp.

There are free NNTP services. I like gmane.org, but Googling reveals more.
Teranews has a service that is free after they charge $3.95 setup fee on
your credit card - presumably to verify you're an adult because it gets
you 50MB/day across all the binary groups, not just the kosher ones.

Clifford Heath.

paul c

unread,

Oct 15, 2010, 12:42:04 PM10/15/10

to

On 15/10/2010 5:42 AM, Clifford Heath wrote:
> There are free NNTP services. I like gmane.org, but Googling reveals more.
> Teranews has a service that is free after they charge $3.95 setup fee on
> your credit card - presumably to verify you're an adult because it gets
> you 50MB/day across all the binary groups, not just the kosher ones.
>
> Clifford Heath.

Thanks, Clifford.