Aggregates on empty result sets.

Neil Zanella

unread,

Jul 27, 2004, 3:00:07 PM7/27/04

to

Hello,

I wonder whether anyone knows what the SQL standard has to say about what
aggregate functions such as MAX() and SUM() should return when the result
set is empty. I also wonder whether poular database management systems
such as Oracle, PostgreSQL, MySQL, and SQLite all conform to the
standard with regards to this particular issue. I am particularly
interested in what the result should be when the underlying data
type for the fields of interest is INTEGER.

Thanks,

Neil

Kristian Damm Jensen

unread,

Jul 27, 2004, 4:25:28 PM7/27/04

to

Neil Zanella wrote:
> Hello,
>
> I wonder whether anyone knows what the SQL standard has to say about
what
> aggregate functions such as MAX() and SUM() should return when the
result
> set is empty.

count(): 0
sum(): NULL
avg(): NULL
min(): NULL
max(): NULL

Except for sum() which could be 0, this should not come as a big
surprise.

> I also wonder whether poular database management systems
> such as Oracle, PostgreSQL, MySQL, and SQLite all conform to the
> standard with regards to this particular issue.

I have frankly no idea, but I would be surprised id they didn't.
Regards,
Kristian

Mikito Harakiri

unread,

Jul 27, 2004, 6:36:45 PM7/27/04

to

nzan...@cs.mun.ca (Neil Zanella) wrote in message news:<b68d2f19.04072...@posting.google.com>...

SUM should return 0
MIN should return the maximum element in the domain
MAX should return the minimum element in the domain

This is what makes sence mathematically. ANSII/ISO perspective is different.

Vadim Tropashko

unread,

Jul 27, 2004, 9:02:37 PM7/27/04

to

I just doublechecked that CAS -- Mapple, for example, -- indeed knows
these trivialities

> sum('a[k]','k'=1..0);

0

> min();

infinity

Avg should be undefined of course. And the least ad-hock way to report
this fact to user is raising exception

> sum('a[k]','k'=1..0)/sum('1','k'=1..0);
Error, numeric exception: division by zero

(as opposed to returning lame NULL).
1

Christopher Browne

unread,

Jul 27, 2004, 9:52:49 PM7/27/04

to

Clinging to sanity, mikha...@yahoo.com (Mikito Harakiri) mumbled into her beard:

Mind you, if NULL is considered to be a "contagion," then the answer
to all of the above should be NULL if the set is nonempty, but any
element happens to be NULL.

PostgreSQL returns NULL for all of the above. I could go along with
expecting SUM to be 0, but NULL seems the best answer for both min and
max.
--
select 'cbbrowne' || '@' || 'acm.org';
http://cbbrowne.com/info/spiritual.html
Multics Emacs: a lifetime of convenience, a moment of regret.

Kaydell

unread,

Jul 27, 2004, 11:43:15 PM7/27/04

to

nzan...@cs.mun.ca (Neil Zanella) wrote in message news:<b68d2f19.04072...@posting.google.com>...

I would guess that NULL would be returned. Keep in mind that there
is a difference between "NULL" in SQL which means "no such anser" like
in math. What number is less than negative one and greater than one.
Well, there is no such number. In SQL, when a query has no answer,
the SQL NULL value is returned. This is different than an empty text
field. This has an answer. A string that is zero bytes long. It is
also different than the NULL pointer in "C".

Kaydell

Kristian Damm Jensen

unread,

Jul 28, 2004, 1:25:48 AM7/28/04

to

But we are not talking "should" in the sense "I would like it to be" or
"it makes sense". We are talking "what is does the standard define".

And the SQL-standard say

sum() is NULL
min() is NULL
max() is NULL

Regards,
Kristian

P.S. Mathematically you are correct of course, and while
"maximum/minimum element in the domain" could be a little hard to
implement (infinity not being a value), I find it mysterious that the
standard does not define the sum over an empty set to be 0.

Tony

unread,

Jul 28, 2004, 10:30:41 AM7/28/04

to

nzan...@cs.mun.ca (Neil Zanella) wrote in message news:<b68d2f19.04072...@posting.google.com>...

Here is what Oracle 8.1.7 does:

SQL> create table t (col integer);

Table created.

SQL> select sum(col),min(col),max(col),avg(col),count(col)
2 from t;

SUM(COL) MIN(COL) MAX(COL) AVG(COL) COUNT(COL)
---------- ---------- ---------- ---------- ----------
0

I believe that conforms to the ANSI standard.

Neil Zanella

unread,

Jul 29, 2004, 9:23:03 PM7/29/04

to

"Kristian Damm Jensen" <da...@ofir.dk> wrote in message

> But we are not talking "should" in the sense "I would like it to be" or

> "it makes sense". We are talking "what does the standard define".

Thanks. Everyone knows what the answer should be mathematically,
but we all know that standards don't always conform to mathematics,
and often for practical reasons...

> And the SQL-standard sais:

>
> sum() is NULL
> min() is NULL
> max() is NULL
>
> Regards,
> Kristian

Kristian, I am assuming that your answer applies to both the situations
where the result set is empty as well as to where one of the fields in
the nonempty result set carries the SQL NULL value.

Anyways, when are databases going to start supporting domains??? And if
databases support domains, then is it possible to define SUM(), MIN(),
and MAX() for such domains??? Imagine defining a complex data type in
SQL. SQL would have to allow the user to return 0 in the case where
the result set was empty. I am guessing that there is no such
mechanism in SQL, so probably the designers just decided to
return NULL for SUM() as well for consistency. Just
speculating...

Regards,

Neil

--CELKO--

unread,

Jul 30, 2004, 10:13:03 AM7/30/04

to

>> I wonder whether anyone knows what the SQL standard has to say
about what
aggregate functions such as MAX() and SUM() should return when the
result set is empty. <<

I think you meant when the set is empty, not the result set. Standard
SQL follows the expected rule "ab nulo, ex nulo" (from nothing, comes
nothing) because it is a set-oriented language. The aggregate
functions work with the **elements** of the set, not the set qua set.
So if I have an empty set, show me the minimum (or max) value in it --
ain't no such animal! Likewise, the SUM() and therefore the AVG() have
nothing to work with and COUNT(<exp>) has nothing with which to build
the <exp>. Again, "ab nulo, ex nulo".

The COUNT(*) is very different; it is the **cardinality** operation on
the set as a whole, so it returns zero on an empty set. Maybe we
ought to have a different syntax, something like CARD(SELECT * FROM ..
WHERE), but its too late to change now.

Empty sets are converted into NULLs when they are used as a scalar.
Thus,

SELECT MIN(x), COUNT(*)
FROM Empty;

will return (NULL, 0) because we need a row to show the zero.
However, without that zero value, you can see that

SELECT MIN(x)
FROM Empty;

will return an empty result.

I have never figured out why people think that the sum of an empty set
is an actual number, usually zero. My guess is that there is a
convention in the summation of sequences in mathematics that an empty
element in the sequence is converted from "undefined" to zero. This
convention lets you write easier indexing expressions on the
summation, so the computations (i.e. procedural coding and model) are
also easier.

But we are in a set-oriented model and not a procedural one.

Alan

unread,

Jul 30, 2004, 11:04:27 AM7/30/04

to

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:18c7b3c2.04073...@posting.google.com...

People are number-oriented, not set-oriented, and that's why they expect a
numerical result. If a room is empty, and you ask a person, "How many people
are in the room?" they will likely answer "None", (meaning zero) but not
"null", or not even "the room is empty so I can't answer how many people are
in there." BTW, "none" does not mean "null".

Kristian Damm Jensen

unread,

Jul 30, 2004, 2:51:39 PM7/30/04

to

--CELKO-- wrote:

<snip>

> I have never figured out why people think that the sum of an empty
set
> is an actual number, usually zero. My guess is that there is a
> convention in the summation of sequences in mathematics that an empty
> element in the sequence is converted from "undefined" to zero.

There is indeed a convention. I am not certain whether it originates
with mathematicians or computer scientist, since my education was a
mixture of both.

But the convention is simply convenient. It lets you write expression
like sum(A)+sum(B)>=sum(A union B) without having to point out that A
or B being the empty set, Ø, is a special case. In stead you define
sum(Ø)=0 since 0 is neutral with regard to addition like Ø is neutral
with regard to set union.

(I do hope Ø comes out right on your screen. I'm not to confident.)

> This
> convention lets you write easier indexing expressions on the
> summation, so the computations (i.e. procedural coding and model) are
> also easier.
>
> But we are in a set-oriented model and not a procedural one.

Procedural has nothing to do with it. In fact I recognize this
convention primarily from purely functional programming language like
SML, Haskell or Miranda.

Regards,
Kristian

Neil Zanella

unread,

Jul 31, 2004, 3:15:53 PM7/31/04

to

"Kristian Damm Jensen" <da...@ofir.dk> wrote in message news:<cee5br$e...@odbk17.prod.google.com>...

> --CELKO-- wrote:
>
> <snip>
>
> > I have never figured out why people think that the sum of an empty
> > set is an actual number, usually zero. My guess is that there is a
> > convention in the summation of sequences in mathematics that an empty
> > element in the sequence is converted from "undefined" to zero.

Mathematicians like to define a sum operator on finite sets in such a way
that the sum of a finite set is equal to the sum of all elements in the set.
Next they would like such operator to have the property that the sum operator
is distributive over disjoint finite partitions of sets. Since disjoint finite
partitions can involve any finite number of empty sets, they then go on to
define the sum of an empty set to be zero, just so that they can say what
they wanted to say about the sum operator, namely that it is distributive
over finite partitions of sets. So, they make the definition suit the
world they like to create for themselves.

This is an oversimplified example of why mathematicians might define something
to be one way or another: so that they can say things that don't sound more
complicated than they ought to. In this case, it's all about avoiding
trivialities. The same principles applies to the product of an empty
set being one.

I know SQL domains aren't gonna allow anything like GAP objects,
http://mirrors.ccs.neu.edu/GAP/. That would be too interesting
to be true, but perhaps also equally useless? Certainly not so
useful to accomplish your average SQL programming tasks. :-)
Imagine storing group elements in a database!!! I am sure
someone can think of uses which I haven't thought of...

:-) So, is someone going to write a GAP binding for some
popular SQL database??? And who will use it??? :-)

> There is indeed a convention. I am not certain whether it originates
> with mathematicians or computer scientist, since my education was a
> mixture of both.
>
> But the convention is simply convenient. It lets you write expression
> like sum(A)+sum(B)>=sum(A union B) without having to point out that A

> or B being the empty set, , is a special case. In stead you define
> sum( )=0 since 0 is neutral with regard to addition like is neutral

> with regard to set union.

You're right.
The "zero" in the monoid of finite sets with union is the empty set.

> (I do hope comes out right on your screen. I'm not to confident.)

It didn't. But I know what you typed.

> > This
> > convention lets you write easier indexing expressions on the
> > summation, so the computations (i.e. procedural coding and model) are
> > also easier.
> >
> > But we are in a set-oriented model and not a procedural one.

Sort of, but set theory doesn't emphasize the notion of NULL values,
so the SQL standard had to come up with something.

> Procedural has nothing to do with it. In fact I recognize this
> convention primarily from purely functional programming language like
> SML, Haskell or Miranda.

How so. Can you please provide us with some examples? Do these languages
also carry notions of operations on empty sets and binary operations on
null values?

Thanks,

Neil

Kristian Damm Jensen

unread,

Jul 31, 2004, 5:07:25 PM7/31/04

to

Mainly because mathematicians does not have a NULL and don't want it.

In a sense SQL is cheating, it's does not treat arithmetic on NULL
consistent:

create table T(id int not null, value int null)
insert into table T(id, value) values(1, 5)
insert into table T(id) values (2)
select sum(values) from T

returns 5.

But the mathematically equivalent expression

select 5+NULL

returns NULL

<snip>

> Sort of, but set theory doesn't emphasize the notion of NULL values,
> so the SQL standard had to come up with something.

Exactly.

<snip>

> > Procedural has nothing to do with it. In fact I recognize this
> > convention primarily from purely functional programming language
like
> > SML, Haskell or Miranda.
>
> How so. Can you please provide us with some examples? Do these
languages
> also carry notions of operations on empty sets and binary operations
on
> null values?

All of these languages are list-based, not that lists are the only
datatype or the primary datatype, just that anything interesting
written in one of these language are bound to use lists a lot. My
examples below will be written in Miranda, but could easily be
rewritten to any other functional programming language.)

But the notion we are discussing here apply equally well to lists. The
sum of all elements of a list may be defined as

sum [] = 0
sum (x:xs) = x + sum xs

[] is the empty list.
x:xs is the list with the first element called x and the rest of the
list (posssibly empty) called xs.

If sum should conform to the SQL-standard, this would become

sum [] = NULL
sum (x:[]) = x
sum (x:y:ys) = x + sum (y:ys)

This may seem like no big deal. But consider the amount of extra coding
necessary, if every time you call sum, you have to make special
encodings for the empty list.

(Or consider the amount of coding some of us do on a daily basis to
handle nulls in SQL. At least some of which could be avoided be
defining the sum over an empty set to be 0.)

Regards,
Kristian

Lennart Jonsson

unread,

Aug 1, 2004, 2:44:29 AM8/1/04

to

"Kristian Damm Jensen" <da...@ofir.dk> wrote in message news:<ceh1md$s...@odah37.prod.google.com>...

> Neil Zanella wrote:
> > "Kristian Damm Jensen" <da...@ofir.dk> wrote in message
> news:<cee5br$e...@odbk17.prod.google.com>...
> > > --CELKO-- wrote:
> > >
> > > <snip>
> > >

[...]

>
> But the notion we are discussing here apply equally well to lists. The
> sum of all elements of a list may be defined as
>
> sum [] = 0
> sum (x:xs) = x + sum xs
>
> [] is the empty list.
> x:xs is the list with the first element called x and the rest of the
> list (posssibly empty) called xs.
>
> If sum should conform to the SQL-standard, this would become
>
> sum [] = NULL
> sum (x:[]) = x
> sum (x:y:ys) = x + sum (y:ys)
>
> This may seem like no big deal. But consider the amount of extra coding
> necessary, if every time you call sum, you have to make special
> encodings for the empty list.

Yes, and it will also stop us (or atleast make it much more
cumbersome) from defining functions like reduce:

reduce f u [] = u
reduce f u (x:xs) = f x (reduce f u xs)

now, sum and similar functions can easily be defined as

sum xs = reduce add 0 xs
prod xs = reduce prod 1 xs
sort xs = reduce insert [] xs
...

/Lennart

Mikito Harakiri

unread,

Aug 1, 2004, 3:10:37 PM8/1/04

to

nzan...@cs.mun.ca (Neil Zanella) wrote in message news:<b68d2f19.04073...@posting.google.com>...

> Mathematicians like to define a sum operator on finite sets in such a way
> that the sum of a finite set is equal to the sum of all elements in the set.
> Next they would like such operator to have the property that the sum operator
> is distributive over disjoint finite partitions of sets. Since disjoint finite
> partitions can involve any finite number of empty sets, they then go on to
> define the sum of an empty set to be zero, just so that they can say what
> they wanted to say about the sum operator, namely that it is distributive
> over finite partitions of sets. So, they make the definition suit the
> world they like to create for themselves.

Simple aggregate is easy. It's "group by" that challenges consistency
of the whole approach. Indeed, empty set becomes empty group,
therefore, shouldn't we label with 0 every value in the domain that is
missing in the relation?!

Example:

select dept, sum(sal) from emp
group by dept

Accounting 200000
Engineering 1000000
............ and departments that are not in emp relation:
Finance 0

Note that the problem of domain dependency is easily solved if we
declare the aggregate on empty set to be the empty set.

When working with aggregates it might be desirable to consider empty
set and the record with sum=0 as equivalent from math perspective...

--CELKO--

unread,

Aug 1, 2004, 11:22:04 PM8/1/04

to

>> SELECT dept, SUM(sal) FROM Employees GROUP BY dept;

.. and departments that are not in emp relation: Finance 0 <<

But if Finance is not the Employees table, how does it get a group?
You might build such a thing:

SELECT dept, SUM(sal) FROM Employees GROUP BY dept
UNION
SELECT DISTINCT dept, {NULL | 0.00} FROM OtherDepartments;

or something with an OUTER JOIN, but this is not a simple GROUP BY.

Laconic2

unread,

Aug 2, 2004, 9:24:16 AM8/2/04

to

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:18c7b3c2.04073...@posting.google.com...

> I have never figured out why people think that the sum of an empty set
> is an actual number, usually zero.

This question is a psychological question rather than a logical one, so
I'll give an answer about the way people think, rather than an apology.

Most of us learn to use numbers for counting, long before we begin to use
numbers for measuring. And most of our intuition about numbers is based on
the counting numbers, rather than the so called "real" numbers.

Now let's take a column that contains an integer value:

ARRESTS (It's the number of of times a person has been arrested).

now let's take the following

select sum(ARRESTS) from PEOPLE where <predicate>

You may call this whatever you like, and it may make sense logically. But
deep down inside of us, most of us think of this number as a count. We are
really counting a set of arrests. The fact that we can use addition to
combine the individual counts is just mathematical convenience. So we
expect to get the same answer from SUM that we would have gotten from COUNT.

As an aside, that, if I had chosen, say, MARRIAGES instead of ARRESTS, I
would have had to take care not to double count the same MARRIAGE. But
that's another story.

If the set of people is empty, then the set of arrests is empty, and its
count is zero. Well, what if there are some people in the result set whose
ARRESTS are set to NULL? Then I'd expect SUM(ARRESTS) to yield NULL as
well. But I'm not sure
whether every body would agree.

Well, in that case why isn't COUNT(ARRESTS) NULL as well? Because that's a
question about the data, not a question about the arrests. But that's
another story.

Again, this is about the way people think, rather than an abstract logical
discussion.

Why do I care? Because the way people think about the data turns out to be
a critical success factor for nearly every large scale database. You ignore
the people factor at your peril.

Marshall Spight

unread,

Aug 2, 2004, 12:02:34 PM8/2/04

to

"--CELKO--" <jcel...@earthlink.net> wrote in message news:18c7b3c2.04080...@posting.google.com...

(I'm renaming your "OtherDepartments" to "Departments", because
I don't see anything "other" about it.)

ISTM that what you want is something like

for all dept in Departments, for all sal in Employees, (dept, sum(sal))

The closest thing is an outer join with a group by, I guess, but
you have to IFNULL or some such, because SQL doesn't do
folds on a zero length sequence right.

Marshall

Todd B

unread,

Aug 2, 2004, 4:10:13 PM8/2/04

to

"Alan" <al...@erols.com> wrote in message news:<2mv6bsF...@uni-berlin.de>...

> "--CELKO--" <jcel...@earthlink.net> wrote in message
> news:18c7b3c2.04073...@posting.google.com...
> > >> I wonder whether anyone knows what the SQL standard has to say
> > about what
> > aggregate functions such as MAX() and SUM() should return when the
> > result set is empty. <<
> >

>

> People are number-oriented, not set-oriented, and that's why they expect a
> numerical result. If a room is empty, and you ask a person, "How many people
> are in the room?" they will likely answer "None", (meaning zero) but not
> "null", or not even "the room is empty so I can't answer how many people are
> in there." BTW, "none" does not mean "null".

I don't know. For me, personally, an empty set, in that situation,
doesn't mean that there are no people in the room. An empty set to me
means that there is no information about whether or not there are
people in the room. So, it doesn't mean "I am absolutely sure that
the room is empty of people". Rather, I think it means, "I cannot be
certain about the occupation of that room by people". So, to me, that
would be a null value.

Todd

Alan

unread,

Aug 2, 2004, 4:21:12 PM8/2/04

to

"Todd B" <toddkenn...@yahoo.com> wrote in message
news:ef8e4d1e.04080...@posting.google.com...

Okay, I should have phrased it as, "...and you ask a NORMAL person..." :)

I'm not talking about empty sets. I'm talking about empty rooms. That the
room represents a set would not be realized by non-math/cs people.

Kristian Damm Jensen

unread,

Aug 3, 2004, 5:17:06 AM8/3/04

to

toddkenn...@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.04080...@posting.google.com>...

I don't get this.

I set is the set of people in the room, and you claim that the empty
set to you represents the information "that there is no information
about whether or not there are people in the room", how *would* you
tell if the room was empty? Or that an information you could never get
from the set?

Regards,
Kristian
>
> Todd

Todd B

unread,

Aug 3, 2004, 12:05:31 PM8/3/04

to

da...@ofir.dk (Kristian Damm Jensen) wrote in message news:<2c9e2992.04080...@posting.google.com>...

> toddkenn...@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.04080...@posting.google.com>...
> > "Alan" <al...@erols.com> wrote in message news:<2mv6bsF...@uni-berlin.de>...

> > > People are number-oriented, not set-oriented, and that's why they expect a

> > > numerical result. If a room is empty, and you ask a person, "How many people
> > > are in the room?" they will likely answer "None", (meaning zero) but not
> > > "null", or not even "the room is empty so I can't answer how many people are
> > > in there." BTW, "none" does not mean "null".
> >
> > I don't know. For me, personally, an empty set, in that situation,
> > doesn't mean that there are no people in the room. An empty set to me
> > means that there is no information about whether or not there are
> > people in the room. So, it doesn't mean "I am absolutely sure that
> > the room is empty of people". Rather, I think it means, "I cannot be
> > certain about the occupation of that room by people". So, to me, that
> > would be a null value.
>
> I don't get this.
>
> I set is the set of people in the room, and you claim that the empty
> set to you represents the information "that there is no information
> about whether or not there are people in the room", how *would* you
> tell if the room was empty? Or that an information you could never get
> from the set?
>
> Regards,
> Kristian

Yes, I do claim that there is no information. I may be way off track,
but to me, an empty set doesn't gaurantee me that it is absolutely
100% surely empty of elements, whatever elements those may be. I look
at an empty set as 'completely' empty. That is, even empty of the
information of whether it may contain something or not. I know this
isn't a practical view of set theory, but it does fit in with
relational 3-valued logic.

create table room {
roomnumber integer not null primary key }

create table people {
person integer not null primary key }

create table people_in_room {
person integer not null references people (person) on update cascade
on delete cascade,
roomnumber integer not null references room (roomnumber) on update
cascade on delete cascade,
presence date not null,
primary key (person, roomnumber, date)
}

Now, let's say there is nothing in the table people_in_room. You
suggest that you can absolutely gaurantee that the database tells you
that there are zero people in the room at any specific time. Ideally,
this would probably be the case, but practically, I think that it's
better to look at the empty set as an unknown rather than a count(*)
equalling zero. Please enlighten me if you disagree.

Todd

Ed prochak

unread,

Aug 3, 2004, 1:04:36 PM8/3/04

to

da...@ofir.dk (Kristian Damm Jensen) wrote in message news:<2c9e2992.04080...@posting.google.com>...

> toddkenn...@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.04080...@posting.google.com>...
> > "Alan" <al...@erols.com> wrote in message news:<2mv6bsF...@uni-berlin.de>...
> > > "--CELKO--" <jcel...@earthlink.net> wrote in message
> > > news:18c7b3c2.04073...@posting.google.com...
> > > > >> I wonder whether anyone knows what the SQL standard has to say
> > > > about what
> > > > aggregate functions such as MAX() and SUM() should return when the
> > > > result set is empty. <<
> > > >
>
> > >
> > > People are number-oriented, not set-oriented, and that's why they expect a
> > > numerical result. If a room is empty, and you ask a person, "How many people
> > > are in the room?" they will likely answer "None", (meaning zero) but not
> > > "null", or not even "the room is empty so I can't answer how many people are
> > > in there." BTW, "none" does not mean "null".
> >
> > I don't know. For me, personally, an empty set, in that situation,
> > doesn't mean that there are no people in the room. An empty set to me
> > means that there is no information about whether or not there are
> > people in the room. So, it doesn't mean "I am absolutely sure that
> > the room is empty of people". Rather, I think it means, "I cannot be
> > certain about the occupation of that room by people". So, to me, that
> > would be a null value.

The empty set is NOT the NULL. NULL is NOT a set value, it is a
placeholder for an attribute on a row. The empty set is a definite
answer to a question. NULL is another way of saying I don't know the
answer.
If the door to the room is closed and I ask you: who is in the room?
you would answer I don't know (IOW NULL)
If you then open the door and see the room is empty, and I ask again,
you would answer No one (IOW empty set)

Very different.

>
> I don't get this.
>
> I set is the set of people in the room, and you claim that the empty
> set to you represents the information "that there is no information
> about whether or not there are people in the room", how *would* you
> tell if the room was empty? Or that an information you could never get
> from the set?
>
> Regards,
> Kristian

Kristian, your thinking is correct. NULL means I do not know whether
there are people in the room of not. Or for a better example, consider
this question- what is the NAME of the person in the empty room?
It would be almost a comedy routine.

standing just outside the door, two friends continue their job as
census takers. As they talk, we listen...

A: Lou, who is in the room?
B: <Taking a peak inside> nobody, Abbott, it's empty.
A: I want the names of the people in the room.
B: there's no one there!
A: okay we'll come back to that later. what's the average age of the
people in the room?
B: I dunno.
A: You won't tell me the average age?
B: Abbott, I can't.
A: Sure you can, Lou. It's easy. you take the number of people in the
room (How many is that?)
B: I got that one Abbott. zero!
A: and you divide that into the total age in the room. (How much is
that?)
B: <really frustrated> I don't know!
A: you can be so stubborn sometimes. Forget I even asked. Let's just
go in here and sit down.
<they enter the room>
B: Abbot and Costello
A: what?
B: <smiling> that's who is in the room.
A: Why wouldn't you tell me that when I asked?!
<and he chases Lou out, hitting him with his hat as they go>

Laconic2

unread,

Aug 3, 2004, 1:58:55 PM8/3/04

to

"Todd B" <toddkenn...@yahoo.com> wrote in message

news:ef8e4d1e.0408...@posting.google.com...

> Now, let's say there is nothing in the table people_in_room. You
> suggest that you can absolutely gaurantee that the database tells you
> that there are zero people in the room at any specific time. Ideally,
> this would probably be the case, but practically, I think that it's
> better to look at the empty set as an unknown rather than a count(*)
> equalling zero. Please enlighten me if you disagree.

What Ed Prochak said. Plus my two cents:

First, the way you have rephrased the question brings the "open universe or
closed universe" into the discussion.

The open or closed universe question is not about NULLS as such, but it
does deal with the question of whether a database
asserts the falsehood of data it does not contain. Let's say we have a
table of employees of Sun Microsystems. And we ask for the count of
employees where first name is "Bill" and last name is "Gates". Let's say
the count comes back as zero.

Is that an assertion that there is no employee with that name, or is merely
a statement of ignorance about such a person?

In the case I've concocted, I claim it's a closed universe. That is, if
the table of employees doesn't assert that you are one of the employees,
it's asserting that you are not. Of course, he could have been hired five
minutes ago, and HR hasn't entered the name yet, but that argument could
be applied to any data whatsoever.

In an "open universe" table, the count describes the data in the database,
but it doesn't necessarily describe the count of anything in the real world,
except to assert a minimum value of the count.

Second, a useful way to look at NULLS as they enter into aggregates is to
think of them as the product of an outer join.

let's say we have

create table SALESMEN
(sales_id integer primary key,
sales_first_name char,
sales_last_name char);

create table ORDERS
(order_id integer,
order_amount number (9,2),
order_date date,
sales_id integer references SALESMEN.sales_id);

Now let's say we do an outer join between these two tables, group by
sales_id, and pull out the sum (order_amount).

We do an outer join, so as to include the salesmen with no orders. What
should the sum(order_amount) be for these people?

Well, in this case, I'm going to take the "closed universe" assumption
(above), and assert that the right answer is zero, for a salesmen with no
ORDERS. But I think an different case could be concocted where the right
answer is NULL.

Every NULL can be construed, eventually, as the product of an outer join on
fully normalized relations.
I could go on, but this post is already getting long.

Kristian Damm Jensen

unread,

Aug 3, 2004, 5:08:30 PM8/3/04

to

It doesn't fit with set theory in any way. (The normal kind anyway. I
suppose things get a little more tricky when you introduce fuzzy sets.)

> but it does fit in with
> relational 3-valued logic.
>
> create table room {
> roomnumber integer not null primary key }
>
> create table people {
> person integer not null primary key }
>
> create table people_in_room {
> person integer not null references people (person) on update cascade
> on delete cascade,
> roomnumber integer not null references room (roomnumber) on update
> cascade on delete cascade,
> presence date not null,
> primary key (person, roomnumber, date)
> }
>
> Now, let's say there is nothing in the table people_in_room. You
> suggest that you can absolutely gaurantee that the database tells you
> that there are zero people in the room at any specific time.
Ideally,
> this would probably be the case, but practically, I think that it's
> better to look at the empty set as an unknown rather than a count(*)
> equalling zero.

I still don't understand why.

And you haven't answered my question: How would ever be able to
represent zero people in the room?

If, as Laconic2 suggests in another post, the point is that you think
there may be people in the room that the database does not tell you
about, then count(*) will never tell you the exact number of people
anyway, but only supply you with a lower bound. Which a count(*) of
zero would supply anyway.

> Please enlighten me if you disagree.

I would, if I understood your reasoning.

To put it simple: How could an empty bag of beans mean anything else
than "I have no beans in this bag".

Kristian

Gene Wirchenko

unread,

Aug 3, 2004, 6:14:49 PM8/3/04

to

toddkenn...@yahoo.com (Todd B) wrote:

[snip]

>Now, let's say there is nothing in the table people_in_room. You
>suggest that you can absolutely gaurantee that the database tells you
>that there are zero people in the room at any specific time. Ideally,

According to what is in the database, that is the case.

>this would probably be the case, but practically, I think that it's
>better to look at the empty set as an unknown rather than a count(*)
>equalling zero. Please enlighten me if you disagree.

How many items are there in an empty set? Zero.

Suppose you query for a list of the people in the room and get
back a list of three people. Now, try the following logic:

You suggest that you can absolutely gaurantee that the database

tells you that there are three people in the room at any specific

time. Ideally, this would probably be the case, but practically, I

think that it's better to look at the three as an unknown rather than
a count(*) equalling three.

In a closed universe model, which is what a relational database
is, the database knows everything that is true. If it is not entered,
it is not true.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
I have preferences.
You have biases.
He/She has prejudices.

Todd B

unread,

Aug 3, 2004, 7:35:19 PM8/3/04

to

ed.pr...@magicinterface.com (Ed prochak) wrote in message news:<4b5394b2.04080...@posting.google.com>...

> da...@ofir.dk (Kristian Damm Jensen) wrote in message news:<2c9e2992.04080...@posting.google.com>...
> > toddkenn...@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.04080...@posting.google.com>...
> > > "Alan" <al...@erols.com> wrote in message news:<2mv6bsF...@uni-berlin.de>...

> > > > People are number-oriented, not set-oriented, and that's why they expect a
> > > > numerical result. If a room is empty, and you ask a person, "How many people
> > > > are in the room?" they will likely answer "None", (meaning zero) but not
> > > > "null", or not even "the room is empty so I can't answer how many people are
> > > > in there." BTW, "none" does not mean "null".
> > >
> > > I don't know. For me, personally, an empty set, in that situation,
> > > doesn't mean that there are no people in the room. An empty set to me
> > > means that there is no information about whether or not there are
> > > people in the room. So, it doesn't mean "I am absolutely sure that
> > > the room is empty of people". Rather, I think it means, "I cannot be
> > > certain about the occupation of that room by people". So, to me, that
> > > would be a null value.
>
> The empty set is NOT the NULL. NULL is NOT a set value, it is a
> placeholder for an attribute on a row. The empty set is a definite
> answer to a question. NULL is another way of saying I don't know the
> answer.

I wasn't talking about the answer to 'select * from
<pick_your_table>'. I was talking about the count() or sum() function
on an empty table, and how it should respond. So, here, you and I are
in agreement.

> If the door to the room is closed and I ask you: who is in the room?
> you would answer I don't know (IOW NULL)
> If you then open the door and see the room is empty, and I ask again,
> you would answer No one (IOW empty set)
>
> Very different.

In most of your database experience, then, is the door closed or open?

> >
> > I don't get this.
> >
> > I set is the set of people in the room, and you claim that the empty
> > set to you represents the information "that there is no information
> > about whether or not there are people in the room", how *would* you
> > tell if the room was empty? Or that an information you could never get
> > from the set?
> >
> > Regards,
> > Kristian
>
> Kristian, your thinking is correct. NULL means I do not know whether
> there are people in the room of not. Or for a better example, consider
> this question- what is the NAME of the person in the empty room?
> It would be almost a comedy routine.
>
> standing just outside the door, two friends continue their job as
> census takers. As they talk, we listen...
>

Very clever dialogue follows...

> A: Lou, who is in the room?
> B: <Taking a peak inside> nobody, Abbott, it's empty.
> A: I want the names of the people in the room.
> B: there's no one there!
> A: okay we'll come back to that later. what's the average age of the
> people in the room?
> B: I dunno.
> A: You won't tell me the average age?
> B: Abbott, I can't.
> A: Sure you can, Lou. It's easy. you take the number of people in the
> room (How many is that?)
> B: I got that one Abbott. zero!
> A: and you divide that into the total age in the room. (How much is
> that?)
> B: <really frustrated> I don't know!
> A: you can be so stubborn sometimes. Forget I even asked. Let's just
> go in here and sit down.
> <they enter the room>
> B: Abbot and Costello
> A: what?
> B: <smiling> that's who is in the room.
> A: Why wouldn't you tell me that when I asked?!
> <and he chases Lou out, hitting him with his hat as they go>

My point was that logic isn't so straightforward when nulls in
aggregates are included.

Am I a fan of nulls as attribute values? Right now, yes, because they
make things simple. I'm also a fan of updateable views, which is
somewhat of a contradiction. So, I guess you could call me somewhere
between an uneducataed and miseducated person -- take your pick.

Todd

Todd B

unread,

Aug 3, 2004, 7:45:48 PM8/3/04

to

"Laconic2" <laco...@comcast.net> wrote in message news:<gf-dnTo-Jq9...@comcast.com>...

This is the most interesting post I've seen so far in this thread
(meaning, probably, it makes the most sense to me :). Thanks for your
response, along with Ed Prochak's, Alan's, and Kristian's.

Todd

Ed prochak

unread,

Aug 4, 2004, 2:00:24 PM8/4/04

to

toddkenn...@yahoo.com (Todd B) wrote in message news:<ef8e4d1e.04080...@posting.google.com>...

> ed.pr...@magicinterface.com (Ed prochak) wrote in message news:<4b5394b2.04080...@posting.google.com>...

[]

> > The empty set is NOT the NULL. NULL is NOT a set value, it is a
> > placeholder for an attribute on a row. The empty set is a definite
> > answer to a question. NULL is another way of saying I don't know the
> > answer.
>
> I wasn't talking about the answer to 'select * from
> <pick_your_table>'. I was talking about the count() or sum() function
> on an empty table, and how it should respond. So, here, you and I are
> in agreement.
>
> > If the door to the room is closed and I ask you: who is in the room?
> > you would answer I don't know (IOW NULL)
> > If you then open the door and see the room is empty, and I ask again,
> > you would answer No one (IOW empty set)
> >
> > Very different.
>
> In most of your database experience, then, is the door closed or open?

It is closed until you open it, IOW execute the query. Note the metaphor:
room is a table, people are rows, their names and other things are attributes.

[]

>
> My point was that logic isn't so straightforward when nulls in
> aggregates are included.

I guess my point then is, it's not so crooked either.
8^? <<< (he says with a crooked smile)

>
> Am I a fan of nulls as attribute values? Right now, yes, because they
> make things simple.

when you understand it properly, three valued logic is wonderful.

> ... I'm also a fan of updateable views, which is

> somewhat of a contradiction. So, I guess you could call me somewhere
> between an uneducataed and miseducated person -- take your pick.
>
> Todd

How do NULL attributes contradict the usefullness of updateable views?
Maybe we both need to be edumacated? 8^)

Have a nice day, Todd.
Ed

--CELKO--

unread,

Aug 4, 2004, 5:43:45 PM8/4/04

to

>> I think that it's better to look at the empty set as an unknown
rather than a count(*) equalling zero. Please enlighten me if you
disagree. <<

COUNT(*) is the cardinality operator for the set as a whole. The
other aggregate functions work on the elements within the set. In set
theory, the cardinality of the empty set is defined as zero. SQL is
following straight mathematical conventions here.

Alfredo Novoa

unread,

Aug 6, 2004, 6:27:54 AM8/6/04

to

mikha...@yahoo.com (Mikito Harakiri) wrote in message news:<bdf69bdf.04072...@posting.google.com>...

> > I wonder whether anyone knows what the SQL standard has to say about what
> > aggregate functions such as MAX() and SUM() should return when the result

> > set is empty. I also wonder whether poular database management systems
> > such as Oracle, PostgreSQL, MySQL, and SQLite all conform to the
> > standard with regards to this particular issue. I am particularly
> > interested in what the result should be when the underlying data
> > type for the fields of interest is INTEGER.
>

> SUM should return 0

Obvious.

> MIN should return the maximum element in the domain
> MAX should return the minimum element in the domain

Why?

Why not "undefined" like 1/0?

Regards
Alfredo

Mikito Harakiri

unread,

Aug 9, 2004, 7:10:11 PM8/9/04

to

alf...@ncs.es (Alfredo Novoa) wrote in message news:<e4330f45.04080...@posting.google.com>...

> > MIN should return the maximum element in the domain
> > MAX should return the minimum element in the domain
>
> Why?

Because max and min are monothonic functions. Max is increasing when
you increase the count of the underlying set, while min is decreasing.

And there is also an authority argument: because Maple said so.
(Compared to SQL engines CAS systems posess much higher intellectual
capabilities).

> Why not "undefined" like 1/0?

It is undefined in the scope of rudimentary number implementations
like 32-bit arithmetics, but we can easily extend the numbers domain
to include 1/0 (plus infinity) and -1/0 (minus infinity). In fact, 1/0
is a very powerful concept utilized in some advanced exact arithmetics
implementations, but I'm digressing here.

Therefore, yes, we one define

select min(sal) from empty_employees_list

as

1/0

and

select max(sal) from empty_employees_list

as

-1/0

and the decision how to simplify the latter fraction can be handled by
the domain implementation.