[HACKERS] Weird function behavior from Sept 11 snapshot

Mike Mascari

unread,

Sep 12, 2000, 8:00:12 AM9/12/00

to

Under both 6.5 and 7.0:
----------------------
stocks=# create table test (key int4);
CREATE
stocks=# create function crap(int4) returns int4 as
'select sum(key) from test' language 'sql';
CREATE
stocks=# select version();

version
---------------------------------------------------------------------
PostgreSQL 7.0.0 on i686-pc-linux-gnu, compiled by gcc
egcs-2.91.66

Under the snapshot from yesterday:
---------------------------------

template1=# create table test (key int4);
CREATE
template1=# create function crap(int4) returns int4
as 'select sum(key) from test' language 'sql';
ERROR: return type mismatch in function: declared to return
int4, returns numeric
template1=# select version();

version
------------------------------------------------------------------------
PostgreSQL 7.1devel on i586-pc-linux-gnu, compiled by GCC
egcs-2.91.66

Is this correct behavior? All of the regression tests pass on the
snapshot version, BTW.

Mike Mascari

Tom Lane

unread,

Sep 12, 2000, 9:56:02 AM9/12/00

to

Mike Mascari <mas...@mascari.com> writes:
> Under the snapshot from yesterday:
> ---------------------------------

> template1=# create function crap(int4) returns int4

> as 'select sum(key) from test' language 'sql';
> ERROR: return type mismatch in function: declared to return
> int4, returns numeric

I changed sum() on integer types to return numeric as a way of
avoiding overflow. Also avg() on integers now returns numeric
so that you can get some fractional precision. If you think this
was a bad idea, there's still time to debate it ... but we've had
repeated complaints about both of those issues.

regards, tom lane

Thomas Lockhart

unread,

Sep 12, 2000, 10:06:32 AM9/12/00

to

> Is this correct behavior? All of the regression tests pass on the
> snapshot version, BTW.

This is the expected behavior, and is "correct". There was a change
recently to the aggregate functions to make them more robust. So
sum(int4) now calculates and returns a numeric result rather than an
int4.

The problem is that numeric is extremely slow compared to an int4
calculation, and I'd like us to consider doing the calculation in int4
(downside: silent overflow when dealing with non-trivial data), int8
(downside: no support on a few platforms), or float8 (downside: silent
truncation on non-trivial data).

Tom, do you recall measuring the performance difference on aggregate
functions between int4 and numeric for small-value cases? We probably
don't want to take order-of-magnitude performance hits to get this more
correct behavior, but I'm not sure what the performance actually is.

btw, Mike's function works when defined as

create function c(int4) returns int4
as 'select cast(sum(key) as int4) from test' language 'sql';

- Thomas

Tom Lane

unread,

Sep 12, 2000, 10:28:42 AM9/12/00

to

Thomas Lockhart <lock...@alumni.caltech.edu> writes:
> Tom, do you recall measuring the performance difference on aggregate
> functions between int4 and numeric for small-value cases? We probably
> don't want to take order-of-magnitude performance hits to get this more
> correct behavior, but I'm not sure what the performance actually is.

I have not tried to measure it --- I was sort of assuming that for
realistic-size problems, disk I/O would swamp any increase in CPU time
anyway. Does anyone want to check the time for sum() or avg() on an
int4 column over a large table, under both 7.0.* and current?

> The problem is that numeric is extremely slow compared to an int4
> calculation, and I'd like us to consider doing the calculation in int4
> (downside: silent overflow when dealing with non-trivial data), int8
> (downside: no support on a few platforms), or float8 (downside: silent
> truncation on non-trivial data).

Actually, using a float8 accumulator would work pretty well; assuming
IEEE float8, you'd only start to get roundoff error when the running
sum exceeds 2^52 or so. However the SQL92 spec is insistent that sum()
deliver an exact-numeric result when applied to exact-numeric data,
and with a float accumulator we'd be at the mercy of the quality of the
local implementation of floating point.

I could see offering variant aggregates, say "sumf" and "avgf", that
use float8 accumulation. Right now the user can get the same result
by writing "sum(foo::float8)" but it might be wise to formalize the
idea ...

regards, tom lane

Thomas Lockhart

unread,

Sep 12, 2000, 10:52:41 AM9/12/00

to

> ... Does anyone want to check the time for sum() or avg() on an

> int4 column over a large table, under both 7.0.* and current?

For 262144 rows on the current tree, I get the following:

sum(int4): 12.0 seconds
sum(float8): 5.2 seconds
sum(cast(int4 as float8): 5.7 seconds

This includes startup costs, etc, and are the minimum times from several
runs (there is pretty wide variability, presumably due to disk caching,
swapping, etc on my laptop). It is a safe bet that the original int4
implementation was as fast or faster than the float8 result above (int4
does not require palloc() calls).

> Actually, using a float8 accumulator would work pretty well; assuming
> IEEE float8, you'd only start to get roundoff error when the running
> sum exceeds 2^52 or so. However the SQL92 spec is insistent that sum()
> deliver an exact-numeric result when applied to exact-numeric data,
> and with a float accumulator we'd be at the mercy of the quality of the
> local implementation of floating point.

A problem with float8 is that it is possible to reach a point in the
accumulation where subsequent input values are ignored in the sum. This
is different than just roundoff error, since it degrades ungracefully
from that point on.

> I could see offering variant aggregates, say "sumf" and "avgf", that
> use float8 accumulation. Right now the user can get the same result
> by writing "sum(foo::float8)" but it might be wise to formalize the
> idea ...

How about using int8 for the accumulator (on machines which support it
of course)? Falling back to float8 or numeric on other machines? Or
perhaps we could have an option (runtime??) to switch accumulator modes.

I like the idea of something like "sumf" to get alternative algorithms,
but it would be nice if basic sum() could be a bit more optimized than
currently.

- Thomas

Thomas Lockhart

unread,

Sep 12, 2000, 12:45:52 PM9/12/00

to

Hmm. I recompiled the current snapshot with the optimizations from my
Mandrake RPM (using the Mandrake defaults, except for disabling
"fast-math"), and get the following:

7.0.2 current test
1.8 5.3 sum(i)
1.95 1.77 sum(f)
2.3 1.9 sum(cast(i as float8))

My previous tests on the current tree were with -O0, asserts enabled,
and few other options specified (mostly, the defaults for the Postgres
Linux build).

The Linux defaults in the Postgres tarball are:

-O2 -Wall -Wmissing-prototypes -Wmissing-declarations

whereas the defaults for Mandrake (with fast-math turned off since it
gives rounding trouble in date/time math):

-O3 -fomit-frame-pointer -fno-exceptions -fno-rtti -pipe -s
-mpentiumpro -mcpu=pentiumpro -march=pentiumpro
-fexpensive-optimizations
-malign-loops=2 -malign-jumps=2 -malign-functions=2
-mpreferred-stack-boundary=2 -fno-fast-math

I'll do some more tests with the default compiler options. The good news
is that the new fmgr interface is apparently as fast or faster than the
old one :)

- Thomas