Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

FUNC_MAX_ARGS benchmarks

5 views
Skip to first unread message

Bruce Momjian

unread,
Aug 2, 2002, 9:17:28 PM8/2/02
to
Thomas Lockhart wrote:
> > > With FUNC_MAX_ARGS=16:
> > > (average = 28.6 seconds)
> > > With FUNC_MAX_ARGS=32:
> > > (average = 29.15 seconds)
>
> That is almost a 2 percent cost. Shall we challenge someone to get us
> back 2 percent from somewhere before the 7.3 release? Optimizing a hot
> spot might do it...

I wasn't terribly concerned because this wasn't a 2% on normal workload
test, it was a 2% bang on function calls as fast as you can test.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Thomas Lockhart

unread,
Aug 2, 2002, 10:44:31 PM8/2/02
to
...
> I wasn't terribly concerned because this wasn't a 2% on normal workload
> test, it was a 2% bang on function calls as fast as you can test.

Yeah, good point. But if we can get it back somehow that would be even
better :)

- Thomas

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majo...@postgresql.org

Tom Lane

unread,
Aug 3, 2002, 12:05:57 AM8/3/02
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
> I wasn't terribly concerned because this wasn't a 2% on normal workload
> test, it was a 2% bang on function calls as fast as you can test.

No, it was a 2% hit on rather slow functions with only one call made
per query issued by the client. This is not much of a stress test.

A more impressive comparison would be

select 2+2+2+2+2+2+ ... (iterate 10000 times or so)

and see how much that slows down.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Joe Conway

unread,
Aug 3, 2002, 2:02:23 AM8/3/02
to
Tom Lane wrote:
> No, it was a 2% hit on rather slow functions with only one call made
> per query issued by the client. This is not much of a stress test.
>
> A more impressive comparison would be
>
> select 2+2+2+2+2+2+ ... (iterate 10000 times or so)
>
> and see how much that slows down.

I ran a crude test as follows (using a PHP script on the same machine.
Nothing else going on at the same time):

do 100 times
select 2+2+2+2+2+2+ ... iterated 9901 times


#define INDEX_MAX_KEYS 16, 32, 64, & 128
#define FUNC_MAX_ARGS INDEX_MAX_KEYS
make all
make install
initdb

The results were as follows:
INDEX_MAX_KEYS 16 32 64 128
-----+-------+------+--------
Time in seconds 48 49 51 55


Joe


---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

unread,
Aug 3, 2002, 12:41:37 PM8/3/02
to
Joe Conway <ma...@joeconway.com> writes:
> I ran a crude test as follows (using a PHP script on the same machine.
> Nothing else going on at the same time):

> do 100 times
> select 2+2+2+2+2+2+ ... iterated 9901 times

> The results were as follows:


> INDEX_MAX_KEYS 16 32 64 128
> -----+-------+------+--------
> Time in seconds 48 49 51 55

Okay, that seems like a good basic test.

Did you happen to make any notes about the disk space occupied by the
database? One thing I was worried about was the bloat that'd occur
in pg_proc, pg_index, and pg_proc_proname_args_nsp_index. Aside from
costing disk space, this would indirectly slow things down due to more
I/O to read these tables --- an effect that probably your test couldn't
measure, since it wasn't touching very many entries in any of those
tables.

Looks like we could go for 32 without much problem, though.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 3, 2002, 1:12:41 PM8/3/02
to
Tom Lane wrote:
> Did you happen to make any notes about the disk space occupied by the
> database? One thing I was worried about was the bloat that'd occur
> in pg_proc, pg_index, and pg_proc_proname_args_nsp_index. Aside from
> costing disk space, this would indirectly slow things down due to more
> I/O to read these tables --- an effect that probably your test couldn't
> measure, since it wasn't touching very many entries in any of those
> tables.

No, but it's easy enough to repeat. I'll do that today sometime.

Joe

---------------------------(end of broadcast)---------------------------

Hannu Krosing

unread,
Aug 3, 2002, 1:55:39 PM8/3/02
to
On Sat, 2002-08-03 at 18:41, Tom Lane wrote:
> Joe Conway <ma...@joeconway.com> writes:
> > I ran a crude test as follows (using a PHP script on the same machine.
> > Nothing else going on at the same time):
>
> > do 100 times
> > select 2+2+2+2+2+2+ ... iterated 9901 times
>
> > The results were as follows:
> > INDEX_MAX_KEYS 16 32 64 128
> > -----+-------+------+--------
> > Time in seconds 48 49 51 55
>
> Okay, that seems like a good basic test.
>
> Did you happen to make any notes about the disk space occupied by the
> database? One thing I was worried about was the bloat that'd occur
> in pg_proc, pg_index, and pg_proc_proname_args_nsp_index. Aside from
> costing disk space, this would indirectly slow things down due to more
> I/O to read these tables --- an effect that probably your test couldn't
> measure, since it wasn't touching very many entries in any of those
> tables.

How hard would it be to change pg_proc.proargtypes from oidvector to _oid and hope
that toasting will take care of the rest ?

This could also get the requested 2% speedup, not to mention the
potential for up to 64K arguments ;)

---------------
Hannu

Tom Lane

unread,
Aug 3, 2002, 2:21:12 PM8/3/02
to
Hannu Krosing <ha...@tm.ee> writes:
> How hard would it be to change pg_proc.proargtypes from oidvector to _oid

Lack of btree index support for _oid would be the first hurdle.

Even if we wanted to do that work, there'd be some serious breakage
of client queries because of the historical differences in output format
and subscripting. (oidvector indexes from 0, _oid from 1. Which is
pretty bogus, but if the regression tests are anything to judge by there
are probably a lot of queries out there that know this.)

> This could also get the requested 2% speedup,

I'm not convinced that _oid would be faster.

All in all, it doesn't seem worth the trouble compared to just kicking
FUNC_MAX_ARGS up a notch. At least not right now. I think we've
created quite enough system-catalog changes for one release cycle ;-)

regards, tom lane

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 3, 2002, 5:25:38 PM8/3/02
to
Tom Lane wrote:
> Did you happen to make any notes about the disk space occupied by the
> database? One thing I was worried about was the bloat that'd occur
> in pg_proc, pg_index, and pg_proc_proname_args_nsp_index. Aside from
> costing disk space, this would indirectly slow things down due to more
> I/O to read these tables --- an effect that probably your test couldn't
> measure, since it wasn't touching very many entries in any of those
> tables.

#define INDEX_MAX_KEYS 16
#define FUNC_MAX_ARGS INDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
2.7M /opt/data/pgsql/data/base/1
2.7M /opt/data/pgsql/data/base/16862
2.7M /opt/data/pgsql/data/base/16863
2.7M /opt/data/pgsql/data/base/16864
3.2M /opt/data/pgsql/data/base/16865
2.7M /opt/data/pgsql/data/base/16866
17M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS 32
#define FUNC_MAX_ARGS INDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
3.1M /opt/data/pgsql/data/base/1
3.1M /opt/data/pgsql/data/base/16862
3.1M /opt/data/pgsql/data/base/16863
3.1M /opt/data/pgsql/data/base/16864
3.6M /opt/data/pgsql/data/base/16865
3.1M /opt/data/pgsql/data/base/16866
19M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS 64
#define FUNC_MAX_ARGS INDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
3.9M /opt/data/pgsql/data/base/1
3.9M /opt/data/pgsql/data/base/16862
3.9M /opt/data/pgsql/data/base/16863
3.9M /opt/data/pgsql/data/base/16864
4.4M /opt/data/pgsql/data/base/16865
3.9M /opt/data/pgsql/data/base/16866
24M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS 128
#define FUNC_MAX_ARGS INDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
5.7M /opt/data/pgsql/data/base/1
5.7M /opt/data/pgsql/data/base/16862
5.7M /opt/data/pgsql/data/base/16863
5.7M /opt/data/pgsql/data/base/16864
6.3M /opt/data/pgsql/data/base/16865
5.7M /opt/data/pgsql/data/base/16866
35M /opt/data/pgsql/data/base


Joe


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majo...@postgresql.org)

Hannu Krosing

unread,
Aug 3, 2002, 7:36:58 PM8/3/02
to
On Sat, 2002-08-03 at 23:20, Tom Lane wrote:
> Hannu Krosing <ha...@tm.ee> writes:
> > How hard would it be to change pg_proc.proargtypes from oidvector to _oid
>
> Lack of btree index support for _oid would be the first hurdle.

Is that index really needed, or is it there just to enforce uniqueness ?

Would the lookup not be in some internal cache most of the time ?

Also, (imho ;) btree index support should be done for all array types
which have comparison ops for elements at once (with semantics similar
to string) not one by one for individual types. It should be in some
ways quite similar to multi-key indexes, so perhaps some code could be
borrowed from there.

Otoh, It should be a SMOP to write support for b-tree indexes just for
_oid :-p , most likely one could re-use code from oidvector ;)

> Even if we wanted to do that work, there'd be some serious breakage
> of client queries because of the historical differences in output format
> and subscripting. (oidvector indexes from 0, _oid from 1. Which is
> pretty bogus, but if the regression tests are anything to judge by there
> are probably a lot of queries out there that know this.)

I would guess that oidvector is sufficiently obscure type and that
nobody actually uses oidvector for user tables.

It is also only used in two tables and one index in system tables:

hannu=# select relname,relkind from pg_class where oid in (
hannu-# select attrelid from pg_attribute where atttypid=30);
relname | relkind
---------------------------------+---------
pg_index | r
pg_proc_proname_narg_type_index | i
pg_proc | r
(3 rows)

> > This could also get the requested 2% speedup,
>
> I'm not convinced that _oid would be faster.

Neither am I, but it _may_ be that having generally shorter oid arrays
wins us enough ;)

> All in all, it doesn't seem worth the trouble compared to just kicking
> FUNC_MAX_ARGS up a notch. At least not right now. I think we've
> created quite enough system-catalog changes for one release cycle ;-)

But going to _oid will free us from arbitrary limits on argument count.
Or at least from small arbitrary limits, as there will probably still be
the at-least-three-btree-keys-must-fit-in-page limit (makes > 2600
args/function) and maybe some other internal limits as well.

------------------
Hannu


---------------------------(end of broadcast)---------------------------

Bruce Momjian

unread,
Aug 3, 2002, 8:40:34 PM8/3/02
to
Hannu Krosing wrote:
> On Sat, 2002-08-03 at 23:20, Tom Lane wrote:
> > Hannu Krosing <ha...@tm.ee> writes:
> > > How hard would it be to change pg_proc.proargtypes from oidvector to _oid
> >
> > Lack of btree index support for _oid would be the first hurdle.
>
> Is that index really needed, or is it there just to enforce uniqueness ?

Needed to look up functions based on their args.

The big issue of using arrays is that we don't have cache capability for
variable length fields. Until we get that, we are stuck with
NAMEDATALEN taking the full length, and oidvector taking the full
length.

And if we went with variable length, there may be a performance penalty.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Bruce Momjian

unread,
Aug 3, 2002, 9:05:03 PM8/3/02
to

OK, time to get moving folks. Looks like the increase in the function
args to 32 and the NAMEDATALEN to 128 has been sufficiently tested. Tom
has some ideas on removing some memset() calls for function args to
speed things up, but we don't have to wait for that go get going. The
end of August is nearing.

Is there any reason to delay the change further?

---------------------------------------------------------------------------

--

Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Aug 3, 2002, 9:38:21 PM8/3/02
to
Hannu Krosing <ha...@tm.ee> writes:
>> Lack of btree index support for _oid would be the first hurdle.

> Is that index really needed, or is it there just to enforce uniqueness ?

Both.

> Also, (imho ;) btree index support should be done for all array types
> which have comparison ops for elements at once (with semantics similar
> to string) not one by one for individual types.

Fine, send a patch ;-)

>> Even if we wanted to do that work, there'd be some serious breakage
>> of client queries because of the historical differences in output format
>> and subscripting. (oidvector indexes from 0, _oid from 1. Which is
>> pretty bogus, but if the regression tests are anything to judge by there
>> are probably a lot of queries out there that know this.)

> I would guess that oidvector is sufficiently obscure type and that
> nobody actually uses oidvector for user tables.

No, you miss my point: client queries that do subscripting on
proargtypes will break. Since the regression tests find this a useful
thing to do, I suspect there are clients out there that do too.

> But going to _oid will free us from arbitrary limits on argument count.

I didn't say it wouldn't be a good idea in the long run. I'm saying I
don't think it's happening for 7.3, given that Aug 31 is not that far
away anymore and that a lot of cleanup work remains undone on other
already-committed features. FUNC_MAX_ARGS=32 could happen for 7.3, though.

regards, tom lane

Tom Lane

unread,
Aug 3, 2002, 10:15:33 PM8/3/02
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
> OK, time to get moving folks. Looks like the increase in the function
> args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.

I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
too much. But have we done equivalent checks on NAMEDATALEN? In
particular, do we know what it does to the size of template1?

regards, tom lane

---------------------------(end of broadcast)---------------------------

Bruce Momjian

unread,
Aug 3, 2002, 10:54:22 PM8/3/02
to
Tom Lane wrote:
> Bruce Momjian <pg...@candle.pha.pa.us> writes:
> > OK, time to get moving folks. Looks like the increase in the function
> > args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.
>
> I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
> too much. But have we done equivalent checks on NAMEDATALEN? In
> particular, do we know what it does to the size of template1?

No, I thought we saw the number, was 30%? No, we did a test for 64.
Can someone get us that number for 128?

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 4, 2002, 2:59:51 AM8/4/02
to
Bruce Momjian wrote:

> Tom Lane wrote:
>>I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
>>too much. But have we done equivalent checks on NAMEDATALEN? In
>>particular, do we know what it does to the size of template1?
> No, I thought we saw the number, was 30%? No, we did a test for 64.
> Can someone get us that number for 128?
>

I'll do 32, 64, and 128 and report back on template1 size.

Joe

Joe Conway

unread,
Aug 4, 2002, 3:19:47 AM8/4/02
to
Bruce Momjian wrote:
> Tom Lane wrote:
>
>>Bruce Momjian <pg...@candle.pha.pa.us> writes:
>>
>>>OK, time to get moving folks. Looks like the increase in the function
>>>args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.
>>
>>I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
>>too much. But have we done equivalent checks on NAMEDATALEN? In
>>particular, do we know what it does to the size of template1?
>
>
> No, I thought we saw the number, was 30%? No, we did a test for 64.
> Can someone get us that number for 128?
>

These are all with FUNC_MAX_ARGS = 16.

#define NAMEDATALEN 32


du -h --max-depth=1 /opt/data/pgsql/data/base/
2.7M /opt/data/pgsql/data/base/1
2.7M /opt/data/pgsql/data/base/16862
2.7M /opt/data/pgsql/data/base/16863
2.7M /opt/data/pgsql/data/base/16864
3.2M /opt/data/pgsql/data/base/16865
2.7M /opt/data/pgsql/data/base/16866

2.7M /opt/data/pgsql/data/base/17117
19M /opt/data/pgsql/data/base

#define NAMEDATALEN 64
du -h --max-depth=1 /opt/data/pgsql/data/base/
3.0M /opt/data/pgsql/data/base/1
3.0M /opt/data/pgsql/data/base/16863
3.0M /opt/data/pgsql/data/base/16864
3.0M /opt/data/pgsql/data/base/16865
3.5M /opt/data/pgsql/data/base/16866
3.0M /opt/data/pgsql/data/base/16867
19M /opt/data/pgsql/data/base

#define NAMEDATALEN 128
du -h --max-depth=1 /opt/data/pgsql/data/base/
3.8M /opt/data/pgsql/data/base/1
3.8M /opt/data/pgsql/data/base/16863
3.8M /opt/data/pgsql/data/base/16864
3.8M /opt/data/pgsql/data/base/16865
4.4M /opt/data/pgsql/data/base/16866
3.8M /opt/data/pgsql/data/base/16867
23M /opt/data/pgsql/data/base

Joe Conway

unread,
Aug 4, 2002, 7:19:57 PM8/4/02
to
Joe Conway wrote:
> These are all with FUNC_MAX_ARGS = 16.
>
> #define NAMEDATALEN 32
> du -h --max-depth=1 /opt/data/pgsql/data/base/
> 2.7M /opt/data/pgsql/data/base/1
> 2.7M /opt/data/pgsql/data/base/16862
> 2.7M /opt/data/pgsql/data/base/16863
> 2.7M /opt/data/pgsql/data/base/16864
> 3.2M /opt/data/pgsql/data/base/16865
> 2.7M /opt/data/pgsql/data/base/16866
> 2.7M /opt/data/pgsql/data/base/17117
> 19M /opt/data/pgsql/data/base
>

FWIW, this is FUNC_MAX_ARGS = 32 *and* NAMEDATALEN 128
du -h --max-depth=1 /opt/data/pgsql/data/base/
4.1M /opt/data/pgsql/data/base/1
4.1M /opt/data/pgsql/data/base/16863
4.1M /opt/data/pgsql/data/base/16864
4.1M /opt/data/pgsql/data/base/16865
4.8M /opt/data/pgsql/data/base/16866
4.1M /opt/data/pgsql/data/base/16867
26M /opt/data/pgsql/data/base

Tom Lane

unread,
Aug 4, 2002, 11:57:09 PM8/4/02
to
Joe Conway <ma...@joeconway.com> writes:
> These are all with FUNC_MAX_ARGS = 16.

> #define NAMEDATALEN 32
> 2.7M /opt/data/pgsql/data/base/1

> #define NAMEDATALEN 64
> 3.0M /opt/data/pgsql/data/base/1

> #define NAMEDATALEN 128
> 3.8M /opt/data/pgsql/data/base/1

Based on Joe's numbers, I'm kind of thinking that we should go for
FUNC_MAX_ARGS=32 and NAMEDATALEN=64 as defaults in 7.3.

Although NAMEDATALEN=128 would be needed for full SQL compliance,
the space penalty seems severe. I'm thinking we should back off
until someone wants to do the legwork needed to make the name type
be truly variable-length.

Comments?

regards, tom lane

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 5, 2002, 12:09:10 AM8/5/02
to
Tom Lane wrote:
> Joe Conway <ma...@joeconway.com> writes:
>
>>These are all with FUNC_MAX_ARGS = 16.
>
>
>>#define NAMEDATALEN 32
>>2.7M /opt/data/pgsql/data/base/1
>
>
>>#define NAMEDATALEN 64
>>3.0M /opt/data/pgsql/data/base/1
>
>
>>#define NAMEDATALEN 128
>>3.8M /opt/data/pgsql/data/base/1
>
>
> Based on Joe's numbers, I'm kind of thinking that we should go for
> FUNC_MAX_ARGS=32 and NAMEDATALEN=64 as defaults in 7.3.
>
> Although NAMEDATALEN=128 would be needed for full SQL compliance,
> the space penalty seems severe. I'm thinking we should back off
> until someone wants to do the legwork needed to make the name type
> be truly variable-length.

FWIW, I reran the speed benchmark (select 2+2+2...) with
FUNC_MAX_ARGS=32 and NAMEDATALEN=128 and still got 49 seconds, i.e.
NAMEDATALEN=128 didn't impact performance of that particular test.

The results were as follows:
INDEX_MAX_KEYS 16 32 64 128
-----+-------+------+--------
Time in seconds 48 49 51 55

^^^^^^^^
reran with NAMEDATALEN=128, same result

What will the impact be on a medium to large production database? In
other words, is the bloat strictly to the system catalogs based on how
extensive your database schema (bad choice of words now, but I don't
know a better term for this) is? Or will the bloat scale with the size
of the database including data?

Bruce Momjian

unread,
Aug 5, 2002, 1:21:56 AM8/5/02
to
Tom Lane wrote:
> Joe Conway <ma...@joeconway.com> writes:
> > These are all with FUNC_MAX_ARGS = 16.
>
> > #define NAMEDATALEN 32
> > 2.7M /opt/data/pgsql/data/base/1
>
> > #define NAMEDATALEN 64
> > 3.0M /opt/data/pgsql/data/base/1
>
> > #define NAMEDATALEN 128
> > 3.8M /opt/data/pgsql/data/base/1
>
> Based on Joe's numbers, I'm kind of thinking that we should go for
> FUNC_MAX_ARGS=32 and NAMEDATALEN=64 as defaults in 7.3.
>
> Although NAMEDATALEN=128 would be needed for full SQL compliance,
> the space penalty seems severe. I'm thinking we should back off
> until someone wants to do the legwork needed to make the name type
> be truly variable-length.

I prefer 64 for NAMEDATALEN myself. Standards compliance is nice, but
realistically it seems a shame to waste so much space on an excessive
length that will never be used.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 5, 2002, 2:09:08 AM8/5/02
to
Bruce Momjian wrote:
> I prefer 64 for NAMEDATALEN myself. Standards compliance is nice, but
> realistically it seems a shame to waste so much space on an excessive
> length that will never be used.
>

But is the space wasted really never more than a few MB's, even if the
database itself is say 1 GB? If so, and if the speed penalty is small to
non-existent, I'd rather be spec compliant. That way nobody has a good
basis for complaining ;-)

I guess I'll try another test with a larger data-set.

Joe


---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 5, 2002, 2:37:41 AM8/5/02
to
Joe Conway wrote:
> But is the space wasted really never more than a few MB's, even if the
> database itself is say 1 GB? If so, and if the speed penalty is small to
> non-existent, I'd rather be spec compliant. That way nobody has a good
> basis for complaining ;-)
>
> I guess I'll try another test with a larger data-set.
>

Starting with pg_dumpall file at 138M.


#define INDEX_MAX_KEYS 16
#define FUNC_MAX_ARGS INDEX_MAX_KEYS

#define NAMEDATALEN 32
du -h --max-depth=1 /opt/data/pgsql/data/base/
2.7M /opt/data/pgsql/data/base/1
2.7M /opt/data/pgsql/data/base/16862

119M /opt/data/pgsql/data/base/16863
3.1M /opt/data/pgsql/data/base/696496
3.1M /opt/data/pgsql/data/base/696623
3.1M /opt/data/pgsql/data/base/696750
2.8M /opt/data/pgsql/data/base/696877
2.8M /opt/data/pgsql/data/base/696889
2.8M /opt/data/pgsql/data/base/696901
2.8M /opt/data/pgsql/data/base/696912
18M /opt/data/pgsql/data/base/696924
3.0M /opt/data/pgsql/data/base/878966
2.7M /opt/data/pgsql/data/base/881056
2.7M /opt/data/pgsql/data/base/881075
2.8M /opt/data/pgsql/data/base/881078
3.1M /opt/data/pgsql/data/base/881093
3.1M /opt/data/pgsql/data/base/881225
2.8M /opt/data/pgsql/data/base/881604
3.3M /opt/data/pgsql/data/base/881620
31M /opt/data/pgsql/data/base/881807
31M /opt/data/pgsql/data/base/1031939
32M /opt/data/pgsql/data/base/1181250
31M /opt/data/pgsql/data/base/1332676
309M /opt/data/pgsql/data/base


#define INDEX_MAX_KEYS 32
#define FUNC_MAX_ARGS INDEX_MAX_KEYS

#define NAMEDATALEN 128


du -h --max-depth=1 /opt/data/pgsql/data/base/
4.1M /opt/data/pgsql/data/base/1
4.1M /opt/data/pgsql/data/base/16863

121M /opt/data/pgsql/data/base/16864
4.6M /opt/data/pgsql/data/base/696497
4.6M /opt/data/pgsql/data/base/696624
4.6M /opt/data/pgsql/data/base/696751
4.2M /opt/data/pgsql/data/base/696878
4.2M /opt/data/pgsql/data/base/696890
4.2M /opt/data/pgsql/data/base/696902
4.2M /opt/data/pgsql/data/base/696913
20M /opt/data/pgsql/data/base/696925
4.5M /opt/data/pgsql/data/base/878967
4.2M /opt/data/pgsql/data/base/881057
4.1M /opt/data/pgsql/data/base/881076
4.2M /opt/data/pgsql/data/base/881079
4.7M /opt/data/pgsql/data/base/881094
4.7M /opt/data/pgsql/data/base/881226
4.2M /opt/data/pgsql/data/base/881605
4.9M /opt/data/pgsql/data/base/881621
33M /opt/data/pgsql/data/base/881808
33M /opt/data/pgsql/data/base/1031940
33M /opt/data/pgsql/data/base/1181251
33M /opt/data/pgsql/data/base/1332677
343M /opt/data/pgsql/data/base

So the 119MB database only grows to 121MB. In fact, each of the > 10MB
databases seems to grow only about 2MB. Based on this, I'd go with:

#define INDEX_MAX_KEYS 32
#define FUNC_MAX_ARGS INDEX_MAX_KEYS

#define NAMEDATALEN 128

and take spec compliance.

Joe


---------------------------(end of broadcast)---------------------------

Dave Page

unread,
Aug 5, 2002, 3:21:09 AM8/5/02
to

> -----Original Message-----
> From: Tom Lane [mailto:t...@sss.pgh.pa.us]=20
> Sent: 05 August 2002 04:56
> To: Joe Conway
> Cc: Bruce Momjian; Thomas Lockhart; Neil Conway; PostgreSQL Hackers
> Subject: Re: [HACKERS] FUNC_MAX_ARGS benchmarks=20
>=20
>=20
> Joe Conway <ma...@joeconway.com> writes:

> > These are all with FUNC_MAX_ARGS =3D 16.
>=20


> > #define NAMEDATALEN 32
> > 2.7M /opt/data/pgsql/data/base/1

>=20


> > #define NAMEDATALEN 64
> > 3.0M /opt/data/pgsql/data/base/1

>=20


> > #define NAMEDATALEN 128
> > 3.8M /opt/data/pgsql/data/base/1

>=20
> Based on Joe's numbers, I'm kind of thinking that we should=20
> go for FUNC_MAX_ARGS=3D32 and NAMEDATALEN=3D64 as defaults in 7.3.
>=20
> Although NAMEDATALEN=3D128 would be needed for full SQL=20
> compliance, the space penalty seems severe. I'm thinking we=20
> should back off until someone wants to do the legwork needed=20


> to make the name type be truly variable-length.

>=20
> Comments?

In Joe's last test he had only about 2Mb growth per db (I guess this
would not be the case had he used the name type in some of his tables).
I would rather lose a measly few Mb and be standards compliant myself.

$0.02

Regards, Dave.

Tom Lane

unread,
Aug 5, 2002, 9:34:08 AM8/5/02
to
Joe Conway <ma...@joeconway.com> writes:
> Tom Lane wrote:
>> Although NAMEDATALEN=128 would be needed for full SQL compliance,

>> the space penalty seems severe.

> What will the impact be on a medium to large production database? In

> other words, is the bloat strictly to the system catalogs based on how
> extensive your database schema (bad choice of words now, but I don't
> know a better term for this) is? Or will the bloat scale with the size
> of the database including data?

The bloat would scale with the size of your schema, not with the amount
of data in your tables (unless you have "name" columns in your user
tables, which is something we've always discouraged). template1 is
clearly a worst-case scenario, percentagewise, for NAMEDATALEN.

I'm quite prepared to believe that the net cost is "a couple megs per
database" more or less independent of how much data you store. Maybe
that's negligible these days, or maybe it isn't ...

regards, tom lane

---------------------------(end of broadcast)---------------------------

Joe Conway

unread,
Aug 5, 2002, 11:19:34 AM8/5/02
to
Tom Lane wrote:
> The bloat would scale with the size of your schema, not with the amount
> of data in your tables (unless you have "name" columns in your user
> tables, which is something we've always discouraged). template1 is
> clearly a worst-case scenario, percentagewise, for NAMEDATALEN.
>
> I'm quite prepared to believe that the net cost is "a couple megs per
> database" more or less independent of how much data you store. Maybe
> that's negligible these days, or maybe it isn't ...

Seems to me it's negligible for the vast majority of applications. I
*know* it is for any appplication that I have.

We can always tell people who are doing embedded application work to
bump *down* NAMEDATALEN.

Joe

Tom Lane

unread,
Aug 5, 2002, 11:27:50 AM8/5/02
to
Joe Conway <ma...@joeconway.com> writes:
> We can always tell people who are doing embedded application work to
> bump *down* NAMEDATALEN.

Good point. Okay, I'm OK with 128 ...

regards, tom lane

---------------------------(end of broadcast)---------------------------

Bruce Momjian

unread,
Aug 5, 2002, 12:21:51 PM8/5/02
to
Tom Lane wrote:
> Joe Conway <ma...@joeconway.com> writes:
> > We can always tell people who are doing embedded application work to
> > bump *down* NAMEDATALEN.
>
> Good point. Okay, I'm OK with 128 ...

Yes, good point. I think the major issue is pushing stuff out of the
cache because we have longer names. Did we see performance hit at 128?
Seems it more that just disk space.

I don't have trouble with 128, but other than standards compliance, I
can't see many people getting >64 names.

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,
Aug 5, 2002, 12:26:26 PM8/5/02
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
> I don't have trouble with 128, but other than standards compliance, I
> can't see many people getting >64 names.

One nice thing about 128 is you can basically forget about the weird
truncation behavior on generated sequence names for serial columns
--- "tablename_colname_seq" will be correct for essentially all
practical cases. At 64 you might still need to think about it.

regards, tom lane

---------------------------(end of broadcast)---------------------------

Bruce Momjian

unread,
Aug 5, 2002, 12:54:43 PM8/5/02
to
Tom Lane wrote:
> Bruce Momjian <pg...@candle.pha.pa.us> writes:
> > I don't have trouble with 128, but other than standards compliance, I
> > can't see many people getting >64 names.
>
> One nice thing about 128 is you can basically forget about the weird
> truncation behavior on generated sequence names for serial columns
> --- "tablename_colname_seq" will be correct for essentially all
> practical cases. At 64 you might still need to think about it.

Oh, good point. Does anyone remember the performance hit for 64 vs 128
namedatalen?

--
Bruce Momjian | http://candle.pha.pa.us
pg...@candle.pha.pa.us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

---------------------------(end of broadcast)---------------------------

Peter Eisentraut

unread,
Aug 5, 2002, 3:38:54 PM8/5/02
to
Joe Conway writes:

> I'd rather be spec compliant. That way nobody has a good basis for
> complaining ;-)

How long until someone figures out that to be spec-compliant you need
NAMEDATALEN 129? ;-)

--
Peter Eisentraut pet...@gmx.net

Tom Lane

unread,
Aug 5, 2002, 5:31:53 PM8/5/02
to
Well, in fact it's not just a question of disk space.

The following numbers are stats for total elapsed time of "make
installcheck" over ten trials:

NAMEDATALEN = 32, FUNC_MAX_ARGS = 16

min | max | avg | stddev
-------+-------+--------+-------------------
25.59 | 27.61 | 26.612 | 0.637003401351409

NAMEDATALEN = 64, FUNC_MAX_ARGS = 32

min | max | avg | stddev
-------+-------+--------+-----------------
26.32 | 29.27 | 27.415 | 1.0337982824947

NAMEDATALEN = 128, FUNC_MAX_ARGS = 32

min | max | avg | stddev
-------+-------+--------+------------------
27.44 | 30.79 | 29.603 | 1.26148105195622

I'm not sure about the trend of increasing standard deviation --- that
may reflect more disk I/O being done, and perhaps more checkpoints
occurring during the test. But in any case it's clear that there's a
nontrivial runtime cost here. Does a 10% slowdown bother you?

regards, tom lane

Joe Conway

unread,
Aug 5, 2002, 7:09:28 PM8/5/02
to
Tom Lane wrote:
> Well, in fact it's not just a question of disk space.
>
> The following numbers are stats for total elapsed time of "make
> installcheck" over ten trials:
>
<snip>

> I'm not sure about the trend of increasing standard deviation --- that
> may reflect more disk I/O being done, and perhaps more checkpoints
> occurring during the test. But in any case it's clear that there's a
> nontrivial runtime cost here. Does a 10% slowdown bother you?

Hmmm -- didn't Neil do some kind of test that had different results,
i.e. not much performance difference? I wonder if the large number of
DDL commands in installcheck doesn't skew the results against longer
NAMEDATALEN compared to other benchmarks?

# pwd
/opt/src/pgsql/src/test/regress/sql
# grep -i 'CREATE\|DROP' * | wc -l
1114


Joe

Tom Lane

unread,
Aug 5, 2002, 7:46:50 PM8/5/02
to
Joe Conway <ma...@joeconway.com> writes:
>> I'm not sure about the trend of increasing standard deviation --- that
>> may reflect more disk I/O being done, and perhaps more checkpoints
>> occurring during the test. But in any case it's clear that there's a
>> nontrivial runtime cost here. Does a 10% slowdown bother you?

> Hmmm -- didn't Neil do some kind of test that had different results,
> i.e. not much performance difference?

Well, one person had reported a 10% slowdown in pgbench, but Neil saw
a 10% speedup. Given the well-known difficulty of getting any
reproducible numbers out of pgbench, I don't trust either number very
far; but unless some other folk are willing to repeat the experiment
I think we can only conclude that pgbench isn't affected much by
NAMEDATALEN.

> I wonder if the large number of
> DDL commands in installcheck doesn't skew the results against longer
> NAMEDATALEN compared to other benchmarks?

Depends on what you consider skewed, I suppose. pgbench touches only a
very small number of relations, and starts no new backends over the
length of its run, thus everything gets cached and stays cached. At
best I'd consider it an existence proof that some applications won't be
hurt.

Do you have another application you'd consider a more representative
benchmark?

Joe Conway

unread,
Aug 5, 2002, 8:46:36 PM8/5/02
to
Tom Lane wrote:
> Depends on what you consider skewed, I suppose. pgbench touches only a
> very small number of relations, and starts no new backends over the
> length of its run, thus everything gets cached and stays cached. At
> best I'd consider it an existence proof that some applications won't be
> hurt.
>
> Do you have another application you'd consider a more representative
> benchmark?

I'm not sure. Maybe OSDB? I'll see if I can get it running over the next
few days. Anyone else have other suggestions?

Joe


---------------------------(end of broadcast)---------------------------

Tatsuo Ishii

unread,
Aug 5, 2002, 10:09:00 PM8/5/02
to
> I don't have trouble with 128, but other than standards compliance, I
> can't see many people getting >64 names.

Don't forget that 128 is for *bytes*, not for characters(this is still
ture with 7.3). In CJK(Chinese, Japanese and Korean) single character
can eat up to 3 bytes if the encoding is UTF-8.
--
Tatsuo Ishii

Tom Lane

unread,
Aug 5, 2002, 10:55:48 PM8/5/02
to
Tatsuo Ishii <t-i...@sra.co.jp> writes:
>> I don't have trouble with 128, but other than standards compliance, I
>> can't see many people getting >64 names.

> Don't forget that 128 is for *bytes*, not for characters(this is still
> ture with 7.3). In CJK(Chinese, Japanese and Korean) single character
> can eat up to 3 bytes if the encoding is UTF-8.

True, but in those languages a typical name would be many fewer
characters than it is in Western alphabets, no? I'd guess (with
no evidence though) that the effect would more or less cancel out.

Bruce Momjian

unread,
Aug 10, 2002, 7:21:32 PM8/10/02
to

OK, seems we have not come to a decision yet on this.

Do we have agreement to increate FUNC_MAX_ARGS to 32?

NAMEDATALEN will be 64 or 128 in 7.3. At this point, we better decide
which one we prefer.

The conservative approach would be to go for 64 and perhaps increase it
again in 7.4 after we get feedback and real-world usage. If we go to
128, we will have trouble decreasing it if there are performance
problems.

---------------------------------------------------------------------------

Tom Lane wrote:
> Joe Conway <ma...@joeconway.com> writes:
> >> I'm not sure about the trend of increasing standard deviation --- that
> >> may reflect more disk I/O being done, and perhaps more checkpoints
> >> occurring during the test. But in any case it's clear that there's a
> >> nontrivial runtime cost here. Does a 10% slowdown bother you?
>
> > Hmmm -- didn't Neil do some kind of test that had different results,
> > i.e. not much performance difference?
>
> Well, one person had reported a 10% slowdown in pgbench, but Neil saw
> a 10% speedup. Given the well-known difficulty of getting any
> reproducible numbers out of pgbench, I don't trust either number very
> far; but unless some other folk are willing to repeat the experiment
> I think we can only conclude that pgbench isn't affected much by
> NAMEDATALEN.
>
> > I wonder if the large number of
> > DDL commands in installcheck doesn't skew the results against longer
> > NAMEDATALEN compared to other benchmarks?
>

> Depends on what you consider skewed, I suppose. pgbench touches only a
> very small number of relations, and starts no new backends over the
> length of its run, thus everything gets cached and stays cached. At
> best I'd consider it an existence proof that some applications won't be
> hurt.
>
> Do you have another application you'd consider a more representative
> benchmark?
>

> regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us

pg...@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Tom Lane

unread,
Aug 10, 2002, 8:22:56 PM8/10/02
to
Bruce Momjian <pg...@candle.pha.pa.us> writes:
> Do we have agreement to increate FUNC_MAX_ARGS to 32?

I believe so.

> NAMEDATALEN will be 64 or 128 in 7.3. At this point, we better decide
> which one we prefer.
> The conservative approach would be to go for 64 and perhaps increase it
> again in 7.4 after we get feedback and real-world usage. If we go to
> 128, we will have trouble decreasing it if there are performance
> problems.

It seems fairly clear to me that there *are* performance problems,
at least in some scenarios. I think we should go to 64. There doesn't
seem to be a lot of real-world demand for more than that, despite what
the spec says ...

regards, tom lane

Joe Conway

unread,
Aug 10, 2002, 9:21:24 PM8/10/02
to
Bruce Momjian wrote:
> OK, seems we have not come to a decision yet on this.
>
> Do we have agreement to increate FUNC_MAX_ARGS to 32?
>
> NAMEDATALEN will be 64 or 128 in 7.3. At this point, we better decide
> which one we prefer.
>
> The conservative approach would be to go for 64 and perhaps increase it
> again in 7.4 after we get feedback and real-world usage. If we go to
> 128, we will have trouble decreasing it if there are performance
> problems.

I guess I'd also agree with:
FUNC_MAX_ARGS 32
NAMEDATALEN 64
and work on the performance issues for 7.4.

Joe

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Christopher Kings-Lynne

unread,
Aug 11, 2002, 5:40:06 AM8/11/02
to
> > NAMEDATALEN will be 64 or 128 in 7.3. At this point, we better decide
> > which one we prefer.
> >
> > The conservative approach would be to go for 64 and perhaps increase it
> > again in 7.4 after we get feedback and real-world usage. If we go to
> > 128, we will have trouble decreasing it if there are performance
> > problems.
>
> I guess I'd also agree with:
> FUNC_MAX_ARGS 32
> NAMEDATALEN 64
> and work on the performance issues for 7.4.

I agree too.

Chris

0 new messages