postmaster segfaults with HUGE table

Joachim Wieland

unread,

Nov 14, 2004, 4:33:01 AM11/14/04

to

Hi,

this query makes postmaster (beta4) die with signal 11:

(echo "CREATE TABLE footest(";
for i in `seq 0 66000`; do
echo "col$i int NOT NULL,";
done;
echo "PRIMARY KEY(col0));") | psql test

ERROR: tables can have at most 1600 columns
LOG: server process (PID 2140) was terminated by signal 11
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing

Program received signal SIGSEGV, Segmentation fault.
0x4015d43c in mallopt () from /lib/tls/libc.so.6
(gdb) bt
#0 0x4015d43c in mallopt () from /lib/tls/libc.so.6
#1 0x00021680 in ?? ()
[...]
#16 0x00000001 in ?? ()
#17 0x0821bc9b in AllocSetDelete ()
Previous frame inner to this frame (corrupt stack?)

Furthermore the backend doesn't react on query cancel requests from psql
during execution of the query.

Joachim

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
joining column's datatypes do not match

Neil Conway

unread,

Nov 14, 2004, 5:05:02 AM11/14/04

to

Joachim Wieland wrote:
> this query makes postmaster (beta4) die with signal 11:
>
> (echo "CREATE TABLE footest(";
> for i in `seq 0 66000`; do
> echo "col$i int NOT NULL,";
> done;
> echo "PRIMARY KEY(col0));") | psql test
>
>
> ERROR: tables can have at most 1600 columns
> LOG: server process (PID 2140) was terminated by signal 11
> LOG: terminating any other active server processes
> LOG: all server processes terminated; reinitializing

At best you're going to get the error message above: "tables can have at
most 1600 columns". But this is definitely a bug: we end up triggering
this assertion:

TRAP: BadArgument("!(attributeNumber >= 1)", File: "tupdesc.c", Line: 405)

This specific assertion is triggered because we represent attribute
numbers throughout the code base as a (signed) int16 -- the assertion
failure has occurred because an int16 has wrapped around due to
overflow. A fix would be to add a check to DefineRelation() (or even
earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber"
columns. We eventually do perform this check in
heap_create_with_catalog(), but by that point it is too late: various
functions have been invoked that assume we're dealing with a sane number
of columns.

BTW, I noticed that there's an O(n^2) algorithm to check for duplicate
column names in DefineRelation() -- with 60,000 column names that takes
minutes to execute, but an inconsequential amount of time with 1500
column names. So I don't think there's any point rewriting the algorithm
to be faster -- provided we move the check for MaxHeapAttributeNumber
previous to this loop, we should be fine.

Thanks for the report.

-Neil

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Simon Riggs

unread,

Nov 14, 2004, 6:24:34 AM11/14/04

to

This seems too obvious a problem to have caused a bug...presumably this
has been there for a while? Does this mean that we do not have
regression tests for each maximum setting ... i.e. are we missing a
whole class of tests in the regression tests?

--
Best Regards, Simon Riggs

Tom Lane

unread,

Nov 14, 2004, 6:29:43 PM11/14/04

to

Neil Conway <ne...@samurai.com> writes:
> This specific assertion is triggered because we represent attribute
> numbers throughout the code base as a (signed) int16 -- the assertion
> failure has occurred because an int16 has wrapped around due to
> overflow. A fix would be to add a check to DefineRelation() (or even
> earlier) to reject CREATE TABLEs with more than "MaxHeapAttributeNumber"
> columns.

Good analysis. We can't check earlier than DefineRelation AFAICS,
because earlier stages don't know about inherited columns.

On reflection I suspect there are similar issues with SELECTs that have
more than 64K output columns. This probably has to be guarded against
in parser/analyze.c.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Neil Conway

unread,

Nov 15, 2004, 7:54:53 PM11/15/04

to

On Sun, 2004-11-14 at 18:29 -0500, Tom Lane wrote:
> Good analysis. We can't check earlier than DefineRelation AFAICS,
> because earlier stages don't know about inherited columns.
>
> On reflection I suspect there are similar issues with SELECTs that have
> more than 64K output columns. This probably has to be guarded against
> in parser/analyze.c.

You're correct -- we also crash on extremely long SELECT statements.
Another variant of the problem would be a CREATE TABLE that inherits
from, say, 70 relations, each of which has 1,000 columns.

Attached is a patch. Not entirely sure that the checks I added are in
the right places, but at any rate this fixes the three identified
problems for me.

-Neil

too-many-columns-1.patch

Neil Conway

unread,

Nov 15, 2004, 7:58:07 PM11/15/04

to

On Sun, 2004-11-14 at 11:24 +0000, Simon Riggs wrote:
> This seems too obvious a problem to have caused a bug

Well, I'd imagine that we've checked CREATE TABLE et al. with
somewhat-too-large values (like 2000 columns), which wouldn't be
sufficiently large to trigger the problem.

> presumably this has been there for a while?

Not sure.

> Does this mean that we do not have
> regression tests for each maximum setting ... i.e. are we missing a
> whole class of tests in the regression tests?

I'm always in favour of more regression tests -- patches are welcome :)

That said, there are some minor logistical problems with testing that a
70,000 column CREATE TABLE doesn't fail (it would be nice not to have to
include all that text in the regression tests themselves).

-Neil

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Tom Lane

unread,

Nov 15, 2004, 8:53:53 PM11/15/04

to

Neil Conway <ne...@samurai.com> writes:
> Attached is a patch. Not entirely sure that the checks I added are in
> the right places, but at any rate this fixes the three identified
> problems for me.

I think the SELECT limit should be MaxTupleAttributeNumber not
MaxHeapAttributeNumber. The point of the differential is to allow
you a bit of slop to do extra stuff (like sorting) when selecting
from a max-width table, but the proposed patch takes that away.

As for the placement issue, I'm OK with what you did in tablecmds.c,
but for the SELECT case I think that testing in transformTargetList
is probably not good. It definitely doesn't do to test at the top
of the routine, because we haven't expanded '*' yet. But testing at
the bottom probably won't do either since it doesn't cover later
addition of junk attributes --- in other words you could probably still
crash it by specifying >64K distinct ORDER BY values.

What I think needs to happen is to check p_next_resno at some point
after the complete tlist has been built. Since that's an int, we
don't need to worry too much about it overflowing, so one test at the
end should do (though I suppose if you're really paranoid you could
instead add a test everywhere it's used/incremented).

regards, tom lane

---------------------------(end of broadcast)---------------------------

Tom Lane

unread,

Nov 15, 2004, 9:08:42 PM11/15/04

to

Neil Conway <ne...@samurai.com> writes:
> On Sun, 2004-11-14 at 11:24 +0000, Simon Riggs wrote:
>> Does this mean that we do not have
>> regression tests for each maximum setting ... i.e. are we missing a
>> whole class of tests in the regression tests?

> That said, there are some minor logistical problems with testing that a

> 70,000 column CREATE TABLE doesn't fail (it would be nice not to have to
> include all that text in the regression tests themselves).

There are always limits; this just catches the next level up. Are we
going to try to test whether the behavior is appropriate when running
out of memory to store the tlist? (I think it's OK, but there's surely
no regression test for it.) Are we going to test whether the behavior
is appropriate when the list length overflows an int32? (I can pretty
much guarantee it isn't, but you'll need a 64-bit machine to reach this
level before you hit out-of-memory, and you'll need very large
quantities of patience as well as RAM.) A regression test for the
latter case is silly on its face, though if Moore's law can hold up for
another couple decades it might someday not be.

regards, tom lane

Neil Conway

unread,

Nov 15, 2004, 11:00:59 PM11/15/04

to

On Mon, 2004-11-15 at 21:08 -0500, Tom Lane wrote:
> Are we going to try to test whether the behavior is appropriate when
> running out of memory to store the tlist?

We absolutely should: segfaulting on OOM is not acceptable behavior.
Testing that we recover safely when palloc() elogs (or _any_ routine
elogs) would be a good idea. I'd guess model checking would help here.

-Neil

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majo...@postgresql.org so that your
message can get through to the mailing list cleanly

Neil Conway

unread,

Nov 16, 2004, 12:25:37 AM11/16/04

to

On Mon, 2004-11-15 at 20:53 -0500, Tom Lane wrote:
> I think the SELECT limit should be MaxTupleAttributeNumber not
> MaxHeapAttributeNumber.

Ah, true -- I forgot about the distinction...

> What I think needs to happen is to check p_next_resno at some point
> after the complete tlist has been built.

Attached is a revised patch -- I just did the check at the end of
transformStmt(), since otherwise we'll need to duplicate code in the
various places that resnos are used/incremented (set operation
statements, normal selects, updates, and so on). This is somewhat
fragile in that we usually assign p_next_resno to an AttrNumber and only
check for overflow at the end of the analysis phase, but it seems safe
for the moment...

BTW I figure this should be backpatched to REL7_4_STABLE. Barring any
objections I will do that (and apply to HEAD) this evening.

-Neil

too-many-columns-2.patch

Tom Lane

unread,

Nov 16, 2004, 3:07:43 PM11/16/04

to

Neil Conway <ne...@samurai.com> writes:
> Attached is a revised patch -- I just did the check at the end of
> transformStmt(),

Looks OK offhand.

> BTW I figure this should be backpatched to REL7_4_STABLE. Barring any
> objections I will do that (and apply to HEAD) this evening.

No objection here.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html