I have an interesting error that's coming up on an Ubuntu 64-bit system,
running PostgreSQL 8.4.2, and PostGIS 1.5.1 (packaged by me) as well as 1.4.0
(from Ubuntu repos). The query itself is somewhat complex (I'm sure it could
be optimized better - but it's been working well for quite some time). I've
attached two files - one is a dummed-down version of the query I am running
(crashing.sql). After discussing this with some folks in the postgresql IRC
channel, it seems that the st_geom(), or something PostGIS-related is causing
the error.
Running this query on various data will produce one of two results. One is
the error mentioned in the subject (ERROR: array size exceeds the maximum
allowed (134217727)). I can find very little information on this error. The
other outcome is that it often causes the PostgreSQL backend to segfault (see
gdboutput.txt).
I can attest that this query works fine on many installations of PostgreSQL
8.2+, and PostGIS 1.3+. The only main difference I can see in this case where
I'm getting errors and segfaults is that I'm running this on Ubuntu, whereas
all of my successful cases are on Fedora (7 through 12).
If I replace the st_union() function in the query with accum(), it still
produces the error/segfaults.
Also, strangely, if I remove one of the where conditions near the end of the
query (e.g., " AND (cl.depth = 0)"), it suddenly works fine without any issues
(except that I actually need that condition).
Does this sort of problem look familiar to anyone? I can try to put together
an example if that would be of any use, but since the same data/query works on
other Fedora systems that I am running, I'm not sure how easy it is to
replicate.
Regards,
Mike
It might be of interest to point out that substituting st_union() with
st_memunion() seems to have worked around this. I'm curious though, because
there is not a great deal of data being processed, and I am running this on a
fairly sturdy system that that has more capacity than some of the Fedora
systems I'm running.
Mike
_______________________________________________
postgis-users mailing list
postgi...@postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users
st_memunion builds a big array with all geometries in it..
you were hitting a limit of the array type.
st_union should behave better.
--strk;
() Free GIS & Flash consultant/developer
/\ http://strk.keybit.net/services.html
P.
Mike,
The issue from before was that array aggregate functions did not handle
NULLs correctly. 64-bit systems were more likely to segfault or give
strange
Errros in this case.
To rule out that we still have some of these issues in the code base, can
you add a
geom IS NOT NULL
Condition to your WHERE filter. If that works, then the NULL issue is
probably still lurking somewhere.
Leo and Regina
http://www.postgis.us
-----Original Message-----
From: postgis-us...@postgis.refractions.net
[mailto:postgis-us...@postgis.refractions.net] On Behalf Of Paul
Ramsey
Sent: Saturday, March 20, 2010 10:56 AM
To: PostGIS Users Discussion; mgl...@alumni.uwaterloo.ca
Subject: Re: [postgis-users] ERROR: array size exceeds the maximum
allowed(134217727)
Oops, sorry for the confusion. Not-so-intuitive name :P
--
Regards,
Chris Hermansen · mailto:chris.h...@timberline.ca
tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
Timberline Natural Resource Group · http://www.timberline.ca
401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5
Thanks for the suggestion. I tried adding the clause 'not geom is null' to
the where statement in each of the two sub-queries that have the
st_union(geom) functions are used, but it still segfaults.
I also tried this on a fresh database with very little data, and it doesn't
seem to cause problems. But I have two databases with live data where I can
cause this. I have been able to pare one of these down to remove personal
information and reduce unnecessary data, while still generating the crash with
that query. Would someone be interested in a dump of this db? Of course,
that someone would ideally be able to test this on a 64-bit (K)ubutnu system,
in the hopes that the problem can be replicated.
Mike
Sorry for not catching this one sooner (I'm subscribed to the digest)...but
thanks for pointing this out. The system is completely up to date. I could
reboot into an older kernel to see if that solve the problem. I'll report
results a bit later today.
Mike
------------------------------
Message: 7
Date: Sat, 20 Mar 2010 10:03:13 -0700
From: Chris Hermansen <chris.h...@timberline.ca>
Subject: Re: [postgis-users] ERROR: array size exceeds the maximum
allowed (134217727)
To: PostGIS Users Discussion <postgi...@postgis.refractions.net>
Message-ID: <4BA4FFD1...@timberline.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Very recently there was a kernel patch in Ubuntu regarding - if memory
serves - memory copying. How up to date is the system in question?
I have tried booting into the following kernels, all exhibiting the same
issue:
2.6.31-20-generic
2.6.31-19-generic
2.6.31-14-generic
I believe 2.6.31-14-generic is the original kernel installed with this version
of Ubuntu, so unless the change to the kernel you're thinking occurred
previous to the 9.10 Karmic release, it's not the culprit here.
Mike
Too bad, it would have been an easy solution!
--
Regards,
Chris Hermansen · mailto:chris.h...@timberline.ca
tel+1.604.714.2878 · fax+1.604.733.0631 · mob+1.778.840.4625
Timberline Natural Resource Group · http://www.timberline.ca
401 · 958 West 8th Avenue · Vancouver BC · Canada · V5Z 1E5
_______________________________________________
We apologize, didn't notice this was in a subquery and that you have a limit
statement in your query. So we presume regardless of your WHERE only 26
records are being selected.
So Paul could be right that you do have data that is hitting some compiled
or variable limit.
Does running the subquery alone work or you didn't try because it takes a
long time?
Thanks,
The query (when it works) is virtually instantaneous when I use st_memunion()
(or when I took out part of the where clause - although now I can't even get
that to work, so maybe it was just lucky). Each of the subqueries also work
fine without any obvious problem.
To me, it seems to be a combination of things that somehow add up to this
limitation...I don't see what settings could affect the array size limit (in
postgresql.conf). Can anyone point to a config option that might make a
difference, or maybe point to another place with settings I can tinker with?
Mike
Well, if you can reduce it to a small db dump and query that exercises
it, I can install a 64bit ubuntu VM and see if it happens here too.
P
This could be very well a bug in PostgreSQL 8.4.
Have you tried this on other PostgreSQL 8.4 installs on other your Fedoras -
you mentioned trying on older installs?
Your problem sounds exactly like this guy's and he wasn't using PostGIS at
all but array_agg (which is logic that PostGIS borrowed from to implement
the faster collecting ST_Union behavior)i
http://archives.postgresql.org/pgsql-hackers/2009-06/msg01171.php
If you haven't already probably good to check the plan difference between
the two queries (one with addtional WHERE and one without). It could be
doing something nuts like running the sub select for each record you have in
the outer only when your extra WHERE condition is added and doing more or
less the sane thing when you take it out.
We've had some odd behavior in 8.4 with the planner doing strange things
that we haven't been able to pin down and had to implement workaround for.
As we described in this article.. Its probably unrelated though but just a
thought to throw out at the wind.
http://www.postgresonline.com/journal/index.php?/archives/149-Forcing-the-pl
anners-hand-with-set-enable_seqscan-off-WTF.html
Leo and Regina,
SELECT csls.chat_id, csls.ugeom
FROM (
SELECT '[stuff]' AS selection, st_union(geom) AS ugeom, csl1.chat_id
FROM testdb.user_selections AS us1
INNER JOIN testdb.chat_selection_links AS csl1 ON us1.id =
csl1.selection_id
WHERE (not us1.user_drawing and not csl1.deleted)
GROUP BY chat_id
) AS csls
LEFT JOIN (
SELECT '[stuff]' AS drawing, st_union(geom) AS ugeom, csl2.chat_id
FROM testdb.user_selections AS us2
INNER JOIN testdb.chat_selection_links AS csl2 ON us2.id =
csl2.selection_id
WHERE (us2.user_drawing and us2.deleted and not csl2.deleted)
GROUP BY chat_id
) AS csld ON csls.chat_id = csld.chat_id
That query there does not crash. I'll try to follow up on your suggestions in
the prior email.
Mike
> l anners-hand-with-set-enable_seqscan-off-WTF.html
In response to your earlier message, I did find that report of the identical
error message. However, as I'm sure you noticed, there is no apparent
investigation that followed after it. That was why I went to the PostgreSQL
IRC channel first...they walked me through generating a core dump, which is
what then pointed to it being a problem related to PostGIS. So while it's
certainly the same error being reported, the source of the problem in this
case appears PostGIS-specific - likely due to the logic that you note was
borrowed from the array_agg that caused the problem in the previous report.
As for looking at the different query plans for working/non-working versions of
the statement with different where conditions, I've attached several results in
text files. Two examples that worked, one that segfaults, and one that
produces the 'array size exceeds...' error message. All of these are the same
query, just with variations in the where condition. The query plan is a but
much for me to interpret though - what does this look like to the experts?
Mike
> l anners-hand-with-set-enable_seqscan-off-WTF.html
crashdb=# \i ./crashing.sql
psql:./crashing.sql:36: ERROR: array size exceeds the maximum allowed
(134217727)
crashdb=#
P.
Thanks for looking into it - at least I know it's not just me (for the array
size limit at least).
What does a cleanly trapped error like this suggest to you? If you try
different variations on the where condition at the end of the statement, are
you able to get a segfault?
Mike
> >>e-p l anners-hand-with-set-enable_seqscan-off-WTF.html
On Sun, Mar 21, 2010 at 12:58 PM, Mike Leahy
I've attached the smallest version of this query I can make before the error
goes away. It seems to be very particular, but at least I have been able to
take out some of the subqueries. Does this still produce the error in your
environments as well?
Mike
> >> >>-th e-p l anners-hand-with-set-enable_seqscan-off-WTF.html
Anyrate -- one striking thing it seems between the ones that don't work and
the ones that work are
The ones that work never employ a nested loop scan
Both the segfault one and the exceed max allowed employ nested loops. So
maybe its the interaction between nested loops and hash agg causing the
problem.
Could you two try disabling nested loop scan and run the crashing ones. Also
verify that the plan has indeed changed to not use nested loops.
set enable_nestloop = off;
set enable_hashjoin = on;
run query
--- If you can't disable nested loop, try disabling the hash_join to see if
you can make the working queries crash
set enable_hashjoin = off;
set enable_nestloop = on;
run query
Ok,
Mike
> >> >>cing -th e-p l anners-hand-with-set-enable_seqscan-off-WTF.html
Yes, I can confirm that the non-working queries work when enable_nestloop is
off, and enable_hashjoin is on.
The opposite is not the case - setting enable_nestloop on and enable_hashjoin
off does not make the 'working' variations of my query crash.
Nice catch - is there anything further I can do from my end to troubleshoot
this?
Mike
P.
On Sat, Mar 20, 2010 at 9:17 AM, Mike Leahy <mgl...@alumni.uwaterloo.ca> wrote:
> Running this query on various data will produce one of two results. One is
> the error mentioned in the subject (ERROR: array size exceeds the maximum
> allowed (134217727)). I can find very little information on this error. The
> other outcome is that it often causes the PostgreSQL backend to segfault (see
> gdboutput.txt).
gdboutput.txt:
Program terminated with signal 11, Segmentation fault.
#0 0x00007fa4be23615b in pfree () from
/usr/lib/postgresql/8.4/bin/postgres
#1 0x00007fa4be18091b in makeMdArrayResult () from
/usr/lib/postgresql/8.4/bin/postgres
#2 0x00007fa4b7f038bc in pgis_accum_finalfn () from
/usr/lib/postgresql/8.4/lib/postgis-1.5.so
#3 0x00007fa4b7f039ee in pgis_geometry_union_finalfn () from
/usr/lib/postgresql/8.4/lib/postgis-1.5.so
Any chance you can generate one of these seg faults from a build with
symbols and with assertions enabled? It might catch the problem
earlier and provide more info.
IIRC there were some memory management changes which required changes
to array_agg() and which had some risk of causing problems for other
sites with similar coding. Is it possible your'e missing some of these
changes? I'm having trouble tracking them all down but at least
there's these:
commit 3d332de2eab8a01c0ef3f58ea69de2010fe8a1e1
Author: Tom Lane <t...@sss.pgh.pa.us>
Date: Thu Jul 23 20:45:27 2009 +0000
In a non-hashed Agg node, reset the "aggcontext" at group
boundaries, instead
of individually pfree'ing pass-by-reference transition values. This should
be at least as fast as the prior coding, and it has the major advantage of
clearing out any working data an aggregate function may have stored in or
underneath the aggcontext. This avoids memory leakage when an aggregate
such as array_agg() is used in GROUP BY mode. Per report from Chris Spotts.
Back-patch to 8.4. In principle the problem could arise in prior versions,
but since they didn't have array_agg the issue seems not critical.
commit 7f83b61cc26eeac0c5a09add49f6cf899f87fc0b
Author: Tom Lane <t...@sss.pgh.pa.us>
Date: Sat Jun 20 18:45:28 2009 +0000
Fix things so that array_agg_finalfn does not modify or free its input
ArrayBuildState, per trouble report from Merlin Moncure. By adopting
this fix, we are essentially deciding that aggregate final-functions
should not modify their inputs ever. Adjust documentation and comments
to match that conclusion.
--
greg