On Tue, 18 Aug 2015 15:46:01 -0700 (PDT), David Anderson wrote:
> I am using gunion to build some GIS data from modeling results.
> Data set 1 is a table with a process for each polygon for each time
> period.
> Data set 2 is the table with the geometry data.
> Here is the query
> Process is a lookup table.
>
> insert into table3 timestep,process,shape)
> select a.timestep,p.process,gunion(g.shape)
> from data_set2 a, data_set2 g, simpplle.process p
> where cast(a.slink as integer)=g.simple_id and p.PROCESS like
> '%FIRE%' and p.ID=a.PROCESS_ID
> group by a.timestep,p.process
>
> The query works just fine. Data is generated like I expect. The
> issue is that for a data set of a hundred thousand polygons, 7
> processes and 50 timestepd, it takes a very long time to generate the
> data set.
>
Hi David,
your query doesn't seems to contain any obvious pitfall and leaves
practically no margin for further optimization.
ST_Union intrinsically is a very costly (and slow) operation, and
"a hundred thousand polygons" is a rather demanding dataset: all
this considered a long execution time can be easily forecasted.
> I should mention that I am using the geopackage geometry format.
> Is there a way to speed things up? Can the spatial index be used?
> Should the data be put in to native Spatialite format for the gunion
> operation?
>
a. the Spatial Index can't help at all in a problem like this; there
is no spatial selection to be performed, and the Spatial Index
simply is a spatial filter.
b. using the native binary format will possibly help, but only in
a very marginal way.
you could anyway attempt to deploy an alternative approach by
splitting your problem in two distinct steps: this will surely
help to understand better what really happens behind the scenes,
and will possibly introduce some performance benefit
(the SQLite's own query optimizer sometimes does crazy choices)
step #1
--------------
CREATE TEMP TABLE aux_table AS
SELECT a.timestep AS timestep, p.process AS process,
ST_Collect(g.shape) AS shape
FROM data_set2 AS a, data_set2 AS g, simpplle.process AS p
WHERE CastToInteger(a.slink) = g.simple_id
AND p.process LIKE '%FIRE%' AND
p.id = a.process_id
GROUP BY a.timestp, p.process;
this first step will simply create a temporary (intermediate)
table; ST_Collect() is a reasonably fast operation (polygons
will not really be dissolved: just a MultiPolygon collection
will be aggregated), so this first step will be expected to
complete in a reasonably short time.
step #2
-------------
INSERT INTO table3 (timestep, process, shape)
SELECT timestep, process, ST_UnaryUnion(shape)
FROM aux_table;
this final step will actually perform the hard work by
dissolving all polygons.
the time required by this second query will be obviously
"non-compressible", and will give you a direct measure of
the intrinsic computational weight of your problem.
bye Sandro