I suppose you are not using the R*Tree Spatial Index
in your query ;-)
Please, read the documentation:
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=misc-docs
you'll surely find many examples about this topic
bye Sandro
INSERT INTO inter( id, cover, geometry )
SELECT
grid.ID as id,
cover_utm.COV as cover,
ST_Multi( ST_Intersection(grid.geometry,cover_utm.geometry) )
FROM grid, cover_utm
WHERE st_intersects(grid.geometry,cover_utm.geometry)
this is an absolute RED ALARM; a well designed Spatial SQL
query very difficultly requires more than few minutes
to complete.
an execution time of *TWO* days surely is a strong signal
of some very badly written and poorly optimized SQL query.
and the reason for this is absolutely clear: you don't
use at all any Spatial Index.
> I read in the cookbook that this is a useful tool if I want
> to look into precise parts of my db
>
and this one exactly is your goal: you obviously don't
want to actually evaluate ST_Intersects for every possible
candidate pair, because this will be intolerably slow.
so you have to intelligently use the Spatial Index in
order to actually evaluate ST_Intersects only for the
(few) pairs whose MBRs actually overlaps.
and you can do such a thing inserting an appropriate
sub-query based on the Spatial Index.
more or less, something like the following SQL sample:
SELECT ln.line_id AS line_id, pg.polyg_id AS polyg_id
FROM lines AS ln, polygons AS pg
WHERE ST_Intersects(ln.geometry, pg.geometry)
AND ln.ROWID IN
(
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'lines'
AND search_frame = pg.geometry
)
in my own test DB:
- "lines" contains about one hundredth thousand features
- "polygons" contains about ten thousand features
- the above query completed in about 2 minutes
- I've then computed a raw estimation; the same query
(omitting to define the Spatial Index subquery) will
probably take about 40 hours to complete
PLEASE, CAREFULLY READ THE DOCUMENTATION ABOUT SPATIAL
INDEX USAGE IN SPATIALITE
it's absolutely a different thing than in PostGis ;-)
bye Sandro
INSERT INTO inter_iran
(
inter_id,
grid_id,
celletype,
cover_type,
geometry
)
SELECT
NULL,
grid_iran.grid_id,
grid_iran.grid_type,
cover_iran.cover_type,
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran
WHERE
st_intersects(grid_iran.geometry,cover_iran.geometry)
AND
GeometryType
(
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
) = "MULTIPOLYGON"
AND
grid_iran.rowid IN
(
SELECT pkid
FROM idx_grid_iran_geometry
WHERE
pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
)
)
;
> it seems that my R*tree spatial index has not been set correctly
>
AFAIK, a severe bug related with crazy R*Trees was signaled on
the latest version of Mac Os X 64-bit if sqlite and/or spatialite
are built using a too much advanced code optimization.
This seems to be related to specifc issues of the Mac own
C/C++ compiler.
I'm not aware of any other possible issue affecting R*Trees,
except the one affecting tables without a Primary Key after
a VACUUM operation: but this point is fully covered on the
documentation.
> 1) Do I have to run the query << SELECT CreateSpatialIndex('grid_iran',
> 'geometry'); >> before I populate my table, after, or it has no importance?
>
it's absolutely not relevant: before or after is absolutely the same
> 2) How do I know the index has been created correctly?
>
e.g. using CheckSpatialIndex()
> 3) If I have a table that already contains data, is it possible to rebuilt
> a broken spatial index or do I have to create a new table?
>
yes, using RecoverSpatialIndex()
Please, carefully read this document:
http://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf
bye Sandro
your query:
==========
INSERT INTO inter_iran (inter_id, grid_id, celletype, cover_type,
geometry)
SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry)
AND GeometryType (
st_multi( st_intersection(grid_iran.geometry,cover_iran.geometry))
) = "MULTIPOLYGON"
AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));
the art of debugging:
=================
a) never attempt to test a complex problem
b) break the original problem in many simpler steps
step #1 (the core select):
-----------------------------------
SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,
grid_iran.geometry,cover_iran.geometry
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1
AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));
q1-1: does it runs correctly ?
q1-2: does it runs fastly enough ?
- if not, identify the cause and then apply some correction
(retest again: goto step #1)
- if yes, goto step #2
step #2 (the full select):
-----------------------------------
SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1
AND GeometryType (
st_multi( st_intersection(grid_iran.geometry,cover_iran.geometry))
) = "MULTIPOLYGON"
AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));
q2-1: does it runs correctly ?
q2-2: does it runs fastly enough ?
- if not, identify the cause and then apply some correction
(retest again: goto step #2)
- if yes, goto step #3
PLEASE NOTE: any issue on step#1 and/or step#2
is probably related to malformed SQL, bad optimization,
wrong usage of R*Trees ... and so on.
testing by discrete steps (from simplest to much complex)
will help you a lot in identifying where the problem(s) is(are).
step #3 (the full complex query):
----------------------------------------------
INSERT INTO inter_iran (inter_id, grid_id, celletype, cover_type,
geometry)
SELECT ....
q3: do you notice some unexpected slowness ?
well, now you can surely exclude that this is caused by
any flaw into the underlying SELECT
you can now focus your attention on optimizing the
output performance of SQLite setting the corresponding
PRAGMAs as appropriate.
bye Sandro
SELECT grid_iran.grid_id, cover_iran.cover_id
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1
AND cover_iran.rowid IN
(
SELECT pkid FROM idx_cover_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(grid_iran.geometry),
MBRminY(grid_iran.geometry),
MBRmaxX(grid_iran.geometry),
MBRmaxY(grid_iran.geometry)
))
SELECT grid_iran.grid_id, cover_iran.cover_id
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1
AND cover_iran.rowid IN
(
SELECT pkid FROM idx_cover_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(grid_iran.geometry),
MBRminY(grid_iran.geometry),
MBRmaxX(grid_iran.geometry),
MBRmaxY(grid_iran.geometry)
))
no, here you can get a most recent version:
http://www.gaia-gis.it/gaia-sins/
> If not, can it pose a problem using R*tree?
>
I strongly doubt; the R*Tree implementation is
basically always the same
> I measured the time needed to fetch a certain number of rows using
> the "LIMIT" condition.
> I noticed that the delay varies when I switch the table in which
> R*tree is used:
>
when you "spatially join" two different tables using the one
or the other R*Tree isn't at all the same, and subtantial
different timings may derive depending on the one actually used.
always remember: the R*Tree is a "spatial filter" used to
reduce complexity (i.e. allowing to perform as few comparisons
as strictly required).
> As you can see, query 2 is slower than the first one, but query 1,
> after a quite linear progression, suddenly rises. I identified the
> limit being between the 47,661st and 47,662nd rows.
>
I strong suspect that at this point some "exceptional"
geometry is encounterd: may be, a really huge one having
much more points than any other one. or one presenting
a very big spatial extension; or a malformed one.
but you can easily check this on your DB using the
BlobExplore tool [geometry explore].
> I have imagined that the rupture causing trouble in query1 could
> be caused by the fact that when we arrive at the end of a row in
> the grid and we need to come back to the extreme left of the
> "shapefile", it implies that the R*tree has to move to another
> sector, and this could cost time.
>
yes, for sure it cost some time: may be few microseconds ? :-)
seriously speacking: how big is your DB-file ?
if its size widely exceeds the available RAM size, you can
easily experience poor performance, due to I/O stalls.
another important point: are you really sure that all your
geometries are full valid ones ?
you can easily check this using ST_IsValid(); sometimes
few broken or malformed "crazy" polygons may cause very
serious problems.
bye Sandro
Thanks again for your very useful advices.
I checked the geometry of my "cover_iran" layer, and this is the one
which causes trouble.
I ran my query ignoring the malformed polygons and seemed to do the
job correctly. I therefore assume that this is the main source of my
problems. It seems there are around 12.000 polygons with geometry
problems.
I extracted one of them to have a better idea of what's the matter,
and this is what I discovered:
http://s3.noelshack.com/uploads/images/18400038942087_spatialite_guierror.jpg
http://s3.noelshack.com/uploads/images/4991973722703_deformed_polygon.jpg
The sanitizegeometry() function appears to be of little help, I didn't
managed to solve the problem this way.
I read on the internet that it is possible to repair a broken geometry
by exporting the table as a shapefile, loading it into QGIS then
GRASS, rebuilding the topology, then re-load the resulting shapefile
in spatialite-gui. It carried out this manipulation, without much
result, my problem still being there.
In another thread you solved a problem by cutting the countries into
smaller pieces in order to simplify their geometry. According to the
QGIS view, I suspect it would be one way of solving my problem, at
least for these 12.000 polygons. The problem is I don't know exactly
how I could do it. Do you think I would be wise to cut these malformed
polygons into a whole lot of smaller ones? How could I do it without
having to cope with too many too small polygons?
Thanks again for your precious help
Damien