Intersecting two spatial layers

680 views
Skip to first unread message

Damien

unread,
Dec 7, 2011, 1:19:47 PM12/7/11
to spatiali...@googlegroups.com
Hello everybody,

I am currently trying to intersect two layers of polygons, in order to cut the former with the polygons of the latter.
The fact is that my layers are quite big: "grid" has 66 333 rows, and the second, "cover_utm" has 616 059 rows.

I created a spatialite database containing these two layers and created a table "inter" that should receive the result of the intersection.
The fact is that I launched the process more than 9 hours earlier, and it's not finished yet.

As I use Spatialite-gui to do this, I can see the value of the "FullscanStep" indicator that has climbed up to 1 billion. How far does it have to grow? Have I missed as step?

Thanks for your help, I very new to spatialite and it's not so easy to begin :-s

a.fu...@lqt.it

unread,
Dec 7, 2011, 1:30:27 PM12/7/11
to spatiali...@googlegroups.com
Hi Damien,

I suppose you are not using the R*Tree Spatial Index
in your query ;-)

Please, read the documentation:
https://www.gaia-gis.it/fossil/libspatialite/wiki?name=misc-docs

you'll surely find many examples about this topic

bye Sandro

Damien

unread,
Dec 9, 2011, 11:33:41 AM12/9/11
to spatiali...@googlegroups.com
Thanks for your answer,

I have looked into the spatialite cookbook, but I didn't find a step-by-step explanation on what happens when I intersect two layers.

About the R*Tree spatial index, I read in the cookbook that this is a useful tool if I want to look into precise parts of my my db, but if I have to to things to the entire database, it is said that this index will increase the necessary time.
Is it true?

To help you, here is the query I launched two days ago:

INSERT INTO inter( id, cover, geometry )
        SELECT
           grid.ID as id,
           cover_utm.COV as cover,
           ST_Multi( ST_Intersection(grid.geometry,cover_utm.geometry) )
        FROM grid, cover_utm
        WHERE st_intersects(grid.geometry,cover_utm.geometry)

 

a.fu...@lqt.it

unread,
Dec 9, 2011, 3:17:36 PM12/9/11
to spatiali...@googlegroups.com
> ... the query I launched two days ago ...
>

this is an absolute RED ALARM; a well designed Spatial SQL
query very difficultly requires more than few minutes
to complete.
an execution time of *TWO* days surely is a strong signal
of some very badly written and poorly optimized SQL query.
and the reason for this is absolutely clear: you don't
use at all any Spatial Index.

> I read in the cookbook that this is a useful tool if I want

> to look into precise parts of my db
>

and this one exactly is your goal: you obviously don't
want to actually evaluate ST_Intersects for every possible
candidate pair, because this will be intolerably slow.

so you have to intelligently use the Spatial Index in
order to actually evaluate ST_Intersects only for the
(few) pairs whose MBRs actually overlaps.
and you can do such a thing inserting an appropriate
sub-query based on the Spatial Index.

more or less, something like the following SQL sample:

SELECT ln.line_id AS line_id, pg.polyg_id AS polyg_id
FROM lines AS ln, polygons AS pg
WHERE ST_Intersects(ln.geometry, pg.geometry)
AND ln.ROWID IN
(
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'lines'
AND search_frame = pg.geometry
)

in my own test DB:
- "lines" contains about one hundredth thousand features
- "polygons" contains about ten thousand features
- the above query completed in about 2 minutes
- I've then computed a raw estimation; the same query
(omitting to define the Spatial Index subquery) will
probably take about 40 hours to complete

PLEASE, CAREFULLY READ THE DOCUMENTATION ABOUT SPATIAL
INDEX USAGE IN SPATIALITE
it's absolutely a different thing than in PostGis ;-)

bye Sandro

Damien

unread,
Dec 14, 2011, 5:07:23 AM12/14/11
to spatiali...@googlegroups.com
Hello again!

I followed your advice and adopted a step-by-step approach to my problem, beginning with a methodical reading of the cookbook. I launched my final query and it's still running after 20 minutes, then I think I screwed something :D

Could you please have a eye on my query and tell me where the problem is ?
You can find here a screenshot of my spatialite gui so you have the details of my tables' structure.

And here is the intersect query:
INSERT INTO inter_iran
    (
    inter_id,
    grid_id,
    celletype,
    cover_type,
    geometry
    )
SELECT
    NULL,
    grid_iran.grid_id,
    grid_iran.grid_type,
    cover_iran.cover_type,
    st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran
WHERE
    st_intersects(grid_iran.geometry,cover_iran.geometry)
AND
    GeometryType
        (
            st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
        ) = "MULTIPOLYGON"
AND
    grid_iran.rowid IN
        (
        SELECT pkid
        FROM idx_grid_iran_geometry
        WHERE
            pkid MATCH RtreeIntersects
                (
                MBRminX(cover_iran.geometry),
                MBRminY(cover_iran.geometry),
                MBRmaxX(cover_iran.geometry),
                MBRmaxY(cover_iran.geometry)
                )
        )
;

I precise that this time I have carefully created indexes on the primary keys of each table and also spatial indexes (I hope so).
Am I on the right way?  :D


Thanks for your help!

Damien

unread,
Dec 14, 2011, 6:04:41 AM12/14/11
to spatiali...@googlegroups.com
After some investigations, it seems that my R*tree spatial index has not been set correctly. I therefore have a few questions regarding how to correctly enable this index.

1) Do I have to run the query << SELECT CreateSpatialIndex('grid_iran', 'geometry'); >>  before I populate my table, after, or it has no importance?

2) How do I know the index has been created correctly?

3) If I have a table that already contains data, is it possible to rebuilt a broken spatial index or do I have to create a new table?

Thanks again :)

Damien

unread,
Dec 14, 2011, 12:48:23 PM12/14/11
to spatiali...@googlegroups.com
In order to add some information...

I just tried to modify my way of using R*tree with the query in the screenshot, but interestingly it seems to make spatialite-gui crash every time I do it  O_o

http://s3.noelshack.com/uploads/images/14938762064391_spaitalite_bug.jpg

a.furieri

unread,
Dec 16, 2011, 4:25:18 AM12/16/11
to SpatiaLite Users
Hi Damien,

> it seems that my R*tree spatial index has not been set correctly
>

AFAIK, a severe bug related with crazy R*Trees was signaled on
the latest version of Mac Os X 64-bit if sqlite and/or spatialite
are built using a too much advanced code optimization.
This seems to be related to specifc issues of the Mac own
C/C++ compiler.

I'm not aware of any other possible issue affecting R*Trees,
except the one affecting tables without a Primary Key after
a VACUUM operation: but this point is fully covered on the
documentation.


> 1) Do I have to run the query << SELECT CreateSpatialIndex('grid_iran',
> 'geometry'); >>  before I populate my table, after, or it has no importance?
>

it's absolutely not relevant: before or after is absolutely the same


> 2) How do I know the index has been created correctly?
>

e.g. using CheckSpatialIndex()


> 3) If I have a table that already contains data, is it possible to rebuilt
> a broken spatial index or do I have to create a new table?
>

yes, using RecoverSpatialIndex()

Please, carefully read this document:
http://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf

bye Sandro

a.furieri

unread,
Dec 16, 2011, 4:59:16 AM12/16/11
to SpatiaLite Users
Hi Damien,

your query:
==========
INSERT INTO inter_iran  (inter_id, grid_id, celletype, cover_type,
geometry)


SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry)
AND GeometryType (
st_multi( st_intersection(grid_iran.geometry,cover_iran.geometry))
) = "MULTIPOLYGON"
AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));


the art of debugging:
=================
a) never attempt to test a complex problem
b) break the original problem in many simpler steps

step #1 (the core select):
-----------------------------------


SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,

grid_iran.geometry,cover_iran.geometry
FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1


AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));

q1-1: does it runs correctly ?
q1-2: does it runs fastly enough ?
- if not, identify the cause and then apply some correction
(retest again: goto step #1)
- if yes, goto step #2


step #2 (the full select):
-----------------------------------


SELECT NULL, grid_iran.grid_id, grid_iran.grid_type,
cover_iran.cover_type,
st_multi( st_intersection( grid_iran.geometry,cover_iran.geometry))
FROM grid_iran,cover_iran

WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1
AND GeometryType (
st_multi( st_intersection(grid_iran.geometry,cover_iran.geometry))


) = "MULTIPOLYGON"
AND grid_iran.rowid IN
(
SELECT pkid FROM idx_grid_iran_geometry
WHERE pkid MATCH RtreeIntersects
(
MBRminX(cover_iran.geometry),
MBRminY(cover_iran.geometry),
MBRmaxX(cover_iran.geometry),
MBRmaxY(cover_iran.geometry)
));

q2-1: does it runs correctly ?
q2-2: does it runs fastly enough ?
- if not, identify the cause and then apply some correction
(retest again: goto step #2)
- if yes, goto step #3

PLEASE NOTE: any issue on step#1 and/or step#2
is probably related to malformed SQL, bad optimization,
wrong usage of R*Trees ... and so on.
testing by discrete steps (from simplest to much complex)
will help you a lot in identifying where the problem(s) is(are).


step #3 (the full complex query):
----------------------------------------------
INSERT INTO inter_iran (inter_id, grid_id, celletype, cover_type,
geometry)
SELECT ....

q3: do you notice some unexpected slowness ?
well, now you can surely exclude that this is caused by
any flaw into the underlying SELECT
you can now focus your attention on optimizing the
output performance of SQLite setting the corresponding
PRAGMAs as appropriate.

bye Sandro

Damien

unread,
Dec 17, 2011, 6:07:17 AM12/17/11
to spatiali...@googlegroups.com
Hi Sandro,

First of all, thank you for your precious help!

I have two probably unrelated problems.

The first deals with the CheckSpatialIndex() and RecoverSpatialIndex() functions. I tried them, but spatialite-gui tells me that he doesn't know them. I checked the "help" tool which displays the list of all functions (I presume), and they're not listed there, despite it has Create and Disable spatial index functions.
My version of spatialite-gui is v.1.4.0, is it the newest? If not, can it pose a problem using R*tree?

My second problem deals with the query I submitted to you. I made some tests on the first step with this query:
[QUERY 1]
SELECT grid_iran.grid_id, cover_iran.cover_id

FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1

  AND cover_iran.rowid IN
  (
    SELECT pkid FROM idx_cover_iran_geometry

    WHERE pkid MATCH RtreeIntersects
    (
      MBRminX(grid_iran.geometry),
      MBRminY(grid_iran.geometry),
      MBRmaxX(grid_iran.geometry),
      MBRmaxY(grid_iran.geometry)
   ))
 
I measured the time needed to fetch a certain number of rows using the "LIMIT" condition.
I noticed that the delay varies when I switch the table in which R*tree is used:
[QUERY 2]
SELECT grid_iran.grid_id, cover_iran.cover_id

FROM grid_iran,cover_iran
WHERE st_intersects(grid_iran.geometry,cover_iran.geometry) = 1

  AND cover_iran.rowid IN
  (
    SELECT pkid FROM idx_cover_iran_geometry

    WHERE pkid MATCH RtreeIntersects
    (
      MBRminX(grid_iran.geometry),
      MBRminY(grid_iran.geometry),
      MBRmaxX(grid_iran.geometry),
      MBRmaxY(grid_iran.geometry)
   ))
  
I ran both queries and plotted the time needed:

http://s3.noelshack.com/uploads/images/1121287410045_1.jpg

As you can see, query 2 is slower than the first one, but query 1, after a quite linear progression, suddenly rises. I identified the limit being between the 47,661st and 47,662nd rows.

http://s3.noelshack.com/uploads/images/10177680390174_2.jpg

When I run query1, spatialite take each cover_iran.cover_id and see which grid_iran.grid_id it intersects. You can see in the link below how my grid was constructed and numbered.

http://s3.noelshack.com/uploads/images/7719346048194_3.jpg

I have imagined that the rupture causing trouble in query1 could be caused by the fact that when we arrive at the end of a row in the grid and we need to come back to the extreme left of the "shapefile", it implies that the R*tree has to move to another sector, and this could cost time. But I am both not sure if it could be that, and what I could do if it was the case.

Do you have any idea on how to explain this slowness?

Thanks for your insight!

a.fu...@lqt.it

unread,
Dec 17, 2011, 11:24:42 AM12/17/11
to spatiali...@googlegroups.com
On Sat, 17 Dec 2011 03:07:17 -0800 (PST), Damien wrote
> The first deals with the CheckSpatialIndex() and
> RecoverSpatialIndex() functions.
> My version of spatialite-gui is v.1.4.0, is it the newest?
>

no, here you can get a most recent version:
http://www.gaia-gis.it/gaia-sins/

> If not, can it pose a problem using R*tree?
>

I strongly doubt; the R*Tree implementation is
basically always the same


> I measured the time needed to fetch a certain number of rows using
> the "LIMIT" condition.
> I noticed that the delay varies when I switch the table in which
> R*tree is used:
>

when you "spatially join" two different tables using the one
or the other R*Tree isn't at all the same, and subtantial
different timings may derive depending on the one actually used.
always remember: the R*Tree is a "spatial filter" used to
reduce complexity (i.e. allowing to perform as few comparisons
as strictly required).

> As you can see, query 2 is slower than the first one, but query 1,
> after a quite linear progression, suddenly rises. I identified the
> limit being between the 47,661st and 47,662nd rows.
>

I strong suspect that at this point some "exceptional"
geometry is encounterd: may be, a really huge one having
much more points than any other one. or one presenting
a very big spatial extension; or a malformed one.
but you can easily check this on your DB using the
BlobExplore tool [geometry explore].


> I have imagined that the rupture causing trouble in query1 could
> be caused by the fact that when we arrive at the end of a row in
> the grid and we need to come back to the extreme left of the
> "shapefile", it implies that the R*tree has to move to another
> sector, and this could cost time.
>

yes, for sure it cost some time: may be few microseconds ? :-)

seriously speacking: how big is your DB-file ?
if its size widely exceeds the available RAM size, you can
easily experience poor performance, due to I/O stalls.

another important point: are you really sure that all your
geometries are full valid ones ?
you can easily check this using ST_IsValid(); sometimes
few broken or malformed "crazy" polygons may cause very
serious problems.

bye Sandro

Damien

unread,
Dec 19, 2011, 4:12:02 PM12/19/11
to SpatiaLite Users
Good evening,

Thanks again for your very useful advices.

I checked the geometry of my "cover_iran" layer, and this is the one
which causes trouble.
I ran my query ignoring the malformed polygons and seemed to do the
job correctly. I therefore assume that this is the main source of my
problems. It seems there are around 12.000 polygons with geometry
problems.
I extracted one of them to have a better idea of what's the matter,
and this is what I discovered:

http://s3.noelshack.com/uploads/images/18400038942087_spatialite_guierror.jpg

http://s3.noelshack.com/uploads/images/4991973722703_deformed_polygon.jpg

The sanitizegeometry() function appears to be of little help, I didn't
managed to solve the problem this way.
I read on the internet that it is possible to repair a broken geometry
by exporting the table as a shapefile, loading it into QGIS then
GRASS, rebuilding the topology, then re-load the resulting shapefile
in spatialite-gui. It carried out this manipulation, without much
result, my problem still being there.

In another thread you solved a problem by cutting the countries into
smaller pieces in order to simplify their geometry. According to the
QGIS view, I suspect it would be one way of solving my problem, at
least for these 12.000 polygons. The problem is I don't know exactly
how I could do it. Do you think I would be wise to cut these malformed
polygons into a whole lot of smaller ones? How could I do it without
having to cope with too many too small polygons?

Thanks again for your precious help

Damien

Reply all
Reply to author
Forward
0 new messages