Difference in query execution times when using ST_Intersect(g1,g2) vs MbrIntersect(g1,g2), Why?

163 views
Skip to first unread message

Ankur

unread,
Oct 18, 2017, 3:29:28 AM10/18/17
to SpatiaLite Users
Hi, 

I am doing performance comparisons between different spatial DB's like PostGIS and Spatialite. 
I have these two queries that are giving different timings. I know Mbr gives approximate spatial relationship vs ST_ functions which uses actual geometry. 

My two queries are:

Query 1: 
time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p WHERE ST_Intersects(c.Geometry, p.Geometry) AND p.ROWID IN (SELECT pkid FROM idx_oil_Geometry WHERE xmin <= MbrMaxX(c.Geometry) AND xmax >= MbrMinX(c.Geometry) AND ymin <= MbrMaxY(c.Geometry) AND ymax >= MbrMinY(c.Geometry));" | spatialite spatialite.db > /dev/null

real 0m17.947s
user 0m17.394s
sys 0m0.384s

Query 2:
time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p WHERE MbrIntersects(c.Geometry, p.Geometry) AND p.ROWID IN (SELECT pkid FROM idx_oil_Geometry WHERE xmin <= MbrMaxX(c.Geometry) AND xmax >= MbrMinX(c.Geometry) AND ymin <= MbrMaxY(c.Geometry) AND ymax >= MbrMinY(c.Geometry));" | spatialite spatialite.db > /dev/null

real 0m2.914s
user 0m2.630s
sys 0m0.259s

Notice, when using Mbr, execution time is super fast but the results vary. For Query 1(ST_Intersect), I get 178 results, but for Query2 (MbrIntersect) I get 246 results. I get the point that since Mbr is approximate SR, there will be in-accurate results. 

So my question, is there any way I can improve my ST_Intersect query to achieve similar performance as with query 2 (MbrIntersect). 
Any help will be greatly appreciated.

Thanks !

a.fu...@lqt.it

unread,
Oct 18, 2017, 4:37:49 AM10/18/17
to spatiali...@googlegroups.com
On Wed, 18 Oct 2017 00:29:28 -0700 (PDT), Ankur wrote:
> Hi, 
>
> I am doing performance comparisons between different spatial DB's
> like
> PostGIS and Spatialite. 
> I have these two queries that are giving different timings. I know
> Mbr
> gives approximate spatial relationship vs ST_ functions which uses
> actual geometry. 
>

Hi Ankur,

you've already got the key point: just evaluating MBR relationships
is a basically simple operation and not surprisingly will be
computed in a negligible time.
even more important, the time required to compare two MBRs can be
assumed to be constant and does not depends on the total number of
vertices declared by the two Geometries.

the GEOS-based ST_Intersects and friends are intrinsically complex
operations and could easily require a lot of calculations.
the required time is variable and mainly depends on the total number
of vertices.
computing an intersection between simple Linestrings/Polygons
declaring a handful of points will be rather fast, but when
processing complex geometries declaring several thousand points
a very long time will be required.

short conclusion: just evaluating the MBRs is a quick method for
fast approximate filtering. the GEOS-based ST_xxxx methods ensure
exactness but tend to be rather slow.


> My two queries are:
>
> Query 1:
>
> time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p WHERE
> ST_Intersects(c.Geometry, p.Geometry) AND p.ROWID IN (SELECT pkid
> FROM
> idx_oil_Geometry WHERE xmin = MbrMinX(c.Geometry) AND ymin =
> MbrMinY(c.Geometry));" | spatialite spatialite.db > /dev/null
>
> real 0m17.947s
> user 0m17.394s
> sys 0m0.384s
>
> Query 2:
>
> time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p WHERE
> MbrIntersects(c.Geometry, p.Geometry) AND p.ROWID IN (SELECT pkid
> FROM
> idx_oil_Geometry WHERE xmin = MbrMinX(c.Geometry) AND ymin =
> MbrMinY(c.Geometry));" | spatialite spatialite.db > /dev/null
>
> real 0m2.914s
> user 0m2.630s
> sys 0m0.259s
>
> Notice, when using Mbr, execution time is super fast but the results
> vary. For Query 1(ST_Intersect), I get 178 results, but for Query2
> (MbrIntersect) I get 246 results. I get the point that since Mbr is
> approximate SR, there will be in-accurate results.
>

when using the fast but approximative MBR comparison a lot of
"false-positives" will be returned; i.e. not really intersecting
geometries but presenting intersecting MBRs.
(see the attached figure for a practical example)

> So my question, is there any way I can improve my ST_Intersect query
> to achieve similar performance as with query 2 (MbrIntersect).
> Any help will be greatly appreciated.
>

it's all or nothing: you have to necessarily choose between
quick/approximate and slow/precise, there is no possible
compromise.

what many users do in such a situation is including a
SpatialIndex subquery in their main spatial query, so to
quickly filter the (possibly few) candidate couples of
geometries, and then performing a more exact evaluation
only on behalf of the pre-selected candidates.

your queries apparently seem to intend using the Spatial
Index, but they adopt an invalid syntax.
the expected syntax for a valid Spatial Index query is:

SELECT c.Zip_Code, count(1)
FROM zipcodes c, oil p
WHERE ST_Intersects(c.Geometry, p.Geometry) = 1 AND p.ROWID IN (
SELECT pkid FROM SpatialIndex
WHERE f_table_name = 'oil' AND search_frame = c.geometry)
);

bye Sandro
mbr-false-positive.png

Ankur

unread,
Oct 18, 2017, 12:23:22 PM10/18/17
to SpatiaLite Users
Hi Sandro,

Thanks for a precise response. I realized that my spatial query needs SpatialIndex feature and so I tried the one you provided, but it's executing in similar time range. 

$ time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p WHERE ST_Intersects(c.Geometry, p.Geometry)=1 AND p.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name='oil' AND search_frame=c.Geometry);" | spatialite spatialite.db > /dev/null

real 0m19.540s
user 0m18.417s
sys 0m0.537s

So basically if I have less number of intersecting vertices between geometries, the possible execution time will reduce. Is that a correct statement ? 

-Ankur

a.fu...@lqt.it

unread,
Oct 19, 2017, 5:33:52 AM10/19/17
to spatiali...@googlegroups.com
On Wed, 18 Oct 2017 09:23:22 -0700 (PDT), Ankur wrote:
> Hi Sandro,
>
> Thanks for a precise response. I realized that my spatial query needs
> SpatialIndex feature and so I tried the one you provided, but it's
> executing in similar time range. 
>
> $ time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p
> WHERE ST_Intersects(c.Geometry, p.Geometry)=1 AND p.ROWID IN (SELECT
> ROWID FROM SpatialIndex WHERE f_table_name='oil' AND
> search_frame=c.Geometry);" | spatialite spatialite.db > /dev/null
>
> real 0m19.540s
> user 0m18.417s
> sys 0m0.537s
>

Hi Ankur,

I read from your first post:

>> time printf "SELECT c.Zip_Code, count(1) FROM zipcodes c, oil p
>> WHERE ST_Intersects(c.Geometry, p.Geometry) AND p.ROWID IN (SELECT
>> pkid FROM idx_oil_Geometry WHERE xmin = MbrMinX(c.Geometry) AND ymin
>> = MbrMinY(c.Geometry));" | spatialite spatialite.db > /dev/null
>>
>> real 0m17.947s
>> user 0m17.394s
>> sys 0m0.384s
>>

this query seems to already include a Spatial Index
("idx_oil_geometry")
although the adopted syntax is obsolete and invalid.
not surprisingly you are getting similar execution timings.


> So basically if I have less number of intersecting vertices between
> geometries, the possible execution time will reduce. Is that a
> correct
> statement ? 
>

not exactly.
the number of intersecting vertices plays no role; what is
really relevant is the total number of vertices in both
geometries.

quick explanation: in order to exactly determine an
intersection (if any) between two polygons you necessarily
have to follow an algorithm more or less like the following:

1. extract the first segment (v0-v1) from the first polygon
2. check if this segment is completely contained within the
second polygon (yes/no).
note: this step requires a time proportional to the
total number of points into the second polygon.
3. check if the same segment has any intersecting point
with some segment from the second polygon (if yes
you then have to split the segment in two halves,
one "outside" and the other "inside"),
note: this step too requires a time proportional to
the total number of points into the second polygon.
4. then you have to re-cycle starting from step 1.
for each segment found into the first polygon.
note: the overall cycle from 1. to 4. will require
a time proportional to the total number of points
into the first polygon.
5. and finally you have to collect all "inside"
segments then reassembling the outer ring of
the intersection between the two polygons.

real world algorithms are surely smarter and more
efficient than the above one, but it remains
confirmed that precisely determining the intersection
between two polygons will require a time proportional
the the total count of their vertices.

just a coarse indicative classification:
- very simple polygons (about 10 vertices): fast
- complex polygons (about thousand vertices): slow
- very complex polygons (hundredth thousands or
even million vertices): deadly slow.

you can easily apply similar considerations to the
linestring-linestring, linestring-polygon,
point-linestring and point-polygon intersections.

bye Sandro
Reply all
Reply to author
Forward
0 new messages