RTreeWithin bug

133 views
Skip to first unread message

Sacha

unread,
Oct 27, 2011, 9:00:37 AM10/27/11
to SpatiaLite Users
After some further investigation with my colleague, using v3.0.0, I
think we have come across a possible bug in the RTreeWithin method. I
originally posted about our problem in another thread here:

https://groups.google.com/group/spatialite-users/browse_thread/thread/fff75f062e6bb845/ed70ece838c75e49?hl=en&lnk=gst&q=rtreewithin#ed70ece838c75e49

It seems that when we are performing a boundingbox query using
RTreeWithin, only the R*tree nodes *within* the queried bounding box
are retrieved, and then any features contained by these returned
R*tree nodes that are within the queried bounding box are returned by
the query.

We believe that the initial bounding box query should find all R*tree
nodes that *intersect* the queried bounding box. The features
contained by these returned R*tree nodes should then use the
RTreeWithin callback to test whether they are within the queried
bounding box.

We are doing some further investigation to see where this bug could
possibly be happening, but I thought I would post our findings in case
anyone had already come across this, or knows of where in the code
this bug could possibly exist.

Mark Wright

unread,
Nov 1, 2011, 1:10:21 PM11/1/11
to SpatiaLite Users
I have been working with Sacha on diagnosing this problem. Our work on
the behaviour of the function and inspection of the code suggests that
the function RTreeWithin() is called at all levels of data (index
nodes and actual data points) during a “within” query, while the
RTreeIntersects() is likewise called at all levels during an
“intersects” query.

While this behaviour is a plausible expectation of correct behaviour,
we are convinced that the current output of RTreeWithin doesn’t
conform to its description, and more importantly it is difficult to
see any scenario in which it the current behaviour could be useful.
This is because currently the output of RTreeWithin depends entirely
on the geometry and arrangement of the R*tree nodes. An arrangement of
R*tree nodes where most of them by chance cross the boundingbox query
will result in most coordinate points being eliminated. Indeed, it’s
quite possible to imagine a scenario where *all* of the R*tree nodes
cross the boundingbox query in some way, which would lead to no points
at all being returned, even though many are inside the boundingbox.

Even worse than this, because the R*tree nodes of two separate
databases with identical points in them could be different (due to
different insertion/build strategies) it is entirely likely that
otherwise identical databases could produce different results with the
current RTreeWithin.

Because of this, we think that R*tree nodes should *always* be
processed with RTreeIntersect, and the distinction between functions
should be made only on actual coordinate points. This does make the
chain slightly more complex because it means it can’t be a simple
recursive call to the function – there needs to be some logic in there
too.

As a workaround, we had considered inserting an “if” statement at the /
* evaluating Within relationship */ in function RTreeWithin() saying
effectively “if we are processing a point, use current evaluation,
else use the 8 lines copied from RTreeIntersects()”. However, we can’t
currently see a way to know if the function is processing a node or a
point. Any ideas?

Yours,
Mark


On Oct 27, 1:00 pm, Sacha <sacha.dar...@riskaware.co.uk> wrote:
> After some further investigation with my colleague, using v3.0.0, I
> think we have come across a possible bug in the RTreeWithin method. I
> originally posted about our problem in another thread here:
>
> https://groups.google.com/group/spatialite-users/browse_thread/thread...

a.furieri

unread,
Nov 1, 2011, 4:40:02 PM11/1/11
to SpatiaLite Users
Hi Sasha and Mark,

first of all, I'll quote myself: please check my previous answer
given on August 8.

> the equivalent spatial filter is RTreeContains:
> this will extract both geometries fully contained
> within the BBOX *and* geometries crossing the BBOX
> itself.
> in other words, when accessing the R*Tree
> as a Spatial Index (quick BBOX filtering)
> you are always expected to use RTreeContains."
>

few technical highlights: the R*Tree simply represents a
really efficient way allowing to perform rough&fast spatial
filtering on the basis of MBR [aka BBOX] quick comparisons;
for any "finest/precise" spatial evaluation you are expected
to use "true" spatial functions, such as ST_Contains(),
ST_Within(), ST_Disjoint(), ST_Intersects() ...

you can access the R*Tree following *three* alternative
approaches: e.g.

#1: oldest style
-----------------------------------------
SELECT Name FROM GeoNames
WHERE ROWID IN (
SELECT pkid FROM idx_GeoNames_Geometry
WHERE xmin <= 11.9 AND xmax >= 11.8
AND ymin <= 43.5 AND ymin >= 43.4
);



#2: using the Geometry Call-Backs interface
-------------------------------------------
SELECT Name FROM GeoNames
WHERE ROWID IN (
SELECT pkid FROM idx_GeoNames_Geometry
WHERE pkid MATCH
RTreeIntersects(11.8, 43.4, 11.9, 43.5)
);



#3: using the VirtualSpatialIndex interface
-------------------------------------------
SELECT Name FROM GeoNames
WHERE ROWID IN (
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'GeoNames' AND
search_frame = BuildMbr(11.8, 43.4, 11.9, 43.5)
);


Please note well: there is no real difference between
all them.
it's mainly a problem of "aesthetic" preferences
and syntactic sugar; performances are absolutely
the same anyway.
not at all surprisingly, because the underlaying
R*Tree implementation always is one and the same.



The ugly ducklings:
===============
I initially introduced the RTreeWithin() and RTreeContains()
callbacks simply driven by (stupid) inertial forces ...

The really needed callback function was obviously
RTreeIntersects().
But we already had MbrsWithin() and MbrsContains():
so applying the same approach for geometry callbacks
functions accessing the R*Tree looked apparently good ...

BAD ERROR: this is not at all applicable to R*Tree callbacks:
the internal logic implemented by SQLite is strictly
bounded to "intersects".
Any attempt to implement "within" or "contains" logic
simply causes the whole callbacks chain to miserably fail.


Interim conclusion: *never* use RTreeContains and/or
RTreeWithin; always use RTreeIntersects instead.

Definitive conclusion: v.3.0.0 "stable" will officially
*DEPRECATE* both RTreeWithin and RTreeContains.
They'll probably be still maintained (for backward compatibility),
but they'll simply become alias-names for RTreeIntersects.

bye Sandro

Sacha

unread,
Nov 2, 2011, 5:58:08 AM11/2/11
to SpatiaLite Users
Thanks for the detailed response Sandro. I think we will make use of
the old style for our purposes now.

Stefan Keller

unread,
Apr 22, 2012, 2:17:40 PM4/22/12
to spatiali...@googlegroups.com
Hi Sandro,

2011/11/1 a.furieri <a.fu...@lqt.it>:

Hi Sandro

I'm migrating a simple ST_Contains (ST_Within) query from PostGIS to
spatialite and I'm little confused by the three versions you
summarized before.

That's what I'd like to do: I'd like to make a point in polygon test
for points (table orte/places) in order to spatially join and add an
attribute (Name) from a polygon table (table gemeinden/communities)
using spatialite 2.4 and 3.x (newest):

SELECT pt.ROWID, pt.PKUID, pt.NAME, pt.Geometry, po.name
FROM orte AS pt, gemeinden AS po
WHERE ST_Contains(pt.way, po.way);

Note: Thats ISO/OGC standard and includes an index von geometry
attributes from both(!) tables.

Now, for spatialite 3x this seems to be one of the newer syntaxes:

SELECT pt.ROWID, pt.PKUID, pt.NAME, pt.Geometry, po.name
FROM orte AS pt, gemeinden AS po
WHERE ST_Contains(po.Geometry, pt.Geometry)
AND pt.ROWID IN (
SELECT ROWID FROM idx_Orte_Geometry
WHERE RTreeContains(po.Geometry, pt.Geometry) = 1
);

My questions:
* Is this the preferred syntax for this task?
* What is the preferred syntax for spatialite v. 2.4 (which is in
current stable QGIS)?
* Why is the Spatial Index only applied for one table (here:
idx_Orte_Geometry) and not for both (here: idx_Gemeinden_Geometry)?

Yours, Stefan

Stefan Keller

unread,
Apr 23, 2012, 2:27:14 AM4/23/12
to spatiali...@googlegroups.com
2012/4/22 Stefan Keller <sfke...@gmail.com>:

I correct myself: This is what really used the index:

SELECT pt.PKUID, pt.NAME, pt.Geometry, po.Name


FROM orte AS pt, gemeinden AS po

WHERE ST_Within(pt.Geometry, po.Geometry)
AND pt.ROWID IN (
SELECT pkid FROM idx_Orte_Geometry
WHERE pkid MATCH RTreeIntersects(
MbrMinX(po.Geometry), MbrMinY(po.Geometry),
MbrMaxX(po.Geometry), MbrMaxY(po.Geometry))
)

But my questions remain:
* Isn't there a more elegant and intuitive subquery?
* At least when two polygons would be involved I would expect two
index subqueries one for each

Yours, S.

a.fu...@lqt.it

unread,
Apr 23, 2012, 4:10:45 AM4/23/12
to spatiali...@googlegroups.com
Hi Stefan,

SQLite implements a nice R*Tree, and it's really effective and fast:
anyway a major design issue exists; an R*Tree simply is a separate
table (it actually is a VirtualTable based on three further tables,
but for any practical purpose we can safely ignore at all the three
underlaying tables, simply focusing our attention on the VirtualRTree
table itself).

In other words, the SQL main core is absolutely unaware that some
logical relationship exist between the main-geometry table and the
supporting rtree-table.

SpatiaLite on its own adds few TRIGGERs to the main-geometry table,
so to ensure that any INSERT/UPDATE/DELETE affecting the main table
will immediately affect the corresponding RTree table at the same time.

But any SELECT absolutely requires to explicitly JOIN the RTree (one
way or the other) in order to query the Spatial Index; the usual way
is the one to define a further sub-query. e.g.:

form #1:
--------
SELECT pt.PKUID, pt.NAME, pt.Geometry, po.Name
FROM orte AS pt, gemeinden AS po
WHERE ST_Within(pt.Geometry, po.Geometry)
AND pt.ROWID IN (
SELECT pkid FROM idx_Orte_Geometry
WHERE xmin <= MbrMaxX(po.Geometry) AND
ymin <= MbrMaxY(po.Geometry) AND
xmax >= MbrMinX(po.Geometry) AND
ymax >= MbrMinY(po.Geometry)
);

the above SQL snippet shows the "earlier form" initially suggested
in order to access the Spatial Index.
It surely works: but it's rather verbose (and maybe, confusing).


form #2:
--------
SELECT pt.PKUID, pt.NAME, pt.Geometry, po.Name
FROM orte AS pt, gemeinden AS po
WHERE ST_Within(pt.Geometry, po.Geometry)
AND pt.ROWID IN (
SELECT pkid FROM idx_Orte_Geometry
WHERE pkid MATCH RTreeIntersects(
MbrMinX(po.Geometry), MbrMinY(po.Geometry),
MbrMaxX(po.Geometry), MbrMaxY(po.Geometry))
);

this second form is based on the so called geometry-callbacks
introduced in SQLite 3.7.3
Not a real progress: still too much verbose (and several serious
implementation issues emerged from direct on-the-field experience).
Please note: it's now *deprecated*, and will be removed from future
versions of SpatiaLite.


form #3:
--------
SELECT pt.PKUID, pt.NAME, pt.Geometry, po.Name
FROM orte AS pt, gemeinden AS po
WHERE ST_Within(pt.Geometry, po.Geometry)
AND pt.ROWID IN (
SELECT ROWID FROM SpatialIndex
WHERE f_table_name = 'orte'
AND search_frame = po.Geometry
);

the most recent one: it finally is quite elegant and simple.
I personally suggest anyone to swiftly adopt this one as the
"canonical form" for Spatial Index queries.


================================================================

> * At least when two polygons would be involved I would expect two
> index subqueries one for each
>

Absolutely not, a single subquery is enough. A short rationale:

SELECT * FROM a, b

ideally, this will compute the Cartesian Product of both tables;
i.e. any row found in "a" will be joined to any row found in "b".
In order to get this result two nested loops are required:
- the "outer" loop will scan table "a", fetching one single row
at each iteration step.
- the "inner" loop will do the same on behalf of table "b"

please note: reality isn't so simple; the SQL query planner/optimizer
will actually make some preliminary evaluations of its own, in order
to identify the most promising (=faster) data access strategy.
e.g. it could eventually decide to use table "b" in the outer loop,
consequently assigning table "a" to the inner loop.

accordingly to all this, one single table (the one chosen by the
query planner for the outer loop) will be full-scanned anyway; and
rather obviously any possible optimization (aka filtering/discarding
obviously uninteresting rows) will simply apply on behalf of the second
table assigned to the inner loop.

please note: exactly the same criteria will be applied when three (or
even
more) tables have to be joined; simply adding a further nested loop is
required for each table.

coming back to our Spatial Index sub-queries (above example): by
defining the
Spatial Index sub-query you'll pass a useful hint to the query planner.
So the planned query workflow become:

a) fetch one row from "gemeinden" (po) table
b) resolve the sub-query: this step will access the Spatial Index
corresponding
to the "orte" (pt) table, using the po.geometry MBR as a "spatial
filter".
only the (few) "orte" rows matching the spatial filter will be
further
evaluated, discarding the (many) uninteresting ones.
c) and finally evaluate any [po/pt] pair, checking if ST_Within() is
TRUE

As you can easily see, there is absolutely no reason to introduce a
second
Spatial Index in this query.

bye Sandro

--
Il messaggio e' stato analizzato alla ricerca di virus o
contenuti pericolosi da MailScanner, ed e'
risultato non infetto.

Stefan Keller

unread,
Apr 23, 2012, 10:49:49 AM4/23/12
to spatiali...@googlegroups.com
Thank you very much, Sandro.
I got it.
-S.

2012/4/23 <a.fu...@lqt.it>:

> --
> You received this message because you are subscribed to the Google Groups
> "SpatiaLite Users" group.
> To post to this group, send email to spatiali...@googlegroups.com.
> To unsubscribe from this group, send email to
> spatialite-use...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/spatialite-users?hl=en.
>

Stefan Keller

unread,
Apr 23, 2012, 4:08:29 PM4/23/12
to spatiali...@googlegroups.com
Hi Sandro,

2012/4/23 <a.fu...@lqt.it>:

I just tested this out with SpatiaLite ver. 2.4.0 which is the one in
current stable QGIS 1.7.
There, form #3 does not work because SpatialIndex does not yet exist.
So what do you recommend in this case? Deprecated form #2 or form #1?

Yours, S.

a.fu...@lqt.it

unread,
Apr 23, 2012, 4:48:08 PM4/23/12
to spatiali...@googlegroups.com
On Mon, 23 Apr 2012 22:08:29 +0200, Stefan Keller wrote:
> I just tested this out with SpatiaLite ver. 2.4.0 which is the one in
> current stable QGIS 1.7.
> There, form #3 does not work because SpatialIndex does not yet exist.
>

yes, it's a quite recent introduction

> So what do you recommend in this case? Deprecated form #2 or form #1?
>

IMHO form #1: it's absolutely stable, and it will surely be supported
for a very long time.

Micha

unread,
Apr 23, 2012, 4:50:15 PM4/23/12
to SpatiaLite Users


On Apr 23, 11:10 am, a.furi...@lqt.it wrote:
> Hi Stefan,
>
>
>
> form #3:
> --------
> SELECT pt.PKUID, pt.NAME, pt.Geometry, po.Name
> FROM orte AS pt, gemeinden AS po
> WHERE ST_Within(pt.Geometry, po.Geometry)
> AND pt.ROWID IN (
>    SELECT ROWID FROM SpatialIndex
>    WHERE f_table_name = 'orte'
>      AND search_frame = po.Geometry
> );
>
> the most recent one: it finally is quite elegant and simple.
> I personally suggest anyone to swiftly adopt this one as the
> "canonical form" for Spatial Index queries.
>
>

I'm working with spatialite 3.0.1 on WinXP (and spatialite-gui 1.5). I
see the SpatialIndex table, but it remains empty after I run:
SELECT CreateSpatialIndex('table_name','Geometry');
The last column in geometry_columns changes to '1' and the four idx_*
tables are created, but nothing in SpatialIndex.
What am I missing??
Thanks,
Micha

a.fu...@lqt.it

unread,
Apr 23, 2012, 5:09:39 PM4/23/12
to spatiali...@googlegroups.com
On Mon, 23 Apr 2012 13:50:15 -0700 (PDT), Micha wrote:
> I'm working with spatialite 3.0.1 on WinXP (and spatialite-gui 1.5).
> I
> see the SpatialIndex table, but it remains empty after I run:
> SELECT CreateSpatialIndex('table_name','Geometry');
> The last column in geometry_columns changes to '1' and the four idx_*
> tables are created, but nothing in SpatialIndex.
> What am I missing??
>

Hi Micha,

there is nothing odd in all this: please note, SpatialIndex is
a *VirtualTable*, not a real table ...
in other words, it's simply an abstract interface, actual data
are "magically" provided as appropriate by the underlying driver.

if you are curious to know, the actual workflow implemented by
SpatialIndex is:
a) checking if the table identified by "WHERE f_table_name = x"
actually exists, and if it has a corresponding R*Tree
b) if yes, than a nested query on behalf of this R*Tree is
silently performed using as appropriate the MBR deriving
from the "WHERE search_frame = g" clause
c) and finally all the ROWIDs found are inserted into the
resultset to be returned

in other words, it simply is "syntactic sugar", no more than
this (anyway, really useful because it allows writing simpler
and clearer queries) ;-)

Micha

unread,
Apr 24, 2012, 5:33:45 AM4/24/12
to SpatiaLite Users
Thanks, Sando.
Just to fully understand, should this query work:
(I have a polygon table adm2, and a points table, GeoPC, both with
spatial indices)

SELECT ROWID FROM SpatialIndex, adm2 WHERE f_table_name='GeoPC' and
search_frame=adm2.Geometry;

I get: SQL error: No such column ROWID

??
Thanks

a.fu...@lqt.it

unread,
Apr 24, 2012, 5:54:49 AM4/24/12
to spatiali...@googlegroups.com
On Tue, 24 Apr 2012 02:33:45 -0700 (PDT), Micha wrote:
> SELECT ROWID FROM SpatialIndex, adm2 WHERE f_table_name='GeoPC' and
> search_frame=adm2.Geometry;
>
> I get: SQL error: No such column ROWID
>

Hi Micha,

this is because there are *two* different columns named ROWID in
your query: one coming from the SpatialIndex table, and another
one coming from the GeoPC table.
you are simply required to disambiguate/qualify all column names,
as in:

SELECT SpatialIndex.ROWID, adm2.ROWID
FROM SpatialIndex, adm2
WHERE f_table_name='GeoPC' AND search_frame=adm2.Geometry;

Reply all
Reply to author
Forward
0 new messages