recent improvements affecting the Spatial Index

82 views
Skip to first unread message

Alessandro Furieri

unread,
Oct 31, 2023, 5:46:16 AM10/31/23
to SpatiaLite Users
Hi List,

in recent days the joint efforts of several people have made it possible
to obtain a remarkable optimization of the Spatial Index.

There are two areas specifically affected:

1. INSERT operations involving the R*Tree now take approximately 15% - 20%
   less time than before.
   Note: this improvement will be supported by all future versions of SQLite
   starting from 3.44 whose release is now imminent.
   
2. A brand new feature now allows for very fast initial loading of an R*Tree.
   This new functionality (RTree Bulk Load) is activated automatically every
   time a Spatial Index connected to an already populated GeoTable is created
   or rebuilt.
   Bulk Load mode reduces RTree loading times by approximately 2/3
   Support for RTree Bulk Load is completely internal to SpatiaLite and does
   not depend on SQLite.
   The complete implementation is already available in the current development
   version of SpatiaLite contained in the Fossil repository.
   
if you are looking for further details here you will find a comparative benchmark
that documents the speed gains that can be achieved with these latest optimizations:

https://www.gaia-gis.it/gaia-sins/rtree-benchmark.html

enjoy,
Sandro

Even Rouault

unread,
Oct 31, 2023, 6:08:13 AM10/31/23
to spatiali...@googlegroups.com, Alessandro Furieri

Sandro,

excellent!

you might want to resync with the latest version of https://github.com/rouault/sqlite_rtree_bulk_load . Fuzzers caught a out-of-bounds write issue in the node splitting function when writing geometries with huge coordinate values which I've corrected 2 days ago

I've realized that the term "bulk loading" might be overselling what is implemented actually (but which is a major enhancement over SQLite native RTree building).  Apparently in the literature, bulk loading involves sorting the geometries with increasing X values before loading them into the RTree which leads to a better RTree. This is often associated with a sort tile recursive (STR) RTree. Not totally sure if that would make sense for initial filling of the R*Tree.

Even

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/47ec6060-aef7-4354-a55f-f88acf4ca273n%40googlegroups.com.
-- 
http://www.spatialys.com
My software is free, but my time generally not.

a.fu...@lqt.it

unread,
Oct 31, 2023, 7:09:50 AM10/31/23
to spatiali...@googlegroups.com
On Tue, 31 Oct 2023 11:08:09 +0100, Even Rouault wrote:
> you might want to resync with the latest version of
> https://github.com/rouault/sqlite_rtree_bulk_load. Fuzzers caught
> a out-of-bounds write issue in the node splitting function when
> writing geometries with huge coordinate values which I've corrected 2
> days ago
>

Hi Even,

I've committed in the Fossil repo your latest patches.

bye Sandro

Pieter Roggemans

unread,
Oct 31, 2023, 7:17:17 AM10/31/23
to SpatiaLite Users
> I've realized that the term "bulk loading" might be overselling what is implemented actually (but which is a major enhancement over SQLite native RTree building).  Apparently in the literature, bulk loading involves sorting the geometries with increasing X values before loading them into the RTree which leads to a better RTree. This is often associated with a sort tile recursive (STR) RTree. Not totally sure if that would make sense for initial filling of the R*Tree.

This indeed what I meant before. The algorithm I implemented in SQL and posted in the SQLite forum is a "real" bulk load algorithm based on sorting on Morton code (or sorting on Hilbert code would give the same result) because that would give better results than naive sorting on coordinates, and - according to literature - should give another large improvement to the current fast implementation.

An "older" but commonly used "real" bulk algorithm is indeed "sort tile recursive (STR) RTree", which I referenced to before because it is the one used in GEOS. The "build" function referenced here is the one building the tree:

Normally "real" bulk load algorithms should be possible drop-in replacements to create the sqlite data structure.




Op dinsdag 31 oktober 2023 om 12:09:50 UTC+1 schreef a.fu...@lqt.it:
Reply all
Reply to author
Forward
Message has been deleted
0 new messages