batch update the whole table

54 views
Skip to first unread message

Dave Milter

unread,
Feb 25, 2018, 4:54:17 PM2/25/18
to SpatiaLite Users
Hello.

What is the best way to update every row in column added by `AddGeometryColumn`?

My current solution is looks like this (pseudo code):

```
CREATE TABLE Foo(id_key INTEGER PRIMARY KEY);

SELECT AddGeometryColumn('Foo', 'geom', 4326, 'POINT', 'XY', 1);
SELECT CreateSpatialIndex('Foo', 'geom');

SELECT AddGeometryColumn('Foo', 'proj', -1, 'POINT', 'XY', 0);
SELECT CreateSpatialIndex('Foo', 'proj');

void reproject()
{
   std::vector<(int64_t, double, double)> reprojected_points;
   for (id, longitude, latitude) in "SELECT id_key,X(geom),Y(geom) FROM FOO" {
     auto new_point = reproject(latitude, longitude);
     reprojected_points.push_back((id, new_point.x, new_point.y));
   }
   BEGIN TRANSACTION;
   for ((id, x, y) : reprojected_points) {
      UPDATE Foo SET proj = (x, y) WHERE id_key = id;
   }
   COMMIT;
}
```
As can you see I have two geometry columns, one with wgs-84 latitude, longitude and another one with some projection.
And I need time to time reproject  column with projection.

Reading data using "SELECT" and then reproject takes only 33ms,
but `SELECT` plus `UPDATE` takes 811ms.
Looks too long for me, because of I use for development fast enough computer (i7/16GB/SSD),
but my program may be used on relatively slow machines, that may be 10x slower then mine.

I tried execute `DisableSpatialIndex` for 'proj' right after `BEGIN TRANSACTION;`, and looks like without
spatial index update it takes 324ms.  That is good enough, but I need `SpatialIndex` for 'proj'.

May be there is other way to fill geometry column and spatial index with data?

br...@frogmouth.net

unread,
Feb 25, 2018, 5:08:36 PM2/25/18
to spatiali...@googlegroups.com

I think this is really a sqlite question (not a spatialite question), but you may be better off if you do not to store all the projected data inside the database (bad design to duplicate data), but rather to reproject the selected set on query (i.e. use ST_Transform on the results of the query). Keep the single geometry column.

 

If you do need to reproject it all, a set of smaller transactions may help (say 1000-5000 rows per transaction).

 

A slower machine may not be much worse, although disk performance does matter. If speed is really more important than transaction safety, see https://www.sqlite.org/faq.html#q19

 

Brad

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To post to this group, send email to spatiali...@googlegroups.com.
Visit this group at https://groups.google.com/group/spatialite-users.
For more options, visit https://groups.google.com/d/optout.

Dave Milter

unread,
Feb 25, 2018, 5:56:17 PM2/25/18
to SpatiaLite Users


On Monday, February 26, 2018 at 1:08:36 AM UTC+3, Brad Hards wrote:

I think this is really a sqlite question (not a spatialite question), but you may be better off if you do not to store all the projected data inside the database (bad design to duplicate data), but rather to reproject the selected set on query (i.e. use ST_Transform on the results of the query). Keep the single geometry column.

 


I suppose I need to add some context here. I execute some query against geography data (geom column) (latatude, longitude) and some queries
against XY (on plane) (proj column). I can not keep one geometry column, because of for example if I query all rows inside rectangle on the plane,
and can use only `proj` column, because of depend on projection rectangle on plane may be mapped to complex polygon on latitude/longitude plane.

But I need 'proj' column only during work, actually I do not need on disk. So great thing will be (in terms of sqlite) attaching :memory:  to my db and store
'proj' and index inside memory database. But looks like it is impossible, and not clear is :memory: idea is faster.

If you do need to reproject it all, a set of smaller transactions may help (say 1000-5000 rows per transaction).

 

A slower machine may not be much worse, although disk performance does matter. If speed is really more important than transaction safety, see https://www.sqlite.org/faq.html#q19

 



I tried your suggestion,  run every 1000 or 5000 updates inside BEGIN TRANSATION/COMMIT,
and do not see any visible difference, before was ~800 ms and with 1000 or 5000 also ~800ms, may be +-10ms.

I think this is SpatiaLite questions because of numbers. sqlite  specific code (without spatialite index update) takes 324ms,
with spatial index update ~800ms. May be there is way to improve batch index update with some spatial magic.

br...@frogmouth.net

unread,
Feb 25, 2018, 6:13:39 PM2/25/18
to spatiali...@googlegroups.com

You do realise that you could project your query box as well?

 

You can ATTACH a database, and just create a view, but you’d need to test and tune for your scenario / usage patterns.


Brad

 

Dave Milter

unread,
Feb 25, 2018, 6:45:10 PM2/25/18
to SpatiaLite Users


On Monday, February 26, 2018 at 2:13:39 AM UTC+3, Brad Hards wrote:

You do realise that you could project your query box as well?

 


If you mean project rectangle on some map plane to some complex geometry on latitude and longitude plane,
then I suppose it is impossible with spatialite.
May be with postgis that has geography type, but not with spatialite.

For example if I have map with azimuth projection and  center of azimuth projection at the south pole, and I do query
points that belongs to rectangle that cover the south pole.
Or I query objects that belong to rectangle that cover 180 longitude and so on things.
 
So as I understand at now queries that deals with srid=4326(wgs-84) should be done with special care,
and I can not jump forward and backward to other projections.

a.fu...@lqt.it

unread,
Feb 26, 2018, 6:31:37 AM2/26/18
to spatiali...@googlegroups.com
On Sun, 25 Feb 2018 13:54:17 -0800 (PST), Dave Milter wrote:
> What is the best way to update every row in column added by
> `AddGeometryColumn`?
>
> ------------------- <snip> -------------------
> ```
> As can you see I have two geometry columns, one with wgs-84 latitude,
> longitude and another one with some projection.
> And I need time to time reproject  column with projection.
>
> ------------------- <snip> --------------------
>
> May be there is other way to fill geometry column and spatial index
> with data?
>

Hi Dave,

here are the measured timings of some practical test.

dataset: road network of Tuscany Region, two tables
"house_nr" (POINT, SRID=3003) 1,481,026 rows
"roads" (LINESTRING, SRID=3003) 392,617 rows

platform: intel i7 3,6 GHz, 32GB, 1TB HDD)
Windows 7 pro 64 bit
libsqlite3 3.22.0
libspatialite 4.4.0-devel

test #1
--------------------------------------------------
CREATE TABLE house_nr_proj (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('house_nr_proj', 'geom', 4326, 'POINT', 'XY');
SELECT CreateSpatialIndex('house_nr_proj', 'geom');
INSERT INTO house_nr_proj
SELECT id, ST_Transform(geom, 4326)
FROM house_nr;
CREATE TABLE roads_proj (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('roads_proj', 'geom', 4326, 'LINESTRING',
'XY');
SELECT CreateSpatialIndex('roads_proj', 'geom');
INSERT INTO roads_proj
SELECT id, ST_Transform(geom, 4326)
FROM roads;

house_nr_proj: 3 minutes and 7 seconds
roads_proj: 58 seconds


test #2
--------------------------------------------------
CREATE TABLE house_nr_proj (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('house_nr_proj', 'geom', 4326, 'POINT', 'XY');
INSERT INTO house_nr_proj
SELECT id, ST_Transform(geom, 4326)
FROM house_nr;
SELECT CreateSpatialIndex('house_nr_proj', 'geom');
CREATE TABLE roads_proj (id INTEGER PRIMARY KEY);
SELECT AddGeometryColumn('roads_proj', 'geom', 4326, 'LINESTRING',
'XY');
INSERT INTO roads_proj
SELECT id, ST_Transform(geom, 4326)
FROM roads;
SELECT CreateSpatialIndex('roads_proj', 'geom');

house_nr_proj: 2 minutes, 28 seconds
roads: 42 seconds

in this second test I've created the Spatial Index
_AFTER_ fully populating the corresponding table.


test #3
--------------------------------------------------
PRAGMA cache_size = -65536;

house_nr_proj: 1 minute, 52 seconds
roads_proj: 41 seconds

this third test is identical to the above one,
but this time I've enabled a cache of 64MB.


test #4
--------------------------------------------------
PRAGMA cache_size = -131072;

house_nr_proj: 1 minute, 53 seconds
roads_proj: 39 seconds

same as above, cache of 128MB


test #5
--------------------------------------------------
PRAGMA cache_size = -131072;
PRAGMA journal_mode = off;

house_nr_proj: 1 minute, 47 seconds
roads_proj: 36 seconds

same as above, cache of 128MB and journal file
completely disabled.
note: this is a dangerous condition, an eventual
power failure or any other system stop could
produce a severely corrupted database file.


test #6
--------------------------------------------------
ATTACH DATABASE "./test.sqlite" AS a;

house_nr_proj: 1 minute, 40 seconds
roads_proj: 32 seconds

in this final test I've created a :memory: DB,
then attaching the filesystem-based DB; now both
"house_nr_proj" and "roads_proj" (and their related
Spatial Index) will be created directly on RAM.


CONCLUSIONS:
=================================================
1. creating the Spatial Index after (and not before)
fully populating the Spatial Table surely has
a positive impact on performance.

2. also configuring an appropriately dimensioned cache
will surely have a positive impact on performance,
but the specific amount of such benefit is variable
and strongly depends on the number of rows, on the
specific distribution of actual values and on the
intrinsic performances of the I/O subsystem.
(in other words; it should be carefully tuned by
practical tests)

3. disabling the journal file is a dangerous option,
but could help a little more.

4. using a :memory: DB surely is the fastest solution
you can deploy.

remarks: any kind of Index is strongly optimized so
to support very fast search operations.
being fast when performing insert, update or delete
operations comes in second place and is not a specific
design requisite, most notable if the final result can
be detrimental when querying the Index.

more specifically, creating/updating an R*Tree Spatial
Index is a very demanding task, and surely imposes an
extraordinary CPU workload.

what we can try to optimize is the overhead caused by
I/O traffic from/to the filesystem.
defining an optimal cache or, even better, using a
:memory: DB usually effectively minimizes or nullifies
any disk-base I/O, and could thus substantially help to
get faster timings.

bye Sandro



Dave Milter

unread,
Feb 26, 2018, 9:04:52 AM2/26/18
to SpatiaLite Users
Hi, Sandro.


On Monday, February 26, 2018 at 2:31:37 PM UTC+3, sandro furieri wrote:
On Sun, 25 Feb 2018 13:54:17 -0800 (PST), Dave Milter wrote:
> What is the best way to update every row in column added by
> `AddGeometryColumn`?
>

here are the measured timings of some practical test.


Thanks for sharing numbers. I retried my measurements using this knowledge,
and CreateSpatialIndex after filling 'proj' column indeed reduce time to 2x times.

But how in practice should I use CreateSpatialIndex?

I need update all rows in column several times, not once.

At first I can call 'CreateSpatialIndex' at the end,
but what about the second time?

I tried:

SELECT DisableSpatialIndex('Foo', 'proj');
for all rows
   UPDATE Foo SET proj=?1 WHERE id_key=?2

SELECT CreateSpatialIndex('Foo', 'proj');

and the second run cause error:
CreateSpatialIndex() error: either "Foo"."proj" isn't a Geometry column or a SpatialIndex is already defined

Should I also DROP TABLE Foo.proj or something?

May be I missed something, there is CreateSpatialIndex, but there are no DeleteSpatialIndex,
there is DisableSpatialIndex, but no there are no EnableSpatialIndex (which trigger full index recreation from column)?

mj10777

unread,
Feb 26, 2018, 9:11:42 AM2/26/18
to SpatiaLite Users


On Monday, 26 February 2018 15:04:52 UTC+1, Dave Milter wrote:
Hi, Sandro.

On Monday, February 26, 2018 at 2:31:37 PM UTC+3, sandro furieri wrote:
On Sun, 25 Feb 2018 13:54:17 -0800 (PST), Dave Milter wrote:
> What is the best way to update every row in column added by
> `AddGeometryColumn`?
>

here are the measured timings of some practical test.


Thanks for sharing numbers. I retried my measurements using this knowledge,
and CreateSpatialIndex after filling 'proj' column indeed reduce time to 2x times.

But how in practice should I use CreateSpatialIndex?
Mayby a combination of:

DisableSpatialIndex, RecoverSpatialIndex 


Mark

a.fu...@lqt.it

unread,
Feb 26, 2018, 10:13:22 AM2/26/18
to spatiali...@googlegroups.com
<pre>On Mon, 26 Feb 2018 06:11:42 -0800 (PST), 'mj10777' via SpatiaLite
Users wrote:
> On Monday, 26 February 2018 15:04:52 UTC+1, Dave Milter wrote:
>> But how in practice should I use CreateSpatialIndex?
>>
>> --------------- <snip> -----------------
>>
>> the second run cause error:
>> CreateSpatialIndex() error: either "Foo"."proj" isn't a Geometry
>> column or a SpatialIndex is already defined
>>
>> Should I also DROP TABLE Foo.proj or something?
>>

calling DisableSpatialIndex() just sets
"geometry_columns"."spatial_index_enabled" = 0
end removes all triggers intended to synchroniza
the R*Tree and the Spatial Table, but does not
drop the R*Tree itself.
if you really want to completely remove a Spatial
Index you are required to call (in this sequence):

SELECT DisableSpatialIndex('table_name', 'column_name');
DROP TABLE idx_<table_name>_<column_name>;


> Mayby a combination of:
>
> DisableSpatialIndex, RecoverSpatialIndex
>

yes, it could effectively work but it requires a
further passage, because after calling DisableSpatialIndex
"geometry_columns"."spatial_index_enabled" will be
set to 0, thus causing any following call to
RecoverSpatialIndex to fail.
so the correct sequence should be:

..... DisableSpatialIndex ....;
UPDATE geometry_columns SET spatial_index_enabled = 1
WHERE f_table_name = 'house_nr_proj' AND
f_geometry_column = 'geom';
..... RecoverSpatialIndex ....;

warning: such an approach is warmly deprecated, for
two very good reasons:
a. directly updating a meta-table never is a good idea
b. repeatedly calling RecoverSpatialIndex will cause
a catastrophic slowness, because this function will
scan the whole Spatial Table checking for every
single feature if a corresponding R*Tree entry do
correctly exists.
it's a cure exacerbating the disease.

the same is for repeatedly dropping and recreating
the Spatial Index; you'll simply end up by paying
several times a cost you've already payed before.

calling CreateSpatialIndex _after_ populating the
spatial table can be an effective solution only if
you are absolutely sure to never change in a second
time the spatial table.
but if you already know in advance that the spatial
table will certainly change from time to time, then
the simpler and most effective solution you can adopt
is the one to never touch an existing R*Tree, leaving
it absolutely free to follow its own internal logic.

bye Sandro





Dave Milter

unread,
Feb 26, 2018, 10:59:20 AM2/26/18
to SpatiaLite Users


On Monday, February 26, 2018 at 6:13:22 PM UTC+3, sandro furieri wrote:
yes, it could effectively work but it requires a
further passage, because after calling DisableSpatialIndex
"geometry_columns"."spatial_index_enabled" will be
set to 0, thus causing any following call to
RecoverSpatialIndex to fail.
so the correct sequence should be:

..... DisableSpatialIndex ....;
UPDATE geometry_columns SET spatial_index_enabled = 1
WHERE f_table_name = 'house_nr_proj' AND
       f_geometry_column = 'geom';
..... RecoverSpatialIndex ....;


Thank you for clarification:

I tried this:

```
SELECT DisableSpatialIndex('{tbl_name}', 'proj');

update my data

UPDATE geometry_columns SET spatial_index_enabled = 1 WHERE f_table_name = '{tbl_name}' AND
       f_geometry_column = 'proj';
SELECT RecoverSpatialIndex('{tbl_name}', 'proj', 1);
```

and I called this code two times in row, the first one was ok, but the second one cause error:

DisableSpatialIndex() error: either "Foo"."proj" isn't a Geometry column or no SpatialIndex is defined

 

b. repeatedly calling RecoverSpatialIndex will cause
    a catastrophic slowness, because this function will
    scan the whole Spatial Table checking for every
    single feature if a corresponding R*Tree entry do
    correctly exists.
    it's a cure exacerbating the disease.


But, but I call  RecoverSpatialIndex with no_check = TRUE,
is it still problem?

the same is for repeatedly dropping and recreating
the Spatial Index; you'll simply end up by paying
several times a cost you've already payed before.

calling CreateSpatialIndex _after_ populating the
spatial table can be an effective solution only if
you are absolutely sure to never change in a second
time the spatial table.
but if you already know in advance that the spatial
table will certainly change from time to time, then
the simpler and most effective solution you can adopt
is the one to never touch an existing R*Tree, leaving
it absolutely free to follow its own internal logic.



So you suggest do nothing? But 800 ms on fast machine and 5 seconds on my old notebook is too slow,
because of I load only the 20% of my real data to sql tables.

May be I can apply another suggestions from your timings  above?
It is possible to spread spatial index?

I mean I have two spatial columns: 'geo' with geographical data (SRID=4326) and 'proj' with (SRID=-1),
can I hold 'geo' index inside ordinary file, and 'proj' index inside attached ':memory:'?

a.fu...@lqt.it

unread,
Feb 26, 2018, 11:34:24 AM2/26/18
to spatiali...@googlegroups.com
On Mon, 26 Feb 2018 07:59:19 -0800 (PST), Dave Milter wrote:
>> b. repeatedly calling RecoverSpatialIndex will cause
>>     a catastrophic slowness, because this function will
>>     scan the whole Spatial Table checking for every
>>     single feature if a corresponding R*Tree entry do
>>     correctly exists.
>>     it's a cure exacerbating the disease.
>
> But, but I call  RecoverSpatialIndex with no_check = TRUE,
> is it still problem?
>

it could be marginally better, but still very far
from optimal.
please, forget at all any idea about selectively
enable/disable the Spatial Index: it's a dead end
going nowhere.


> So you suggest do nothing? But 800 ms on fast machine
> and 5 seconds on my old notebook is too slow
>

creating/updating a Spatial Index is an intrinsically
costly operation, and you necessarily have to pay some
price for this.
you can optimize the I/O overhead by configuring a
well-dimensioned cache, but you can do absolutely
nothing to speed up the R*Tree own logic.


> May be I can apply another suggestions from your timings  above?
> It is possible to spread spatial index?
>
> I mean I have two spatial columns: 'geo' with geographical data
> (SRID=4326) and 'proj' with (SRID=-1),
> can I hold 'geo' index inside ordinary file, and 'proj' index inside
> attached ':memory:'?
>

please, read more carefully my previous example
"test #6"; it's exactly the configuration you
are looking for.

bye Sandro

Dave Milter

unread,
Feb 26, 2018, 12:11:02 PM2/26/18
to SpatiaLite Users


On Monday, February 26, 2018 at 7:34:24 PM UTC+3, sandro furieri wrote:


please, read more carefully my previous example
"test #6"; it's exactly the configuration you
are looking for.


I reread it again. As I understand "test #6" has such architecture:
in ":memory:" st_tranformed database with only geometry ( SRID=4326 )
and in file the same data with SRID=3003 and may be additional data.

Actually I imagine little different architecture: two spatial indexes for one table,
one in ":memory:", one in  file.

The difference is in triggers, as I see your "tables with index only" in ":memory:" will be not changed,
if database in file was changed.
But not big deal I can add triggers on update/insert/delete by myself.

But, in case of ":memory:", the topic what we discuss here is going to work out of the box, am I right?
I mean at any time, when I want reproject my data with unknown projection, I can just
close sqlite handle of ":memory:", open new one with  the same name  ":memory:",
 attach file database to it, call SELECT InitSpatialMetadata, and after filling
memory database I can call "CreateSpatialIndex"?

a.fu...@lqt.it

unread,
Feb 26, 2018, 12:45:57 PM2/26/18
to spatiali...@googlegroups.com
On Mon, 26 Feb 2018 09:11:02 -0800 (PST), Dave Milter wrote:
> On Monday, February 26, 2018 at 7:34:24 PM UTC+3, sandro furieri
> wrote:
>
>> please, read more carefully my previous example
>> "test #6"; it's exactly the configuration you
>> are looking for.
>
> I reread it again. As I understand "test #6" has such architecture:
> in ":memory:" st_tranformed database with only geometry ( SRID=4326 )
> and in file the same data with SRID=3003 and may be additional data.
>
> Actually I imagine little different architecture: two spatial indexes
> for one table, one in ":memory:", one in  file.
>

sorry, this is not allowed in SpatiaLite; a Spatial Index is always
expected to be in the same database where the corresponding Spatial
Table is.


> The difference is in triggers, as I see your "tables with index only"
> in ":memory:" will be not changed, if database in file was changed.
> But not big deal I can add triggers on update/insert/delete by
> myself.
>

you are absolutely free to implement any alternative of your own, but
this will clearly become a different thing from SpatiaLite.


> But, in case of ":memory:", the topic what we discuss here is going
> to
> work out of the box, am I right?
> I mean at any time, when I want reproject my data with unknown
> projection, I can just close sqlite handle of ":memory:", open new
> one
> with the same name ":memory:", attach file database to it, call
> SELECT InitSpatialMetadata, and after filling memory database I can
> call "CreateSpatialIndex"?
>

if I understand well your intentions, the simpler solution I can
suggest you is:
- open a first connection to the permanent SpatiaLite database.
- then open a second connection to another database just intended
to store your own "unorthodox" and volatile R*Tree.
- the second DB connection does not strictly requires to support
SpatiaLite; you can directly create and fully manipulate an
R*Tree just using the basic SQLite support.
https://sqlite.org/rtree.html
- I'm obviously assuming that you are going to use some high
level language to develop your application, so dispatching
the SQL queries between the two connections shouldn't be a
serious issue.

bye Sandro

Dave Milter

unread,
Feb 26, 2018, 1:38:51 PM2/26/18
to SpatiaLite Users


On Monday, February 26, 2018 at 8:45:57 PM UTC+3, sandro furieri wrote:
On Mon, 26 Feb 2018 09:11:02 -0800 (PST), Dave Milter wrote:
> On Monday, February 26, 2018 at 7:34:24 PM UTC+3, sandro furieri
> wrote:
>
>> please, read more carefully my previous example
>> "test #6"; it's exactly the configuration you
>> are looking for.
>
> I reread it again. As I understand "test #6" has such architecture:
> in ":memory:" st_tranformed database with only geometry ( SRID=4326 )
> and in file the same data with SRID=3003 and may be additional data.
>
> Actually I imagine little different architecture: two spatial indexes
> for one table, one in ":memory:", one in  file.
>

sorry, this is not allowed in SpatiaLite; a Spatial Index is always
expected to be in the same database where the corresponding Spatial
Table is.


> The difference is in triggers, as I see your "tables with index only"
> in ":memory:" will be not changed, if database in file was changed.
> But not big deal I can add triggers on update/insert/delete by
> myself.
>

you are absolutely free to implement any alternative of your own, but
this will clearly become a different thing from SpatiaLite.



Why? I suppose it should work on top of spatialite and not touch it's functionlity.
I mean I have two tables (create by me, not internal table of spatialite):

memory.foo_proj_index

and database.foo_full,

If add triggers for updating memory.foo_proj_index on update of database.foo_full,
then after my trigger spatialite tirggers should fires and update geo index for
memory.foo_proj_index. I don't see how this may become interference to  spatialite?
 
if I understand well your intentions, the simpler solution I can
suggest you is:
- open a first connection to the permanent SpatiaLite database.
- then open a second connection to another database just intended
   to store your own "unorthodox" and volatile R*Tree.
- the second DB connection does not strictly requires to support
   SpatiaLite; you can directly create and fully manipulate an
   R*Tree just using the basic SQLite support.
   https://sqlite.org/rtree.html
- I'm obviously assuming that you are going to use some high
   level language to develop your application, so dispatching
   the SQL queries between the two connections shouldn't be a
   serious issue.


I also thinks about directly usage of rtree, but my data also contains polygons, line strings
and so on. Not sure that I want to handle them by my self.
About separate handles I am also in doubts, yes it is possible to 
work with  memory.foo_proj_index <-> database.foo_full connection via "hight level language"
instead of SQL, but triggers looks like simple and robust solution in compare with total control of code inside
module, who take care about database connection and prevent in future any non coherent calls to sqlite.
Thank you for your time.

a.fu...@lqt.it

unread,
Feb 26, 2018, 3:43:59 PM2/26/18
to spatiali...@googlegroups.com
On Mon, 26 Feb 2018 10:38:51 -0800 (PST), Dave Milter wrote:
> I also thinks about directly usage of rtree, but my data also
> contains
> polygons, line strings and so on.
> Not sure that I want to handle them by my self.
>

Hi Dave,

it's surprisingly simple and not at all complex.
recall: a Spatial Index is just based on Bounding Boxes, and has
no idea about linestrings, polygon and so on.
defining a BBOX simply requires four coordinated: minx, miny
(lower left corner) and maxx, maxy (upper right corner).

defining the BBOX corresponding to some POINT is trivial, because:
minx = x = maxx
miny = y = maxy

in the case of Linestrings or Polygons it's a little bit more
complex, but just requires looping on all vertices so to get
the min and max values for X and Y.
note: for Polygons you can simply ignore any Interior Ring,
because the BBOX is determined by the Exterior Ring only.

in terms of SQL statements all you need are:

CREATE VIRTUAL TABLE my_unorthodox_index
USING rtree(id, minx, maxx, miny, maxy);

INSERT INTO my_unortodox_index (id, minx, maxx, miny, maxy)
VALUES (?, ?, ?, ?, ?);

UPDATE my_unortodox_index
SET minx = ?, maxx = ?, miny = ?, maxy = ?
WHERE id = ?;

DELETE FROM my_unortodox_index WHERE id = ?;


> About separate handles I am also in doubts, yes it is possible to
> work with memory.foo_proj_index database.foo_full connection via
> "hight level language"
> instead of SQL, but triggers looks like simple and robust solution in
> compare with total control of code inside
> module, who take care about database connection and prevent in future
> any non coherent calls to sqlite.
>

you can easily get the one and the other.
nothing forbids you to develop a small extension of your own
implementing a SQL function intended to always keep in sync
the Spatial Table in the first connection and the "unortodox"
R*Tree in the second connection.
then you simply have to call such function from within
appropriate triggers and that's all.

using a "bare" R*Tree only indirectly related to some Spatial
Table will certainly be the most efficient solution, most
notably if it will be stored in a :memory: DB via an
independent connection.

bye Sandro
Reply all
Reply to author
Forward
0 new messages