I studied and used the fabulous VirtualKNN present in spatialite [0] for
calculations of minimum distances between points and lines on large datasets (millions of
lines) and I noticed that the classic approach of using spatial indexes is - for my goal -
enormously more performing than the use of VirtualKNN (2 sec against
100 seconds); this is also due to the fact that the max_distance option does not exist in the
virtualKNN, and it calculates the distances on all features.
these are the two queries compared, same input data and same output:
- geo-table `route` linestring 32632 with 116674 features
- geo-table `schools` point 32632 with 5529 features
VirtualKNN:
CREATE TABLE t_schuole_100mKNN AS
SELECT d.chiave_uni, d.fid as id_route, d.distance as distance,
ST_shortestline (d.geometry, s.geometry) as geom
FROM
(SELECT a.fid as fid, a.distance as distance, zz.chiave_uni as
key_uni, zz.geometry
FROM knn as a
JOIN
schools as zz
WHERE f_table_name = 'route'
AND f_geometry_column = 'geometry'
AND ref_geometry = zz.geometry
AND max_items = 1) d,
route s
WHERE ST_Distance (s.geometry, d.geometry) <= 100 and id_route = s.ogc_fid
ORDER BY d.chiave_uni;
SELECT
RecoverGeometryColumn ('t_schuole_100mKNN', 'geom', 32632, 'LINESTRING', 'XY');
output 1737 lines
takes about 100 seconds
classic spatialIndex use
CREATE TABLE t_schuole_100mMBR AS
SELECT a.ogc_fid as id_route, Min (ST_Distance (a.geometry, zz.geometry)) AS
distance,
zz.uni_key as uni_key, st_shortestline (a.geometry, zz.geometry)
as geom
FROM route as a, schools as zz
WHERE ST_Distance (a.geometry, zz.geometry) <= 100 AND a.ogc_fid IN (
SELECT rowid
FROM SpatialIndex
WHERE f_table_name = 'route'
AND search_frame = BuildCircleMbr (ST_X (zz.geometry),
ST_Y (zz.geometry), 100))
GROUP by zz.chiave_uni;
SELECT
RecoverGeometryColumn ('t_schuole_100mMBR', 'geom', 32632, 'LINESTRING', 'XY');
output 1737 lines
takes about 2 seconds
Perhaps the queries are not written optimally, but in my opinion a parameter
max_distance in virtualKNN would help a lot
thank you