radical KNN refactoring ?

113 views
Skip to first unread message

Alessandro Furieri

unread,
Jun 2, 2021, 4:31:05 AMJun 2
to SpatiaLite Users
Hi list,

few considerations about the current state of the art of the KNN module.

A) several users during last months/weeks reported the practical impossibility
   to successfully use KNN on Windows when using environments like Java or
   Python or C# and alike.
   
   typical sympthoms: anything run smoothly in spatialite_gui or spatialite CLI
   but there is no way to get the same SQL queries working on language
   connectors. any attempt fails reporting mysterious errors or directly
   causes an application crash.
   
   the technical causes explaining for such a behavior are absolutely clear:
   the current KNN implementation critically depends on a very special callback 
   API of libsqlite3 (sqlite3_rtree_query_info), that is intended to explore
   all the branches of the tree supporting any SpatialIndex.
   unhappily this very special API is not included within the "reflected"
   APIs directly available for dynamically loaded modules (as mod_spatialite)
   and requires to be directly linked to libsqlite3.dll
   
   the net effect of all this is that a robust, stable and affordable
   configuration is ensured only if and when both the main application
   and the SpatiaLite extension use the same identical libsqlite.dll
   (as it happens in C/C++ apps such as SpatiaLite GUI and CLI).
   but when using complex language frameworks such as Java, Python, C# etc
   it's very probable to end up with having _TWO_ different conflicting
   copies of libsqlite.dll (one referenced by the language and the other 
   by KNN), possibly of different versions or built using different 
   compilers and/or different settings, and this will surely be a fatal 
   toxic combination.
   
   the ideal solution will be building the SpatiaLite extension
   directly on the top of the same libraries used at run time
   by the language framework, but this seems a too much complex
   and demanding approach for average users and developers.
   a completely unrealistic expectation.
   
   
B) there is a pending proposal from Toto' Fiandaca suggesting to
   introduce a MaxDistance radius parameter to KNN.
   short rationale: the current implementation is rather slow,
   and introducing an alternative approach based on MaxDistance 
   will certainly ensure noticeably faster performances in the
   vast majority of cases.
   
   
evaluating altogether A) and B) leads to a rather obvious conclusion;
a complete refactoring of KNN is required for the next version,
a refactoring based on the following design assumptions:

1) getting definitely rid of the problematic sqlite3_rtree_query_info API

2) adopting a radically different approach internally based on 
   conventional SpatialIndex queries.
   
PROS:
===================================
- faster execution (at least assuming a reasonably small MaxDistace
  radius)
- robust stability ensured on all configurations, including
  Java, Python, C# etc language bindings

CONS:  
===================================
just one: the current approach based on full exploration
of all branches of the Rtree completely avoids any preliminary
assumption about MaxDistance. it's an approach that will
surely identify nearest features even when there is a
very scattered and irregular spatial distribution.
this will be no longer possible with the proposed refactored 
implementation because MaxDistance will then become a mandatory 
argument.

your considerations about all this ?

bye Sandro

Andrea Giraldi

unread,
Jun 2, 2021, 4:41:17 AMJun 2
to spatiali...@googlegroups.com
e zfZA¥¥%%%

















%Aawz

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/4c066fdf-6421-4c46-9bff-7f901fe8fa90n%40googlegroups.com.

Pedro Camargo

unread,
Jun 2, 2021, 4:56:05 AMJun 2
to spatiali...@googlegroups.com
Sandro, 

The refactoring seems to be a better route, as better performance and interoperability would be awesome to have. 
The downside does not seem to bad, though, as one can always set a really large max_distance and call it a day! 

Cheers, 
Pedro 



---- On Wed, 02 Jun 2021 18:31:05 +1000 alessandr...@gmail.com wrote ----

--

Totò Fiandaca

unread,
Jun 2, 2021, 5:24:37 AMJun 2
to SpatiaLite Users
thank you for considering this new implementation

Totò Fiandaca

Duncan

unread,
Jun 2, 2021, 6:08:11 AMJun 2
to SpatiaLite Users
Thanks Sandro. I see the down side but it is currently too slow (and problematic) for my purposes.
Perhaps a large default value for MaxDist (but how large?)

Maurizio Trevisani

unread,
Jun 2, 2021, 11:40:01 AMJun 2
to spatialite-users
Pay attention to the case when distance results 0 (when the position of the point Is on the boundary of the target). 
Thanks,
Maurizio

Il mer 2 giu 2021, 10:31 Alessandro Furieri <alessandr...@gmail.com> ha scritto:
--

Maurizio Trevisani

unread,
Jun 2, 2021, 11:43:22 AMJun 2
to spatialite-users
May be max_distance null means whatever distance the nearest feature is at.

a.fu...@lqt.it

unread,
Jun 2, 2021, 12:07:50 PMJun 2
to spatiali...@googlegroups.com
On Wed, 2 Jun 2021 17:43:08 +0200, Maurizio Trevisani wrote:
> May be max_distance null means whatever distance the nearest feature
> is at.
>

sorry Maurizio,

but after adopting the new proposed implementation detecting the
nearest feature will completely depend on the assumption made
about MaxDistace.
when the nearest feature falls beyond the allowed max radius it
will be absolutely impossible to identify and you'll simply
get a NULL result.

just to recapitulate the terms of the question:

1) exploring all branches of the Rtree one by one using
the dangerous sqlite3_rtree_query_info API of libsqlite.
surely a rather slow process, but one not requiring
any arbitrary assumption about MaxDistance.

2) internally performing many SQL queries based on the classic
SpatialIndex. this will be intrinsecally faster, but it
strictrly requires to define in advance a given search
frame (usually, a circle).

we must necessarily choose one the two alternatives;
they are radically different strategies and are mutually
exclusive. the one or the other.

ciao Sandro

Maurizio Trevisani

unread,
Jun 2, 2021, 4:28:29 PMJun 2
to spatialite-users
Yes, you're right!
Sorry.

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.

a.fu...@lqt.it

unread,
Jun 3, 2021, 2:45:25 AMJun 3
to spatiali...@googlegroups.com
Second thought.

A modified algorithm still fully based on the SpatialIndex and
MaxDistance
but capable to identify all neighbours, even those exceeding the given
distance radius:

pass #1: we'll start by identifying all neighbours within the radius.
at the end of this pass we'll check if all features have been
linked to the expected number of neighbours:
true: the algorithm will stop
false: the algorithm will continue

pass #2: now we'll consider only those features not yet presenting
the requested number of neighbours, and only for those
we'll perform a further search by applying a doubled radius.

pass #3: we'll check again if there are any incomplete features:
true: the algorithm will return again to perform pass #2
false: the algorithm will stop

short rationale: we are supposing that a "reasonable" MaxDistance is
known in advance.
so pass #1 is expected to resolve the vast majority of cases, and
just a handfull of "odd features" will still remain undetermined
or incomplete.
iteratively performing further queries each time doubling the search
radius will quickly converge; and because only a small (and
progressively
decreasing) number of items will be involved on each iteration it's
reasonable to expect a quick solution also in the case of relaxed
MaxDistance.

bye Sandro

Maurizio Trevisani

unread,
Jun 4, 2021, 3:42:54 AMJun 4
to spatiali...@googlegroups.com
Let user decide (by adding a parameter) if he wants to iterate as many
times as needed to reach the expected number of neighbourse.

I need, in some cases, to get just one or zero neighbours to a set of
points within the maximum distance, and accept also points coincident
to some soirce points (resulting at zero distance).

So actually I use KNN to search the nearest feature and then remove
any result having distance greater than maximum_distance.

Thx,
bye,
Maurizio
> --
> You received this message because you are subscribed to the Google Groups
> "SpatiaLite Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to spatialite-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/spatialite-users/b7ff9f31b08711df39b574ce8b713017%40lqt.it.
>

bric...@outlook.com

unread,
Jun 11, 2021, 11:20:08 PMJun 11
to SpatiaLite Users
It's more than just KNN. I mentioned the same issue a couple of years ago here in the context of PROJ. Furthermore, using "./configure --enable-knn=no --enable-proj=no" still doesn't avoid linking to libsqlite3. There are few other calls to sqlite3_enable_load_extension and sqlite3_rtree_query_callback which can't be compiled out.
Reply all
Reply to author
Forward
0 new messages