On Tue, 11 Feb 2025 08:50:45 -0800 (PST), Jim wrote:
> I have a 7 million row table that contains a geometry of a POINT.
> The table also has a spatial index build on the geometry column.
> I want to find the row nearest to my location
>
Hi Jim,
this is one of the most classic problems and has been covered
in depth in the academic literature.
It is generally known as the nearest neighbors problem; here
you can find a valid summary to begin to delve deeper into
the topic:
https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
https://en.wikipedia.org/wiki/Nearest_neighbor_search
In a nutshell:
- it's a decidedly complex problem
- many different algorithms have been proposed
- unfortunately none of them are simple
- and above all they all adapt very poorly to
a simple SQL implementation
The biggest obstacle for any SQL approach is that in
principle it's never possible to set a reasonable search
radius to narrow the search.
Just a few easy examples to understand better:
- if you are looking for the nearest pharmacy and you are
in some city, a search radius of a few hundred meters
should be reasonable.
- but if you are in a rural area the search radius will
have to be expanded to several km
- if, however, you find yourself in the middle of the
forests in a mountain range you'll have to widen the
search radius to many tens of km
- finally, if you are on a boat in the middle of the
ocean the search radius should be extended to several
thousand km
- there are no general rules; it all depends on what
you're looking for, where you are, and the statistical
dispersion of the sample.
Note: apparently you use geographic coordinates of longitude
and latitude. This introduces many additional complications
that we'll intentionally ignore to avoid drowning in
excessive complexity.
The problem is already complex enough in itself.
> ... and have come up with this select form:
>
> SELECT *
> FROM my_table
> WHERE ST_Distance(geometry, MakePoint(mylon, mylat) <= 0.013
> ORDER BY ST_Distance(geometry, MakePoint(-122.000, 47.000)) ASC
> LIMIT 1;
>
Your query suffers from a fairly obvious flaw.
In the WHERE clause you are imposing a completely arbitrary
distance radius, and as we just saw it could be a very severe
limit.
SELECT
p.id,
p.name, Min(ST_Distance(p.geom, x.mypoint, 1)) AS dist
FROM points AS p,
(SELECT MakePoint(-122.3655289, 47.7635415, 4326) AS mypoint) AS x;
A modified Query like the previous one, however, is really
robust; it will always find the closest point in any situation
because it does not assume any arbitrary distance radius.
Note: ST_Distance(g1, g2, 1) will return a distance expressed
in meters precisely calculated on the Earth's ellipsoid, so you
can ignore all the problems related to converting angular
distances to metric distances.
It works perfectly both neat the equator and near the poles.
However, it's clear that this is a very slow query when many
millions of points have to be examined.
It cannot be a practical solution.
... the rest of my analysis continues below your next post...
see you later,
Sandro