Select poinnts based on distance

20 views
Skip to first unread message

ckgoo...@gmail.com

unread,
Aug 1, 2024, 11:45:00 AMAug 1
to spatiali...@googlegroups.com

Hi, I have a simple Spatialite database.  One table called spt_cities with columns, id, name, country, location

Where location is a point geometry.

I populated this table with about 8000 cities downloaded from Wikidata.

I ultimately want to get any cities within a given distance of a given point.  But I am trying something simpler first.

I run the following query in the SQLite terminal:

SELECT name, country, ST_Distance( location, GeomFromText('POINT(6.0 40.0)') ) AS distance

FROM spt_cities

ORDER BY name LIMIT 10;

And I get 10 results listed but only the  name and country, not the distance.

How do I get the distance listed out?  And is this the right direction to be going to return results near a particular coordinate?

Best, Chris

a.fu...@lqt.it

unread,
Aug 1, 2024, 1:24:49 PMAug 1
to spatiali...@googlegroups.com
On Thu, 1 Aug 2024 16:36:42 +0100, ckgoo...@gmail.com wrote:
> And I get 10 results listed but only the name and country, not the
> distance.
>

Hi Chris,

note that ST_Distance() always returns a NULL value if one of
the two Geometries is an invalid one.

HINT: when using the SQLite front-end always specify a
.nullvalue NULL
directive, so to be always able to easly distiguish
NULLs from empty values.

So the problem seems to be in that your "location" values
are not valid Geometries.
You can test for this in a very simple way:

SELECT TypeOf(location), IsGeometryBlob(location),
ST_IsValid(location)
FROM spt_cities;

the first column is expexted to contain "blob", the second and
the thirs 1; if not your "location" values definitely aren't
valid SpatiaLite's geometries.


> And is this the right direction
> to be going to return results near a particular coordinate?
>

surely not.
the "LIMIT 10" clause simply aborts the query when the first
10 rows are ready, but this doesn't ensures at all that they
are the nearest to your reference point.

a more reasonable Spatial query could be the following:

SELECT name, country,
ST_Distance( location, GeomFromText('POINT(6.0 40.0)') ) AS distance
FROM spt_cities
ORDER BY distance LIMIT 10;

this will effectively return the nearest 10 cities.

HINT: study the SpatialIndex carefully before proceeding
further, and check that you understand in depth how it
should be used to obtain efficient and fast Spatial Queries

bye Sandro

ckgoo...@gmail.com

unread,
Aug 3, 2024, 11:55:34 AMAug 3
to spatiali...@googlegroups.com
Hi, you are right that my geometry object is not valid.
So I started from the beginning again and created my database, imported data into a non-spatial table, created my spatial table and then tried the following.
INSERT INTO spt_Cities (id, name, country, location )
VALUES ( 0, 'London', 'United Kingdom', MakePoint( -1,50, 4326) );
And this worked fine because when I then do:
SELECT TypeOf(location), IsGeometryBlob(location),
ST_IsValid(location)
FROM spt_cities;
I get blob,1,1.

I delete this line and then try inserting from my non-spatial table. It has columns of cityLabel, CountryLabel, long, lat.
INSERT INTO spt_Cities (id, name, country, location )
SELECT NULL, cityLabel, countryLabel,
MakePoint( long, lat, 4326)
FROM tbl_Cities;

But when I test the validity of the rows I don't get blob,1,1. I get null,-1,-1.
Myy only thought is that the long and lat columns in tbl_Cities are of type TEXT.
Can you see where I am going wrong?
Best, Chris
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/17dd49dea517bbfe197bdc9d30b5effc%40lqt.it.

a.fu...@lqt.it

unread,
Aug 3, 2024, 12:40:50 PMAug 3
to spatiali...@googlegroups.com
On Sat, 3 Aug 2024 16:22:21 +0100, ckgoo...@gmail.com wrote:
> But when I test the validity of the rows I don't get blob,1,1. I
> get null,-1,-1.
> Myy only thought is that the long and lat columns in tbl_Cities are
> of type TEXT.
>

and you are probably right.

as the documentation clearly states MakePoint() expects that
lat and long coordinates are DOUBLE values.
if they actually are TEXT strings MakePoint() will obviosuly
fail returning NULL.

note: it's not at all exceptional to encounter such problems
when importing data from a text file.

HINT: the easiest way to chek your input data is:

SELECT TypeOf(long), TipeOf(lat) FROM tbl_Cities;

if it's not 'real','real' but is 'text','text'
it will never work.
however you can correct quite simply by calling
the appropriate casting function as in:

SELECT( CastToDouble(long), CastToDouble(lon)
FROM tbl_Cities.

------------------------------------------

some general considerations.

It seems to me that you are not yet ready to
start working with SQLite and SpatiaLite, you
are still missing too many fundamental concepts,
you haven't studied deeply enough.

Studying well is never a waste of time,
it's the real secret to saving time later.

I advise you to take a few days to study and
understand well, perhaps starting from the
CookBoock which is now quite outdated in many
parts, but which continues to be an excellent
starting point for developing a fairly broad
competence.

You'll see that afterwards you'll certainly
be able to walk with much greater confidence
on your own legs without constantly blocking
for trivial and easily solvable reasons.

best regards,
Sandro

Mike Castle

unread,
Aug 3, 2024, 1:53:30 PMAug 3
to spatiali...@googlegroups.com
On Thu, Aug 1, 2024 at 10:24 AM <a.fu...@lqt.it> wrote:
> a more reasonable Spatial query could be the following:
>
> SELECT name, country,
> ST_Distance( location, GeomFromText('POINT(6.0 40.0)') ) AS distance
> FROM spt_cities
> ORDER BY distance LIMIT 10;
>
> this will effectively return the nearest 10 cities.
>
> HINT: study the SpatialIndex carefully before proceeding
> further, and check that you understand in depth how it
> should be used to obtain efficient and fast Spatial Queries

This approach will require a table scan and N calls to ST_Distance()
(where N == items in the table).

I would have thought, with the R-Tree stuff, there would be a way to
query "10 closest" entries as a subselect, then run ST_Distance
against just those 10. But the best I could find was the singular
ST_ClosestPoint() function.

Of course, we don't know from this example whether the OP will have
additional search parameters that might require a scan anyway... I
wonder... if we want N points, if there are more efficient approaches?
Or if they would all call ST_Distance() behind the scenes anyway so
it wouldn't really make a difference.

For example, start with a radius=1 unit circle around the center
point, and as long as something like:
SELECT COUNT(*)
FROM spt_cities
WHERE ST_Contains(location, MakeCircle(lng, lat, radius));

is less than 10, double the radius.

Then replace COUNT(*) with the name, country, ST_Distance(...) with
the ORDER_BY/LIMIT.

I don't know how to do that in SQL though, so would end up using the
host language. Maybe some SQL expert can help. :->

Of course, 10 be a parameter and no larger than the number of items in the DB.

mrc

ckgoo...@gmail.com

unread,
Aug 3, 2024, 2:50:57 PMAug 3
to spatiali...@googlegroups.com
Hi,
When I posted my question and quoted my query with a 'LIMIT 10' in it, this was perhaps misleading. I only put this in to stop my screen filling up with too many results. My actual objective is to simply return cities within a given distance / radius of my given coordinate. Ordering by distance isn't really necessary as my calling application will do this as sometimes I will sort by distance and sometimes by bearing.

And yes, I am hoping very much that there is a simple way to make use of the R*Tree functionality within Spatialite to get fast results. I'll go look at the Spatialite list of functions to see what might help to do this.

Best, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of Mike Castle
Sent: Saturday, August 3, 2024 6:46 PM
To: spatiali...@googlegroups.com
Subject: Re: [SpatiaLite-Users] Select poinnts based on distance

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/CA%2Bt9iMy88eDzuie0rOUeSeh%2BZ34TASuy5CacWx4Bn9%2BKxP44_w%40mail.gmail.com.

Mike Castle

unread,
Aug 3, 2024, 2:52:17 PMAug 3
to spatiali...@googlegroups.com
On Sat, Aug 3, 2024 at 9:40 AM <a.fu...@lqt.it> wrote:
> SELECT( CastToDouble(long), CastToDouble(lon)
> FROM tbl_Cities.

What is the benefit of using the domain specific CastToDouble(lat) vs
the more more generic CAST(lat AS FLOAT)?

I understand that, if lat is a value like "asdf", then CastToDouble
returns type null whereas CAST returns a type real, value 0.0;

But I'm wondering, in terms of defensive programming, which is
preferred? Or one of those "Well, it depends on what you want do..."
type of situations?

mrc

ckgoo...@gmail.com

unread,
Aug 3, 2024, 3:01:48 PMAug 3
to spatiali...@googlegroups.com
Hello Sandro,
Thanks for highlighting the CastToDouble function and explaining why. This has worked nicely and I've now populated my spt_Cities table and confirmed that the location column is a valid blob.

I was also able to do this by using the GeomFromText function.

I read through the Cookbook earlier this week and it was very useful. I find that trying out real examples is a good way for me to learn and really embed the theory in my own mind. But I recognize that it is probably time to revisit the Cookbook as more of it will now hopefully make sense. I also recognize that you have been very helpful so far in helping me with good explanations of where I am going wrong and I don't want to stretch your goodwill too far.

Best and many thanks, Chris


-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of a.fu...@lqt.it
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/d427844c4881a12a75dbedf02e0e3194%40lqt.it.

Mike Castle

unread,
Aug 3, 2024, 3:40:00 PMAug 3
to spatiali...@googlegroups.com
On Sat, Aug 3, 2024 at 11:50 AM <ckgoo...@gmail.com> wrote:
> My actual objective is to simply return cities within a given distance / radius of my given coordinate.

Ahh! In that case, yes, using something like:

ST_Contains(location, MakeCircle(lng, lat, radius));

where, is (likely) faster. Then again, with the number of items you
have at the moment, the difference may not be measurable. If you do
not see a difference, you might want to increase your data by an order
of magnitude or two.

Just somewhat related to your project, if not the original question,
is keeping the data in sync.

In a project I am working on, the input data comes in as text as "lat,lng".

Rather than splitting it in the host language (and then remembering to
keep that in place if I switch), is to use computed (aka, generated)
columns. That is, something like the following:
CREATE TABLE addr (
latlng VARCHAR NOT NULL,
lat VARCHAR GENERATED ALWAYS AS (SUBSTR(latlng, 1, INSTR(latlng,
",") - 1)) VIRTUAL,
lng VARCHAR GENERATED ALWAYS AS (SUBSTR(latlng, INSTR(latlng, ",")
+ 1)) VIRTUAL,
point GEOMETRY GENERATED ALWAYS AS (ST_Point(CastToDouble(lng),
CastToDouble(lat))) STORED,
CONSTRAINT pk_addr PRIMARY KEY (latlng)
);

The "point" column is readonly. Whenever I update the actual data, I
just need to update the "latlng" columng, and everything else happens
automatically.

Technically, the intermediate columns "lat" and "lng" are not
necessary for my case, but they are useful for debugging and makes the
definition for "point" a little shorter.

Cheers!
mrc

ckgoo...@gmail.com

unread,
Aug 3, 2024, 6:43:21 PMAug 3
to spatiali...@googlegroups.com
Hi Mike,

You wrote (how do you do that thing where you quote a previous email?):
ST_Contains(location, MakeCircle(lng, lat, radius));

So I made this query:
SELECT count (*) FROM spt_Cities
WHERE ST_Contains( location, MakeCircle( 0.0, 51.5, 500000, 4326));

Which returns 0. Yet:
select name, X( location ), Y( location )
FROM spt_Cities
WHERE name = 'London';

Gets me a row
London, -0.127, 51.50 Well within the circle.

So unclear why I don't get any rows returned.

Any thoughts? Think though that it is time to take a rest and see if it makes more sense in the morning.
Best, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of Mike Castle
Sent: Saturday, August 3, 2024 8:12 PM
To: spatiali...@googlegroups.com
Subject: Re: [SpatiaLite-Users] Select poinnts based on distance

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/CA%2Bt9iMwVNnGLTCCp4SzF%2BCSBcqL%2BcmSG4DYKJ1hEE26fvjUrwQ%40mail.gmail.com.

a.fu...@lqt.it

unread,
Aug 4, 2024, 12:19:28 AMAug 4
to spatiali...@googlegroups.com
Hi Mike,

your considerations are very interesting and insightful.

Golden rule: always avoid doing in C/C++ (Python, Java etc)
what you can do much better in SQL

SQLite offers several really useful and very powerful
functions for editing text strings, and they are certainly
worth to be carefully studied before you start reinventing
the wheel.

please see: https://www.sqlite.org/lang_corefunc.html

the ones that happen to be used quite frequently are
substr(), trim(), replace(), concat() and printf()

in general, cleaning up the input data thanks to
a few simple SQL statements takes much less time
than writing a piece of code in any programming
language.

best regards,
Sandrp


a.fu...@lqt.it

unread,
Aug 4, 2024, 12:31:18 AMAug 4
to spatiali...@googlegroups.com
On Sat, 3 Aug 2024 11:46:59 -0700, Mike Castle wrote:
> What is the benefit of using the domain specific CastToDouble(lat) vs
> the more more generic CAST(lat AS FLOAT)?
>
> I understand that, if lat is a value like "asdf", then CastToDouble
> returns type null whereas CAST returns a type real, value 0.0;
>
> But I'm wondering, in terms of defensive programming, which is
> preferred? Or one of those "Well, it depends on what you want do..."
> type of situations?
>

Hi Mark,

it's simply a matter of personal taste.

CastToDouble() is implemented by SpatiaLite while CASY AS FLOAT
is implemented by SQLite.
It comes naturally to me to use the SpatiaLite's version,
but of course you are absolutely free to follow your own
inclinations.

bye Sandro

Mike Castle

unread,
Aug 4, 2024, 12:33:11 AMAug 4
to spatiali...@googlegroups.com
On Sat, Aug 3, 2024 at 3:43 PM <ckgoo...@gmail.com> wrote:
> You wrote (how do you do that thing where you quote a previous email?):

Since you have a gmail.com address, I am going to guess you at least
sometimes use the web ui.

When you reply to a mail, that client automatically includes the
referenced email as a quote, only it is collapsed into a three-dot
button in the bottom of the mail. Click on that and the quote to open
it up. Then just edit as normal.

Historically (well, at least since I've been using email starting in
the late 80s), the sequence of 'greater-than space' has been the
default identifier for quoting text message (like email, usenet, etc).
Most email clients of the day allowed to tweak the identifier, but
most stayed with the '> ' sequence.

GMail, unfortunately, defaults to what is known as 'top-posting',
where the new content goes above the old. Very annoying when you get
pulled into an existing thread of messages and having to read them
backwards. So, if you want to do like I do, it is a bit of manual
labor.

> > ST_Contains(location, MakeCircle(lng, lat, radius));
>
> So I made this query:
> SELECT count (*) FROM spt_Cities
> WHERE ST_Contains( location, MakeCircle( 0.0, 51.5, 500000, 4326));
>
> Which returns 0. Yet:
> select name, X( location ), Y( location )
> FROM spt_Cities
> WHERE name = 'London';

I should have qualified my suggestion with "I didn't actually try
this, but it looks like it might work."

First, after rereading the docs, it looks like I swapped the args on
ST_Contains() anyway.

It would either be:

ST_Contains(circle, location)

OR

ST_Within(location, circle)


But, even with that knowledge, I too cannot get it to work. However,
a hint might be this:
spatialite> SELECT AsText(MakeCircle(0.0, 51.5, 1, 4326, 360));
LINESTRING(1 51.5, 0.707107 52.207107, 0 52.5, -0.707107 52.207107, -1
51.5, -0.707107 50.792893, 0 50.5, 0.707107 50.792893, 1 51.5)

Note that the for pair is (1, 51.5). That is one "unit" off from
center. And if we do something like:
spatialite> SELECT ST_Distance(MakePoint(1.0, 51.5), MakePoint(0.0, 51.5));
1.0

And the docs for ST_Distance says: always measured in CRS units

But, if we use the three argument version, where the simple presence
of the third argument enables "measured in meters", and the values 0/1
(true/false) selects between great circle/ellipsoid computations
(depending on the accuracy you need)
spatialite> SELECT ST_Distance(MakePoint(1.0, 51.5), MakePoint(0.0, 51.5), 0);
69220.0265463721
spatialite> SELECT ST_Distance(MakePoint(1.0, 51.5), MakePoint(0.0, 51.5), 1);
69439.976819321

So, I guess that 1 degree west of London is just shy of 70km?

See http://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html#p13

Now, I have been unable to wrap my head around "CRS", so the tiny bit
of this stuff that I'm doing, I always use the three-argument version
of ST_Distance. But, I believe that default CRS is effectively a
unitless plane and even with specifying that SRID=4326, it does not
seem to matter here.

Of course, I also don't understand why MakeCircle() with a 90 degree
step results in eight points rather than four. So I'm probably
missing a lot of things!

mrc

a.fu...@lqt.it

unread,
Aug 4, 2024, 12:51:44 AMAug 4
to spatiali...@googlegroups.com
On Sat, 3 Aug 2024 12:12:03 -0700, Mike Castle wrote:
> On Sat, Aug 3, 2024 at 11:50 AM <ckgoo...@gmail.com> wrote:
>> My actual objective is to simply return cities within a given
>> distance / radius of my given coordinate.
>
> Ahh! In that case, yes, using something like:
>
> ST_Contains(location, MakeCircle(lng, lat, radius));
>
> where, is (likely) faster.
>

all spatial functions are computationally very heavy,
but ST_Contains() is probably heavier than ST_Distance()
because in this case you are transforming a simple
measurement between two POINTS (Pythagorean theorem)
into a more complex calculation involving a POINT
and a POLYGON with many sides (approximating a Circle)

You're unnecessarily complicating a simple problem.

bye Sandro

Mike Castle

unread,
Aug 4, 2024, 12:53:01 AMAug 4
to spatiali...@googlegroups.com
On Sat, Aug 3, 2024 at 9:19 PM <a.fu...@lqt.it> wrote:
> Golden rule: always avoid doing in C/C++ (Python, Java etc)
> what you can do much better in SQL

"better" is possibly not the best word here. But, yes, I've been
trying to move more of my toy project out of Python and into SQL.

> SQLite offers several really useful and very powerful
> functions for editing text strings, and they are certainly
> worth to be carefully studied before you start reinventing
> the wheel.

Again, I think we probably have different definitions for "several",
"useful", and "powerful". Text manipulation is NOT a strong point of
SQL, even an implementation like SQLite. But "recent" additions have
improved the situation. (E.g., INSTR was added to SQLite in 2012,
which is fairly recently in the lifetime of SQL.)

> in general, cleaning up the input data thanks to
> a few simple SQL statements takes much less time
> than writing a piece of code in any programming
> language.

Depends on the experience in the different languages. But, I agree
that keeping more business logic in the SQL layer is what we should
strive for.

mrc

a.fu...@lqt.it

unread,
Aug 4, 2024, 1:04:43 AMAug 4
to spatiali...@googlegroups.com
On Sat, 3 Aug 2024 21:44:16 -0700, Mike Castle wrote:
> Depends on the experience in the different languages. But, I agree
> that keeping more business logic in the SQL layer is what we should
> strive for.
>

I can only appeal to my direct personal experience.

I'm almost 70 years old and started programming in C
in 1980; I only encountered SQL around 2000.

I initially found it natural to do almost everything
in C/C++ and only used SQL to retrieve data for further
processing in C/C++

Then I finally realized how much time I could save
by doing exactly the opposite, and since I got into
the habit of doing as little as possible in C/C++ my
productivity skyrocketed.

bye Sandro

a.fu...@lqt.it

unread,
Aug 4, 2024, 2:46:05 AMAug 4
to spatiali...@googlegroups.com
On Sat, 3 Aug 2024 21:20:45 -0700, Mike Castle wrote:
> So, I guess that 1 degree west of London is just shy of 70km?
>
> Now, I have been unable to wrap my head around "CRS", so the tiny bit
> of this stuff that I'm doing, I always use the three-argument version
> of ST_Distance. But, I believe that default CRS is effectively a
> unitless plane and even with specifying that SRID=4326, it does not
> seem to matter here.
>

Hi Mark,

SRS simply stands for Spatial Reference System,
and CRS stanrds for Coordinate Reference System.

Coordinates must necessarily specify some SRS/CRS
to be truly meaningful.

There are two different families of SRS, the
geographic ones and the projected planar ones;
and they have absolutely different characteristics.

1. Geographic coordinates represent the position of
points on the surface of a sphere, and are measured
as latitudes and longitudes.
Note: both them are ANGLES, and therefore their natural
unit of measurement is in DEGREES.

2. instead with planar coordinates the values ​​indicate the
position of the points on a classic Cartesian plane X,Y
and they usually measured in METERS.
A planar CRS always requires the use of a map projection
that allows a small part of the surface of the Earth's
sphere to be "flattened" into a plane.
It's absolutely essential that the projected area is
as small as possible to avoid intolerable deformations.
It is therefore not surprising to discover that while
the geographical SRS are a few dozen, the planar SRS
are many thousands, practically one for each State or
Region of the world.

3. The "default" SRS simply is unknown/unqualified.
You can only assume it matches some generic Cartesian
plane of unknown orientation and location on Earth
and lacking any unit of measure.
You can transform your coordinates from an SRS to
another (see ST_Transfom), but "default" coords
can never be transformed into something else.

Short conclusion: planar SRSs do not hold any surprises,
because we are all used to work with with Cartesian planes
and with Euclidean geometry.

Unfortunately geographical SRSs are a completely alien world
that holds unpleasant surprises for many.
And the very popular 4326 aka WGS 84 ends up being a constant
source of seemingly insoluble problems.

I summarize the main obstacles of long/lat systems:

a. Distances are measured in DEGREES because they are
actually ANGLES on the surface of the sphere.

b. Even worse, it's never possible to convert angular
measurements in degrees into linear measurements in
meters, because it varies depending on the position
of the two points.

c. More generally, forget everything you learned at
school about Euclidean geometry because it's only
valid on the plane.
Here, however, we are on a spherical surface and
the rules are completely different.

d. SpatiaLite allows you to calculate linear distances
expressed in meters very precisely even for geographical
SRSs. But it's important to understand that it's a very
expensive operation because it requires the use of complex
geodetic algorithms.
But above all it's important to understand well that the
SpatialIndex works exclusively on angular distances in
degrees.

HINT: whenever possible, always make an effort to transform
your WGS 84 4326 coordinates by projecting them into some
appropriate planar system, such as those of the SRID=32xxx
series; you can do it easily ans safely if the region of
your interest doesn't exceed the width of 6 degrees of
longitude, but 12 and even 18 degrees may be acceptable
(obviously the more you widen the zone the more the
margins of error increase).
This is often an excellent advice to avoid many
failures and unnecessary headaches.

best regards,
Sandro

ckgoo...@gmail.com

unread,
Aug 4, 2024, 5:36:43 AMAug 4
to spatiali...@googlegroups.com
Hi Mike,

I have had some success after working through the points you went through.

I think the MakeCircle call just returns a string and not a geometry object. Also, you called it with a step value of 360 but step means the number of degrees between each point. The same output is created when using a value of 45. So I am guessing that 8 vertices is the minimum number of points MakeCircle generates. If you leave out the optional number the step size is 10 degrees and you get a lot more points).

I did try wrapping MakeCircle in PolyFromText as this states it returns a geometry but this returned all 8000 of my cities and varying the radius didn't make any difference.

Then I moved to using BuildCircleMbr which the documentation states returns a geometry object which will be a square enclosing the circle of a radius given in degrees. So:
SELECT COUNT (*) FROM spt_Cities
WHERE ST_Contains( BuildCircleMbr( 0.0, 51.5, 1.0), location );
Returns me 12 and when I look at the cities returned they all make sense from my knowledge of southeast United Kingdom.

So this will do for now though I would still like to get the results limited to a genuine circle rather than an MBR at some point.

The thing I really don't know at the moment is whether this is taking advantage of the spatial index I created on my location column. I'm doing all my work from the SQLite terminal so I'll have to find a way to time queries and then construct some comparison tests. My current dataset of 8,000 cities is just to help me learn at the moment. I want to expand my dataset to include towns and villages (from OpenStreetMap data) eventually so using the power of R*Tree will become important.

Best, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of Mike Castle
Sent: Sunday, August 4, 2024 5:21 AM
To: spatiali...@googlegroups.com
Subject: Re: [SpatiaLite-Users] Select poinnts based on distance

spatialite> 51.5));
1.0

And the docs for ST_Distance says: always measured in CRS units

But, if we use the three argument version, where the simple presence of the third argument enables "measured in meters", and the values 0/1
(true/false) selects between great circle/ellipsoid computations (depending on the accuracy you need)
spatialite> SELECT ST_Distance(MakePoint(1.0, 51.5), MakePoint(0.0,
spatialite> 51.5), 0);
69220.0265463721
spatialite> SELECT ST_Distance(MakePoint(1.0, 51.5), MakePoint(0.0,
spatialite> 51.5), 1);
69439.976819321

So, I guess that 1 degree west of London is just shy of 70km?

See http://www.gaia-gis.it/gaia-sins/spatialite-sql-latest.html#p13

Now, I have been unable to wrap my head around "CRS", so the tiny bit of this stuff that I'm doing, I always use the three-argument version of ST_Distance. But, I believe that default CRS is effectively a unitless plane and even with specifying that SRID=4326, it does not seem to matter here.

Of course, I also don't understand why MakeCircle() with a 90 degree step results in eight points rather than four. So I'm probably missing a lot of things!

mrc

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/CA%2Bt9iMxNmm3SA3cU3esUFupgpBUAQ9Gup3qvH53P4ur%2BhxZPbg%40mail.gmail.com.

a.fu...@lqt.it

unread,
Aug 4, 2024, 6:49:57 AMAug 4
to spatiali...@googlegroups.com
On Sun, 04 Aug 2024 08:45:59 +0200, a.fu...@lqt.it wrote:
> c. More generally, forget everything you learned at
> school about Euclidean geometry because it's only
> valid on the plane.
> Here, however, we are on a spherical surface and
> the rules are completely different.
>

Very often a figure explains many more things than
a thousand words.

The attached image represents the Earth as it appears
in SRS WGS 84 4326, and as we will see contains many
dangerous traps and pitfalls.

As you can see I have drawn 3 parallels:

A is very near to the Equator
B passes through London
C is close to the North Pole

Judging from the figure it would seem that A, B and
C have exactly the same length, but this is not the
case at all:

A has a linear length of 40,000 km
B has a linear length of about 25,200 km
C has a linear length practically of ZERO

Said in other words:

- near latitude 0 (the Equator) 1 degree of
longitude roughly corresponds to 111 km

- at latitude 51N 1 degree of longitude roughly
corresponds to 70 km

- at latitude 90N (the North Pole) what appently
is a line actually simply is a point, and
therefore the length of 1 degree of longitude
becomes ZERO.

Concluding: the linear length of 1 degree of longitude
progressively decreases when moving from the Equator
towards one of the two Poles.

It's conceptually impossible to define a conversion
formula between degrees and meters because it depends
on the latitude.

But there is yet another paradoxical effect to note:
points A1 and A2 appear very far away, on the opposite
side of the world.
In reality they coincide and the distance betweeb them
is exactly ZERO

-----------------------------------------------------

Let's now examine the vagaries of the meridians.
X and Y seemed like two absolutely different
meridians without any relationship, but it's
not like that at all.
In fact points X1 and Y1 coincide, and the same
happens for X2 and Y2.

Concluding: each pair of meridians with a difference
between their longitudes of exactly 180 degrees trace
a single uninterrupted circle around the globe that
passes through both Poles.

-------------------------------

Final conclusions: using geographical data in WGS 84 4326
coordinates to make accurate measurements of lengths and
areas is asking for trouble.

you have been warned.

bye Sandro
etopo.jpg

a.fu...@lqt.it

unread,
Aug 4, 2024, 6:55:52 AMAug 4
to spatiali...@googlegroups.com
On Sun, 04 Aug 2024 12:49:51 +0200, a.fu...@lqt.it wrote:
> Very often a figure explains many more things than
> a thousand words.
>

Oops ... in my previous email I forgot to attach the
annotated figure ... here it is.

I apologize for the mess.

bye Sandro
etopo.jpg

a.fu...@lqt.it

unread,
Aug 4, 2024, 11:17:40 AMAug 4
to spatiali...@googlegroups.com
On Sun, 4 Aug 2024 10:18:25 +0100, ckgoo...@gmail.com wrote:
> SELECT COUNT (*) FROM spt_Cities
> WHERE ST_Contains( BuildCircleMbr( 0.0, 51.5, 1.0), location );
> Returns me 12 and when I look at the cities returned they all make
> sense from my knowledge of southeast United Kingdom.
>
> So this will do for now though I would still like to get the results
> limited to a genuine circle rather than an MBR at some point.
>
> The thing I really don't know at the moment is whether this is taking
> advantage of the spatial index I created on my location column.
>

Hello,

This is intended to be a comprehensive answer to all your many
questions
regarding SpatialIndex.

Suggested preliminary readings:

https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/rtree.html
https://blog.jrg.com.br/2016/05/23/Querying-spatial-data-using-spatialite-with-indexes/index.html
https://en.wikipedia.org/wiki/R-tree
https://en.wikipedia.org/wiki/R*-tree


Let's start from a first rough classification of Spatial SQL
problems which leads us to identify the following categories:

1. simple problems: dataset with a few hundred or thousands of rows.-
2. problems of medium difficulty: dataset with many hundreds of
thousands of rows.
3 complex problems: dataset with millions of rows.

Class 1. is quite trivial; any strategy will be reasonably fast
and efficient, even those based on pure brute force.

Class 2. will highlight the first flaws; brute force strategies
will start to feel too slow, but on a fast CPU the response
times could still be acceptable.

Class 3. will bring all the problems to the surface; and
inefficient and unintelligent strategies will be intolerably
slow (response times in the order of minutes, but also hours
and perhaps even days).

There is only one way to make a Spatial Query efficient and fast,
and is to be able to use the Spatial Index intelligently.
To take any advantage of the Spatial Index, SQL Queries must be
written in a very specific way.
There is no automatism whatsoever, there are no shortcuts,
there are no silver bullets.

If your Spatial SQL Query doesn't directly takes care
of the Spatial Index it will always be completely
ignored, even if it's present.

----------

Now let's see how and why the spatial Index can give
considerable efficiency to our Spatial Queries.

In a typical brute force approach any Spatial SQL
function in our query will affect all the rows of
the dataset (this is the infamous full table scan).
The more rows there are in the dataset the slower
the query will become.

Remember that any Spatial SQL function always requires
an heavy computational cost; it should start to become
clear that the more rows we have to process, the slower
the query will become.

The Spatial Index can be miraculous because it's nothing
more than a spatial filter allowing to drastically reduce
the number of rows to be processed.
the more we are able to narrow the filter, the faster we'll
be able to go.

Explained in simple words, a Spatial Index can be used as
a rectangular window allowing to very quickly extract only
the geometries that fall within that rectangle, ignoring
all the others.
The smaller the rectangle, the more efficient the filter
will be.

In the most typical case the rectangle to be passed to the
Spatial Index simply coincides with the reference
geometry itself.
(remember that all geometries of any type can always be
represented schematically by their MBR aka BBOX).

It's trivial to demonstrate that it'is useless to try
to calculate ST_Intersects() or ST_Contains() when the
two Geometries have their BBOXes that don't overlap;
we can discard them immediately because we can safely
anticipate the negative result.
However, the opposite condition is not true; two
geometries can have their BBOXes overlapped even if
they don't intersect at all.

Concluding: the Spatial Index is only a fairly coarse
but fast first filter.
Everything that passes this first check will then be
verified more accurately using a real Spatial SQL
function; but since the filter is expected to be
very selective, the impact on the performance of
the extra work necessary to discard false positives
will have a very modest impact.

---------------------------------------

Last step: how to use the Spatial Index for
ST_Distance(); the problem is that now we don't
expect the two BBOXes to be overlapping at all,
especially if we're talking about POINTs.

The best solution is to define the size of the
search rectangle to pass to the Spatial Index
based on some heuristic considerations and
knowledge of the reference dataset.

Just to give a very banal example: if I intend
to search for the nearest restaurant and I am
inside a town, then a search radius of 1 km
can be very reasonable.
But if I find myself in the middle of a forest,
a search radius of several tens of km seems
decidedly preferable.

A smarter generalized approach can be adaptive
in nature: we start with a small search radius,
and if nothing is found we'll try again by doubling
the search radius, and so on until a satisfactory
number of solutions is found.

best regards
Sandro




Mike Castle

unread,
Aug 4, 2024, 11:18:08 AMAug 4
to spatiali...@googlegroups.com
On Sun, Aug 4, 2024 at 2:36 AM <ckgoo...@gmail.com> wrote:
> So this will do for now though I would still like to get the results limited to a genuine circle rather than an MBR at some point.

My whole idea was to reduce having to do a table scan with
ST_Distance(), but, as Sandro pointed out in a message, it turns out
my idea is actually more computationally complex.

> The thing I really don't know at the moment is whether this is taking advantage of the spatial index I created on my location column. I'm doing all my work from the SQLite terminal so I'll have to find a way to time queries and then construct some comparison tests. My current dataset of 8,000 cities is just to help me learn at the moment. I want to expand my dataset to include towns and villages (from OpenStreetMap data) eventually so using the power of R*Tree will become important.

Again, from my reading of Sandro's messages, the spatial columns are
not going to help as long as we use SRID:4325/WGS 84 coordinates in
our respective projects. They might be useful when looking up
specific items by lat,lng. But they simply don't help for finding
closer items (because they would find items closer in degrees rather
than meters). We'd really have to project based transformations to
make good use of those indices.

As far as timing goes, you can do something like the following:
spatialite test.db 'SELECT COUNT(*) FROM table;'

You can wrap that whole command in whatever system you want to capture
timestamps, depending on what your OS has to offer.

On UNIX-like systems, for instance, I'd use the time(1) command to
wrap just the spatialite command:
time spatialite test.db

For more complicated queries, you could create a text file and do
something like:
spatialite test.db < input.sql

mrc

a.fu...@lqt.it

unread,
Aug 4, 2024, 11:30:44 AMAug 4
to spatiali...@googlegroups.com
On Sun, 4 Aug 2024 07:57:37 -0700, Mike Castle wrote:
> As far as timing goes, you can do something like the following:
> ............
>

Hi Mark,

it's even simpler and easier

.timer ON

after typing this directive on the console you'll
see the CPU time reported immediately below each
resultset.

bye Sandro


ckgoo...@gmail.com

unread,
Aug 4, 2024, 11:54:49 AMAug 4
to spatiali...@googlegroups.com
Hi Sandro,
You wrote:
> Final conclusions: using geographical data in WGS 84 4326 coordinates to make accurate measurements of lengths and areas is asking for trouble.

Thank you for your email and I think I'm beginning to understand as I've done some further reading.
Geospatial data is fundamentally stored as long-lat pairs. E.g. London is somewhere like -0.1,51.5.
When you want to start displaying points and lines on a 2D surface like paper or a screen you have to use a projection which describes how to manage the distortion inevitably arising from representing a 3D surface onto 2D. There are many projections with their own approach and which one is best often depends on the application. What looks like a centimetre on my piece of paper at the equator will cover a different distance to a centimetre on my piece of paper near a pole.

Now, in my specific application I do not draw a map or in any way render the points and lines graphically on my screen. This is because I'm trying to create an application which is an atlas for the blind and graphics is of no value. I am blind myself.

So for me, working in long-lat is always fine. If my user is moving around the globe and they move to a position, one function I want to provide is to say what cities are within a given distance e.g. 500km. It is always safe to translate 500km to an angle - here about 4.8 degrees.

So I would like to be able to ask my database to return me cities from my table which are less than 4.8 degrees away from my user's current position.

And having written this I am a little bit worried. With Spatialite, am I forced to use a projection which distorts all my locations? Or is there a way I can work always in degrees?

BTW, I have been reading the tutorial today (rather than the cookbook) which has been really interesting and will be enormously helpful. And I've seen your other email to me but not carefully gone through it yet. I will of course do so soon.

Best, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of a.fu...@lqt.it
Sent: Sunday, August 4, 2024 11:50 AM
To: spatiali...@googlegroups.com
Subject: Re: [SpatiaLite-Users] Select poinnts based on distance

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/fdbd73e94ca0ed804340d10518051daa%40lqt.it.

a.fu...@lqt.it

unread,
Aug 4, 2024, 12:37:30 PMAug 4
to spatiali...@googlegroups.com
On Sun, 4 Aug 2024 16:53:01 +0100, ckgoo...@gmail.com wrote:
> Geospatial data is fundamentally stored as long-lat pairs. E.g.
> London is somewhere like -0.1,51.5.
>

Hi Chris,

I'm sorry to disappoint you, but things aren't as you think.
WGS 84 4326 data is virtually universal when GPS devices come
into play, i.e. on mass consumer products.
In professional environments completely different SRSs are used,
and the official datasets released by national and regional
governments are never of the long/lat type, they are instead
of the projected planar type.

In the meantime I found a guide published by the British Ordnance
Survey which seems to me to be written in a very clear and exhaustive
way and which explains everything there is to know about the current
SRS systems used in the United Kingdom.
It's a read that I highly recommend you.

https://www.ordnancesurvey.co.uk/documents/resources/guide-coordinate-systems-great-britain.pdf


> Now, in my specific application I do not draw a map or in any way
> render the points and lines graphically on my screen.
> ...
> So for me, working in long-lat is always fine.
>

No, it's wrong.
Even if you are not interested in drawing geometries, you cannot
ignore all the mathematical and geometric complications arising
from wanting to translate angular distance measurements into
linear terms. It's a conceptually insoluble problem.


> If my user is moving
> around the globe and they move to a position, one function I want to
> provide is to say what cities are within a given distance e.g. 500km.
> It is always safe to translate 500km to an angle - here about 4.8
> degrees.
>

this us absolutaly wrong, becasuse you are assuming a fixed metric
length corresponding to 1 degree.

This may be a fairly reasonable assumption for the equatorial belt,
say between the two Tropics.
You'll start to encounter some accuracy problems in the Temperate
range (Italy, USA, Japan).

When you get even closer to the Poles (Canada, Iceland, Scandinavia,
Siberia) you'll discover that you have ended up in trouble.

Having said this for the sake of precision, it's also true that
the vast majority of the world's population lives in the less
cold areas of the Earth, so a simplifie approach, however imprecise,
could prove adequate for your soecific purposes.

bye Sandro

Mike Castle

unread,
Aug 4, 2024, 12:39:35 PMAug 4
to spatiali...@googlegroups.com
On Sun, Aug 4, 2024 at 9:26 AM Mike Castle <dal...@gmail.com> wrote:
> 4.8 degrees is 534km in the east/west direction, but 333km in the
> north/south direction.

I made a mistake there: I swapped east/west and north/south

Also, note that 4.8 degrees due north is 534.3km, while 4.8 degrees
due south is 533.8km.

4.8 degrees is 333.3km in both due east and due west.

mrc

Mike Castle

unread,
Aug 4, 2024, 12:39:46 PMAug 4
to spatiali...@googlegroups.com
On Sun, Aug 4, 2024 at 8:54 AM <ckgoo...@gmail.com> wrote:
> So I would like to be able to ask my database to return me cities from my table which are less than 4.8 degrees away from my user's current position.
>
> And having written this I am a little bit worried. With Spatialite, am I forced to use a projection which distorts all my locations? Or is there a way I can work always in degrees?

4.8 degrees is 534km in the east/west direction, but 333km in the
north/south direction.

SELECT ST_Distance(MakePoint(0.0, 51.5), MakePoint(0.0, 51.5 + 4.8), 1);
534254.126120441

SELECT ST_Distance(MakePoint(0.0, 51.5), MakePoint(0.0 + 4.8, 51.5), 1);
333254.767547239

ckgoo...@gmail.com

unread,
Aug 4, 2024, 4:06:15 PMAug 4
to spatiali...@googlegroups.com
Hi Sandro, Mike.
Think I have engaged my brain now. So a circle created with MakeCircle of radius 5 degrees will look fairly circular at the equator but near a pole will be much squashed so east-west distance is much less than north-south distance.
I'm just going to accept that for now. Maybe in the future I'll look at constructing a polygon with a number of points on it a fixed number of kilometres from the centre and convert these to degrees. But that is for another day.

I need to go read the links about spatial indices. Now.

Many thanks both for your patience.
Best, Chris

-----Original Message-----
From: spatiali...@googlegroups.com <spatiali...@googlegroups.com> On Behalf Of Mike Castle
Sent: Sunday, August 4, 2024 5:26 PM
To: spatiali...@googlegroups.com
Subject: Re: [SpatiaLite-Users] Select poinnts based on distance

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spatialite-users/CA%2Bt9iMwo%3DzfdXEaktEf9%2B6m02_prX6EGDXv1fn2rX83QbyHB3Q%40mail.gmail.com.

Reply all
Reply to author
Forward
0 new messages