215 views

Skip to first unread message

Apr 17, 2009, 1:24:15 AM4/17/09

to SpatiaLite Users

Firstly, new to SQLite/SpatiaLite but well done Alessandro on a great

extension! Seems a perfect fit for some of our custom spatial data

analysis tools.

My problem is a variation on Nearest Neighbour.

I have ~300,000 customers (Polygons) and ~160,000 water pipes

(Polylines) across a City. For each customer, I need to find the

nearest water pipe.

I'm wondering what SQL will give the fastest results. I tried a

sample below, but I'm not sure this is the appropriate way of trying

to use a spatial index.

I realise the sample isn't the full SQL req'd, in this case I'm just

trying to solve the main headache in first getting an approximate (ie.

using MBRs?) 1<->Many table of Customer <->Nearest Pipes within a user-

defined search radius (for example below, 30 metres)

CREATE VIEW NearestPipes AS

SELECT Customers.Service_ID,WaterPipes.PipeID

FROM Customers,WaterPipes WHERE

WaterPipes.RowID IN(SELECT pkid FROM idx_WaterPipes_Geometry WHERE

MbrOverLaps(BuildMBR(xmin-30,ymin-30,xmax+30,ymax

+30,28356),Customers.Geometry))

extension! Seems a perfect fit for some of our custom spatial data

analysis tools.

My problem is a variation on Nearest Neighbour.

I have ~300,000 customers (Polygons) and ~160,000 water pipes

(Polylines) across a City. For each customer, I need to find the

nearest water pipe.

I'm wondering what SQL will give the fastest results. I tried a

sample below, but I'm not sure this is the appropriate way of trying

to use a spatial index.

I realise the sample isn't the full SQL req'd, in this case I'm just

trying to solve the main headache in first getting an approximate (ie.

using MBRs?) 1<->Many table of Customer <->Nearest Pipes within a user-

defined search radius (for example below, 30 metres)

CREATE VIEW NearestPipes AS

SELECT Customers.Service_ID,WaterPipes.PipeID

FROM Customers,WaterPipes WHERE

WaterPipes.RowID IN(SELECT pkid FROM idx_WaterPipes_Geometry WHERE

MbrOverLaps(BuildMBR(xmin-30,ymin-30,xmax+30,ymax

+30,28356),Customers.Geometry))

Apr 20, 2009, 3:27:10 AM4/20/09

to SpatiaLite Users

To add to this, managed to do a bit of trial and error on different

methods.

Following observations:

1. SpatiaLite seems to HATE having a geometric operator in the SQL

accessing the spatial index virtual table. (eg. Using the MBROverlaps

() operator in above example) Performance seems to be slower than if

chose to not use the spatial index at all. Best performance yielded

if use simple

2. Don't know whether this is from SQLite or the SpatiaLite

extensions, but the nested SQL statement pulling pkIDs from the

spatial index table HATES "OR" operators.

For example, the below sample is what I considered to have the most

efficient syntax:

SELECT Customers.Service_ID, Pipes.Label, X(Customers.Geometry)-30 As

CustXmin, Y(Customers.Geometry)-30 As CustYmin, X(Customers.Geometry)

+30 As CustXmax, Y(Customers.Geometry)+30 As CustYmax

FROM Customers, Pipes

WHERE Pipes.RowID IN(Select pkID From idx_Pipes_Geometry WHERE

((xmin>CustXmin AND xmin<CustXmax) OR

(xmax>CustXmin AND xmax<CustXmax) OR

(xmin<CustXmin AND xmax>CustXmax)) AND

((ymin>CustYMin AND ymin<CustYmax) OR

(ymax>CustYMin AND ymax<CustYmax) OR

(ymin<CustYMin AND ymax>CustYmax)))

AND Crosses(BuildMBR(CustXmin,CustYmin,CustXmax,CustYmax,

28356),Pipes.Geometry)

However, this had very slow performance.

Weirdly (to me at least, but SQL newbie), using an expansion method +

using UNION SELECT yields a much faster query. Doing this in this

solution has 3 x 3 permutations. So re-writing the above I get this:

SELECT Customers.Service_ID, Pipes.Label, X(Customers.Geometry)-30 As

CustXmin, Y(Customers.Geometry)-30 As CustYmin, X(Customers.Geometry)

+30 As CustXmax, Y(Customers.Geometry)+30 As CustYmax

FROM Customers, Pipes

WHERE Pipes.RowID IN(Select pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymin<CustYMin AND ymax>CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymin<CustYMin AND ymax>CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymin<CustYMin AND ymax>CustYmax)

AND Crosses(BuildMBR(CustXmin,CustYmin,CustXmax,CustYmax,

28356),Pipes.Geometry)

As I said, seems to be much faster than SQL statment above it, but

have no idea why(?). Still interested if anyone has a better

performing solution!

methods.

Following observations:

1. SpatiaLite seems to HATE having a geometric operator in the SQL

accessing the spatial index virtual table. (eg. Using the MBROverlaps

() operator in above example) Performance seems to be slower than if

chose to not use the spatial index at all. Best performance yielded

if use simple

2. Don't know whether this is from SQLite or the SpatiaLite

extensions, but the nested SQL statement pulling pkIDs from the

spatial index table HATES "OR" operators.

For example, the below sample is what I considered to have the most

efficient syntax:

SELECT Customers.Service_ID, Pipes.Label, X(Customers.Geometry)-30 As

CustXmin, Y(Customers.Geometry)-30 As CustYmin, X(Customers.Geometry)

+30 As CustXmax, Y(Customers.Geometry)+30 As CustYmax

FROM Customers, Pipes

WHERE Pipes.RowID IN(Select pkID From idx_Pipes_Geometry WHERE

((xmin>CustXmin AND xmin<CustXmax) OR

(xmax>CustXmin AND xmax<CustXmax) OR

(xmin<CustXmin AND xmax>CustXmax)) AND

((ymin>CustYMin AND ymin<CustYmax) OR

(ymax>CustYMin AND ymax<CustYmax) OR

(ymin<CustYMin AND ymax>CustYmax)))

AND Crosses(BuildMBR(CustXmin,CustYmin,CustXmax,CustYmax,

28356),Pipes.Geometry)

However, this had very slow performance.

Weirdly (to me at least, but SQL newbie), using an expansion method +

using UNION SELECT yields a much faster query. Doing this in this

solution has 3 x 3 permutations. So re-writing the above I get this:

SELECT Customers.Service_ID, Pipes.Label, X(Customers.Geometry)-30 As

CustXmin, Y(Customers.Geometry)-30 As CustYmin, X(Customers.Geometry)

+30 As CustXmax, Y(Customers.Geometry)+30 As CustYmax

FROM Customers, Pipes

WHERE Pipes.RowID IN(Select pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin>CustXmin AND xmin<CustXmax AND ymin<CustYMin AND ymax>CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmax>CustXmin AND xmax<CustXmax AND ymin<CustYMin AND ymax>CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymin>CustYMin AND ymin<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymax>CustYMin AND ymax<CustYmax

UNION SELECT pkID From idx_Pipes_Geometry WHERE

xmin<CustXmin AND xmax>CustXmax AND ymin<CustYMin AND ymax>CustYmax)

AND Crosses(BuildMBR(CustXmin,CustYmin,CustXmax,CustYmax,

28356),Pipes.Geometry)

As I said, seems to be much faster than SQL statment above it, but

have no idea why(?). Still interested if anyone has a better

performing solution!

Apr 21, 2009, 9:27:21 AM4/21/09

to spatiali...@googlegroups.com

I tried to reproduce the problem you pose.

So I loaded a DB using:

- PopulatedPlaces: about 5,000 polygons

[footprints for Tuscany's towns and villages]

- Streets: about 17,000 linestrings

[Tuscany's highways and first class roads]

So I loaded a DB using:

- PopulatedPlaces: about 5,000 polygons

[footprints for Tuscany's towns and villages]

- Streets: about 17,000 linestrings

[Tuscany's highways and first class roads]

The problem I actually tried to solve was:

"Which ones streets may be associated to each populated place

within a 100 meters range ?"

More or less it seems quite the same of your original problem

involving pipes and customers.

===============================================================

First I tried a "raw" brutal force approach;

not using any Spatial Index at all

SELECT pp.Name, st.Name, Distance(pp.Geometry, st.Geometry)

FROM PopulatedPlaces AS pp, Streets AS st

WHERE Distance(pp.Geometry, st.Geometry) < 100.0

After waiting some 20 minutes I decided to kill SpatiaLite,

because it was quite obvious such a solution requires a long

time [and obviously, lots and lots of CPU cycles] to be computed.

--------------------

So I tried a "sophisticated" approach ...

I built an R*Tree Spatial Index for Streets.Geometry,

and then I executed the following SQL query:

SELECT pp.Name, st.Name, Distance(pp.Geometry, st.Geometry)

FROM PopulatedPlaces AS pp, Streets AS st

WHERE Distance(pp.Geometry, st.Geometry) < 100.0 AND st.ROWID IN

(

SELECT pkid FROM idx_Streets_Geometry

WHERE xmin < (MbrMaxX(pp.Geometry) + 100) AND

xmax > (MbrMinX(pp.Geometry) - 100) AND

ymin < (MbrMaxY(pp.Geometry) + 100) AND

ymax > (MbrMinY(pp.Geometry) - 100)

)

this time I was able to get a result set in about 1 minute;

and this is not at all surprising, because in this case I was

taking full profit from the R*Tree Spatial Index.

bye, Sandro

Apr 23, 2009, 3:25:30 AM4/23/09

to SpatiaLite Users

Thankyou so much Alessandro!

I'm actually pleased that my approach was very close to your

approach. Both my syntax and your syntax was looking in the Spatial

Index table for overlaps between the Pipe/Streets indexed MBRs, and

extended MBRs from Customers/PopulatedPlaces.

However, your syntax is a LOT more efficient than mine for checking

overlaps in the X and Y dimensions. Also, performance-wise, your

syntax was about twice as fast as executing.

As a sidenote, there is another technique I've used previously on the

Nearest Neighbour problem that can yield even more speed.

In my real case, the maximum search radius from customer to pipe will

be around 500 metres, but in most cases the distance is less than 6

metres. Programatically, the best method I've found to get good speed

is NOT to search out for pipes in a 500 metre radius in the first

pass. The best method seems to be to iteratively run SQL statements

that gradually expand the "search" MBR bounding box from a very small

distance to larger distances.

Then use the returned results to UPDATE an "Customers.IsProcessed"

Field. Then run the next SELECT query with an expanded search MBR

bounding box but ignoring Customers that have already been previously

processed.

Nominally I have previously done this in about 10 iterations with

progressing larger search bounding boxes.

But you've helped me solve the hardest part of the problem, thanks

again!

I'm actually pleased that my approach was very close to your

approach. Both my syntax and your syntax was looking in the Spatial

Index table for overlaps between the Pipe/Streets indexed MBRs, and

extended MBRs from Customers/PopulatedPlaces.

However, your syntax is a LOT more efficient than mine for checking

overlaps in the X and Y dimensions. Also, performance-wise, your

syntax was about twice as fast as executing.

As a sidenote, there is another technique I've used previously on the

Nearest Neighbour problem that can yield even more speed.

In my real case, the maximum search radius from customer to pipe will

be around 500 metres, but in most cases the distance is less than 6

metres. Programatically, the best method I've found to get good speed

is NOT to search out for pipes in a 500 metre radius in the first

pass. The best method seems to be to iteratively run SQL statements

that gradually expand the "search" MBR bounding box from a very small

distance to larger distances.

Then use the returned results to UPDATE an "Customers.IsProcessed"

Field. Then run the next SELECT query with an expanded search MBR

bounding box but ignoring Customers that have already been previously

processed.

Nominally I have previously done this in about 10 iterations with

progressing larger search bounding boxes.

But you've helped me solve the hardest part of the problem, thanks

again!

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu