Hi there,
I am attempting to create an Origin-Destination (OD) travel cost matrix from a road network and set of locations using Spatialite. Building on Steven Woodbridge's posts I found that using a CROSS JOIN coupled with the Connection Cost mode of the virtual network tables was the most efficient approach.
When I ran the SQL on the same network but with a larger number of locations (4000 as opposed to 300), both Spatialite GUI and the CLI tool crashed almost immediately, with no error message. This was running with a default configuration, and looking into some of the settings, I tested journal_mode=OFF and cache_size=1000000 without any difference.
The problem was originally encountered with a commercial dataset,
but I have worked up an Ordnance Survey open data based network
sample that demonstrates the same issue. This data can be
downloaded here (OSOpenRoads_SU_NET.zip
- ~20Mb)
ITN_RoadLinks is the road network, with ITN_RoadLinks_net being
the spatialite network virtual table, and ITN_RoadLinks_Net_data
the underlying data. SamplePoints contains a set of roughly 4000
nodes randomly sampled from the network that are used as the OD
locations.
The query that results in a crash is as follows:
SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net WHERE NodeFrom = NodeID and Cost <= 100000000) as d INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo WHERE s.NodeID <> d.NodeFrom;
I set the Cost threshold to be very large to ensure that costs to all the other nodes in the network are returned. As an aside, it would be much neater if the Cost clause could be omitted altogether to achieve this.
Running a single or small selection of locations completes, with
the maximum number of locations that can be processed without
crashing being about 10. This number doesn't seem to be affected
by the amount of RAM.
SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net WHERE NodeFrom = NodeID and Cost <= 100000000 AND SampleID BETWEEN 1 AND 1) as d INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo WHERE s.NodeID <> d.NodeFrom;
I have seen the same behaviour (crashing) on Windows 7 (64 bit) and 8 (64 bit), using Spatialite 4.3 and 4.4
Bizarrely, though probably unrelated, the query below, basically
identical to the one above, takes orders of magnitude longer to
run. I'm guessing that it is query optimiser related, and I'd be
interested to know how to explain this behaviour.
SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net WHERE NodeFrom = NodeID and Cost <= 100000000 AND SampleID = 1) as d INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo WHERE s.NodeID <> d.NodeFrom;
Can anybody shed any light on the crashing, or suggest how to
diagnose the reason for it? I can work around it for the moment by
running queries for each location in turn, but I suspect that this
will only work up to a certain number of locations, and it would
be much more elegant to be able to wrap this up in one line of
SQL.
Thanks very much in advance,
Andy
-- Andy Harfoot GeoData Institute University of Southampton Southampton SO17 1BJ Tel: +44 (0)23 8059 2719 www.geodata.soton.ac.uk