Spatialite crash on network routing query

57 views
Skip to first unread message

Andrew Harfoot

unread,
Jan 3, 2017, 9:18:36 AM1/3/17
to SpatiaLite Users

Hi there,

I am attempting to create an Origin-Destination (OD) travel cost matrix from a road network and set of locations using Spatialite. Building on Steven Woodbridge's posts I found that using a CROSS JOIN coupled with the Connection Cost mode of the virtual network tables was the most efficient approach.

When I ran the SQL on the same network but with a larger number of locations (4000 as opposed to 300), both Spatialite GUI and the CLI tool crashed almost immediately, with no error message. This was running with a default configuration, and looking into some of the settings, I tested journal_mode=OFF and cache_size=1000000 without any difference.

The problem was originally encountered with a commercial dataset, but I have worked up an Ordnance Survey open data based network sample that demonstrates the same issue. This data can be downloaded here (OSOpenRoads_SU_NET.zip - ~20Mb)

ITN_RoadLinks is the road network, with ITN_RoadLinks_net being the spatialite network virtual table, and ITN_RoadLinks_Net_data the underlying data. SamplePoints contains a set of roughly 4000 nodes randomly sampled from the network that are used as the OD locations.

The query that results in a crash is as follows:

SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM
    (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net
        WHERE NodeFrom = NodeID and Cost <= 100000000) as d
        INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo
        WHERE s.NodeID <> d.NodeFrom;

I set the Cost threshold to be very large to ensure that costs to all the other nodes in the network are returned. As an aside, it would be much neater if the Cost clause could be omitted altogether to achieve this.

Running a single or small selection of locations completes, with the maximum number of locations that can be processed without crashing being about 10. This number doesn't seem to be affected by the amount of RAM.

SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM
    (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net
        WHERE NodeFrom = NodeID and Cost <= 100000000 AND SampleID BETWEEN 1 AND 1) as d
        INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo
        WHERE s.NodeID <> d.NodeFrom;

I have seen the same behaviour (crashing) on Windows 7 (64 bit) and 8 (64 bit), using Spatialite 4.3 and 4.4

Bizarrely, though probably unrelated, the query below, basically identical to the one above, takes orders of magnitude longer to run. I'm guessing that it is query optimiser related, and I'd be interested to know how to explain this behaviour.

SELECT d.OrigSampleID 'OrigSampleID', s.SampleID 'DestSampleID', d.Cost FROM
    (SELECT SampleID 'OrigSampleID', NodeFrom, NodeTo, Cost FROM SamplePoints CROSS JOIN ITN_RoadLinks_net
        WHERE NodeFrom = NodeID and Cost <= 100000000 AND SampleID = 1) as d
        INNER JOIN SamplePoints AS s ON s.NodeID = d.NodeTo
        WHERE s.NodeID <> d.NodeFrom;

Can anybody shed any light on the crashing, or suggest how to diagnose the reason for it? I can work around it for the moment by running queries for each location in turn, but I suspect that this will only work up to a certain number of locations, and it would be much more elegant to be able to wrap this up in one line of SQL.

Thanks very much in advance,

Andy

--
Andy Harfoot

GeoData Institute
University of Southampton
Southampton
SO17 1BJ

Tel:  +44 (0)23 8059 2719

www.geodata.soton.ac.uk

a.fu...@lqt.it

unread,
Jan 3, 2017, 11:32:52 AM1/3/17
to spatiali...@googlegroups.com
On Tue, 3 Jan 2017 14:18:31 +0000, Andrew Harfoot wrote:
> Can anybody shed any light on the crashing
>

Hi Andrew,

I've just tested your sample by using the current development sources
both on Win32 and Win64, and I discovered something interesting.

- Win32: spatialite CLI starts executing your SQL query and almost
immediately begins outputting resultset rows at a regular and
constant pace; once about 2 millions rows are printed a deadly
crash occurs.
- Win64: same as above, but in this case the crash happens after
printing more than 5.5 million rows.

the most realistic cause explaining such a behaviour is the
exhaustion of memory related resources, very limited on Win32
and more generous on Win64.
and this explanation fits reasonable well will the internal
logic adopted by the VirtualNetwork module that requires
a significant amount of available RAM for each invocation
(and in this case it's invoked million times)


> Bizarrely, though probably unrelated, the query below, basically
> identical to the one above, takes orders of magnitude longer to run.
> I'm guessing that it is query optimiser related, and I'd be
> interested to know how to explain this behaviour.
>

it's not completely unrelated.
VirtualNetwork is a VIRTUAL TABLE, i.e. it's not at all an
ordinary SQL table as any other; it pretends to be under
a purely syntactic aspect, but it's not under a more
realistic functional aspect.

it's more like an extension driver deploying an internal
logic completely unknown to SQLite, and just supporting
two very thin communication interfaces:
- one for receiving the initial input variables
- the other for returning resultset rows
so SQLite is surely able to expose a Virtual Table
(more or less) as if it was a native SQL Table, but
it obviously lacks any useful option allowing the
query optimizer to identify a "smart" access strategy,
most notably while parsing very complex SQL queries.
and never forget, a Virtual Table can easily introduce
extra memory requirements of its own (as VirtualNetwork
certainly does).

short conclusion: the VirtualNetwork module was initially
designed for resolving just a single From-To shortest path
at each time, and it correctly works if used this way.
it still continues to correctly work if many repeated
individual queries are executed under the supervision
of some external application (C, Python, Java etc).

what should be absolutely avoided is encapsulating
a VirtualNetwork query within a complex SQL query
expected to return several million rows, because
in this case it's not at impossible triggering
a dramatic escalation that will end up by exhausting
some critical resource.
at least from my own test such a "pure SQL approach"
could safely work only after carefully subdividing the
overall problem into many smaller problems so to limit
the expected number of rows to be returned (let say,
no more than 1 million for each single query).

bye Sandro

Andy Harfoot

unread,
Jan 6, 2017, 8:22:09 AM1/6/17
to spatiali...@googlegroups.com
Hi Sandro,

Thanks for looking at my problem, and for another insightful response.

I had previously been inserting the results from the select into a table, and hadn't therefore seen any resultset rows when crashes occurred. Running spatialite CLI with output to a CSV file has allowed me to do some further exploration of the crash and how far the query proceeds before crashing:

Typically a crash occurs on the 8th or 9th origin location, or invocation of the network virtual table, irrespective of the total number of locations in the SamplePoints table. This strongly supports what you suggested, that a limit is being reached internally when the calculations are batched in a single SQL statement. Running individual locations as separate statements does not cause a crash.

What is curious is how the number of records returned before a crash occurs varies with different settings, (tests run on 4.3.0a x64):

- spatialite.exe 4.4.0 RC0 returns fewer records than v4.3.0a (23589 vs 35382)
- cache_size=1000000 reduces the number of records returned to 15611
- Running ANALYZE beforehand reduces the number of records to 11796
- A no geometry network returns fewer records (23589)

I note that none of these record counts are anywhere near the numbers that you are reporting (5.5 million!). What platform and hardware were you using?

Cheers,

Andy
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To post to this group, send email to spatiali...@googlegroups.com.
Visit this group at https://groups.google.com/group/spatialite-users.
For more options, visit https://groups.google.com/d/optout.

a.fu...@lqt.it

unread,
Jan 7, 2017, 1:44:59 PM1/7/17
to spatiali...@googlegroups.com
On Fri, 6 Jan 2017 13:22:10 -0000, Andy Harfoot wrote:
> What is curious is how the number of records returned before a crash
> occurs varies with different settings, (tests run on 4.3.0a x64):
>
> - spatialite.exe 4.4.0 RC0 returns fewer records than v4.3.0a (23589
> vs 35382)
> - cache_size=1000000 reduces the number of records returned to 15611
> - Running ANALYZE beforehand reduces the number of records to 11796
> - A no geometry network returns fewer records (23589)
>
> I note that none of these record counts are anywhere near the numbers
> that you are reporting (5.5 million!). What platform and hardware
> were
> you using?
>

Hi Andy,

I tested your sample on Win7 Professional SP1 64 bit
- Intel i7 3.6 GHz CPU
- 32 GB RAM

your own platform has presumably a smaller RAM, and this could
easily explain the different figures between the two tests.
just to say, I replied the same identical test on a Linux 64 bit
Virtual Machine configured with only 8 GB RAM and the crash
occurred after printing about 1 million rows: this seems to
confirm that the total amount of installed RAM has some
influence on this crash.

When the going gets tough, the tough get going

it's now time to stop kidding on Windows; an objective
debugging session on a serious professional platform
is surely required to investigate on this crash in full
depth. so I duly switched to Linux repeating once again
your test, this time carefully monitoring all memory
events by using Valgrind [1], a powerful dynamic code
analyzer.

[1] https://en.wikipedia.org/wiki/Valgrind

Valgrind is a wonderful debugging tool, but unhappily it's
painfully slow; anyway after an all night long test session
now we finally know for sure what's really causing this crash.

A) there was a first code defect in the VirtualNetwork module;
after resetting an internal C data structure supporting
each single Shortest Path query a pointer remained not
properly initialized.
An uninitialized variable is a little dirty bastard
potentially causing havoc, but it can very often pass
completely unnoticed and harmless.
"uninitialized" automatically implies handling random
values, so code execution starts depending on rolling
dices in a substantially unpredictable way.

B) there was a second code defect in VirtualNetwork; a small
memory leak, that repeated for many million times ended
up in consuming a significant amount of RAM; and even worst,
it surely introduced a very high memory fragmentation.

conclusions:
1. when calling VirtualNetwork many times by executing many
distinct SQL queries the first issue never had a chance
to happen (the driver never requiring to be reset being
initialized and destroyed each time), so the overall effect
was absolutely innocent except may be for a tolerable waste
of useless RAM.
2. but when repeatedly calling many million times VirtualNetwork
from the context of a single complex SQL query a pervert
interaction between both issues had a very high probability to
create a severely corrupted memory context, thus leading to a
random crash.
and this is the end of the story.

(code already patched but not yet committed)

bye Sandro

Andy Harfoot

unread,
Jan 10, 2017, 4:55:22 AM1/10/17
to spatiali...@googlegroups.com
Hi Sandro,

Thank you very much for looking at the crashing issue in more detail, and identifying the two issues. I look forward to the code changes to be committed and compiled so that I can put them to the test. In the meantime I'll work around the issue by breaking the query down into individual locations.

Thanks again,

Andy

-----Original Message-----
From: spatiali...@googlegroups.com [mailto:spatiali...@googlegroups.com] On Behalf Of a.fu...@lqt.it
Reply all
Reply to author
Forward
0 new messages