Slow/Endless Query

98 views
Skip to first unread message

E. D. Antmann

unread,
Mar 31, 2013, 11:36:26 AM3/31/13
to spatiali...@googlegroups.com
Hello All,

I am a long-time reader but first-time poster, so thanks for all the past help in this form.  I have a file of approx 110,000 roads (line segments) from TIGER, and am trying to identify all of the intersection points.  To do this, I first merge them by their name attribute, in order to avoid intersections between segments of the same road, and then run ST_Intersection.  My code is below, after importing the TIGER shapefile as daderoads:

CREATE TABLE roads_merged (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, fullname TEXT NOT NULL);
SELECT AddGeometryColumn('roads_merged','Geometry', 3086, 'MULTILINESTRING', 'XY');
INSERT INTO roads_merged (fullname, Geometry) SELECT FULLNAME, CastToMultiLinestring(GUnion(Geometry)) FROM daderoads GROUP BY FULLNAME;
CREATE TABLE intersections (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT);
SELECT AddGeometryColumn('intersections','Geometry', 3086, 'POINT', 'XY');
INSERT INTO intersections (Geometry) SELECT ST_Intersection (r1.Geometry, r2.Geometry) FROM roads_merged as r1, roads_merged as r2 WHERE r1.fullname <> r2.fullname AND ST_Intersects(r1.geometry, r2.Geometry) AND WITHIN(r1.Geometry, r2.Geometry);

The first three lines work just fine, the process of merging by fullname attribute for the 110,000 segments takes under one minute.  However, the ST_Intersection task is very slow.  Lines four and five execute without issue, but the final line takes over ten hours (I killed it after that point).  I also tried it as "INSERT INTO intersections (Geometry) SELECT ST_Intersection (r1.Geometry, r2.Geometry) FROM roads_merged as r1, roads_merged as r2 WHERE r1.fullname <> r2.fullname", which also was still running at the ten hour mark.  The roads_merged table has 6,500 lines in it, so I think this is way too long.  I would very much appreciate any advice you may be able to give.

Thanks in advance, EDA

Micha Silver

unread,
Mar 31, 2013, 3:19:28 PM3/31/13
to E. D. Antmann, spatiali...@googlegroups.com
On 03/31/2013 06:36 PM, E. D. Antmann wrote:
Hello All,

I am a long-time reader but first-time poster, so thanks for all the past help in this form.ן¿½ I have a file of approx 110,000 roads (line segments) from TIGER, and am trying to identify all of the intersection points.ן¿½ To do this, I first merge them by their name attribute, in order to avoid intersections between segments of the same road, and then run ST_Intersection.ן¿½ My code is below, after importing the TIGER shapefile as daderoads:


CREATE TABLE roads_merged (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, fullname TEXT NOT NULL);
SELECT AddGeometryColumn('roads_merged','Geometry', 3086, 'MULTILINESTRING', 'XY');
INSERT INTO roads_merged (fullname, Geometry) SELECT FULLNAME, CastToMultiLinestring(GUnion(Geometry)) FROM daderoads GROUP BY FULLNAME;
CREATE TABLE intersections (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT);
SELECT AddGeometryColumn('intersections','Geometry', 3086, 'POINT', 'XY');
INSERT INTO intersections (Geometry) SELECT ST_Intersection (r1.Geometry, r2.Geometry) FROM roads_merged as r1, roads_merged as r2 WHERE r1.fullname <> r2.fullname AND ST_Intersects(r1.geometry, r2.Geometry) AND WITHIN(r1.Geometry, r2.Geometry);

The first three lines work just fine, the process of merging by fullname attribute for the 110,000 segments takes under one minute.ן¿½ However, the ST_Intersection task is very slow.ן¿½ Lines four and five execute without issue, but the final line takes over ten hours (I killed it after that point).ן¿½ I also tried it as "INSERT INTO intersections (Geometry) SELECT ST_Intersection (r1.Geometry, r2.Geometry) FROM roads_merged as r1, roads_merged as r2 WHERE r1.fullname <> r2.fullname", which also was still running at the ten hour mark.ן¿½ The roads_merged table has 6,500 lines in it, so I think this is way too long.ן¿½ I would very much appreciate any advice you may be able to give.


Adding a spatial index on the road_merged table will surely help.
For details, See:
http://northredoubt.com/n/2012/01/18/spatialite-and-spatial-indexes/
http://www.surfaces.co.il/?p=1196
and (somewhat old)
http://www.gaia-gis.it/spatialite-2.4.0-4/spatialite-cookbook/html/rtree.html

Thanks in advance, EDA
--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To post to this group, send email to spatiali...@googlegroups.com.
Visit this group at http://groups.google.com/group/spatialite-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
ן¿½
ן¿½

E. D. Antmann

unread,
Mar 31, 2013, 4:03:15 PM3/31/13
to spatiali...@googlegroups.com, E. D. Antmann
Hello Micha,

Thanks for the reply.  I read those articles and replaced the final INSERT INTO line with the following: 

INSERT INTO intersections (Geometry) SELECT ST_Intersection (r1.Geometry, r2.Geometry) FROM roads_merged as r1, roads_merged as r2 WHERE r1.ROWID IN (Select ROWID FROM SpatialIndex WHERE f_table_name='roads_merged' AND search_frame='roads_merged') AND r1.fullname <> r2.fullname;

after creating a spatial index using: 

SELECT CreateSpatialIndex('roads_merged','Geometry');

This seemed like it would help, because when I ran EXPLAIN QUERY PLAN it gave the following:

SEARCH TABLE roads_merged AS r1 USING INTEGER PRIMARY KEY (rowid=?) (~25 rows)
EXECUTE LIST SUBQUERY 1
SCAN TABLE SpatialIndex VIRTUAL TABLE INDEX 2: (~0 rows)
SCAN TABLE roads_merged AS r2 (~500000 rows)

However, when I executed the code, it returned zero intersections in 0.000 seconds, even though many, many intersections do actually exist.  I'm not quite sure what I did wrong, as I have never worked with spatial indexes before.  Thanks, EDA

a.fu...@lqt.it

unread,
Apr 1, 2013, 4:39:22 AM4/1/13
to spatiali...@googlegroups.com
> SELECT ST_Intersection(r1.Geometry, r2.Geometry)
> FROM roads_merged as r1, roads_merged as r2
> WHERE r1.ROWID IN (
> Select ROWID
> FROM SpatialIndex
> WHERE f_table_name='roads_merged' AND search_frame='roads_merged')
>

I see something very dubios in the above SLQ snippet

WHERE f_table_name='roads_merged' AND search_frame='roads_merged'

"roads_merged" is a table name, so it's the expected value to
be passed to "f_table_name"; but "search_frame" is expecting to
receive some Geometry; surely not a text string.
so I suppose it should probably be:

WHERE f_table_name='roads_merged' AND search_frame = r2.Geometry

bye Sandro

--
Il messaggio e' stato analizzato alla ricerca di virus o
contenuti pericolosi da MailScanner, ed e'
risultato non infetto.

E. D. Antmann

unread,
Apr 1, 2013, 3:05:00 PM4/1/13
to spatiali...@googlegroups.com
Thanks again for that tip.  I put that updated SELECT line into the code, but it didn't seem to help matters, it is still going strong for several hours.  I threw the same dataset and task into ArcGIS, by using Intersect -> Line input, Point output, and it took under two minutes, so I still think something is amiss.  Hardware should be a non-issue, everything is going on in RamDisk with 8 Gb of memory and a Xeon.  Any other ideas on why this is taking so long?  The dataset has 6600 lines, and ArcGIS found a total of 56,000 intersection points.

a.fu...@lqt.it

unread,
Apr 1, 2013, 5:33:12 PM4/1/13
to spatiali...@googlegroups.com
> I threw the same dataset and task into ArcGIS, by using Intersect ->
> Line
> input, Point output, and it took under two minutes
> The dataset has 6600 lines, and ArcGIS found a total of 56,000
> intersection
> points.

I've performed a similar test: I've surely used a different TIGER
dataset from
your (I've just picked up one at random: for the sake of precision it
was
"tl_rd13_37021_roads"), and my HW surely is less powerful then your
(good old
HDD, 4 GB ram, Core i5) so we cannot pretend to seriously compare your
timings
and mines.

for the chronicles: I've measured a timing of about 2 minutes (18,000
roads,
62,000 intersection points), which falls very close to the other one.

so we can reasonably assume that "about 2 minutes" is the optimal time
required
to resolve this class of problem using a well designed approach.
any other worst timing is a clear symptom that there is something wrong
in
the adopted approach strategy.


> Any other ideas on why this is taking so long?
>

post-mortem report
------------------
let's go to discover what gone wrong in your initial approach; you've
first aggregated all roads as MultiLinestrings by their FullName.

this doesn't seems to be appropriate for TIGER roads; as I quickly
discovered, the surprisingly number of roads has a nice "white space"
name; on my TIGER dataset about 5,000 roads have no name at all.

so, when aggregating by FullName, you are actually creating a "huge
monster",
i.e. an incredibly complex MultiLinestring covering more or less the
whole map,
and aggregating all roads of unspecified (i.e. "white") name.

an abnormal geometry like this one has two interesting properties:
a) it will surely take a very long time to be evaluated by any Spatial
SQL function
b) it conceptually kills any possible optimization based on the Spatial
Index, because its huge BBOX will probably contain any other road
(thus making absolutely meaningless using the R*Tree filtering
strategy).

overall result: an approach like this one is kind-of "negative
optimization";
not surprisingly, the required time under these conditions will
certainly be
terrifying slow.

------------
if you are not completely convinced by this explanation, you can simply
run
yet again your original query, by simply applying this small change:

SELECT ST_Intersection (r1.Geometry, r2.Geometry)
FROM roads_merged as r1, roads_merged as r2
WHERE r1.fullname <> r2.fullname AND r1.fullname <> ' ' AND r2.fullname
<> ' '
AND ST_Intersects(r1.geometry, r2.Geometry) AND WITHIN(r1.Geometry,
r2.Geometry);

please note: there is absolutely no Spatial Index here (brute force
solution) !!!
you'll soon discover that this slightly modified query will complete
its execution
just in few minutes (once you've properly instructed SQL to ignore at
all the
"ugly no-name monster").
-------



my own SQL query
----------------

INSERT INTO intersections (id, Geometry)
SELECT NULL, ST_Intersection (r1.Geometry, r2.Geometry) AS g
FROM tl_rd13_37021_roads as r1, tl_rd13_37021_roads as r2
WHERE ST_Intersects(r1.geometry, r2.Geometry) = 1
AND ST_GeometryType(g) = 'POINT' AND r1.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'tl_rd13_37021_roads'
AND search_frame = r2.Geometry)
AND r1.fullname <> r2.fullname;

few points worth to be noted:
1) I've directly used the input roads as they were; there is absolutely
no need to aggregate them by their FullName (and it makes any
further
step to be harder)
2) ST_Intersection (r1.Geometry, r2.Geometry) doesn't always return
a POINT; I noticed (at least on my sample dataset) that many
LINESTRING and MULTIPOINT were returned
3) so I've introduced the "AND ST_GeometryType(g) = 'POINT'" clause
in order to discard all intersections not being a simple Point.
4) I've still included the "AND r1.fullname <> r2.fullname" clause
simply to maintain strict conformance to your initial query, but
I've serious doubts about it, once considered how many "no name"
roads are included in TIGER datasets.

E. D. Antmann

unread,
Apr 1, 2013, 10:58:30 PM4/1/13
to spatiali...@googlegroups.com
Sandro, thanks for your incredibly detailed response, it solved the problem fantastically.  Now, I feel awful to be a pest, but I have four or eight points on top of each other at each intersection.  I am sure this is an incredibly elementary mistake on my part.
Reply all
Reply to author
Forward
0 new messages