> I threw the same dataset and task into ArcGIS, by using Intersect ->
> Line
> input, Point output, and it took under two minutes
> The dataset has 6600 lines, and ArcGIS found a total of 56,000
> intersection
> points.
I've performed a similar test: I've surely used a different TIGER
dataset from
your (I've just picked up one at random: for the sake of precision it
was
"tl_rd13_37021_roads"), and my HW surely is less powerful then your
(good old
HDD, 4 GB ram, Core i5) so we cannot pretend to seriously compare your
timings
and mines.
for the chronicles: I've measured a timing of about 2 minutes (18,000
roads,
62,000 intersection points), which falls very close to the other one.
so we can reasonably assume that "about 2 minutes" is the optimal time
required
to resolve this class of problem using a well designed approach.
any other worst timing is a clear symptom that there is something wrong
in
the adopted approach strategy.
> Any other ideas on why this is taking so long?
>
post-mortem report
------------------
let's go to discover what gone wrong in your initial approach; you've
first aggregated all roads as MultiLinestrings by their FullName.
this doesn't seems to be appropriate for TIGER roads; as I quickly
discovered, the surprisingly number of roads has a nice "white space"
name; on my TIGER dataset about 5,000 roads have no name at all.
so, when aggregating by FullName, you are actually creating a "huge
monster",
i.e. an incredibly complex MultiLinestring covering more or less the
whole map,
and aggregating all roads of unspecified (i.e. "white") name.
an abnormal geometry like this one has two interesting properties:
a) it will surely take a very long time to be evaluated by any Spatial
SQL function
b) it conceptually kills any possible optimization based on the Spatial
Index, because its huge BBOX will probably contain any other road
(thus making absolutely meaningless using the R*Tree filtering
strategy).
overall result: an approach like this one is kind-of "negative
optimization";
not surprisingly, the required time under these conditions will
certainly be
terrifying slow.
------------
if you are not completely convinced by this explanation, you can simply
run
yet again your original query, by simply applying this small change:
SELECT ST_Intersection (r1.Geometry, r2.Geometry)
FROM roads_merged as r1, roads_merged as r2
WHERE r1.fullname <> r2.fullname AND r1.fullname <> ' ' AND r2.fullname
<> ' '
AND ST_Intersects(r1.geometry, r2.Geometry) AND WITHIN(r1.Geometry,
r2.Geometry);
please note: there is absolutely no Spatial Index here (brute force
solution) !!!
you'll soon discover that this slightly modified query will complete
its execution
just in few minutes (once you've properly instructed SQL to ignore at
all the
"ugly no-name monster").
-------
my own SQL query
----------------
INSERT INTO intersections (id, Geometry)
SELECT NULL, ST_Intersection (r1.Geometry, r2.Geometry) AS g
FROM tl_rd13_37021_roads as r1, tl_rd13_37021_roads as r2
WHERE ST_Intersects(r1.geometry, r2.Geometry) = 1
AND ST_GeometryType(g) = 'POINT' AND r1.ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = 'tl_rd13_37021_roads'
AND search_frame = r2.Geometry)
AND r1.fullname <> r2.fullname;
few points worth to be noted:
1) I've directly used the input roads as they were; there is absolutely
no need to aggregate them by their FullName (and it makes any
further
step to be harder)
2) ST_Intersection (r1.Geometry, r2.Geometry) doesn't always return
a POINT; I noticed (at least on my sample dataset) that many
LINESTRING and MULTIPOINT were returned
3) so I've introduced the "AND ST_GeometryType(g) = 'POINT'" clause
in order to discard all intersections not being a simple Point.
4) I've still included the "AND r1.fullname <> r2.fullname" clause
simply to maintain strict conformance to your initial query, but
I've serious doubts about it, once considered how many "no name"
roads are included in TIGER datasets.