Mark points as non-removable prior to rttopo generalization?

Alessandro Donati

unread,

Jul 18, 2017, 6:56:49 AM7/18/17

to SpatiaLite Users

Hi,

I'm attempting to use librttopo through spatialite. The aim is to simplify/generalize a geotable by preserving its topology.

I have followed the tutorial

- Import shapefile into a sqlite db
- SELECT CreateTopology('topo');
- SELECT TopoGeo_FromGeoTable('topo', 'main', 'geotable', NULL, 0, 512, -1);
- SELECT TopoGeo_ToGeoTableGeneralize('topo', 'main', 'geotable', NULL,'geotable_simple', 1.19);

So far so good.

My problem is that I have to process several shapefiles that have been clipped from a huge original shapefile.
The resulting simplified datasets, once reconstructed, may not perfectly match at borders where the original shapefile was clipped.

A solution could be to mark some points belonging to the contour of the clipped datasets as non-removable, so that, when generalization comes into play, these points are preserved even if douglas-peucker would remove them at the given tolerance.
Does some TopoGeo_xxx function exist to perform this task?

I've attached a picture describing the problem.

The simplified area is in red. The original area in brown. The neighbour dataset is in violet. 1 is the point that simpification removes (correctly, because it falls within the tolerance) but I would like to mark as non removable.

Glad to listen to a better solution if this is a totally wrong approach.

Thanks in advance

Alessandro

border_effect.png

a.fu...@lqt.it

unread,

Jul 18, 2017, 2:18:31 PM7/18/17

to spatiali...@googlegroups.com

On Tue, 18 Jul 2017 03:56:48 -0700 (PDT), Alessandro Donati wrote:
> My problem is that I have to process several shapefiles that have
> been
> clipped from a huge original shapefile.
> The resulting simplified datasets, once reconstructed, may not
> perfectly match at borders where the original shapefile was clipped.
>

Hi Alessandro,

if I understand correctly you've built an independent Topology for
each Shapefile, than you've extracted a generalized dataset from
each Topology and finally you've reassembled all generalized
datasets into a single dataset.

this doesn't seems to be a correct approach, because the aggregation
of two (or more) independent Topologies surely isn't a Topology.
in your specific case it's rather evident that the previous clipping
introduced many topological inconsistencies along the clipping lines;
the artifacts that you are now noticing simply are the direct
consequence of such inconsistencies.

I'll personally adopt the following approach:

1. create a single Topology
2. import all your Shapafiles into this Topology
3. carefully check Edges and Faces near to the old clipping lines;
you'll probably have to remove several Edges so to remove any
possible artifact badly introduced by the previous clipping.
4. once you've finally reconstructed a perfectly clean Topology
you'll be finally able to extract a correctly simplified
dataset fully respecting the underlaying Topology.

note: defining "magic points" intended to be specifically ignored
by the Douglas Peucker simplification algorithm can't be an
acceptable solution.
it obviously is an inelegant workaround intended to circumvent
problems caused by the poor quality of the input datasets.
is such cases the solution never is changing the algorithm; the
real solution if cleaning the data before processing them ;-)

bye Sandro

Alessandro Donati

unread,

Jul 19, 2017, 5:43:06 AM7/19/17

to SpatiaLite Users

Hi Sandro,

thanks for your reply

what you say makes perfectly sense to me. An obstacle is the huge size of the original shapefile (several GB) which makes importing the whole dataset into the topology a really slow process. That's why I had to clip it.

I've noticed that TopoGeo_FromGeoTable draws 25% of my CPU but not so much RAM (~300MB). The process of importing a single (clipped) dataset is very slow.

Some test data:

Clipped shapefile dataset
Size: ~7MB
Polygons: 207
Nodes (total number of polygon vertexes): 441586
CPU Time: user 1233.016304 sys 11.887276 on Intel Core i5-3320M 2.6 GHz, 8GB RAM (yet 32 bit process), SSD hard disk, Windows 7

based on your experience, are these performances reasonable given the dataset complexity or should I investigate more about this slowness? BTW, all components (sqlite, spatialite, librttopo, etc) are built in release mode using original makefile.vc settings

Is there an alternative workflow to what I have described (in terms of SELECT TopoGeo_xxx) to speed up the process, at expense of more RAM perhaps?

Thanks

Ciao
Alessandro

mj10777

unread,

Jul 19, 2017, 5:53:58 AM7/19/17

to SpatiaLite Users

On Wednesday, 19 July 2017 11:43:06 UTC+2, Alessandro Donati wrote:

Hi Sandro,

thanks for your reply

what you say makes perfectly sense to me. An obstacle is the huge size of the original shapefile (several GB) which makes importing the whole dataset into the topology a really slow process.

If this Shape-file is complete and precise, this might be worth the effort to import once.

Then any other Geometrys (with its Metadata) tables, that may not be precise

- could be exported with TopoGeo_ToGeoTable so that the less precise geometries would be replaced with the precise from the Topology

Mark

a.fu...@lqt.it

unread,

Jul 19, 2017, 6:31:22 AM7/19/17

to spatiali...@googlegroups.com

On Wed, 19 Jul 2017 02:43:05 -0700 (PDT), Alessandro Donati wrote:
> Hi Sandro,
>
> thanks for your reply
>
> what you say makes perfectly sense to me. An obstacle is the huge
> size
> of the original shapefile (several GB) which makes importing the
> whole
> dataset into the topology a really slow process. That's why I had to
> clip it.
>
> I've noticed that TopoGeo_FromGeoTable draws 25% of my CPU but not so
> much RAM (~300MB).
>

Alessandro,

I see that you are using an Intel i5 CPU, having 4 logical cores.
both SQLite and SpatiaLite are single-threaded, so a 25% workload
simply means that you are squeezing out any possible bit of
computational power from a single core.

> The process of importing a single (clipped) dataset is very slow.
>

Topology == boring slowness (but ultra-high quality)

this is always true in a general way, and the current implementation
of librttopo (the topology engine) could be hardly judged as a
"fast performer".

> Some test data:
>
> Clipped shapefile dataset
> Size: ~7MB
> Polygons: 207
> Nodes (total number of polygon vertexes): 441586
> CPU Time: user 1233.016304 sys 11.887276 on Intel Core i5-3320M 2.6
> GHz, 8GB RAM (yet 32 bit process), SSD hard disk, Windows 7
>
> based on your experience, are these performances reasonable given the
> dataset complexity or should I investigate more about this slowness?
>

your figures aren't at all exceptional.
during the preliminary Topology testing we frequently encountered
really huge datasets (in the "many GB/million features" range)
requiring a full week to be loaded into a Topology :-D

> BTW, all components (sqlite, spatialite, librttopo, etc) are built in
> release mode using original makefile.vc settings
>
> Is there an alternative workflow to what I have described (in terms
> of
> SELECT TopoGeo_xxx) to speed up the process, at expense of more RAM
> perhaps?
>

yes, you can usefully try using the more recent import interfaces
based on librttopo 1.1:

TopoGeo_FromGeoTableNoFaceExt()
and
TopoGeo_Polygonize()

they implement a two-stages import process:
1. first all Nodes and Edges are imported completely ignoring Faces.
2. the Faces are build on a separate final step.

caveat: these new functions are noticeably faster but they usually
require an impressive amount of RAM.
I suppose that using 64 bit software should be a practical requirement,
because 32 bit sw is physically limited to a 2GB address space, that
on Windows platforms usually downgrades to 1.5GB or (may easily be
if your RAM is highly fragmented) to a mere 0.8GB

bye Sandro

Alessandro Donati

unread,

Jul 19, 2017, 6:52:59 AM7/19/17

to SpatiaLite Users

Hi Sandro,

very explanatory answer.

Thank you

Ciao
Alessandro

Alessandro Donati

unread,

Jul 26, 2017, 9:51:06 AM7/26/17

to SpatiaLite Users

Hi Sandro,

just tried TopoGeo_FromGeoTableNoFace and TopoGeo_Polygonize in place of TopoGeo_FromGeoTable.

I have picked a small dataset just to see how it goes

TopoGeo_FromGeoTable -> CPU Time: user 451.108092 sys 3.915625

TopoGeo_FromGeoTableNoFace -> CPU Time: user 290.193060 sys 0.436803
TopoGeoPolygonize -> CPU Time: user 0.202801 sys 0.046800

so ~40% speed improvement (great!), but I don't see a difference in terms of RAM usage between the two. I have used spatialite.exe, which still draws 25% CPU but RAM is stable at 20 MB in both cases.

Can you explain it? RAM will come into play when the dataset size grows or am I missing something?

Thanks

Ciao,
Alessandro

a.fu...@lqt.it

unread,

Jul 26, 2017, 10:14:15 AM7/26/17

to spatiali...@googlegroups.com

ciao Alessandro,

your first assumption is true: processing a small dataset can never
require too much RAM.
you need to test some big/huge dataset if you really want to see
your RAM exploding.

Just for the sake of pedantic precision:
- TopoGeo_FromGeoTableNoFace() has no special memory requirements;
it's just a simplified version of TopoGeo_FromGeoTable() only
taking care of Nodes and Edges but completely ignoring Faces.
- it's TopoGeoPolygonize() that could eventually require a big
RAM allocation in order to load all Edges and rebuild all
Faces. As a rule-of-thumb you can expect than the RAM
allocation would be more or less proportional to the
number of Edges in your Topology.

bye sandro

Alessandro Donati

unread,

Jul 26, 2017, 10:38:40 AM7/26/17

to SpatiaLite Users

I see, thank you

Ciao
Alessandro

Reply all

Reply to author

Forward