Tips on shrinking databases while still keeping most features?

40 views
Skip to first unread message

Nolan Darilek

unread,
Jun 16, 2016, 11:00:14 AM6/16/16
to spatiali...@googlegroups.com
I'm building a GPS app for blind users. It's a bit different than your
typical Android navigation app in that it doesn't just show you a map,
place a pointer representing you, then let you interpret the results.
Instead it actually determines what street/path/river you're likely on,
tells you, determines if you're 30-40 meters from an intersection, then
describes that intersection ("3-way intersection: Avenue A. crossing
Guadalupe St. on the right.")


So obviously this needs some level of geospatial analysis. I have PoC
code running with Mapsforge which, while not a geospatial store, gets me
to about 80-90% of where I need to be. It describes some intersections
incorrectly though, and I'm basically faking more advanced geospatial
support in a way that isn't entirely accurate.


One thing Mapsforge does get right, though, is file size. Texas' map is
something like 180MB. My Spatialite ogr2ogr import starts at 1.5 GB,
then balloons another 1.5 GB when I add routing. I use the routing
tables to identify intersections but would eventually like to provide
pedestrian routes.


I'd rather use Spatialite for a host of reasons, but does anyone have
tips for shrinking the database size? I recognize that comparing a
geospatial store to a tile format is apples to oranges, so this isn't a
Spatialite criticism at all.


One obvious idea would be stripping out unnecessary OSM data, but I'm
not sure what this might be. In particular I'd like to provide rich
coverage, so if you leave a road and step onto a trail through a park,
or hop in a kayak, you get the same descriptions of trails and waterways
that you'd get with streets. But are there any areas of OSM data that I
might trim and save space that I'm simply unaware of? Or is ogr2ogr not
efficient in some way that other importers are?


Also, are there any other ways other than OSM trimming? I do vacuum the
database, but maybe there are SQLite settings or ogr2ogr switches that
would reduce data sizes further?


Worst case, is there a spatial data format that is more compact? I'd
like to eventually offer map downloads, but my understanding is that
Texas' map is on the small size, and other country's province maps are
even larger due to denser OSM coverage.


Thanks!

Jukka Rahkonen

unread,
Jun 16, 2016, 2:25:54 PM6/16/16
to spatiali...@googlegroups.com, Nolan Darilek
Hi,

I would make some analysis with the data first:

- How many features you have in the database?
- What kind of features, do you really need them all?
- How dense the geometries are, could you remove some vertices by
simplification?
- Are there unnecessary attributes?

If ogr2ogr is used with the default osmconf.ini the resulting
Spatialite-db is roughly 5 times bigger than the .pbf file.
http://download.geofabrik.de/north-america/us/texas-latest.osm.pbf is
255 MB so you seem to get about the same ratio.

-Jukka Rahkonen-

mj10777

unread,
Jun 17, 2016, 12:17:04 AM6/17/16
to SpatiaLite Users


On Thursday, 16 June 2016 17:00:14 UTC+2, Nolan Darilek wrote:
I'm building a GPS app for blind users. It's a bit different than your
typical Android navigation app in that it doesn't just show you a map,
place a pointer representing you, then let you interpret the results.
Instead it actually determines what street/path/river you're likely on,
tells you, determines if you're 30-40 meters from an intersection, then
describes that intersection ("3-way intersection: Avenue A. crossing
Guadalupe St. on the right.")


So obviously this needs some level of geospatial analysis. I have PoC
code running with Mapsforge which, while not a geospatial store, gets me
to about 80-90% of where I need to be. It describes some intersections
incorrectly though, and I'm basically faking more advanced geospatial
support in a way that isn't entirely accurate.


One thing Mapsforge does get right, though, is file size. Texas' map is
something like 180MB. My Spatialite ogr2ogr import starts at 1.5 GB,
then balloons another 1.5 GB when I add routing. I use the routing
tables to identify intersections but would eventually like to provide
pedestrian routes.


I'd rather use Spatialite for a host of reasons, but does anyone have
tips for shrinking the database size
Have you experimented with the CompressGeometry(geometry) function?

For (MULTI) LINESTRINGs and POLYGONs only
- with the exception of the first/last POINTs of each LINESTRING or POLYGON-Ring
-- each X/Y[Z/M] value will be stored as a 4-byte float, instead of a 8-byte double

The reading of this 'compressed' Geometry, is dealt with internally by spatialite.

Never having used this before, I wrote this script that:
- cloned a TABLE with the MULTIPOLYGONs (with a LINESTRING and Ring representation of each geometry)
-- of 631 complicated district and postal-code borders

The resulting size was about 1/3 less than the file without 'compression' (see comments in script).

SELECT DateTime('now'),'clone.berlin_admin_db.sql [begin] -with "BEGIN" and "COMMIT"';
SELECT DateTime('now'),'sample: rm  berlin_polygons_minimal.db ; spatialite berlin_polygons_minimal.db < clone.berlin_polygons.sql';
SELECT DateTime('now'),'sample: rm  berlin_polygons_compressed.db ; spatialite berlin_polygons_compressed.db < clone.berlin_polygons.sql';
--- 
SELECT DateTime('now'),'Goals of this script is to:"';
SELECT DateTime('now'),'--> clone the TABLE/VIEWs created with [MULTILINESTRINGs, MULTIPOLYGONs and a POINT].';
---
SELECT DateTime('now'),'ATTACH DATABASE ''berlin_admin_geometries.db'' AS db_import';
ATTACH DATABASE 'berlin_admin_geometries.db' AS db_import;
SELECT DateTime('now'),'--> cloning the TABLEs "';
SELECT DateTime('now'),'--> 631 complicated borders MULTIPOLYGON''S, with (MULTI) LINESTRING representative, RING and center POINT';
SELECT CloneTable('db_import','berlin_polygons','berlin_polygons',1,"::with-foreign-keys::","::with-triggers::");
---
SELECT DateTime('now'),'DETACH DATABASE ''berlin_admin_geometries.db'' AS db_import';
DETACH DATABASE db_import;
SELECT DateTime('now'),'size of berlin_polygons_minimal.db [no compression]: 20544512 bytes [20.5 MB]';
---
SELECT DateTime('now'),'calling CompressGeometry on:';
SELECT DateTime('now'),'- (MULTI)- POLYGON and LINESTRING Geometries';
SELECT DateTime('now'),'-> with the exception of the first and last POINTs ';
SELECT DateTime('now'),'--> each POINT will be stored as a 32-bit float, instead of a 64-bit double ';
SELECT DateTime('now'),'---> reducing each X/X[Z/M] value by 4 bytes';
---
UPDATE berlin_polygons SET
 soldner_polygon=CompressGeometry(soldner_polygon),
 soldner_linestring=CompressGeometry(soldner_linestring),
 soldner_ring=CompressGeometry(soldner_ring);
---
VACUUM;
SELECT DateTime('now'),'size of berlin_polygons_minimal.db [with compression]: 13933568 bytes [13.9 MB] [difference: 6610944 bytes ca. 33%]';
-- 
SELECT DateTime('now'),'-I-> [clone.berlin_polygons.sql] [end]';
---

The 631 LINESTRINGs loaded into QGIS with no noticeable speed difference.


This functionality has existed, at least, since spatialite version 3.0.1.

The Database designer would call 'CompressGeometry' only once, as done in the above script.

If you are using the spatialite api directly, such as checking the returned geometry-type (expecting: GAIA_LINESTRING)
- the you will receive : GAIA_COMPRESSED_LINESTRING instead

But otherwise you should not have to deal with this at all.

Hope this helps.

Mark
Reply all
Reply to author
Forward
0 new messages