TWKB

387 views
Skip to first unread message

Rinigus

unread,
Apr 24, 2017, 2:59:59 AM4/24/17
to SpatiaLite Users
Hi,

in addition to available formats for storing geometries, it would be great to get support to TWKB (https://github.com/TWKB/Specification/blob/master/twkb.md). When compared to other formats (WKB, Spatialite "Compressed"), TWKB allows you to reduce storage requirements significantly taking into account precision that you require. When tested on rendering Sweden on mobile using Mapnik (SQLite), I have not noticed any slowdown using TWKB. However, the database shrank from 1.22GB (WKB) to 563MB (TWKB). TWKB is supported by PostGIS and would be appropriate to get its support in Spatialite as well. As far as I could see from the files, its supported by RT Topology Library https://git.osgeo.org/gogs/rttopo/librttopo as well.

Best wishes,

Rinigus

rinigus

unread,
Apr 26, 2017, 2:21:30 AM4/26/17
to spatiali...@googlegroups.com
As a continuation of TWKB request: 

for those wishing to test TWKB, I wrote a small utility converting from WKB to TWKB https://github.com/rinigus/wkb2twkb-sqlite . TWKB is also available as a format for SQLite geometry in Mapnik (current master branch)

Rinigus

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-users+unsubscribe@googlegroups.com.
To post to this group, send email to spatialite-users@googlegroups.com.
Visit this group at https://groups.google.com/group/spatialite-users.
For more options, visit https://groups.google.com/d/optout.

Jukka Rahkonen

unread,
Apr 27, 2017, 4:30:58 AM4/27/17
to SpatiaLite Users
Hi,

What is the difference in the db size if you compare it to compressed SpatiaLite geometries? This document suggest that for lines and polygons the saving is about 50% http://www.gaia-gis.it/gaia-sins/SpatiaLite-Geometries-Addendum.pdf. If the difference is not remarkable big, do you consider that TWKB has some other features that makes it better than compressed Spatialite geometries?

-Jukka Rahkonen-


On Wednesday, April 26, 2017 at 9:21:30 AM UTC+3, Rinigus wrote:
As a continuation of TWKB request: 

for those wishing to test TWKB, I wrote a small utility converting from WKB to TWKB https://github.com/rinigus/wkb2twkb-sqlite . TWKB is also available as a format for SQLite geometry in Mapnik (current master branch)

Rinigus
On Mon, Apr 24, 2017 at 9:59 AM, Rinigus <rinig...@gmail.com> wrote:
Hi,

in addition to available formats for storing geometries, it would be great to get support to TWKB (https://github.com/TWKB/Specification/blob/master/twkb.md). When compared to other formats (WKB, Spatialite "Compressed"), TWKB allows you to reduce storage requirements significantly taking into account precision that you require. When tested on rendering Sweden on mobile using Mapnik (SQLite), I have not noticed any slowdown using TWKB. However, the database shrank from 1.22GB (WKB) to 563MB (TWKB). TWKB is supported by PostGIS and would be appropriate to get its support in Spatialite as well. As far as I could see from the files, its supported by RT Topology Library https://git.osgeo.org/gogs/rttopo/librttopo as well.

Best wishes,

Rinigus

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-use...@googlegroups.com.
To post to this group, send email to spatiali...@googlegroups.com.

mj10777

unread,
Apr 27, 2017, 4:43:58 AM4/27/17
to SpatiaLite Users


On Thursday, 27 April 2017 10:30:58 UTC+2, Jukka Rahkonen wrote:
Hi,

What is the difference in the db size if you compare it to compressed SpatiaLite geometries? This document suggest that for lines and polygons the saving is about 50% http://www.gaia-gis.it/gaia-sins/SpatiaLite-Geometries-Addendum.pdf. If the difference is not remarkable big, do you consider that TWKB has some other features that makes it better than compressed Spatialite geometries?
If you read the portion that explains where the 50% reduction comes from, then it is clear that this is a 'lossy' format
- replacing 64-bit values (doubles) with 32-but values (floats), which could cause loss of precision
-- for LINESTRING AND POLYGON point (with the exception of the first and last point)

TWKB seems to offer a lossless solution.

Mark

rinigus

unread,
Apr 27, 2017, 5:12:21 AM4/27/17
to spatiali...@googlegroups.com
Hi,

in practice, TWKB is used as a lossy format as well. You specify precision while compressing the data with the 7 digits after the comma maximal precision supported by LWGEOM. 

As for space savings, that is very much dependent on your data (as usual). In the following examples, I used OSM data and precision 6 which seems to be adequate for drawing road maps. Here, I compare compression with WKB. To compare with Sparialite Compressed, I would have to take time to adjust scripts. 

When compared with Spatialite compressed format, you could immediately see that TWKB is much less "chatty" in the header. There is an advantage even to use it for POINTs. For example, OSM building label coordinates (GEOMETRY BLOB) are compressed from WKB (double fp coordinates) to TWKB with the factor of 2 (48% remained from original WKB). Compared to WKB, I presume that all Spatialite formats are larger due to included BBOX. 

When we talk about LINESTRING and POLYGON, I usually observe compression of Geometry BLOBs in the range of 70-75% (remaining size is 25-30% for geometry blobs). 

When talking about Mapnik-ready datasets that include RTree, labels, and some additional info, I get reduction (from WKB -> TWKB in MBs):

Denmark: 759 -> 446 (60% remained, this country has rather few buildings entered as polygons, mainly label+point)
Estonia: 319 -> 151 (47% remained)
Sweden: 1251 -> 558 (45% remained)

As for additional advantage, TWKB is supported by Mapnik and PostGIS. Mapnik does not support Spatialite Compressed (at least not yet).

Rinigus


To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-users+unsubscribe@googlegroups.com.
To post to this group, send email to spatialite-users@googlegroups.com.

a.fu...@lqt.it

unread,
Apr 27, 2017, 6:08:13 AM4/27/17
to spatiali...@googlegroups.com
On Thu, 27 Apr 2017 12:12:20 +0300, rinigus wrote:
> When compared with Spatialite compressed format, you could
> immediately
> see that TWKB is much less "chatty" in the header. There is an
> advantage even to use it for POINTs.
>

please note: including an explicit definition of the BBOX
is a strict requirement for all Geometry formats supported
by SpatiaLite, for at least two reasons:
1. it supports very fast spatial queries filtered by
an arbitrary BBOX when no SpatialIndex is available.
2. it allows to quickly check (or even rebuild) an
existing Spatial Index.

so an hypothetical implementation of TWKB on SpatiaLite
will certainly require to expand the header so to
include the BBOX definition (unless we are planning
to completely rewrite from scratch a substantial
portion of the existing code, which obviously is an
hard and intensively time consuming task, and that will
very probably introduce a lot of unexpected regressions
posing at serious risk the library's stability and
robustness).

may be I'm wrong, but it seems to me that the game is not
worth the candle. much more when considering that the
expected size benefits aren't at all impressive when
compared to the already available "compressed geom".

what is realistically possible with reasonable
little effort is implementing two further SQL
functions: ToTWKB() and FromTWKB()
(strictly modeled on TOEWKB/FromEWKB, ToWKB/FromWKB
and alike).

this will be enough to create and populate pure
SQLite tables internally storing TWKB geometries;
such tables will not be "vanilla" spatial tables
(they will lack supporting triggers, spatial
index and alike), but will still support a
reasonable level of spatial data processing.
it surely is a compromise, but it's a reasonable
one.

bye sandro


mj10777

unread,
Apr 27, 2017, 6:11:27 AM4/27/17
to SpatiaLite Users


On Thursday, 27 April 2017 11:12:21 UTC+2, Rinigus wrote:
Hi,

in practice, TWKB is used as a lossy format as well. You specify precision while compressing the data with the 7 digits after the comma maximal precision supported by LWGEOM. 
Yes, but the point is that one could store it without loss.
More than 7 is basically unrealistic other that for permanent storage..  

As for space savings, that is very much dependent on your data (as usual). In the following examples, I used OSM data and precision 6 which seems to be adequate for drawing road maps. Here, I compare compression with WKB. To compare with Sparialite Compressed, I would have to take time to adjust scripts. 

When compared with Spatialite compressed format, you could immediately see that TWKB is much less "chatty" in the header. There is an advantage even to use it for POINTs. For example, OSM building label coordinates (GEOMETRY BLOB) are compressed from WKB (double fp coordinates) to TWKB with the factor of 2 (48% remained from original WKB). Compared to WKB, I presume that all Spatialite formats are larger due to included BBOX.
This is helpful for many other reasons, 
- avoiding reading all the data to retrieve the BBOX.
-- that would be 4 doubles 
Plus integers for srid, DimensionModel and DeclaredType and pointers to the internal structures
- most of which TWKB also has (BBOX,DimensionModel and DeclaredType as optional) and pointers to the internal structures
-- no direct srid storage support [bit 6 in the Metadata Header could be used for that]

So other that the 3 optional and the missing srid are mandatory 
- the functionality of the structures are similar in nature 

POINTS are not 'compressed' at all. 

 

When we talk about LINESTRING and POLYGON, I usually observe compression of Geometry BLOBs in the range of 70-75% (remaining size is 25-30% for geometry blobs). 

When talking about Mapnik-ready datasets that include RTree, labels, and some additional info, I get reduction (from WKB -> TWKB in MBs):

Denmark: 759 -> 446 (60% remained, this country has rather few buildings entered as polygons, mainly label+point)
Estonia: 319 -> 151 (47% remained)
Sweden: 1251 -> 558 (45% remained)

As for additional advantage, TWKB is supported by Mapnik and PostGIS. Mapnik does not support Spatialite Compressed (at least not yet).
It would be interesting to learn how many people actually use the Un/CompressGeometry at all
- I don't remember anybody asking about it in the last few years here

Mark

rinigus

unread,
Apr 27, 2017, 6:46:49 AM4/27/17
to spatiali...@googlegroups.com

what is realistically possible with reasonable
little effort is implementing two further SQL
functions: ToTWKB() and FromTWKB()
(strictly modeled on TOEWKB/FromEWKB, ToWKB/FromWKB
and alike).

That's exactly what I was suggesting - getting ToTWKB() and FromTWKB(), not full Spatialite support. This would allow to use Spatialite for data processing and later converting it into SQLite database with BLOBs that can be rolled out. In my case, on mobile where storage is rather limited.
 
this will be enough to create and populate pure
SQLite tables internally storing TWKB geometries;
such tables will not be "vanilla" spatial tables
(they will lack supporting triggers, spatial
index and alike), but will still support a
reasonable level of spatial data processing.
it surely is a compromise, but it's a reasonable
one.

Agreed. And if needed, geometry can be easily recovered by FromTWKB.

Rinigus 

Jaak L

unread,
Apr 28, 2017, 4:40:53 AM4/28/17
to SpatiaLite Users
Hello,

I just happened to do some tests with point data and polygons, with no attributes, my summary with plain and zipped files in brackets. For e.g. http transfer usually you have gz in transport layer, so zip is relevant. Used QGIS and CARTO for converting/creating files.
a) points: ~100K, world cities from geonames
  - geojson - 13.7M (1.5M zip)
  - geopackage - 11.5M (4.5M)
  - spatialite - 20.8M (5.8M)
  - shapefile - 5.9M (2.2M)
  - wkt - 12.7M (1.4M)
  - topojson - 5.1M (0.52M)
  - wkb/json (hex encode) - 10.2M (2.1M)
  - twkb/json (hex encode) - 6.3M (1.4M)

So twkb is about ~50% of WKB, and clear winner to my surprise is topojson. sqlites are huge probably due to not optimizisation, include big proj tables perhaps.

b) polygons: ~200 local admin area coverage, quite complex
 - geojson 27.3M (7.2M)
 - geopackage 20.3 M (15M)
 - spatialite - 25.8M (15.4M)
 - shapefile - 21M (14.9M)
 - wkt - 42.3 M (16.2M)
 - wkb/json  - 40.2M (18M)
 - twkb/json - 5.2M (1.9M)

Now here the wkb vs twkb difference is already huge, almost 10x! Probably due to quite compact shape of the polygons and zigzag encoding explains huge difference here.

Topojson for polygons depends on simplification, I used https://shancarter.github.io/distillery/ with different point reduction %, with 5% general view did not have significant distortions, below that data become clearly worse. Generally size reduces linearly:
 - topojson 'lossless' 100% points - 1.7M (225K)
 - topojson with 5% points - 105K (35K). 

Clearly also here topojson is way better than anything even with lossless parameters, due to the nature of admin areas, where most of the points and borders (except outer outline) are duplicated in simple geometry model.

So looking at this I'd forget about twkb and go straight to topo model, if the data size is your top priority. Also both sqlite files are surprising big and therefore less suitable for compact transfer compared to even text-based formats.

I put my data here if someone wants to check some details: https://www.dropbox.com/s/5pjl5j9odkynxlj/point-polygon-encoding-tests.zip?dl=0

Jaak

Jukka Rahkonen

unread,
Apr 28, 2017, 5:24:25 AM4/28/17
to SpatiaLite Users


On Friday, April 28, 2017 at 11:40:53 AM UTC+3, Jaak L wrote:


So looking at this I'd forget about twkb and go straight to topo model, if the data size is your top priority. Also both sqlite files are surprising big and therefore less suitable for compact transfer compared to even text-based formats.


Jaak, I know that you are professional so you must had a very bad day if you have been searching minimal file size with full populated metadata tables in Spatialite and Geopackage.

The size of the Spatialite DB with one polygon and spatial_ref_sys table with all possible 4924 projections:
5591040 bytes
The size of the Spatialite DB with one polygon and spatial_ref_sys table with just one projection definition:
233472 bytes.

The numbers for geopackage would probably be something similar. If you compare just the file size with file formats you should definitely repeat your tests with truncated spatial_ref_sys.

-Jukka Rahkonen-



 

a.fu...@lqt.it

unread,
Apr 28, 2017, 5:33:58 AM4/28/17
to spatiali...@googlegroups.com
On Fri, 28 Apr 2017 01:40:53 -0700 (PDT), Jaak L wrote:
> So twkb is about ~50% of WKB, and clear winner to my surprise is
> topojson.
>

Hi Jaak,

not so much surprisingly.
any topology will define just once any Edge separating two
adjacent Faces, and will thus require about half the storage
space required by the conventional representation of
Polygons.
anyway processing a topology is surely slower and requires
much more system resources; it's always a bargain between
space and speed.


> sqlites are huge probably due to not optimizisation, include
> big proj tables perhaps.
>

please, don't compare apples and oranges: all the others
are just "data formats" whilst SpatiaLite is a real
Spatial DBMS.

the specific goal of SpatiaLite is to ensure fast and
efficient spatial processing (usually involving large
datasets) in a safe and affordable environment.
it's strongly optimized on speed, stability and
robustness; storage space is not so relevant.

yet again: any optimization always requires a trade-off;
a family car is not necessarily best or worst than an
armoured fighting vehicle ... it depends on your
very specific requirements.

bye Sandro

Jukka Rahkonen

unread,
Apr 28, 2017, 6:55:57 AM4/28/17
to SpatiaLite Users
Hi Jaak,

I found another place which feels like comparing apples to oranges. If you reduce the number of vertices to 5 percent in topojson, you should test how much the same simplified data takes space in other formats.

I made a quick test by using your file "omav-all (5-pc).topojson.json" as source data for GDAL

G:\test\JaakL\ogr2ogr -f sqlite -dsco spatialite=yes -dsco init_with_epsg=no fromtopo5.sqlite "omav-all (5-pc).topojson.json"
G:\test\JaakL>ogr2ogr -f sqlite -dsco spatialite=yes -dsco init_with_epsg=no -lco compress_geom=yes fromtopo5compressed.sqlite "omav-all (5-pc).topojson.json"

The file sizes are

105442 omav-all (5-pc).topojson.json
491520 fromtopo5.sqlite
372736 fromtopo5compressed.sqlite

Compressed and simplified Spatialite database is still 3.5 times bigger than the topojson file but the difference is not at all as stunning as it was in your first numbers: 25.8 MB vs. 105 KB.

-Jukka Rahkonen-

Jaak L

unread,
Apr 28, 2017, 7:14:42 AM4/28/17
to SpatiaLite Users


On Friday, April 28, 2017 at 11:33:58 AM UTC+2, sandro furieri wrote:
On Fri, 28 Apr 2017 01:40:53 -0700 (PDT), Jaak L wrote:
> So twkb is about ~50% of WKB, and clear winner to my surprise is
> topojson.
>

not so much surprisingly.
any topology will define just once any Edge separating two
adjacent Faces, and will thus require about half the storage
space required by the conventional representation of
Polygons.

btw, this remark was about Points, I have still no logical idea why points with topology model should reduce 50% of space, there are no shared edges, no duplicated points in dataset etc. Polygons, of course, are different story and there topo-reducing improved ~90% compared plain space.
 
> sqlites are huge probably due to not optimizisation, include
> big proj tables perhaps.
>

please, don't compare apples and oranges: all the others
are just "data formats" whilst SpatiaLite is a real
Spatial DBMS.


Sure, I was actually trying to find relative (not really absolute) difference between wkb and twkb there with quick tools. As this is spatialite list, then I did simple default Spatialite export from qgis just to have some rough relative idea out of curiosity; the export is not optimized in any way, don't know if it has indexes etc. Others are added just to have a bit general background. 

By the way, some GIS APIs do use "geopackage" as one of data export options (carto, geoserver). Some may blindly assume that as it is binary data then it must be automatically more effective and smaller data size compared to texts like e.g. geojson. But it seems that this is not so really, and it depends on your data and requirements. 

Still, if some universal data encoding would be needed with data reduction in mind, then topo model would make sense. I tried in addition now binary topo encoding with https://github.com/mapbox/pygeobuf, same files
a) points
 - geobuf - 3.1M (1.2M)

b) polygons
 - geobuf - 3.2M (2.9M)

Simplification in distillery and geobuf seems to use different parameters, so these are not directly comparable, but what is interesting here is that binary geobuf does not compress well (at least with default tools/settings), so my compressed topojson is over 2x smaller than compressed geobuf.

Drastically taking my original ~20M spatialite with polygons came down to ~200K (note: this is already before simplification!), which is ~100x smaller file. I would not suggest that spatialite should use internally topo data model in a generic way, but if you design overall solution then it is good to know anyway.

Jaak

rinigus

unread,
Apr 28, 2017, 7:31:57 AM4/28/17
to spatiali...@googlegroups.com
Jaak,

Thank you very much for your analysis, its very helpful!

I looked into converting SQLite/Spatialite into SQLite/TWKB. Data is presented with SPATIAL_REF_SYS _SYS_AUX data dropped. Results are:

Points (city-spatialite.sqlite)

TWKB: 9MB / 3.9 (gzip)

With the usage

*** Page counts for all tables with their indices *****************************

IDX_CITY_SPATIALITE_GEOMETRY_NODE................. 1196        52.3% 
CITY_SPATIALITE................................... 712         31.1% 
IDX_CITY_SPATIALITE_GEOMETRY_ROWID................ 319         13.9% 

So, most of the data is in the index in this case.

Polygons (omav-all-spatialite.sqlite):

TWKB: 5.1 MB / 4.1 (gzip)

OMAV_ALL_SPATIALITE............................... 1226        95.7% 
SQLITE_MASTER..................................... 18           1.4% 
GEOMETRY_COLUMNS.................................. 3            0.23% 
IDX_OMAV_ALL_SPATIALITE_GEOMETRY_NODE............. 3            0.23% 

Here, its mainly the data.

So, sizes (for data) are a bit smaller than in JSON representation, but not in gzipped versions. For me, spatial index is required. Otherwise Mapnik would probably render it forever. As far as I understand, to efficiently use TopoJSON, I would have to store multiple objects in it that could share the topologies, right?

Rinigus


--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-users+unsubscribe@googlegroups.com.
To post to this group, send email to spatialite-users@googlegroups.com.

Jukka Rahkonen

unread,
Apr 28, 2017, 7:41:04 AM4/28/17
to SpatiaLite Users


On Friday, April 28, 2017 at 2:14:42 PM UTC+3, Jaak L wrote:


Drastically taking my original ~20M spatialite with polygons came down to ~200K (note: this is already before simplification!), which is ~100x smaller file. I would not suggest that spatialite should use internally topo data model in a generic way, but if you design overall solution then it is good to know anyway.

Jaak

How do you explain my numbers when I convert you 100% topojson back to Spatialite:

G:\test\JaakL>ogr2ogr -f sqlite -dsco spatialite=yes -dsco init_with_epsg=no -lco compress_geom=yes fromtopo100compressed.sqlite "omav-all (100-pc).topojson.json"
G:\test\JaakL>ogr2ogr -f sqlite -dsco spatialite=yes -dsco init_with_epsg=no fromtopo100.sqlite "omav-all (100-pc).topojson.json"

File sizes:
5857280 fromtopo100.sqlite
3170304 fromtopo100compressed.sqlite

I noticed also that you compare zipped topojson to unzipped Spatialite. The above databases zipped shrink to

1271116 fromtopo100.zip
436470 fromtopo100compressed.zip

-Jukka Rahkonen-

Jaak L

unread,
Apr 28, 2017, 8:39:29 AM4/28/17
to SpatiaLite Users
Yep, the data in topojson is quite different really. See maps in attachment, that's not as "lossless" as I assumed from converter UI. It depends on your target - if you use it to render it as raster on screen with specific resolution, then it might be ok, if you want to do some GIS with it, then you need to smooth it back (for polygons) or take lower accuracy into account for points. 

Now if rendering speed is important for you, then topojson is probably much slower compared to TWKB, which is probably somewhat slower to read/decode than plain WKB. For interactive (i.e. multi-zoom) fast rendering we are using vector tiles as source (in mbtiles), for example same "omav"  polygon data with reasonable resolution (zoom range 0...10, 10-m accuracy) takes 4.7M (2.3M zip) with this, not too bad (and 2x lower resolution gives already 4x smaller size); e.g. Mapnik would render a tile in below 100ms, independent on zoom level.


Jaak
 
topoimage.png
topoimage_points.png

rinigus

unread,
Apr 28, 2017, 8:59:27 AM4/28/17
to spatiali...@googlegroups.com
Jaak,

thank you for explanation! As far as I could see, on mobile, the rendering speed on Mapnik was similar to WKB and TWKB (SQLite database). There are more calculations involved, but maybe something else is rate-limiting. I have not tried using MBTiles with Mapnik - mainly due to limited understanding how I am supposed to plugin them into mapnik rendering pipeline. In addition, space wise, it looks like SQLite/TWKB databases are of similar size as tiles on openmaptiles.org.

Best wishes,

Rinigus
 

--
You received this message because you are subscribed to the Google Groups "SpatiaLite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spatialite-users+unsubscribe@googlegroups.com.
To post to this group, send email to spatialite-users@googlegroups.com.

a.fu...@lqt.it

unread,
Apr 28, 2017, 12:47:35 PM4/28/17
to spatiali...@googlegroups.com
Hi List,

TWKB is now supported by libspatialite.
two new SQL functions have been added adopting the same
signatures as in PostGIS:

AsTWKB ( geom Geometry, precision_xy Integer,
precision_z Integer, precision_m Integer,
with_size Boolean, with_bbox Boolean )

GeomFromTWKB ( twkb Blob, srid Integer )

practical examples:
-------------------
CREATE TABLE test_twkb AS
SELECT a, b, c, ...., AsTWKB(geom) AS geom
FROM some_other_table;

CREATE VIEW view_twkb AS
SELECT a, b, c, ..., GeomFromTWKB(geom, 4326) AS geom
FROM test_twkb;

fearless testers can immediately build their own
executables by downloading the most recent sources
from the Fossil repository.

bye Sandro

rinigus

unread,
Apr 28, 2017, 3:44:01 PM4/28/17
to spatiali...@googlegroups.com
Sandro,

thank you very much!

Rinigus

Reply all
Reply to author
Forward
0 new messages