GeoCoding & Reverse GeoCoding for Android using Spatialite and OSM data

995 views
Skip to first unread message

Simone Giannecchini

unread,
Sep 24, 2013, 7:43:14 PM9/24/13
to spatiali...@googlegroups.com
Dear List,
I have done some cursory searching over the list & the internet on the
topic of my mail with partial success.

We are looking into building an offline geocoder/reverse geocoder for
Android using OSM data and we would like to use SpatiaLite as the
backend. I would happy if you could share
some thouhgts/advices/past experiences and so on about this topic. We
intend to open source this component in the end in case we will have
to develop it ourselves, in case you wanted to ask :)

Regards,
Simone Giannecchini
==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for
more information.
==

Ing. Simone Giannecchini
@simogeo
Founder/Director

GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 1660272
mob: +39 333 8128928

http://www.geo-solutions.it
http://twitter.com/geosolutions_it

-------------------------------------------------------

a.fu...@lqt.it

unread,
Sep 25, 2013, 3:32:11 AM9/25/13
to spatiali...@googlegroups.com
Hi Simone,

this one certainly is a big good new.

Just few worlds of social introduction to the benefit of the List:
Simone is a first class international developer deeply involved in
several open source geospatial projects, most notably GeoServer and
GeoTools; and he is an OSGeo charter member since 2007.

AFAIK there is no currently available implementation of geocoding
based on the top of SpatiaLite; and if I remember well this topic
was never discussed before on this ML.
Anyway I'm pretty sure that supporting geocoding should really
be a killer app, not only on Android but on classic desktop/server
platforms as well.

Coming to more technical details (just to start a discussion seed):

- one possible way could be the one to implement a self-standing
geocoding module, may well be in Java if the intended target is
mainly intended to be Android.
in this case I suppose that simply using via JDBC the already
existing SpatiaLite's own spatial SQL core should be reasonably
enough.
anyway a solution like this one isn't my preferred one, because
such a mixed-language solution could probably discourage many
Windows/Linux users, although it looks perfectly adequate to
Android's own platform requirements.

- a further variant could be always developing a self-standing
geocoding module, this time designed as a classic C library.
such an approach will surely make happier Windows/Linux users,
but will probably require some specific JNI module in order
to support Java integration on Android increasing the overall
complexity.

- and finally we could imagine a third solution: implementing
a VirtualGeoCoder module fully based on standard SQLite's
VIRTUAL TABLE mechanisms.

We already a working example of something like, i.e. the
VirtualNetwork module supporting Routing; at first glance
GeoCoding doesn't looks too much different, speaking in very
general terms.

in both cases (routing and geocoding) we have some physical
table containing our input data (road network vs house numbers);
the VirtualTable internal logic could then implement any required
process finally returning back the expected solution(e.g. shortest
path travel solution or geoconding answer).

the most obvious advantages of the VirtualTable approach are:
1) a VirtualTable module will be smoothly integrated within any
other "ordinary" main core SQL feature
2) consequently any exotic query (routing or geocoding) will then
simply default to a pretty standard SQL query just requiring
to execute an appropriate SELECT statement returning in turn
a plain and ordinary resultset.
3) and this reasonably seems to be the better way to deploy the
same identical features on all platforms avoiding at all to
introduce further language-specific oddities/idiosyncrasies

bye Sandro

JaakL

unread,
Sep 25, 2013, 4:11:41 AM9/25/13
to spatiali...@googlegroups.com, simone.gi...@geo-solutions.it
Hello,

The only sqlite-based opensource geocoding database project what I've found is http://sourceforge.net/apps/mediawiki/osm4j/index.php?title=Geocoding . This seems to be dead project through. Then there are postgis-based solutions like built-in tiger geocoder (but it seems to be quite TIGER-specific), and Nominatim of OSM, but this is huge thing by itself. If you find, or even create something more, then I'd be really happy to know also. 

I'm afraid a truly international geocoder is huge task by itself, maybe could start with general API framework, enable plug in of country-specific implementations of this, which can use general datasources like OSM or country-specific ones (think TIGER).

Jaak

Jukka Rahkonen

unread,
Sep 25, 2013, 5:59:17 AM9/25/13
to spatiali...@googlegroups.com
Hi,

Some quick thoughts:

- We should help users who do geocoding (search coordinates by attribute)
so that they can use partial strings and application could create
automatically a selection list of candidates. This would mean LIKE
queries. Is is still correct information that LIKE in SQLite cannot
utilize index and are tricks in these documents are still valid?
http://joshua.perina.com/africa/gambia/fajara/post/converting-to-sqlite-and-like-query-optimization
http://web.utk.edu/~jplyon/sqlite/SQLite_optimization_FAQ.html

- For reverse geocoding Spatialite should do a "Nearest One" or perhaps
"Nearest N" query (because first hit from OSM address data may be rubbish)
and such function does not exist yet. Something like the indexed nearest N
search in PostGIS
http://boundlessgeo.com/2011/09/indexed-nearest-neighbour-search-in-postgis/
would be nice to have and not only for addresses but also generally.

Creating the address database from OSM data will be painful. First simple
thing to do is to select points with some address tag from OSM data and
save them into database. Next, the same should be done for the polygons
because great part of addresses is put into building polygons. I have used
centroid function and appended those features into the same table than
addressed collected from points. I am not sure if lines may have usable
addresses but if they do then centroid might suit for those too.

This is where the problem begins. All kind of funny features in OSM can
have addresses: restaurants, shops, bus stops and so on. The same address
may be found from a building polygon and from several POIs inside the
building. Very many addresses are incomplete and they have only some of
the common address tags
http://wiki.openstreetmap.org/wiki/Proposed_features/House_numbers/Karlsruhe_Schema#Tags

In addition, addr:city is often in tag "is_in".

I believe that I would not rely too much on the "addr:city" and "is_in"
tags but I would search some good old polygon dataset about municipality
borders and feed in the addr:city data through a "within" spatial query.
At least I would use that for filling the missing data and for quality
control of existing tags.

I haven't been thinking at all about the interpolated street numbers and
other advanced features in http://wiki.openstreetmap.org/wiki/Addresses
Perhaps it would be good to materialize some interpolated address points
with for example "Line_Interpolate_Equidistant_Points" function. If there
will be a dedicated geocoding tool it might be made clever enough to place
a not-really-in-the-data address "Harbour Street 6" in between existing
"Harbour Street 4" and "Harbour Street 8" even without a materialized
point feature in the address db.

It should be possible to use easily also other address sources that OSM.
Some countries have already published official addressed as open data and
there are more to come. Real OSM mappers live in a faith that all data
become better when they are converted and imported into OSM but I often
prefer using the native data. It should be no problem to have several
address tables in Spatialite and let users to select which one(s) are used
by the geocoder tool if tables share a common schema.

I have Finnish OSM addresses in my WFS and they can be used for testing.
Here is a sample GetFeature with maxfeatures=10
http://hip.latuviitta.org/cgi-bin/tinyows?service=wfs&version=1.0.0&maxfeatures=10&request=getfeature&typename=lv:osm_address

I have also some other addresses. This query takes 10 from a complete,
high quality address data of City of Helsinki
http://hip.latuviitta.org/cgi-bin/tinyows?service=wfs&version=1.0.0&maxfeatures=10&request=getfeature&typename=lv:hki_osoiteluettelo

This zip should contain a few million Finnish addresses from the open data
by National Land Survey of Finland in a Spatialite database
http://latuviitta.org/documents/mtk_osoitteet_2012.zip
Accuracy of these addresses is not so good but it is good enough for playing.

-Jukka Rahkonen-


Andrea P.

unread,
Sep 25, 2013, 7:26:28 AM9/25/13
to spatiali...@googlegroups.com, jukka.r...@latuviitta.fi
Hi Jukka,

I search in the sqlite site:
http://www.sqlite.org/optoverview.html

the newest sqlite 3.8.0 has a new query optimizer.
It after some condition will use the indexes with the LIKE and GLOB (case sensitive version od LIKE).

>The left-hand side of the LIKE or GLOB operator must be the name of an indexed column with TEXT affinity.
(from http://www.sqlite.org/optoverview.html)
>The right-hand side of the LIKE or GLOB must be either a string literal or a parameter bound to a string literal that does not begin with a wildcard character.

So , I guess
this

value like '%name%'
don't use indexes.

also this
value like '%name'
don't use indexes.

instead
this other
val like 'name%'

use indexes.

Regards,

Andrea

JaakL

unread,
Sep 26, 2013, 12:25:38 AM9/26/13
to spatiali...@googlegroups.com, jukka.r...@latuviitta.fi

About text-based query: if you use FTS3/4 [1] indexes and MATCH instead of LIKE then you will get good performance. E.g. Android stock SQLite has FTS included, just make sure your Spatialite build has it also. 

However, I'm skeptical if you can have best performance and match quality with plain SQL. You need to handle multiple addressing approaches (range-based like America, point-based mostly in Europe, area-based in Japan etc), different placename hierarchy in almost any country, language-specific abbreviations (St, Rd etc), add guessing heuristics (order of elements, separators), handle keyboard-specific typo matches etc. Reverse-geocoding is simpler, there is no guessing part, but still you have all the the language/country based specifics. In SQL database terms it could be a very special kind of index at most, but more likely just an extension with custom functions like geocode(text,locale) and reverse_geocode(point,locale). For one specific country you can model it in DB tables, but not in global case.


Jaak

ps. I have also an country-wide official open address database in hand for my country Estonia. Not really big, about a million addresses as points, but I am allowed to share it. However, for generic case OSM would be probably better.

Brad Hards

unread,
Sep 26, 2013, 6:00:52 PM9/26/13
to spatiali...@googlegroups.com
On Wed, 25 Sep 2013 09:25:38 PM JaakL wrote:
> However, I'm skeptical if you can have best performance and match quality
> with plain SQL. You need to handle multiple addressing approaches
> (range-based like America, point-based mostly in Europe, area-based in
> Japan etc), different placename hierarchy in almost any country,
> language-specific abbreviations (St, Rd etc), add guessing heuristics
> (order of elements, separators), handle keyboard-specific typo matches etc.
I think this is probably the "interesting part" of the this problem, and its
obviously somewhat context dependent. For example, if you just want to be able
to put in an address and get a location, then the user is probably looking for
something local.

It would be easier if you can constrain the problem. In particular, if you can
get the user to do "pulldown" selection that matches the OSM data, it'll be
pretty easy (SELECT lon, lat WHERE ...). Obviously that may not match what you
need to provide.

If you're looking for something that can index larger text samples, and can
handle the memory usage, then https://github.com/Berico-Technologies/CLAVIN
might be of interest.

I'm also very interested to see the results, and willing to do (a little) work
on this if there is a shared repository.

Brad
Reply all
Reply to author
Forward
0 new messages