Hi,
2013/8/18 Stephan75 <
der.steph...@googlemail.com>:
> Quite nice feature!!
>
> But has anyone already tried to produce an output from such a nationwide obf
> file when processing it with inspector.bat -vaddress test.obf >output.csv ?
Yes, I did. For several files after many trials with different settings.
We use a different route to create these maps compared to the normal
maps route (see below).
I created the Germany address map directly and via the "address map"
route (also see below) and compared these as well.
They were not the same.
>
> When doing this with the one obf for Germany, I get a very big CSV file, but
> I have the feeling that this result file is NOT complete.
>
That feeling is correct, but it's not that simple.
Yes, there is data missing compared to the "normal" address maps:
around 2-3%. On the other hand: There are also some streets in the
address maps that are not in the normal maps.
And there are also more equal streets in the address maps compared to
the normal maps (as mentioned already by Christopher).
It all has to do with the (frustrating) inconsistency in the OSM maps.
This is part due to the history and changes in formats,
backward-compatibility and so on.
I did many long, long runs trying all kind of combinations.
> There are only few tools than can display such big CSV files. Try Notepad++,
> or do a googling for "edit big CSV files" , there are some more tools.
>
Well. That's windows of course ;)
> I only get a file with 252.579 lines ... and only for "Cities" in Germany
> ... I assume many other places are missing!
What do you mean with "I only get a file with 252.579 lines ... and
only for "Cities" in Germany ".
In the CSV you have City [City], but also City [TOWN], down to City
[HAMLET]. Or do you mean something else?
>
> Anyone to reproduce?
yes. :)
As mentioned before: It is extremely hard to produce an address map
directly from OsmAndMapCreator from the Germany.osm.pbf or
France.osm.pbf map. It is very CPU and memory intensive. That's why
another route has been chosen to get the data.
I did the "normal" route by creating a Germany address map from the
full Germany-latest.osm.pbf and I had to boot my 8GB i7 laptop to a
command prompt (without X, without (G)UI) to be able to generate the
map and it took 3½ days.
The OsmAnd server is about 4 times as fast but it will still take a
lot of time, only for Germany.
The route to create these address maps is as follows:
==
osmconvert basemap.osm.pbf --out-o5m -o=basemap.o5m
osmfilter basemap.o5m --keep="boundary=administrative addr:* place=*
is_in=* highway=residential =unclassified =pedestrian =living_street
=service =road =unclassified =tertiary"
--keep-ways-relations="boundary=administrative" --keep-ways=
--keep-nodes= --keep-relations= --out-o5m > basemap_address.o5m
osmconvert basemap_address.o5m --out-pbf -o=basemap_address.osm.pbf
==
(Note that shell based variables are removed to simplify it a little)
So we use osmfilter to get an address-only osm.pbf which is fed into
OsmAndMapcreator to create the final address obf. During this
conversion things get lost, things get duplicated and a few get added.
To repeat the osmfilter command:
osmfilter basemap.o5m --keep="boundary=administrative addr:* place=*
is_in=* highway=residential =unclassified =pedestrian =living_street
=service =road =unclassified =tertiary"
--keep-ways-relations="boundary=administrative" --keep-ways=
--keep-nodes= --keep-relations= --out-o5m > basemap_address.o5m
The "is_in" parameter is the biggest issue causing double/triple
streets/addresses and it is a remnant from "the old days", but without
it you only get 50% or so (can't remember exactly) of the data.
If anyone has ideas to improve it, please share your knowledge.
Harry