Multilingual maps

187 views
Skip to first unread message

Emux

unread,
Jul 24, 2014, 2:34:55 AM7/24/14
to mapsfo...@googlegroups.com
I had the chance lately to speak with a lot of people using vector maps and record their needs.

Another popular requested feature is having vector apps with multilingual labels, where user selects the locale to see.
The map  file currently holds one "preferred" language for its labels, during its building.

We can use at labels simultaneously over 1 locale, but that makes the map overcrowded, the labels become large and many not shown from the collision detection.

Or apparently in an app we can have several map files just for each needed language.
But that's cumbersome, even if we provide to the user the possibility for in-app map downloads per locale.

So what do you think about that feature?
I know it involves possible changes to map file structure, so it maybe not a trivial thing.

From end users feedback, after the 'overlap' feature, the multilingual vector maps is the second most popular request.

--
Emux
Cruiser - Atlas

Ludwig

unread,
Jul 24, 2014, 10:09:55 AM7/24/14
to mapsfo...@googlegroups.com
I agree that that would be a most useful addition, but my feeling so far has been that if we decide to break the map file format we go for larger improvements than just this.

However, just looking at the file format https://code.google.com/p/mapsforge/wiki/SpecificationBinaryMapFile, I think there might be a way of accomplishing this without really breaking it.

The first thing of course would be to encode the names in multiple languages. Technically, this could be done by using some special character to indicate breaks between languages. Ideally that would be some whitespace character, so an app that is not language aware would simply use all the names consecutively without crashing (or a simple language-aware app would only use the first name). 
A language-aware app would break the names along this special character and use just the one on the index it is interested in (So we would need to find a character that is not generally used in OSM names, with all the errors in the data that is difficult, I know).

Secondly, we would need to encode which languages are being used in a file. Without that, of course individual apps with their own maps could know which languages are encoded, but it would severely obstruct the free use of maps between apps. But I think there is space in the header to encode the languages, eg. in some form like en:fr:de, indicating that English is the first language etc. 

For most names, unless we are talking about different scripts, eg Greek or Chinese, being used there is only one language, so we should also think of an automatic fallback rather than encoding the same name over and over for the different languages.

Ludwig










--
You received this message because you are subscribed to the Google Groups "mapsforge-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapsforge-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/mapsforge-dev/53D0A91F.2020909%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Emux

unread,
Jul 24, 2014, 12:10:00 PM7/24/14
to mapsfo...@googlegroups.com
I like that method Ludwig, specially if we don't need to modify the map file format.
I agree for the 1st step. But in order to avoid the 2nd step:
Instead of indexing alone the names and so be necessary to have somewhere also the languages order,
how about if we could prefix the localized names at the label with their locale?
en:hello|fr:bonjour|de:hallo

linco matic

unread,
Aug 31, 2015, 11:18:50 PM8/31/15
to mapsforge-dev
I have implemented multilingual maps for a project that I am working on.
Following a format similar to that described below, if the map is generated with no preferred language specified,
feature names which have localized alternatives are built as follows:

localname\ten:englishname\tfr:frenchname\t:de:germanname .. etc.

There is no particular ordering to the languages within the feature name string, except that the local name is always first.
If a preferred language is specified, then the behavior is as before.

On the client side, I added a static member to Mapfile:

public static String preferredLang;


The user merely sets the Mapfile.preferredLang before reading map data, and the map reader fetches
the proper language out of the feature name string (or the local name if the specified language is not available). If you find these changes amenable,
I will make a pull request.

~Lincomatic

Emux

unread,
Sep 1, 2015, 3:15:14 AM9/1/15
to mapsfo...@googlegroups.com
Ludwig what's your thoughts?

I prefer an implementation that doesn't involve altering map file (header).
As mentioned above having the languages in the name could solve that, even if map file size is to be taken into consideration.

BTW we could use escape / unescape techniques for using delimiters that can exist also in OSM names.

--
Emux

Emux

unread,
Sep 1, 2015, 3:32:12 AM9/1/15
to mapsfo...@googlegroups.com
On 01/09/2015 06:18 πμ, linco matic wrote:
If you find these changes amenable, I will make a pull request.

If you have ready code, you can make a pull request on latest dev to see the implementation and discuss further.

--
Emux

Ludwig

unread,
Sep 1, 2015, 3:39:53 AM9/1/15
to mapsfo...@googlegroups.com

This is similar to something I had suggested IIRC a long while ago.
I vaguely remembered issues with the length of the combined name string, but double checking with the spec there should not be any.
Also in favour of not altering map file format, but I think we will still need to advance map file format number, so that map reader software will know if the format is multilingual.
Also the preference will need to be passed in map reader to actually render in preferred language.
But that should be a separate pr and commit.

--
You received this message because you are subscribed to the Google Groups "mapsforge-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapsforge-de...@googlegroups.com.

Emux

unread,
Sep 1, 2015, 3:44:03 AM9/1/15
to mapsfo...@googlegroups.com
On 01/09/2015 10:39 πμ, Ludwig wrote:

Also in favour of not altering map file format, but I think we will still need to advance map file format number, so that map reader software will know if the format is multilingual.


+1

--
Emux

linco matic

unread,
Sep 1, 2015, 2:26:56 PM9/1/15
to mapsforge-dev
While this is possible, what advantage would be gained by this added complexity?

Emux

unread,
Sep 1, 2015, 2:30:56 PM9/1/15
to mapsfo...@googlegroups.com
On 01/09/2015 09:26 μμ, linco matic wrote:
While this is possible, what advantage would be gained by this added complexity?

Making sure that delimiter (which is just a string) is not something that could be found in OSM names.
Are we sure for such one?

I have used those techniques in the past with 100% success, but that's something to be seen in the end.

--
Emux

lincomatic

unread,
Sep 1, 2015, 2:37:38 PM9/1/15
to mapsfo...@googlegroups.com

Any UTF-8 character that is blank or unprintable should not appear in an OSM name. I chose tab, but line feed and carriage returns are other examples of blank space that shouldn’t appear in a name. Unprintable characters such as SOH are even more unlikely. Even null works, but that will cause problems with C++ string functions.

Emux

unread,
Sep 1, 2015, 2:47:25 PM9/1/15
to mapsfo...@googlegroups.com
That's a valid (and quite old thought) for using unprintable ASCII as delimiter.

Depends on the client parsing, i.e. the name="a\tb" in an OSM xml file could be (and have been seen) interpreted as just a "a\tb" string and not as "a b" (or something else) in many cases.

What I'm trying to say is that all come down to agreement between writer and reader (luckily we manage both here).

--
Emux

linco matic

unread,
Sep 1, 2015, 2:52:46 PM9/1/15
to mapsforge-dev
I think I have been premature in suggesting that I submit a pull request at this point.
Now that I am processing more regions, I am running into some potential issues. Below is a sample feature name which I extracted from the data.
Issues:
- if all of the languages are included some names can be very large. The example below is 3664 bytes total. Sometimes, the name appears repetitively in the data. For instance, Harvard University is over 2K bytes and appears several times
- there are 3-letter codes included. From the wiki I expected only 2-letter or 2-letter with added qualifier (e.g. zh-py)
- there are inconsistencies in the language coding. For instance some data includes zh-py and some includes zh-pinyin, which are equivalent.
- I have a strong feeling that certain language codes appear very infrequently, such as be-x-old and zh-classical. It seems we need a filter on the command line to specify which languages to include in a multilingual map, rather than including all of them.


Long name-3664-
à sland
af:Ysland
ak:Aesland
am:አይስላንድ
an:Islandia
ar:آيسلندا
az:İslandiya
ba:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
be:Ð†Ñ Ð»Ð°Ð½Ð´Ñ‹Ñ
bg:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
bi:Iceland
bm:Isilandi
bn:আইসলণৠড
bo:ཨ་ཨི་སི་ལནདà¼
br:Island
bs:Island
ca:Islà ndia
ce:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸
co:Islanda
cs:Island
cu:Ð˜Ñ Ð»Ð°Ð½Ð´ÑŠ
cv:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸
cy:Gwlad yr Iâ
da:Island
de:Island
dv:Þ‡Þ¦Þ‡Þ¨Þ Þ°Þ Þ¦Þ‚Þ°Þ‘Þ¦Þ‚Þ°
dz:ཨཱའིས་ལེནཌ
ee:Aiseland nutome
el:Ισλανδία
en:Iceland
eo:Islando
es:Islandia
et:Island
eu:Islandia
fa:ایسلند
ff:Islannda
fi:Islanti
fo:Ã sland
fr:Islande
fy:Yslân
ga:An à oslainn
gd:Innis Tìle
gl:Islandia
gn:Iylanda
gu:આઇસલેનૠડ
gv:Yn Eeslynn
ha:Aisalan
he:× ×™×¡×œ× ×“
hi:आइसलैंड
hr:Island
ht:Islann
hu:Izland
hy:Ô»Õ½Õ¬Õ¡Õ¶Õ¤Õ«Õ¡
ia:Islanda
id:Islandia
ie:Island
io:Islando
is:Ã sland
it:Islanda
ja:アイスランド
jv:Islandia
ka:ისლრნდიáƒ
kg:Islande
ki:Aislandi
kk:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
kl:Islandi
km:អ៊ីស្លង់
kn:ಠಸೠ‌ಲೠಯಾಂಡà³
ko:ì•„ì ´ìŠ¬ëž€ë“œ
ks:اَی٠سلینٛڑ
ku:ÃŽsland
kv:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
kw:Island
ky:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
la:Islandia
lb:Island
lg:Ayisirandi
li:Iesland
ln:IsilandÉ›
lo:ໄອສ໠ລນ
lt:Islandija
lv:Īslande
mg:Islandy
mi:Tiorangi
mk:Ð˜Ñ Ð»Ð°Ð½Ð´
ml:ഠസൠ‌ലാനൠറàµ
mn:Ð˜Ñ Ð»Ð°Ð½Ð´
mr:आइसलठड
ms:Iceland
mt:Islanda
my:အိုက်စလန်
na:Aiterand
nb:Island
ne:आइसॠलॠयाणॠड
nl:IJsland
nn:Island
no:Island
nv:Tin Bikéyah
oc:Islà ndia
or:ଆଇସଲୠୟାଣୠଡ
os:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸
pa:ਆਈਸਲੈਂਡ
pl:Islandia
ps:آیسلینډ
pt:Islândia
qu:Islandya
rm:Islanda
rn:Ayisilandi
ro:Islanda
ru:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
rw:Isilande
sa:आइसलैंड
sc:Islanda
se:Islánda
sg:Islânde
sh:Island
si:අයිස්ලන්තය
sk:Island
sl:Islandija
sm:Aiselani
sn:Iceland
so:Iislaand
sq:Islandë
sr:Ð˜Ñ Ð»Ð°Ð½Ð´
ss:Echweni
st:Iceland
su:Islandia
sv:Island
sw:Aislandi
ta:஠ஸ௠லாந௠தà¯
te:ఠసౠలాండà±
tg:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
th:ประเทศไอซ์๠ลนด์
ti:አይስላንድ
tk:Islandiýa
tl:Islandia
to:ʻAisilani
tr:İzlanda
tt:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
ug:ئىسلاندىيە
uk:Ð†Ñ Ð»Ð°Ð½Ð´Ñ–Ñ
ur:آئس لینڈ
uz:Islandiya
vi:Iceland
vo:Lisladeän
wa:Izlande
wo:Islaand
yi:× ×™×¡×œ× × ×“
yo:Oríláº¹Ì Ã¨de Aá¹£ilandi
zh:冰岛
zu:i-Iceland
ace:Islandia
als:Island
ang:Īsland
arz:ايسلاندا
ast:Islandia
bar:Island
bcl:Islanda
bpy:আইসলৠযানৠড
bxr:Ð˜Ñ Ð»Ð°Ð½Ð´
ceb:Islandya
chr:á §á á á “áŽ¸áŽ¯
ckb:ئایسلەند
crh:İslandiya
csb:Islandëjô
diq:İslanda
dsb:Islandska
eml:Islanda
ext:Islándia
frp:Islande
frr:Islönj
fur:Islande
gag:İslandiya
hak:Pên-tó
haw:‘Āina Hau
hif:Iceland
hsb:Islandska
ilo:Islandia
jbo:island
kaa:İslandiya
kab:Island
kbd:Ð˜Ñ Ð»Ñ Ð½Ð´
koi:Ð˜Ñ Ð»Ð°Ð½Ð´
krc:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
ksh:Ißland
lad:Islandia
lez:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
lij:Islanda
lmo:Islanda
ltg:Īslandeja
mdf:Ð˜Ñ Ð»Ð°Ð½Ð´Ð°
mhr:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ð¹
mwl:Eislándia
nah:IztlÄ lpan
nds:Iesland
nov:Islande
nrm:Islaunde
pam:Islandya
pcd:Islinde
pih:Iseland
pms:Islanda
pnb:آئس لینڈ
pnt:Ισλανδία
rue:Ð†Ñ Ð»Ð°Ð½Ð´
sah:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
scn:Islandia
sco:Iceland
stq:Ieslound
szl:Islandyjo
tet:Izlándia
tpi:Aislan
udm:Ð˜Ñ Ð»Ð°Ð½Ð´Ð¸Ñ
vec:Islanda
vep:Islandii
vls:Ysland
war:Islandya
wuu:冰岛
xal:Ð˜Ñ Ð»Ð³ÑƒÐ´Ð¸Ð½ Орн
xmf:ისლრნდიáƒ
zea:Iesland
nds-nl:Ieslaand
simple:Iceland
zh-yue:冰島
bat-smg:IslandÄ—jÄ—
fiu-vro:Island
roa-rup:Islanda
be-x-old:Ð†Ñ ÑŒÐ»Ñ Ð½Ð´Ñ‹Ñ
roa-tara:Islanne
zh_pinyin:Bīngdǎo
zh-min-nan:Peng-tē
zh-classical:冰島

Emux

unread,
Sep 1, 2015, 2:57:24 PM9/1/15
to mapsfo...@googlegroups.com
On 01/09/2015 09:52 μμ, linco matic wrote:
It seems we need a filter on the command line to specify which languages to include in a multilingual map, rather than including all of them.

+1

--
Emux

linco matic

unread,
Sep 1, 2015, 2:58:43 PM9/1/15
to mapsforge-dev
Ah OK, that makes sense. As long as the reader function is UTF8-aware, unprintable characters should be handled correctly.

linco matic

unread,
Sep 1, 2015, 4:21:37 PM9/1/15
to mapsforge-dev
I just did some more research. According to Wikipedia:

https://en.wikipedia.org/wiki/UTF-8

"UTF-8 uses the codes 0–127 only for the ASCII characters. This means that UTF-8 is an ASCII extension and can be processed by software that supports 7-bit characters and assigns no meaning to non-ASCII bytes."

Therefore, reader function doesn't even have to be UTF8-aware, as long as the delimiter is any ASCII character that cannot appear in a feature name.

I am not sure how to share my current implementation just for discussion? Should I submit a PR for the purpose of discussion, even though we might decide to modify it?

linco matic

unread,
Sep 2, 2015, 1:34:32 AM9/2/15
to mapsforge-dev
Implemented this today by extending the preferredLanguage parameter to allow multiple languages, e.g.

preferredLanguage=en:de:jp

The parameter is copied into the map header during the map generation. 

Emux

unread,
Sep 2, 2015, 2:15:05 AM9/2/15
to mapsfo...@googlegroups.com
On 01/09/2015 11:21 μμ, linco matic wrote:
I am not sure how to share my current implementation just for discussion? Should I submit a PR for the purpose of discussion, even though we might decide to modify it?

(Nothing is considered final and probably will be modified before integration)

So I suggest to submit a PR.
We examine it, discuss, improve / change the PR and probably repost it.

--
Emux

Emux

unread,
Sep 2, 2015, 2:18:01 AM9/2/15
to mapsfo...@googlegroups.com
On 02/09/2015 08:34 πμ, linco matic wrote:
preferredLanguage=en:de:jp

Better using "," as separator: en,de,jp


The parameter is copied into the map header during the map generation.

Do we need to modify map header after all,
besides debug map file information, do we have other uses?

--
Emux

Ludwig

unread,
Sep 2, 2015, 10:16:18 AM9/2/15
to mapsfo...@googlegroups.com
Just to summarize:
  1. use some non-printable character to separate languages (it just needs to be agreed on and be filtered out when processing names in the writer, just in case it happens to occur in OSM names)
  2. list the languages supported in the header of the map file (that way it is clear which languages are supported otherwise it would be necessary to scan a map file to find this out).
  3. have a command line argument for the writer that allows selecting languages. I think the biggest problem here is the quality of OSM data, with a very uneven and incoherent naming in different languages when it comes to non-Latin using areas. 
There is also one language not mentioned here which is the default language for the tag "name" without qualification. In most cases this language should be included. 

Ludwig


--
You received this message because you are subscribed to the Google Groups "mapsforge-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapsforge-de...@googlegroups.com.

Emux

unread,
Sep 2, 2015, 12:44:42 PM9/2/15
to mapsfo...@googlegroups.com
On 02/09/2015 05:16 μμ, Ludwig wrote:
use some non-printable character to separate languages (it just needs to be agreed on and be filtered out when processing names in the writer, just in case it happens to occur in OSM names)

About filtering, can you elaborate how can be done without escape / unescape ?


have a command line argument for the writer that allows selecting languages.

You mean like the one mentioned above "preferred-languages=en,de,fr" ?


There is also one language not mentioned here which is the default language for the tag "name" without qualification. In most cases this language should be included.

Now when we define e.g. "preferred-language=en" we search for "name:en" and fallback to "name", which is quite useful in case of empty tags.

--
Emux

Ludwig

unread,
Sep 2, 2015, 1:26:09 PM9/2/15
to mapsfo...@googlegroups.com
use some non-printable character to separate languages (it just needs to be agreed on and be filtered out when processing names in the writer, just in case it happens to occur in OSM names)

About filtering, can you elaborate how can be done without escape / unescape ?

In rare, but not impossible case OSM name has our separator in it, simply replace it with a space in the name (I do not think that OSM names allow non-printable chars, but at least in the more distant past I have seen lots of irregular stuff)

have a command line argument for the writer that allows selecting languages.

You mean like the one mentioned above "preferred-languages=en,de,fr" ?

Yes. 

 
There is also one language not mentioned here which is the default language for the tag "name" without qualification. In most cases this language should be included.

Now when we define e.g. "preferred-language=en" we search for "name:en" and fallback to "name", which is quite useful in case of empty tags.

I think the "name" only tag will be the most frequent case, so my suggestion would be to have in the file 

name, name:en, name:fr etc 

and not 
name:en, name:fr   with the default name copied into every single slot there, which I think would generate an enormous amount of duplication of names.

Ludwig

Emux

unread,
Sep 2, 2015, 1:46:24 PM9/2/15
to mapsfo...@googlegroups.com
On 02/09/2015 08:26 μμ, Ludwig wrote:
I think the "name" only tag will be the most frequent case, so my suggestion would be to have in the file 

name, name:en, name:fr etc 

and not 
name:en, name:fr   with the default name copied into every single slot there, which I think would generate an enormous amount of duplication of names.

Besides the default "name" which I agree to have too, there are 2 cases about locales:
- name:en / name:fr with fallback to name (our current implementation)
- clean name:en / name:fr  i.e. what exactly finds in osm tags, it puts in map

The 1st current fallback case - I wouldn't vote removing it completely, we should have it at least as option somehow.

--
Emux

linco matic

unread,
Sep 3, 2015, 6:41:26 PM9/3/15
to mapsforge-dev


On Wednesday, September 2, 2015 at 10:26:09 AM UTC-7, Ludwig wrote:
use some non-printable character to separate languages (it just needs to be agreed on and be filtered out when processing names in the writer, just in case it happens to occur in OSM names)

About filtering, can you elaborate how can be done without escape / unescape ?

In rare, but not impossible case OSM name has our separator in it, simply replace it with a space in the name (I do not think that OSM names allow non-printable chars, but at least in the more distant past I have seen lots of irregular stuff)

OSM names shouldn't contain non-printable characters, but currently, i'm using \r as the delimiter between names because it's impossible to put that character into a name in the XML file.
You make a good point, though, that there's a small possibility that a legitimate name could begin with for instance "en:" so using ":" as the delimiter between the language and the name not a good idea.
I've changed my code to use \l instead.  so a sample multilingual string would look like:

localname\ren\lenglishname\rfr\lfrenchname
 

have a command line argument for the writer that allows selecting languages.

You mean like the one mentioned above "preferred-languages=en,de,fr" ?

Yes. 

 
There is also one language not mentioned here which is the default language for the tag "name" without qualification. In most cases this language should be included.

Now when we define e.g. "preferred-language=en" we search for "name:en" and fallback to "name", which is quite useful in case of empty tags.

I think the "name" only tag will be the most frequent case, so my suggestion would be to have in the file 

name, name:en, name:fr etc 

and not 
name:en, name:fr   with the default name copied into every single slot there, which I think would generate an enormous amount of duplication of names.


Yes, this is how I'm doing it. When only 1 language is specified, the behavior is exactly the same as the current behavior. For instance, if the user specifies "preferred-language=en"
and an en name is found, it is included without any language tag, the same as current behavior.

If the user specifies more than 1 language, then "name" is always stored. For instance,  "preferred-language=en,de" then
1. if only "k=name v=defaultname" appears in the data, then only "defaultname" is stored.
2. if only "k=name v=defaultname" and "k=name:en" v=enname" appear, then  "defaultname\ren:enname" is stored

if "preferred-language=*" then all of the names are stored.
if "preferred-language=-" then the behavior is the same as when preferred-language is absent from the command line 

I will post a couple of PR's in a little while so we can continue the discussion with an actual implementation to use as reference.

Ludwig

unread,
Sep 4, 2015, 12:47:19 AM9/4/15
to mapsfo...@googlegroups.com
Maybe there could be a bit of space saving if, in the case that name:en==name only name is stored, etc. I am pretty sure there is a lot of duplication there.

I am in favour of wasting processing on the server to save space in the map file.

Ludwig

--
You received this message because you are subscribed to the Google Groups "mapsforge-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mapsforge-de...@googlegroups.com.

Emux

unread,
Sep 4, 2015, 2:12:38 AM9/4/15
to mapsfo...@googlegroups.com
On 04/09/2015 07:47 πμ, Ludwig wrote:
I am in favour of wasting processing on the server to save space in the map file.

+1

--
Emux

Emux

unread,
Sep 30, 2015, 2:28:24 PM9/30/15
to mapsfo...@googlegroups.com
I published new Beta versions of Cruiser and Atlas with the latest Mapsforge dev, for broader branch testing.

- Both support multilingual vector maps via 'map language' menu
(already produced at our server)

- Option for preferred and local map language at the same time

- Mapsforge refactored gesture system + new gestures


Bonus:

- Atlas includes GraphHopper new graph creation with preferred language, see graph.properties
(glad making the contribution towards multilingual graphs)

--
Emux
Reply all
Reply to author
Forward
0 new messages