unexpected aliases in alternateNames.txt

16 views
Skip to first unread message

Andrew Dalke

unread,
Nov 7, 2009, 5:56:26 PM11/7/09
to GeoNames
There are some alternate names I did not expect both in cities1000.txt
and in alternateNames.txt, easily found by grepping for " - " (space
hyphen space):


1894665 3165524 gl Turín - Torino
1972921 2802361 gl Bélxica - België
1986656 2525764 gl Agrixento - Agrigento
1977253 223816 gl Xibutí - Djibouti

It looks like there was a bad import of a set of Galician names.

There are a few problems with other languages, like

2014633 2787387 id Saint-Josse-ten-Noode - Sint-Joost-ten-Node
2013778 2783474 id Woluwe-Saint-Pierre - Sint-Pieters-Woluwe

and some anonymous ones, like

2181169 1816670 Beijing - Pekin

-- Andrew Dalke <da...@dalkescientific.com>

Marc Wick

unread,
Nov 8, 2009, 1:32:51 AM11/8/09
to geon...@googlegroups.com
please feel free to correct it.

Marc

Andrew Dalke

unread,
Nov 8, 2009, 10:20:33 AM11/8/09
to GeoNames
On Nov 8, 7:32 am, Marc Wick <m...@geonames.org> wrote:
> please feel free to correct it.

There are 305 such names which are obviously wrong for the Galician
language and can be fixed with something like

perl -pe 's/(^\d+\t\d+\tgl\t)([^-]+ - )(.*$)/$1$3/'

plus another three which I spotted by hand when scanning for a "-" in
the alternate name.

Shall I send in a diff against alternateNames.txt? While I now know
about the web based interface to make corrections manually, I don't
really want to do 300+ edits by hand. If each takes 30 seconds that's
over 2.5 hours of work.

-- Andrew Dalke <da...@dalkescientific.com>

Marc Wick

unread,
Nov 8, 2009, 11:13:08 AM11/8/09
to geon...@googlegroups.com
It will need someone who speaks Galego to determine which one of the two
names is Galego and which one an other language.

Marc

Andrew Dalke

unread,
Nov 8, 2009, 2:40:06 PM11/8/09
to GeoNames
On Nov 8, 5:13 pm, Marc Wick <m...@geonames.org> wrote:
> It will need someone who speaks Galego to determine which one of the two
> names is Galego and which one an other language.

That is of course your prerogative. I will point out that the relevant
Wikipedia site (which is where I assume the names came from, since
they are also hyphenated) imply that the first name is Galician and
the second name is the 'foreign' name. For a clear example:

http://gl.wikipedia.org/wiki/Agrixento
> Agrixento - Agrigento
> Agrixento, Agrigento en italiano. Cidade capital da provincia de
> Agrixento, Sicilia, Italia.. 55.000 habitantes.

Which is quite easily interpreted, from its similarities to Spanish
and other Romance languages, as "Agrixento, Agrigento in Italian.
Capital city of the province of Argixento, Sicily, Italy. 55,000
inhabitants."

Similarly, http://gl.wikipedia.org/wiki/Bélxica
> Bélxica - België
> O Reino de Bélxica (Koninkrijk België en neerlandés, Royaume de
> Belgique en francés e Königreich Belgien en alemán) é un país
> da Europa Noroccidental

Which would be "The Kingdom of Bélxica (Koninkrijk België in Dutch,
Royaume de Belgique in French and Königreich Belgien in German) is a
country in Northwest Europe."

http://gl.wikipedia.org/wiki/Tur%C3%ADn
> Turín - Torino
> Turín é unha comuna italiana, capital da rexión de Piemonte cunha poboación de 900.608 persoas

Translated: "Turín is a Italian municipality, capital of the Piemote
region with a population of 900,608 people."


I had looked over the 300+ name pairs and everyone one of them looks
like it has Galacian first and the foreign name second.

I assumed this was an import error from whatever the primary data
source was, and I also assumed that that conversion was not done by
someone who knows the language.

Again, feel free to defer this transformation until someone who knows
Galacian spots the error and is willing to report it, or that someone
like me doesn't mind changing 300+ values by hand.. I did not know
that that was a requirement.

Best regards,

-- Andrew Dalke <da...@dalkescientific.com>

Andrew Dalke

unread,
Nov 8, 2009, 3:52:01 PM11/8/09
to GeoNames
On Nov 8, 7:32 am, Marc Wick <m...@geonames.org> wrote:
> please feel free to correct it.

Plus, I can't change the "Beijin - Pekin" entry, which is two
different transliterations of the same name, because

The record you want to edit is locked for updates for userlevel1

Cheers,

-- Andrew Dalke <da...@dalkescientific.com>

Reply all
Reply to author
Forward
0 new messages