comma in name in alternatenames

20 views
Skip to first unread message

Andrew Dalke

unread,
Nov 7, 2009, 4:50:54 PM11/7/09
to GeoNames
The geoname schema which describes cities1000.dat is at

http://download.geonames.org/export/dump/readme.txt

It says

alternatenames : alternatenames, comma separated varchar(4000)
(varchar(5000) for SQL Server)

The entry for geonameid 6292397 is for the city

Rüti / Dorfzentrum, Südl. Teil

That appears to be the same place as

http://en.wikipedia.org/wiki/Rüti,_Zürich

The entry contains a ",", which means my parser, which splits on ",",
gets messed up. I'm going to change it to be "," not followed by
space.

I don't know if this is a problem with the name (that it contains a
","), with the documentation (that it doesn't say how names which
contains commas are handled), or if it's that I shouldn't be using
that field and should instead use the alternateNames.txt file to get
these.

-- Andrew Dalke <da...@dalkescientific.com>

Marc Wick

unread,
Nov 8, 2009, 1:31:10 AM11/8/09
to geon...@googlegroups.com
The alteratenames field is not meant to be parsed. If you want to know
the individual alternate names then you should use the alteratename file.

Marc

Andrew Dalke

unread,
Nov 8, 2009, 10:03:23 AM11/8/09
to GeoNames
On Nov 8, 7:31 am, Marc Wick <m...@geonames.org> wrote:
> The alteratenames field is not meant to be parsed.

Then out of curiosity, why is that field present? And
may I ask that the documentation somewhere mention that?

Are there other fields which should not be parsed?

> If you want to know the individual alternate names
> then you should use the alteratename file.

I don't understand why those two sources are different.
Here are the alternatenames for Gothenburg, Sweden

G'oteborg,GOT,Gautaborg,Geteborga,Gjoteborg,Goeteborg,Goteborg,Goteburg,Gotemburgo,Gotenburg,Gothembourg,Gothenburg,Gothoburgum,Gottenborg,Göteborg,Gøteborg,Gēteborga,Γκέτεμποργκ,Гьотеборг,Гётеборг,גוטנבורג,
イェーテボリ,哥德堡

There are 23 unique names in that list.

Gothenburg is geonameid 2711537 and it has 33 entries in
alternateName.txt .

1235974 2711537 Goeteborg
1235975 2711537 Goteburg
1235976 2711537 Gothenburg
1235977 2711537 Gottenborg
1600989 2711537 da Göteborg
1600991 2711537 eo Göteborg
1600995 2711537 hu Göteborg
1600999 2711537 la Gothoburgum
1601003 2711537 nl Gotenburg
1601007 2711537 pt Gotemburgo
1601009 2711537 sv Göteborg
1600984 2711537 de Göteborg
1600986 2711537 es Gotemburgo
1600988 2711537 ca Göteborg
1600992 2711537 fi Göteborg
1600996 2711537 ia Göteborg
1601002 2711537 nds Göteborg
1601006 2711537 pl Göteborg
1634095 2711537 it Göteborg
2256568 2711537 no Gøteborg
1970271 2711537 is Gautaborg
2181201 2711537 iata GOT
1600987 2711537 bg Гьотеборг
1600990 2711537 el Γκέτεμποργκ
1600994 2711537 he גוטנבורג
1600998 2711537 ja イェーテボリ
1601000 2711537 lv Gēteborga
1601008 2711537 ru Гётеборг
1621129 2711537 zh 哥德堡
1600993 2711537 fr Gothembourg
1601005 2711537 no Göteborg 1 1
1600985 2711537 en Gothenburg 1 1
1600997 2711537 id Göteborg


of which 19 are unique. I see that "G'oteborg", which is
the first name of the alternatenames field of cities1000.txt,
is not in the alternateNames.txt file.

Knowing no better, I decided to use the alternatenames
from cities1000.txt because that one record (which is
where I live) had more alternate names, and for the
project I'm working on I wanted to maximize the
likelihood of getting a match.

Cheers,

-- Andrew Dalke <da...@dalkescientific.com>

Marc Wick

unread,
Nov 8, 2009, 11:03:05 AM11/8/09
to geon...@googlegroups.com
The field is present because users have asked for it, to search on I
suppose. It is redundantly build from the alternate names info. The name
"G'oteborg" is an ascii transliteration of the bulgarian name.

You shouldn't parse anything that is not designed to be parsed. (or
parse it at your own risk).

Best

Best

Marc

Andrew Dalke

unread,
Nov 8, 2009, 2:17:18 PM11/8/09
to GeoNames
On Nov 8, 5:03 pm, Marc Wick <m...@geonames.org> wrote:
> You shouldn't parse anything that is not designed to be parsed. (or
> parse it at your own risk).

I understand that. I wrote the above to point out that there's nothing
which describes which fields are designed to be parsed and which are
not.

Specifically, http://download.geonames.org/export/dump/readme.txt
says:

Remark : the field 'alternatenames' in the table 'geoname' is a
short version of the 'alternatenames' table. You probably don't
need both. If you don't need to know the language of a name
variant,
the field 'alternatenames' will be sufficient. If you need to know
the language of a name variant, then you will need to load the
table
'alternatenames' and you can drop the column in the geoname table.

I did not need to know the language of the name variant, so I thought
I could use this field.

I also did not realize that some of the names are machine generated
transliterations from other languages. I do not see that documented
anywhere, so I assumes there was some other data source involved.
Reply all
Reply to author
Forward
0 new messages