I want to extract cities, towns, villages, or other agglomerations of buildings where people live and work in Europe, yet I fail to do so

156 views
Skip to first unread message

John Reese

unread,
Sep 29, 2020, 4:16:37 PM9/29/20
to GeoNames
As the title says, I'm trying to extract cities, towns, villages, or other agglomerations of buildings where people live and work in Europe, yet I fail to do so.

I am using allCountries.txt available from here.

This is the code that I have in a Jupyter notebook that I use for processing the aforementioned file:

import csv
import pandas as pd

country_city_pairs = []

# http://download.geonames.org/export/dump/timeZones.txt
country_codes = ["NL", "AD", "RU", "GR", "RS", "DE", "SK", "BE", "RO", "HU", "MD", "DK", "IE", "GI", "GG", "FI", "IM",
                 "TR", "JE", "PT", "SI", "GB", "LU", "ES", "MT", "AX", "BY", "MC", "NO", "FR", "ME", "CZ", "LV", "IT",
                 "SM", "BA", "MK", "BG", "SE", "EE", "AL", "UA", "LI", "VA", "AT", "LT", "PL", "HR", "CH"]

# http://www.geonames.org/export/codes.html
feature_codes = ["PPLC", "PPLS"]

for chunk in pd.read_csv(
    'allCountries.txt',
    header=None,
    engine="python",
    sep="\t",
    names=[
        "geonameid",
        "name",
        "asciiname",
        "alternatenames",
        "latitude",
        "longitude",
        "feature class",
        "feature code",
        "country code",
        "cc2",
        "admin1 code",
        "admin2 code",
        "admin3 code",
        "admin4 code",
        "population",
        "elevation",
        "dem",
        "timezone",
        "modification date",
    ],
    quoting=csv.QUOTE_NONE,
    chunksize=1
):
    country_code = str(chunk["country code"].item())
    feature_code = str(chunk["feature code"].item())
    if ((country_code in country_codes) and (feature_code in feature_codes)):
        city = str(chunk["asciiname"].item())
        pair = (country_code, city)
        country_city_pairs.append(pair)

The above code (after additional processing) produces the following output, which clearly doesn't contain all the cities, towns and villages in Europe:

("AD, Andorra la Vella", "AD, Andorra la Vella"),
("AL, Tirana", "AL, Tirana"),
("AT, Vienna", "AT, Vienna"),
("AX, Mariehamn", "AX, Mariehamn"),
("BA, Sarajevo", "BA, Sarajevo"),
("BA, Jabukovacko jezero", "BA, Jabukovacko jezero"),
("BE, Brussels", "BE, Brussels"),
("BG, Sofia", "BG, Sofia"),
("BY, Minsk", "BY, Minsk"),
("CH, Bern", "CH, Bern"),
("CZ, Zasada", "CZ, Zasada"),
("CZ, Prague", "CZ, Prague"),
("DE, Kissingerhoefen", "DE, Kissingerhoefen"),
("DE, Berlin", "DE, Berlin"),
("DE, Kleinthann", "DE, Kleinthann"),
("DK, Copenhagen", "DK, Copenhagen"),
("DK, Vittrup", "DK, Vittrup"),
("DK, Langkaerparken", "DK, Langkaerparken"),
("DK, Skarrev", "DK, Skarrev"),
("EE, Tallinn", "EE, Tallinn"),
("ES, Madrid", "ES, Madrid"),
("ES, Partido Judicial de Bilbao", "ES, Partido Judicial de Bilbao"),
("ES, Fiame", "ES, Fiame"),
("ES, Urbanizacion Playa de Verdicio", "ES, Urbanizacion Playa de Verdicio"),
("ES, Ciudad Deportiva Benidorm", "ES, Ciudad Deportiva Benidorm"),
("ES, Talavera la Nueva", "ES, Talavera la Nueva"),
("ES, Buenas Noches", "ES, Buenas Noches"),
("FI, Helsinki", "FI, Helsinki"),
("FI, Koivupaeae", "FI, Koivupaeae"),
("FI, Puolimatka", "FI, Puolimatka"),
("FI, Vehkoja", "FI, Vehkoja"),
("FI, Hakala", "FI, Hakala"),
("FI, Krokby", "FI, Krokby"),
("FR, Paris", "FR, Paris"),
("FR, Bramans", "FR, Bramans"),
("FR, Hameau de Chambre-Fontaine", "FR, Hameau de Chambre-Fontaine"),
("GB, London", "GB, London"),
("GB, North Yorkshire", "GB, North Yorkshire"),
("GB, Crofton", "GB, Crofton"),
("GB, Little Germany", "GB, Little Germany"),
("GB, Castle Vale", "GB, Castle Vale"),
("GG, Saint Peter Port", "GG, Saint Peter Port"),
("GI, Gibraltar", "GI, Gibraltar"),
("GR, Loumas", "GR, Loumas"),
("GR, Elounda", "GR, Elounda"),
("GR, Doloi", "GR, Doloi"),
("GR, Athens", "GR, Athens"),
("GR, Zagorochoria", "GR, Zagorochoria"),
("GR, Askyfou", "GR, Askyfou"),
("GR, Viniani", "GR, Viniani"),
("HR, Zagreb", "HR, Zagreb"),
("HU, Budapest", "HU, Budapest"),
("IE, Dublin", "IE, Dublin"),
("IE, Ladestown", "IE, Ladestown"),
("IM, Douglas", "IM, Douglas"),
("IT, Rome", "IT, Rome"),
("IT, Fontanelle", "IT, Fontanelle"),
("IT, Castello", "IT, Castello"),
("IT, Piazza Del Popolo", "IT, Piazza Del Popolo"),
("JE, Saint Helier", "JE, Saint Helier"),
("LI, Vaduz", "LI, Vaduz"),
("LT, Vilnius", "LT, Vilnius"),
("LU, Luxembourg", "LU, Luxembourg"),
("LV, Riga", "LV, Riga"),
("MC, Monaco", "MC, Monaco"),
("MD, Chisinau", "MD, Chisinau"),
("ME, Podgorica", "ME, Podgorica"),
("MK, Skopje", "MK, Skopje"),
("MT, Valletta", "MT, Valletta"),
("NL, Amsterdam", "NL, Amsterdam"),
("NL, Kelpen-Oler", "NL, Kelpen-Oler"),
("NL, Hunnerberg", "NL, Hunnerberg"),
("NL, Oost West en Middelbeers", "NL, Oost West en Middelbeers"),
("NO, Oslo", "NO, Oslo"),
("NO, Sagvoll", "NO, Sagvoll"),
("PL, Warsaw", "PL, Warsaw"),
("PL, Plac Kosciuszki", "PL, Plac Kosciuszki"),
("PL, Twierdza Modlin", "PL, Twierdza Modlin"),
("PL, Gniewniewice Folwarczne", "PL, Gniewniewice Folwarczne"),
("PT, Lisbon", "PT, Lisbon"),
("PT, Pero Viegas", "PT, Pero Viegas"),
("PT, Bairro da Asseca", "PT, Bairro da Asseca"),
("RO, Bucharest", "RO, Bucharest"),
("RS, Belgrade", "RS, Belgrade"),
("RU, Moscow", "RU, Moscow"),
("SE, Stockholm", "SE, Stockholm"),
("SE, Stroemnaeset", "SE, Stroemnaeset"),
("SI, Ljubljana", "SI, Ljubljana"),
("SI, Piculini", "SI, Piculini"),
("SK, Bratislava", "SK, Bratislava"),
("SK, Nova Dubrava", "SK, Nova Dubrava"),
("SK, Sidlovec", "SK, Sidlovec"),
("SK, Okruhliak", "SK, Okruhliak"),
("SK, Kisvisk", "SK, Kisvisk"),
("SM, San Marino", "SM, San Marino"),
("TR, Ankara", "TR, Ankara"),
("TR, Goelet", "TR, Goelet"),
("TR, Bahcecik", "TR, Bahcecik"),
("UA, Kyiv", "UA, Kyiv"),
("VA, Vatican City", "VA, Vatican City"),

I have the feature codes in my feature_codes list, but let me remind you that I want only cities, towns, villages, or other agglomerations of buildings where people live and work in Europe. If I put PPL in the feature_codes list as well, I get churches, hotels and other places as well, which I don't want.

Can anyone tell me what I'm doing wrong? My best guess is that I should add some feature codes, but I don't know which ones.

Peter Rose

unread,
Sep 29, 2020, 5:41:11 PM9/29/20
to geon...@googlegroups.com, johnre...@gmail.com
I've been using the cities files, e.g., 'https://download.geonames.org/export/dump/cities15000.ziphttps://download.geonames.org/export/dump/cities5000.zip, ..

to extract city info. I'm not sure if it contains all the places you are looking for.

Here is a Jupyter Notebook I'm using:

-Peter

--
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geonames+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/geonames/d68e9809-192c-4221-be1d-1983c4f311a6n%40googlegroups.com.


--
Peter Rose, Ph.D.
Director, Structural Bioinformatics Laboratory
Lead, Bioinformatics and Biomedical Applications, Data Science Hub
Faculty Affiliate, Halıcıoğlu Data Science Institute
San Diego Supercomputer Center
UC San Diego

Marc Wick

unread,
Sep 30, 2020, 4:57:40 PM9/30/20
to geon...@googlegroups.com, John Reese
Hi John

With PPLC, PPLS you will get a very reduced result, basically the
capital of each country. I would at least include PPLA (the seats of the
provinces (ADM1), maybe also PPLA2.
You wan't get 'spots' with PPL, there must have been someting wrong when
you tried this.

Best Regards

Marc


John Reese wrote:
> As the title says, I'm trying to extract cities, towns, villages, or
> other agglomerations of buildings where people live and work in Europe,
> yet I fail to do so.
>
> I am using allCountries.txt available from here
> <http://download.geonames.org/export/dump/>.
> *Can anyone tell me what I'm doing wrong?* My best guess is that I
> should add some feature codes, but I don't know which ones.
>
> --
> You received this message because you are subscribed to the Google
> Groups "GeoNames" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to geonames+u...@googlegroups.com
> <mailto:geonames+u...@googlegroups.com>.
> <https://groups.google.com/d/msgid/geonames/d68e9809-192c-4221-be1d-1983c4f311a6n%40googlegroups.com?utm_medium=email&utm_source=footer>.
Reply all
Reply to author
Forward
0 new messages