reduce number of populated places

119 views
Skip to first unread message

Alois Treindl

unread,
Feb 12, 2022, 5:15:20 PM2/12/22
to GeoNames

When trying to import lists of populated places from Geonames into my web application,
I am confronted with a "quantity problem".

For example, for Russia I get about 188'00 populated places. For Ukraine, I get 32'000 places. For Brazil, 52'000.
Very many have identical names, which makes it difficult for a user to select his home town or birth place from a list.

If we had population figures for all or at least the majority of places, this would help. One could just
select the places above a certain number of inhabitants, or offer them in the user interface ordered by population.

But there are not enough population data.

For Russia, only 5'700 out of 188'000 towns have a population number.

For Brazil, there are only 103 towns with population numbers among the 52'000.

Does anyone know a procedure to select the more important places from the huge dataset?

For example, if the surface area of towns were known from some other source, this could be a criterion.

The admin1 code can be resolved into a province name.
But that is often not sufficient for the user to pick a place.

For example, in Russia there are 37 towns named Afanosovo.
Only one of them has a population number (300).

4 are in admin1 region Ivanovo

6 are in admin1 region Moskovskaya

6 in Tverskaya

4 in Vologda

5 in Jaroslavl.

Any suggestions welcome.



Barry Hunter

unread,
Feb 12, 2022, 5:31:57 PM2/12/22
to geonames



Does anyone know a procedure to select the more important places from the huge dataset?

But then what if the user was actually looking for one of the 'smaller' places? 

Surely the user needs to be able to see all the results, to be able to select the right one!?

If you filter the results, you might be filtering the one someone needs/wants. 

 


For example, in Russia there are 37 towns named Afanosovo.


Only lists two. 

Ah, maybe you meant 

Yes, that is quite a lot of very similar results. 

 
Maybe could offer a map to let the user select the right one? Kinda like




al...@astro.ch

unread,
Feb 13, 2022, 1:57:29 AM2/13/22
to geon...@googlegroups.com
Where do the categories Hamlet and Village in the user interface of nominatim come from? Did I overlook this in the geonames data?

Am 12.02.2022 um 23:31 schrieb Barry Hunter <barryb...@gmail.com>:


--
You received this message because you are subscribed to the Google Groups "GeoNames" group.
To unsubscribe from this group and stop receiving emails from it, send an email to geonames+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/geonames/CAJCAUuL9Qva7%3Di5rZ_vFxCg2d%2BYmsvdFk1hbiSMfpZY%2BoSWbMQ%40mail.gmail.com.

ahmed amro

unread,
Feb 13, 2022, 11:59:44 AM2/13/22
to GeoNames
Geonames has so many duplications, you have to explain what is your data use so I can help you sort this out. 
You can ask your developer to sort out the locations based on the population this will make more popular places appear first. 
The other thing you can do depends on your search query requirments. 
The heirachy of geonames goes like this : PCLI>ADM1>ADM2>ADM3>ADM4>ADM5>PPLC > PPLA2 >PPLA3> PPLA4> PPL > PPLX > PPLL
The you can also use feature class to exclude places like parks, hospitals or admins using the Geonames codes https://www.geonames.org/export/codes.html

Do you need like locations only or what would the user do? Could you please explain more about the what is expected from search query? 

Barry Hunter

unread,
Feb 13, 2022, 12:14:33 PM2/13/22
to geonames
On Sun, Feb 13, 2022 at 6:57 AM <al...@astro.ch> wrote:
Where do the categories Hamlet and Village in the user interface of nominatim come from? Did I overlook this in the geonames data?

Ah, nominatim is from OSM data, not Geonames!

Was just used as an example, that presents the data on a map. The places in the Geonames database, may or may not be represented in OSM. 


THe point about there being duplicates in the Geonames database, is a good one. Most likely when multiple different databases where imported into Geonames, and the data was not merged completely, possibly because each original gazetteer had different coordinates (possibly just low resolution positions, rather than technically wrong) 

Alas these arent 'easily' resolved, possible because if they were easily resolvable as duplicates - then it would have already happened in geonames itself. 

... as Geonames is a 'wiki' style database, if you can help resolve duplicates, then they should be done at GeoNames database directly (so that the duplicates are cleaned up for everyone), rather than just trying to hide them ay query time. The duplicates should really be deleted/merged explicity, not sure the best way to help with that en-mass. 



 
But if they arent "duplicates" - ie they are really separate places - just with the same name, then each version should surely be repsented to the user in a search. 

Reply all
Reply to author
Forward
0 new messages