Paul,
in our app we use a combination of both scenarios in most cases. First of all, I typically distinct between a service and an entity. The service object is used to store configurations across many entities and often also for access control. Each entity is linked to a service, where the service is either its parent or just a key property, depending on the frequent use-cases and requirements for the entity, i.e. consistent reads, 25 entity group limitation inside transactions, ancestor queries, not more than 1 write op per second per entity-group.
I don't know your frequent use-cases and requirements. Maybe it makes sense to put all states, counties and cities into one entity group (per country). It would make much sense, if you need consistent queries, want the UPDATE API to accept huge batches of cities of the same county/state/country, and if write ops are rare. I also don't know if restructures of counties or cities may appear in the U.S. (it happens rarely in my country, but it can happen), and how your app would deal with them, since it would effect the key of such city/county/state. I see advantages though, especially if your app needs to read all objects in the path each time.
So, in the service - entity approach, the UPDATE API uses both, the ID of the service object, and the keys of the entities to update:
- UPDATE API: service/{serviceId}/entities/{entityIdOrName}
With this approach, the request handler can get the service object and the entity object with the same read-op from datastore.
This also works fine with smaller batch update requests, where the entities are all in the same service. With all entities in the same entity-group, the batches could be much bigger than 25 entities per request:
- UPDATE API: service/{serviceId}/entities_approval
Request body:
{"entities": [entityUrlSafeKey1, entityUrlSafeKey2, entityUrlSafeKey3]}
Back to your two scenarios: If you think that putting everything into big entity-groups and there will be only one city update per put request and not more than 1 update per country per second, your original entity-group design should be fine. In that case, I think that scenario 1 is preferable.
My thoughts: Your request handler only needs the city key and can easily decompose it, so it will be able to get the city, but also the county, state and country (if needed) with a single read op from Datastore. The same approach will also help your app when querying cities or in Search API, because it can always decompose the key and read the parent objects in less read-ops. I don't think the repetition of certain parts of the key path would be an issue, even if you would need to add the same keys as indexed key-properties, e.g. query for all cities in a state. However, if you have indexed key-properties for the higher levels, you could reduce some of the overhead (size of datastore entities and traffic). Example for shorter key paths:
Country --> State
Country --> County (with a state key-property)
Country --> City (with a state key-property and a county key-property)
Disadvantage: if your request handler needs to get anything else than the country and the city, it needs to perform two read ops. And I would prefer a slightly bigger data size and traffic, than more read-ops and latency.
On the other hand, I don't see what your scenario 2 would add to the table. If your important use-case is to update a specific city, why should the client care about anything else than the city key? From the perspective of a (3rd party) client developer, it would make the API more complex and client implementation more error prone. I also don't think that the difference in key size is worth the trouble. I'm not even sure that there is any difference between Key("Country", 1, "State", 37, "County", 12, "City", 361) and something that will represent all four values (unless it's getting very cryptic).
My 2 Cents :)