Best practices for AppEngine Datastore RESTful APIs with long descendant path

Paul Mazzuca

unread,

Sep 30, 2016, 1:35:53 PM9/30/16

to Google App Engine

Let's say I want to represent a city in the US in the datastore: It might look like this

Country --> State --> County --> City

In my REST API I want to be able to read all citys in a state, and then update a city. How should the API look? Clearly, to update a city, I would need the full ancestor path, which means somehow that information needs to make it to the client so the client knows enough about the city in order to make the update. I see two solutions here:

Scenario 1: Make sure the client has the datastore key.
      - UPDATE API:    cities/{cityKey}   PUT
      - Clean API, but there will be repetition of ancestor key data for every descendant. If I read all the cities for a state, all the county key info will be repeated in each city key.

Scenario 2: Make sure the client has the ancestor path as represented by each id
     - UPDATE API:   countries/{countryId}/states/{stateId}/counties/{countyId}/cities/{cityId}
     - My main issue with this scenario is that the API is ugly and unconventional, but it is more efficient because it avoids the ancestor key data repetition

Which method is better? Or, is there another method to consider?

If there were a KeyFactory equivalent on the client that would allow for the generation of a key from an ancestor path, that seems like one compromise. Thoughts?

Anastasios Hatzis

unread,

Sep 30, 2016, 6:27:20 PM9/30/16

to Google App Engine

Paul,

in our app we use a combination of both scenarios in most cases. First of all, I typically distinct between a service and an entity. The service object is used to store configurations across many entities and often also for access control. Each entity is linked to a service, where the service is either its parent or just a key property, depending on the frequent use-cases and requirements for the entity, i.e. consistent reads, 25 entity group limitation inside transactions, ancestor queries, not more than 1 write op per second per entity-group.

I don't know your frequent use-cases and requirements. Maybe it makes sense to put all states, counties and cities into one entity group (per country). It would make much sense, if you need consistent queries, want the UPDATE API to accept huge batches of cities of the same county/state/country, and if write ops are rare. I also don't know if restructures of counties or cities may appear in the U.S. (it happens rarely in my country, but it can happen), and how your app would deal with them, since it would effect the key of such city/county/state. I see advantages though, especially if your app needs to read all objects in the path each time.

So, in the service - entity approach, the UPDATE API uses both, the ID of the service object, and the keys of the entities to update:

- UPDATE API: service/{serviceId}/entities/{entityIdOrName}

With this approach, the request handler can get the service object and the entity object with the same read-op from datastore.

This also works fine with smaller batch update requests, where the entities are all in the same service. With all entities in the same entity-group, the batches could be much bigger than 25 entities per request:

- UPDATE API: service/{serviceId}/entities_approval
Request body:

{"entities": [entityUrlSafeKey1, entityUrlSafeKey2, entityUrlSafeKey3]}

Back to your two scenarios: If you think that putting everything into big entity-groups and there will be only one city update per put request and not more than 1 update per country per second, your original entity-group design should be fine. In that case, I think that scenario 1 is preferable.

My thoughts: Your request handler only needs the city key and can easily decompose it, so it will be able to get the city, but also the county, state and country (if needed) with a single read op from Datastore. The same approach will also help your app when querying cities or in Search API, because it can always decompose the key and read the parent objects in less read-ops. I don't think the repetition of certain parts of the key path would be an issue, even if you would need to add the same keys as indexed key-properties, e.g. query for all cities in a state. However, if you have indexed key-properties for the higher levels, you could reduce some of the overhead (size of datastore entities and traffic). Example for shorter key paths:

Country --> State

Country --> County (with a state key-property)

Country --> City (with a state key-property and a county key-property)

Disadvantage: if your request handler needs to get anything else than the country and the city, it needs to perform two read ops. And I would prefer a slightly bigger data size and traffic, than more read-ops and latency.

On the other hand, I don't see what your scenario 2 would add to the table. If your important use-case is to update a specific city, why should the client care about anything else than the city key? From the perspective of a (3rd party) client developer, it would make the API more complex and client implementation more error prone. I also don't think that the difference in key size is worth the trouble. I'm not even sure that there is any difference between Key("Country", 1, "State", 37, "County", 12, "City", 361) and something that will represent all four values (unless it's getting very cryptic).

My 2 Cents :)

Paul Mazzuca

unread,

Oct 4, 2016, 10:46:40 AM10/4/16

to google-a...@googlegroups.com

Thanks for the thorough insight. I realize now that I have been thinking about the problem more from a logical organization point of view than a read/write point of view. Just because a city is a part of a state certainly should not mean that the city should be a child of the state in the data model. I think it does afford some benefits though when it comes to querying if you can limit the query scope to the nearest ancestor, rather than treating everything as a root. Unfortunately, I can only guess at this point what my write throughput requirements might become for my model which is different than the geographic example.

Going back to the original question, the point I was making was that assuming A->B->C was the best model, and the Id of C was on the client and the client wanted to update C, the only way to do that would be to find A and B first. To avoid finding A and B, I was suggesting that A and B Ids should always accompany C on the client.

HATZIS Edelstahlbearbeitung GmbH
Hojen 2
87490 Haldenwang (Allgäu)
Germany

Handelsregister Kempten (Allgäu): HRB 4204
Geschäftsführer: Paulos Hatzis, Charalampos Hatzis
Umsatzsteuer-Identifikationsnummer: DE 128791802
GLN: 42 504331 0000 6

http://www.hatzis.de/
--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/QNtRDTDm07E/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/7a4c7f53-4037-411e-bf6e-44d6f34def73%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward