Understanding "The App Engine Way"

116 views
Skip to first unread message

adamd

unread,
Feb 22, 2012, 9:08:01 AM2/22/12
to Google App Engine
Hi all

I'm testing the waters with App Engine (Java) and wondered if anyone
could help with my understanding of the "app engine way of doing
things" with regards to queries and efficiency.

Essentially my question is this: given that AE charges for each query
made, and not for CPU usage, isn't it better to serialize and pack
your data in to fewer, larger entities than to follow the standard
data modeling approach of giving each business object it's own table
row?

Lets say I have a website with 100 pages. For each page served, I need
to render a navigation tree of links requiring page names, their URLs,
and their position in the tree. So say I request a level 3 page in the
tree like "Page 1 > Page 1.2 > Page 1.2.3", I need all names and urls
for all pages at levels 1,2 and 3.

A conventional approach would be to have a Page table, with properties
for "name" and "path" (containing the position of the page in the
tree). When a page is requested, I then need to look it up, find out
it's path, and then request all other pages that will be shown in the
navigation according to that path.

Translating this to app engine, I would need to run a query like
"path" IN [level_1_path, level_2_path, level_3_path], which I have
learned in reality boils down to a separate query for each tree level.

(side note: I know that using AE's built in ancestor hierchary makes
queries faster, but Entities can't be moved after being assigned a
parent, and this is a feature i need. So I believe I need to manage by
own tree the "path" property, which would look something like "1.2.3")

So the conventional wisdom of building a single big query to do
everything you want being the most efficient thing you can do doesn't
seem to apply.

Alternative approach: have a single "site" Entity with a property in
which I would store a big HashMap containing all the essential page
information. This can be serialized and compressed and stored as a
String. Each request then only needs to load this one Entity for the
purpose of building the nav.

In the conventional setup, this wouldn't be as efficient because it
would probably be faster to run the big query than to load and
unserialize the hash map data for every request. But since I'm being
billed for queries and not CPU usage on AE, is this not the better
approach?

Or can anyone suggest a better one?

Many thanks for your time

Adam

Jeff Schnitzer

unread,
Feb 22, 2012, 11:41:36 AM2/22/12
to google-a...@googlegroups.com
In general, the 'appengine way' is *definitely* to create fewer, fatter entities.  Denormalization is the way of things.  Because GAE is schemaless, it's relatively easy to store an entire entity graph in a single entity.

Also, the 'appengine way' is to precalculate aggregations rather than computing them at runtime.

Your pre-calculated Nav tree is probably the solution I would use, caching the entity in memcache for performance.

Jeff


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.


Marcelo Nunes

unread,
Feb 22, 2012, 12:28:59 PM2/22/12
to google-a...@googlegroups.com
I think you are missing the point. The goal of this App Engine way is to *force* you to use cache, You only have to query once when the data is modified, then you store the results on a server cache and you can serve as many page views you want with no further access to the datasore, Super fast and with no extra charge.

But doesn't it mean I'll have lots of data in cache?

Yes, but that is the point: no matter how fast HRD can be, a replicated database will never be as fast as a standalone RDBMS for most trivial queries.  So, it is not because Google wants profit more on your queries, but the fact is that you can never have an efficient webapp or webpage based on a cloud-based service if you don't do a lot of caching.

adamd

unread,
Feb 22, 2012, 12:58:23 PM2/22/12
to Google App Engine
Thanks for you're replies, really helpful.

Marcelo - just to be clear, are you talking about Memcache? I haven't
played with it much yet, but I was under the impression there wasn't
enough space there for everything, in order for there to be "no
further access to the datastore" as you say.

Adam

On Feb 22, 5:28 pm, Marcelo Nunes <marcelopereiranu...@gmail.com>
wrote:

Rick Mangi

unread,
Feb 22, 2012, 3:19:08 PM2/22/12
to Google App Engine
In regards to fewer, fatter, denormalized objects, this isn't unique
to appengine, it's a NoSQL standard best practice. Doing single
fetches by key aren't all that more expensive than looking something
up in memcache (in terms of time taken).

Marcelo Nunes

unread,
Feb 22, 2012, 3:36:37 PM2/22/12
to google-a...@googlegroups.com
Yep, It's memcache. 

Indeed there is space for everything on memcache but it doesn't mean you'll store everything there. Each case is different, but my general rule is:

1) For every new query, I store its result on  memcache,

2) When the same query is repeated I get the result from the cache instead of the datastore. 

3) When data is updated I don't need to update the new information on cache, I just clear all occurrences of the old data on cache, thus the next time that record is queried, it'll not find it on cache anymore and it'll be forced to retrieve it from the datastore.

You can make some optimization. For instance if you have a query in cache with a parameter like "field1 = '10'" with less than 100 records, and the user queries for "field1 = '10' AND field2 ='blue', than it is cheaper to make a loop on your cache and get only the blue ones, than querying the datastore again.But this is just an example of optimization, in some cases it is not worthwhile, in other cases you'll need to do much more to get an acceptable performance.

Eventually your cache will be flooded with data that is not frequently used. You can make a specific routine to deal with it but for most of the cases an adjust on the memcache timeout may be enough. Short timeout will produce a high number of datastore calls, long timeouts will make your cache accumulate lots of useless data. You can monitor your app and find out which is the best timeout for you.

Thomas Wiradikusuma

unread,
Feb 23, 2012, 5:32:33 AM2/23/12
to google-a...@googlegroups.com
Is there any documentation mentioning the max size of memcache for an app?

Drew Spencer

unread,
Feb 23, 2012, 6:14:32 AM2/23/12
to google-a...@googlegroups.com
So does this mean I'm going about building my app structure the wrong way?

I'm building something that has a structure something like this:

Company --> Building --> EnergySupply --> EnergyReading

In each case the relationship is a hasMany, so there could be a Company that has say 100 Buildings with dozens of EnergySupply objects on each and then dozens of Readings on each... I'm currently storing each list of child entities as a List<Key<T>> in the entity using Objectify. Should I be using @Embed or something like that? I currently have an MVP triplet for each page to allow users to view and edit each object, and every time I load a page I am making a call to the datastore. It's just a simple get() using Ofy because I am passing the key in as part of the history token, so isn't that going to be quite fast? I don't see the need to load the data from every EnergyReading, EnergySupply and Building a Company has if the user just wants to go to the company details page, add a phone number and then do and do something else. Any advice on this kind of structure? Some kind of lazy loading perhaps? Tell me to RTFM if there's something I'm totally not getting.

Thanks,

Drew

Jeff Schnitzer

unread,
Feb 23, 2012, 7:05:58 AM2/23/12
to google-a...@googlegroups.com
Not necessarily.  Just because the datastore prefers big objects doesn't mean your problem domain isn't better off with a bunch of smaller ones.  In particular, you should never embed a time series (presumably like EnergyReading) since its growth is unbounded.

On the other hand, you might find that Company/Building makes a good case for embedding, or Building/EnergySupply.  It's not all one or the other.  However, it sounds like you've landed on an answer that you are happy with, so as long as your fetch & query times are reasonable... stick with it.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/388hWgtS6fUJ.

Jeff Schnitzer

unread,
Feb 23, 2012, 7:13:58 AM2/23/12
to google-a...@googlegroups.com
On Thu, Feb 23, 2012 at 5:32 AM, Thomas Wiradikusuma <wiradi...@gmail.com> wrote:
Is there any documentation mentioning the max size of memcache for an app?


Nope.  Google scales the amount of memcache for your app along with traffic.  They don't publish the details of how this works, nor do they even guarantee that memcache is available at any moment in time.

Why do you ask?  Usually this question is a strong indicator that someone is trying to use memcache inappropriately.

Jeff

Thomas Wiradikusuma

unread,
Feb 23, 2012, 9:43:30 PM2/23/12
to google-a...@googlegroups.com
I haven't actually used it other than the one provided by Objectify :D

I'm thinking of putting more (mostly rendered pages, recently accessed items, lookups), but afraid memcache will become full and evicting more important cache objects.

What kind of misuse that you're talking? If I know I would avoid it.

Robert Kluin

unread,
Feb 24, 2012, 12:39:09 AM2/24/12
to google-a...@googlegroups.com
More used items will remain in cache longer. So stuff that is
infrequently used should naturally fall out.

If you've got entities that are frequently written to and seldom read,
memcaching them will obviously have a lower value. Particularly if
they are updated very often. Note that it may still make sense to
memcache these in some cases, since it could still save a trip to the
datastore.


Robert

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/lQgcuPRdWgIJ.

Reply all
Reply to author
Forward
0 new messages