Will reducing model size improve performance?

28 views
Skip to first unread message

Jason Smith

unread,
Oct 10, 2009, 7:44:20 AM10/10/09
to Google App Engine
Hi, group. My app's main cost (in dollars and response time) is in the
db.get([list, of, keys, here]) call in some very high-trafficked code.
I want to pare down the size of that model to the bare minimum with
the hope of reducing the time and CPU fee for this very common
activity. Many users who are experiencing growth in the app popularity
probably have this objective as well.

I have two questions that hopefully others are thinking about too.

1. Can I expect the API time of a db.get() with several hundred keys
to reduce roughly linearly as I reduce the size of the entity?
Currently the entity has the following data attached: 9 String, 9
Boolean, 8 Integer, 1 GeoPt, 2 DateTime, 1 Text (avg size ~100 bytes
FWIW), 1 Reference, 1 StringList (avg size 500 bytes). The goal is to
move the vast majority of this data to related classes so that the
core fetch of the main model will be quick.

2. If I do not change the name of the entity (i.e. just delete all the
db.*Property definitions in the model), will I still incur the same
high cost fetching existing entities? The documentation says that all
properties of a model are fetched simultaneously. Will the old
unneeded properties still transfer over RPC on my dime and while users
wait? In other words: if I want to reduce the size of my entities, is
it necessary to migrate the old entities to ones with the new
definition? If so, is it sufficient to re-put() the entity, or must I
save under a wholly new key?

Thanks very much to anyone who knows about this matter!

Jason Smith

unread,
Oct 10, 2009, 7:53:20 AM10/10/09
to Google App Engine
If you're into SO, I have posted this question there, slightly better
edited and formatted. I will summarize any good answers there in this
list.

http://stackoverflow.com/questions/1547750/improve-app-engine-performance-by-reducing-entity-size

Kevin Pierce

unread,
Oct 10, 2009, 12:53:35 PM10/10/09
to google-a...@googlegroups.com
Hi,

1. I recommend using the same key_name based on your logical pkey for all of those related models. Then you can generate keys for which data you need and get the parts you want all at once.  Not sure what your performance benefit expectation can be.

2. If you put the entity again it should overwrite the old unused properties and keep them from being transfered over the wire.
--
Kevin Pierce
Software Architect
VendAsta Technologies Inc.
kpi...@vendasta.com
(306)955.5512 ext 103
www.vendasta.com

Andy Freeman

unread,
Oct 10, 2009, 1:17:05 PM10/10/09
to Google App Engine
> In other words: if I want to reduce the size of my entities, is
> it necessary to migrate the old entities to ones with the new
> definition?

I'm pretty sure that the answer to that is yes.

> If so, is it sufficient to re-put() the entity, or must I
> save under a wholly new key?

I think that it should be sufficient re-put() but decided to test that
hypothesis.

It isn't sufficient in the SDK - the SDK admin console continues to
show values for properties that you've deleted from the model
definition after the re-put(). Yes, I checked to make sure that those
properties didn't have values before the re-put().

I did the get and re-put() in a transaction, namely:

def txn(key):
obj = Model.get(key)
obj.put()
assert db.run_in_transaction(txn, key)

I tried two things to get around this problem. The first was to add
db.delete(obj.key()) right before obj.put(). (You can't do obj.delete
because that trashes the obj.)

The second was to add "obj.old_property = None" right before the
obj.put() (old_property is the name of the property that I deleted
from Model's definition.)

Neither one worked. According to the SDK's datastore viewer, existing
instances of Model continued to have values for old_property after I
updated them with that transaction even with the two changes, together
or separately.

If this is also true of the production datastore, this is a big deal.

Jason Smith

unread,
Oct 10, 2009, 1:27:29 PM10/10/09
to Google App Engine
Thanks for the help guys. I think this is an important matter to have
cleared up.

It's bedtime here (GMT+7) however tomorrow I think I will do some
benchmarks along the lines of the example I wrote up in the SO
question.

At this point I would think the safest thing would be to completely
change the model name, thereby guaranteeing that you will be writing
entities with fresh keys. However I suspect it's not necessary to go
that far. I'm thinking that on the production datastore, changing the
model definition and then re-put()ing the entity will be what's
required to realize a speed benefit when reducing the number of
properties on a model. But the facts will speak for themselves.

Nick Johnson (Google)

unread,
Oct 10, 2009, 4:29:11 PM10/10/09
to google-a...@googlegroups.com
On Sat, Oct 10, 2009 at 6:27 PM, Jason Smith <j...@proven-corporation.com> wrote:

Thanks for the help guys. I think this is an important matter to have
cleared up.

It's bedtime here (GMT+7) however tomorrow I think I will do some
benchmarks along the lines of the example I wrote up in the SO
question.

At this point I would think the safest thing would be to completely
change the model name, thereby guaranteeing that you will be writing
entities with fresh keys. However I suspect it's not necessary to go
that far. I'm thinking that on the production datastore, changing the
model definition and then re-put()ing the entity will be what's
required to realize a speed benefit when reducing the number of
properties on a model. But the facts will speak for themselves.

There's no need to use a new model name: You can simply create new entities to replace the old ones, under the current model name. If you're using key names, you can construct a new entity with the same values as the old ones, and store that.
 
You can also use the low-level API in google.appengine.api.datastore; this provides a dict-like interface from which you can delete unwanted fields.

-Nick Johnson




--
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Andy Freeman

unread,
Oct 12, 2009, 2:23:40 PM10/12/09
to Google App Engine
> There's no need to use a new model name: You can simply create new entities
> to replace the old ones, under the current model name. If you're using key
> names, you can construct a new entity with the same values as the old ones,
> and store that.

Note the precise wording. You can't just put() the instance that you
read from the datastore, the instance that doesn't have the properties
that you've deleted, you have to get(), make a new db.Model instance
with the same key, populate its properties from the instance that you
got, and put the new instance. If you're not using key names, you
can't create that new db.Model instance (as of 1.2.5) because you
can't create an instance with a specified id.

The problem is in db.Model._to_entity() (and maybe
db.Expando._to_entity()). If the instance was created from a protocol
buffer, put() tries to reuse said protocol buffer, and it still
contains values for properties that you've deleted. These values are
not deleted by _to_entity() so they end up being sent back to the
datastore.

I've filed http://code.google.com/p/googleappengine/issues/detail?id=2251
.


On Oct 10, 1:29 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> 368047- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -

Jason Smith

unread,
Oct 17, 2009, 10:57:31 PM10/17/09
to Google App Engine
Thank you very much, Andy. I was never totally certain I understood
exactly what Nick had said.

In short, to remove old properties, you have to instantiate a fresh
entity yourself the normal Python way, copy the data you want, and put
() it back with the idential key_name or ID, parent, etc. (i.e. the
same key).

I starred your bug. I won't go into my disillusionment with the issue
tracker here; however in this case I think the solution might better
be done in a third-party library or middleware. It's arguably better
architecture, but at any rate it would have a better chance of being
implemented.

On Oct 13, 1:23 am, Andy Freeman <ana...@earthlink.net> wrote:
> > There's no need to use a new model name: You can simply create new entities
> > to replace the old ones, under the current model name. If you're using key
> > names, you can construct a new entity with the same values as the old ones,
> > and store that.
>
> Note the precise wording.  You can't just put() the instance that you
> read from the datastore, the instance that doesn't have the properties
> that you've deleted, you have to get(), make a new db.Model instance
> with the same key, populate its properties from the instance that you
> got, and put the new instance.  If you're not using key names, you
> can't create that new db.Model instance (as of 1.2.5) because you
> can't create an instance with a specified id.
>
> The problem is in db.Model._to_entity() (and maybe
> db.Expando._to_entity()).  If the instance was created from a protocol
> buffer, put() tries to reuse said protocol buffer, and it still
> contains values for properties that you've deleted.  These values are
> not deleted by _to_entity() so they end up being sent back to the
> datastore.
>
> I've filedhttp://code.google.com/p/googleappengine/issues/detail?id=2251

Gaurav

unread,
Oct 18, 2009, 9:25:09 AM10/18/09
to Google App Engine
I think there's a relationship (linear most likely) b/w the datastore
performance and no. of properties in an entity (not necessarily their
size).

Baptiste Lepilleur

unread,
Oct 19, 2009, 8:18:41 AM10/19/09
to google-a...@googlegroups.com
You may want do have a look at an old benchmark I did (back in summer
2008) on different methods to load a batch of object from the
datastore:
http://groups.google.com/group/google-appengine/browse_thread/thread/bef7c4dcd2c42b6d

From what I remember, replacing the parsing of protocol buffer to
represent the model's properties as a tuple instead of a dictionary
saved about 30% of CPU on the load operation. Not something that you
want to generalize because of the maintenance issue (you have to
manually handle missing properties), but may be worthwhile to apply in
a few performance sensitive place.

Though, the best is to run your sensitive code under a profiler on
gae. There is the instruction to do so somewhere in the doc as your
model may stress other parts of the loading chain.

Also back then I did not have Nick trick (using protocol buffer
serialization) for efficient use of caching: it was faster to reload
than pickling/unpickling cached model!

2009/10/10, Jason Smith <j...@proven-corporation.com>:
Reply all
Reply to author
Forward
0 new messages