1000 mcycles to update a single entity

10 views
Skip to first unread message

Josh Heitzman

unread,
Oct 14, 2008, 10:24:16 PM10/14/08
to Google App Engine
I've been digging into where may app is spending its mcycles. By
using memcache I've gotten all of my requests that only read data down
well under the 1000 mcycle overall average request processing limit;
however, I'm seeing that it takes about 1000 mcycles to update a
single entity (not indexed, not very large either).

Is this in line with what other folks are seeing?

Josh Heitzman

unread,
Oct 14, 2008, 10:50:19 PM10/14/08
to Google App Engine
Actually, I'm it take about 1500 mcycle to update one entity and then
an about an additional 1000 mcycle per additional entity (each a
different kind in this case) that is updated via the same db.put call.

David Symonds

unread,
Oct 14, 2008, 11:31:24 PM10/14/08
to google-a...@googlegroups.com
On Wed, Oct 15, 2008 at 1:50 PM, Josh Heitzman <JoshHe...@hotmail.com> wrote:

> Actually, I'm it take about 1500 mcycle to update one entity and then
> an about an additional 1000 mcycle per additional entity (each a
> different kind in this case) that is updated via the same db.put call.

Is this in production? What size is the entity? Is it in a large
entity group? How much contention do you think is involved?


Dave.

Josh Heitzman

unread,
Oct 15, 2008, 12:01:37 AM10/15/08
to Google App Engine
Regarding the first question, those mcycle numbers are from logs on
GAE, not from local profiling. But if you mean are lots of people
using, no. I was the only user with any data when I did the test.

Regarding the second question, the entities are not what I would
consider large. For example, one has 10 integer properties, 1 string
property, 2 datetime properties, one string list property (15 strings
with none more 30 characters long), and one int list property (only 1
value at the moment).

The entity group had 4 entities in it when I generated those numbers.

There is no contention involved, as the data is user specific and I
was the only user with data when I did the test.

On Oct 14, 8:31 pm, "David Symonds" <dsymo...@gmail.com> wrote:

djidjadji

unread,
Oct 15, 2008, 3:04:53 AM10/15/08
to google-a...@googlegroups.com
For this entity at least 44 (10+1+2+15+1+15) index updates have to be done
in 16 different index tables (10+1+2+1+1+1). Every attribute has its
implicit index
and you get an implicit index for the 'product' of the property lists.
Not to mention the index tables mentioned in the index.yaml file that
this entity uses.
It can grow big when you have the ListProperties used in the
index.yaml file, 15 extra updates
for every mention of the string list property.

I'm sure not every property of an entity is used in a query to retrieve objects.
To reduce the number of index updates it could be useful to have a
non-index version of every property type. Just like we have for the
StringProperty. The TextProperty does not have an index to be updated.

A possible syntax to tell AppEngine NOT to create and update an index for a
property would be to add an attribute to the Property constructor.
The default value of the attribute is True.

def MyModel(db.Model):
id = db.IntegerProperty(required=True)
num1 = db.IntegerProperty(need_index=False)

This would also help not to often hit the entity-index-update-limit
('exploding' index).

Are the index updates counted in the mcycles used?

2008/10/15 Josh Heitzman <JoshHe...@hotmail.com>:

Tim

unread,
Oct 15, 2008, 3:16:27 AM10/15/08
to Google App Engine
That would be a nice addition. Currently I've created a pickled
property (suggested by a previous post to this group) for properties
that don't need to be indexed, but I fear it won't play well with
djangoforms / model forms.

On Oct 15, 12:04 am, djidjadji <djidja...@gmail.com> wrote:
> For this entity at least 44 (10+1+2+15+1+15) index updates have to be done
> in 16 different index tables (10+1+2+1+1+1). Every attribute has its
> implicit index
> and you get an implicit index for the 'product' of the property lists.
> Not to mention the index tables mentioned in the index.yaml file that
> this entity uses.
> It can grow big when you have the ListProperties used in the
> index.yaml file, 15 extra updates
> for every mention of the string list property.
>
> I'm sure not every property of an entity is used in a query to retrieve objects.
> To reduce the number of index updates it could be useful to have a
> non-index version of every property type. Just like we have for the
> StringProperty. The TextProperty does not have an index to be updated.
>
> A possible syntax to tell AppEngine NOT to create and update an index for a
> property would be to add an attribute to the Property constructor.
> The default value of the attribute is True.
>
> def MyModel(db.Model):
>   id = db.IntegerProperty(required=True)
>   num1 = db.IntegerProperty(need_index=False)
>
> This would also help not to often hit the entity-index-update-limit
> ('exploding' index).
>
> Are the index updates counted in the mcycles used?
>
> 2008/10/15 Josh Heitzman <JoshHeitz...@hotmail.com>:

Josh Heitzman

unread,
Oct 15, 2008, 3:25:15 AM10/15/08
to Google App Engine
There are no indexes in index.yaml for these entity kinds and not very
many of the properties are being changed at one time (no idea if that
matters or not).

If updating the implicit indexes is the majority of the cost of doing
these updates, then I definitely agree that either--
1) an attribute for disabling the implicit indexing of properties
should be added, or
2) native serialization needs to be provided as part of the runtime so
we can quickly (de)serialize our data (from)into a blob.

On Oct 15, 12:04 am, djidjadji <djidja...@gmail.com> wrote:
> For this entity at least 44 (10+1+2+15+1+15) index updates have to be done
> in 16 different index tables (10+1+2+1+1+1). Every attribute has its
> implicit index
> and you get an implicit index for the 'product' of the property lists.
> Not to mention the index tables mentioned in the index.yaml file that
> this entity uses.
> It can grow big when you have the ListProperties used in the
> index.yaml file, 15 extra updates
> for every mention of the string list property.
>
> I'm sure not every property of an entity is used in a query to retrieve objects.
> To reduce the number of index updates it could be useful to have a
> non-index version of every property type. Just like we have for the
> StringProperty. The TextProperty does not have an index to be updated.
>
> A possible syntax to tell AppEngine NOT to create and update an index for a
> property would be to add an attribute to the Property constructor.
> The default value of the attribute is True.
>
> def MyModel(db.Model):
> id = db.IntegerProperty(required=True)
> num1 = db.IntegerProperty(need_index=False)
>
> This would also help not to often hit the entity-index-update-limit
> ('exploding' index).
>
> Are the index updates counted in the mcycles used?
>
> 2008/10/15 Josh Heitzman <JoshHeitz...@hotmail.com>:

Josh Heitzman

unread,
Oct 16, 2008, 12:19:35 AM10/16/08
to Google App Engine

Andy Freeman

unread,
Oct 16, 2008, 3:20:26 AM10/16/08
to Google App Engine
Does this do what you'd like?

import copy
import pickle

# should work on any pickle able datatype
# tested with both dicts and lists, works with unicode keys, data,
elements

# empty() is arguably wrong, but what's correct?
# choices probably doesn't make any sense here, but ....
# readable=True is for folks who want to look at pickle encoded data.
class PickleProperty(db.Property):
def __init__(self, readable=False, *args, **kwds):
if readable:
self._readable = 0
self.data_type = db.Text
else:
self._readable = -1
self.data_type = db.Blob
super(PickleProperty, self).__init__(*args, **kwds)

# as with db.ListProperty, don't want any static sharing
def default_value(self):
return copy.deepcopy(self.default)

# if value is true or the same type as the default value, it is
# assumed to be reasonable even if required is set.
def empty(self, value):
return not (value or (isinstance(value, type(self.default))))

def get_value_for_datastore(self, model_instance):
v = super(PickleProperty,
self).get_value_for_datastore(model_instance)
r = pickle.dumps(v, self._readable)
return self.data_type(r)

def make_value_from_datastore(self, value):
v = super(PickleProperty,
self).make_value_from_datastore(value)
r = pickle.loads(str(v))
return r

class XX(db.Model):
data = PickleProperty(default={})

xx = XX()
xx.data['a'] = 7
xx.put()
yy = XX.get(xx.key())
assert xx.data == yy.data


On Oct 15, 12:25 am, Josh Heitzman <JoshHeitz...@hotmail.com> wrote:
> > >> Dave.- Hide quoted text -
>
> - Show quoted text -

Josh Heitzman

unread,
Oct 16, 2008, 3:22:36 PM10/16/08
to Google App Engine
I've experimented with pickling and found that in about 810 mcycles
are consumed getting the entity (just one BlobProperty), unpickling,
pickling, and then putting the entity. Of those mcycles most (~730)
go into the put operation.

This only leaves 190 mcycles to actually do the work of parsing the
request, processing data, and constructing a result. Even if you can
manage to stay under 1000 mcycles you can only ever put one entity per
request and stay under 1000 mcycles and I found the putting a second
entity basically doubles the get, pickling, and put times.

The sad thing is the actual runtime of putting multiple entities isn't
very long. I've seen 15000 mcycle be consumed by a request that took
less then two seconds to actually get rendered in my browser.

So this 1000 mcycle soft cap seems out of whack with the reality of
how many mcycles are consumed by put data into the store.

While the soft cap can be gotten around by breaking up the processing
into multiple requests behind the scenes, this will result in higher
mcycle consumption to process the user command as well as longer
response time which just isn't good for my bottom line (since we'll be
paying for overall mcycle consumption) or my user satisfaction as
anyone using a high latency connection see it many seconds longer to
render a page. For example someone on using satellite will probably
see about a 4 second delay for every chunk the processing is broken
into.

Alex Epshteyn

unread,
Oct 16, 2008, 8:13:14 PM10/16/08
to Google App Engine
I believe the excessive consumption of megacycles is due to the
datastore API busy-waiting on all puts. Also see Issue 764.

Denis

unread,
Oct 17, 2008, 2:13:22 PM10/17/08
to Google App Engine
-- I've seen 15000 mcycle be consumed by a request that took less then
two seconds to actually get rendered in my browser.--

I'm importing data into my new app. A single requests puts 500
something entities (one entity group, two db.put() to workaround the
500 entities per single put() limitation). The whole request is
usually takes less than 1 second. But the CPU consumption is
enourmous: 400,000 mcycles (400 Gigacycles) on average request.

As you can see, there is no correlation between CPU usage and timing.

BTW, these 400 Gigacycle requests produce warnings that they "used a
high amount of CPU were roughly 1.2 times over the average request CPU
limit". Just 1.2 times. In another application I get "1.6 times"
warnings for requests that consume about 100 times less CPU. Go figure!
Reply all
Reply to author
Forward
0 new messages