Datastore usage ~ 80 times more than expected

6 views
Skip to first unread message

Amir Michail

unread,
Apr 21, 2009, 11:24:24 AM4/21/09
to Google App Engine
Hi,

A rough estimate shows the app engine is using 80 times more storage
than one might expect given the data stored there.

Any reasons why this might be so? Is there a way I can accurately
predict storage given the various data types (e.g., text vs string)?

Amir

Nick Johnson

unread,
Apr 21, 2009, 2:10:07 PM4/21/09
to google-a...@googlegroups.com
Hi Amir,

The amount of storage used depends hugely on what indexes you define.
In particular, there is the problem of 'exploding indexes'. Suppose
you have an entity with the following properties:

foo.alist = [1, 2, 3, 4, 5]
foo.anotherlist = ["foo", "bar", "baz", "quux"]
foo.anumber = 42

And suppose that foo is a child entity of another entity called "bar",
which is itself a child entity of "bleh".

Creating an index on 'anumber' will insert exactly one index entry, as
you would expect.
Creating an index on 'alist' or 'anotherlist' will insert 5 and 4
index entries - one for each item in the list.
Creating an ancestor index on 'anumber' will insert 3 entries - one
for (foo, 42), one for (bar, 42), its parent, and one for (bleh, 42),
the root entity.

If you start combining properties with multiple values, though, things
start to become problematic:
An index on 'alist' and 'anotherlist' will create 5*4=20 index entries.
An ancestor index on 'alist' and 'anotherlist' will create 5*4*3=60
index entries!
An index on 'alist', 'alist' (eg, the same entry twice to allow
selecting only entities that have two different values for the list)
will create 5*5=25 index entries.

As you can see, this becomes particularly problematic with long lists,
and indexes that index more than one of them, or the same one twice. I
would hazard a guess that you either suffer from 'exploding indexes',
or simply have a lot of indexes over your entities.

-Nick Johnson

Amir Michail

unread,
Apr 21, 2009, 6:02:25 PM4/21/09
to Google App Engine
On Apr 21, 2:10 pm, Nick Johnson <nick.john...@google.com> wrote:
> Hi Amir,
>
> The amount of storage used depends hugely on what indexes you define.
> In particular, there is the problem of 'exploding indexes'. Suppose
> you have an entity with the following properties:

My models are of this form:

class A(db.Model):
a1 = db.IntegerProperty()
a2 = db.IntegerProperty()

class B(db.Model):
b1 = db.StringProperty()
b2 = db.IntegerProperty()
b3 = db.IntegerProperty()
b4 = db.TextProperty()

I have a composite index of this form:

- kind: B
properties:
- name: b1
- name: b2

There are no parent relationships.

Amir

Nick Johnson

unread,
Apr 22, 2009, 5:19:30 AM4/22/09
to google-a...@googlegroups.com
Hi Amir,

If these are the only kinds and indexes you have then this is indeed
anomalous. Send me your app's ID (via email, if you wish) and I'll
have someone look into it.

-Nick Johnson

KARTHIKEYAN

unread,
Apr 22, 2009, 5:26:03 AM4/22/09
to google-a...@googlegroups.com

Andy Freeman

unread,
Apr 22, 2009, 12:09:19 PM4/22/09
to Google App Engine
How are you estimating the size? For example, do you think that
strings are stored using one byte per character or two? (I don't
know, but I do know that they're interpreted as unicode.)

I've asked for mechanisms to help estimate size - see
http://code.google.com/p/googleappengine/issues/detail?id=1084

javaDinosaur

unread,
Apr 22, 2009, 12:14:40 PM4/22/09
to Google App Engine
> A rough estimate shows the app engine is using 80 times more storage
> than one might expect given the data stored there.

Is your storage volume analysis based on 100 records or 10,000?

Amir Michail

unread,
Apr 22, 2009, 2:12:31 PM4/22/09
to Google App Engine
Hi,

This turned out to be a temporary error. Usage is now (apparently)
reported correctly.

Amir

Panos

unread,
Apr 22, 2009, 5:47:39 PM4/22/09
to Google App Engine
I have also been puzzled at times on where the space is going. I filed
this request today:

"More granular accounting of how datastore space is used"
http://code.google.com/p/googleappengine/issues/detail?id=1396

Please browse to the issue and add your vote/star if you want to see
this feature implemented.

Panos

Kugutsumen

unread,
Apr 25, 2009, 3:33:37 PM4/25/09
to Google App Engine
I also think there is something wrong.

I have 2.3M Domain records and the source CSV is only 63 megabytes,
no composite index. The dashboard claims I am using 3GB !?!
(3.03 of 101.00 GBytes)

This is my base expando model:

class Domain(db.Expando):
name = db.StringProperty(required=True, verbose_name='FQDN')
revname = db.StringProperty(verbose_name='Reverse FQDN')
since = db.DateTimeProperty(auto_now_add=True)

I am ready to upload 102M more records, I guess I am going to wait
until this issue is resolved.

Jason (Google)

unread,
Apr 28, 2009, 5:06:55 PM4/28/09
to google-a...@googlegroups.com
Can you both provide your application IDs so I can investigate a bit?

Thanks,
- Jason

Kugutsumen

unread,
Apr 30, 2009, 3:40:34 AM4/30/09
to Google App Engine
I've sent you my ID.

Thanks for looking into this.

On Apr 29, 4:06 am, "Jason (Google)" <apija...@google.com> wrote:
> Can you both provide your application IDs so I can investigate a bit?
>
> Thanks,
> - Jason
>

Kugutsumen

unread,
Apr 30, 2009, 3:41:31 AM4/30/09
to Google App Engine
I've created the following issue:

http://code.google.com/p/googleappengine/issues/detail?id=1436

On Apr 29, 4:06 am, "Jason (Google)" <apija...@google.com> wrote:
> Can you both provide your application IDs so I can investigate a bit?
>
> Thanks,
> - Jason
>

Kugutsumen

unread,
May 9, 2009, 7:34:48 PM5/9/09
to Google App Engine
Two weeks ago, I've sent my applications ID to both you and Nick and I
haven't heard from you since then.

Thanks

Jason (Google)

unread,
May 11, 2009, 8:04:18 PM5/11/09
to google-a...@googlegroups.com
Hi Anthony. I'm very sorry for the late reply, and thank you for bearing with me. I've discussed this with the datastore team and it's evident that the CSV file's size is not a great indicator of how much storage your entities will consume. On top of the size of the raw data, each entity has associated metadata, as you've already mentioned, but I'd bet that the indexes are consuming the greatest space. If you don't ever query on one or more of these 15 string properties, you may consider changing their property types to Text or declaring indexed=false in your model. If you can do this with one of your properties and re-build your indexes, I'd be interested in seeing how much your storage usage decreases since you'll need one less index.

(Note that single-property indexes are present but not listed in the Admin Console.)

- Jason

Andy Freeman

unread,
May 12, 2009, 1:38:20 AM5/12/09
to Google App Engine
Since index space can be significant, can we get some additional
information?

For example, does an indexed db.ListProperty(db.Key) with three
elements take significantly more or less space than an indexed
db.StringListProperty with three elements whose value is str() of the
same keys? (The pickle of keys seems to be significantly larger than
the pickle of the equivalent strings.)
> > Thanks- Hide quoted text -
>
> - Show quoted text -

WeatherPhilip

unread,
May 12, 2009, 11:32:03 PM5/12/09
to Google App Engine
I just did a test on one of my apps. Nearly all my data is in a single
model.

I have 163189 instances, and the total size (calculated by reading
each instance and running to_xml() on it, and then adding up the
results) is 281,145,536 bytes. Most of my properties have
indexed=False. The dashboard reports using 890MB of data. I don't know
whether the dashboard calculation is wrong, whether I should be using
a different calculation to estimate my record size, or something else.
If my indexes really are consuming 600MB, then I would work on redoing
a chunk of the app to fix that problem.

However, the only course at the moment appears to be to delete old
data, and hope that the data consumption goes down. Not really very
satisfactory.

Philip

On May 12, 1:38 am, Andy Freeman <ana...@earthlink.net> wrote:
> Since index space can be significant, can we get some additional
> information?
>
> For example, does an indexed db.ListProperty(db.Key) with three
> elements take significantly more or less space than an indexed
> db.StringListProperty with three elements whose value is str() of the
> same keys?  (The pickle of keys seems to be significantly larger than
> the pickle of the equivalent strings.)
>
> On May 11, 5:04 pm, "Jason (Google)" <apija...@google.com> wrote:
>
> > Hi Anthony. I'm very sorry for the late reply, and thank you for bearing
> > with me. I've discussed this with thedatastoreteam and it's evident that
> > the CSV file's size is not a great indicator of how much storage your
> > entities will consume. On top of the size of the raw data, each entity has
> > associated metadata, as you've already mentioned, but I'd bet that the
> > indexes are consuming the greatest space. If you don't ever query on one or
> > more of these 15 string properties, you may consider changing their property
> > types to Text or declaring indexed=false in your model. If you can do this
> > with one of your properties and re-build your indexes, I'd be interested in
> > seeing how much your storageusagedecreases since you'll need one less

Jason (Google)

unread,
May 13, 2009, 2:41:32 PM5/13/09
to google-a...@googlegroups.com
Hi Andy. In this case, the list of Key objects will be smaller than the list of key strings. Even though the picked db.Key object is larger, it is a binary-encoded protocol buffer form that gets stored, which is smaller than the pickled string. That said, I doubt it would make a tremendous difference unless you have a lot of these entities or these lists have a lot of values.

- Jason

Jason (Google)

unread,
May 13, 2009, 2:51:10 PM5/13/09
to google-a...@googlegroups.com
Hi Philip. Calling to_xml() is not a great indicator of the size of your entity as stored in BigTable. Unfortunately, there is currently no straightforward way to estimate how large your entities are, although we're working on possible solutions to this problem.

Without knowing your data model or index definitions, it's certainly not impossible to rule out the size of your indexes, particularly if your application is querying across mutliple multi-valued properties, although this isn't the only scenario that could lead to huge indexes. If you have a property that you're never querying against, I recommend you try removing this single property index and see if that makes a noticeable impact or see if you can eliminate any of your custom indexes which you don't use too often.

- Jason

WeatherPhilip

unread,
May 13, 2009, 10:39:40 PM5/13/09
to Google App Engine
Jason

I removed a bunch of single property indexes (by setting indexed=False
and then loaded and stored each item. This didn't save much (a few
percent). Also, the fact that I can't see the single property indexes
makes it more tricky to figure out if they have really gone or not!

I'm now deleting 10% of the records, but I've only reclaimed 2-3% of
the space (0.92GB down to 0.90GB).

I don't have any significant use of multi-value fields (there is one
field, but only rarely does it have more than one (2) values).

Philip

On May 13, 2:51 pm, "Jason (Google)" <apija...@google.com> wrote:
> Hi Philip. Calling to_xml() is not a great indicator of the size of your
> entity as stored in BigTable. Unfortunately, there is currently no
> straightforward way to estimate how large your entities are, although we're
> working on possible solutions to this problem.
>
> Without knowing your data model or index definitions, it's certainly not
> impossible to rule out the size of your indexes, particularly if your
> application is querying across mutliple multi-valued properties, although
> this isn't the only scenario that could lead to huge indexes. If you have a
> property that you're never querying against, I recommend you try removing
> this single property index and see if that makes a noticeable impact or see
> if you can eliminate any of your custom indexes which you don't use too
> often.
>
> - Jason
>
> On Tue, May 12, 2009 at 8:32 PM, WeatherPhilip <
>
Message has been deleted

Andy Freeman

unread,
May 14, 2009, 1:56:34 AM5/14/09
to Google App Engine
Argh!

This means that one form (db.Key) is smaller than the other
(comparable string) for the datastore while the reverse is true for
memcache.

I've created am issue ( http://code.google.com/p/googleappengine/issues/detail?id=1538
)requesting a __getstate__ and __setstate__ for db.Key that is smaller
than the string equivalent. In addition to eliminating the
inconsistency betwen the datastore and memcache sizes, it will reduce
the size of every memcache'd db.Model instance whose .key() is
defined.
> > > - Show quoted text -- Hide quoted text -

Paul Kinlan

unread,
May 14, 2009, 5:41:14 AM5/14/09
to google-a...@googlegroups.com
Hi,

The whole thing about datastore size is ***really*** frustating.  I am using 30.94 GB for my app (twitterautofollow) and 1) I don't know where it is being consumed and 2) I don't trust the figures, I delete data and the size of the datastore never goes down, so effectively I feel like am paying and I don't know what it is that I am paying for and 3) I honestly don't know how I could be using that much storage.

A case in point, I had another App where I spent 2 months deleting data never to see the size decrease, I removed all the indexes from the system then two days later it was empty.

I just feel frustrated that I can't account for anything, and unfortunatly it is too late for me to design my app to have my own accounting in place.

Paul.

2009/5/14 Andy Freeman <ana...@earthlink.net>

Sri

unread,
May 14, 2009, 8:43:10 AM5/14/09
to Google App Engine
Howdy

I agree with you paul. I just deleted the contents of my
datastore (which took about 2 days - as if that amount of time is not
wierd in itself, let alone 2 months), and at the end it was showing
130 meg (or 13% usage). What the?

Sorry but what was the originaly argument against a "clear-all" switch
on the data store again?

cheers
Sri

Sri

unread,
May 14, 2009, 8:46:39 AM5/14/09
to Google App Engine
Just to be fair, when I recently checked all the data had returned to
0% usage. But that doesnt explain the 30000 entities i had uploaded
12 hours ago....

Paul Kinlan

unread,
May 14, 2009, 8:55:57 AM5/14/09
to google-a...@googlegroups.com
My main issue is that I can't account for the data, and I don't know how to trust the value that I am getting billed for.

Paul

2009/5/14 Sri <sri.p...@gmail.com>

WeatherPhilip

unread,
May 14, 2009, 9:36:29 PM5/14/09
to Google App Engine
Yeah -- I just checked this evening, and my database size has now
dropped by 10% -- roughly in line with the number of entities that I
had deleted. Maybe there is some cleanup process that only runs
occasionally....

However, it is *really* frustrating not to know what aspect of your
application is consuming space....

Philip

Sri

unread,
May 15, 2009, 4:37:03 AM5/15/09
to Google App Engine
fair enough mate. I still have to applaud your patience. hats off
mate, if you can sit around for 2 months while your data is being
deleted!

On May 14, 10:55 pm, Paul Kinlan <paul.kin...@gmail.com> wrote:
> My main issue is that I can't account for the data, and I don't know how to
> trust the value that I am getting billed for.
>
> Paul
>
> 2009/5/14 Sri <sri.pan...@gmail.com>

Paul Kinlan

unread,
May 15, 2009, 4:44:03 AM5/15/09
to google-a...@googlegroups.com
To be fair it was a process I kicked off to come in line with billing, I forgot about it, checked it and it was still running.

Paul

2009/5/15 Sri <sri.p...@gmail.com>
Reply all
Reply to author
Forward
0 new messages