Data Storage Size - multiplier per object

13 views
Skip to first unread message

Jonathan Ultis

unread,
Mar 10, 2009, 12:28:24 PM3/10/09
to Google App Engine
I created a model with fixed content that requires ~250b serialized,
including all field names, the key, and the kind name, and parent
(None). I added 312000 of those to the datastore, for 75 megs of raw
data. There are 8 indexable fields, The indices should require no more
than 176 megs of additional space, if the indices don't do any sort of
column compression. That's 250 megs of raw space.

But, the data store reports 1GB of space used.

That suggests perhaps 2x redundancy, plus a 50% fill rate in big
table. Or, maybe just 4x redundancy. No idea.

Anyhow, for now, take your raw object size including kind, key, field
names, and field content, and multiply by 10x-15x, depending on how
many indexable properties you have, to get your final storage size.

Jonathan Ultis

unread,
Mar 10, 2009, 12:30:46 PM3/10/09
to Google App Engine
Or better, take your raw data size excluding BlobProperty and
TextProperty and multiply by 15x. I'm not sure what the multiplier is
on the unindexed properties yet.

WeatherPhilip

unread,
Mar 11, 2009, 11:14:45 PM3/11/09
to Google App Engine
I think that some transparency on this from the GAE team would be
good. I would like to see in the control panel the size of each of the
indexes that has been built -- including the single property indexes
that are not shown. It would also be really nice to have a good way of
marking a property in a model as 'not to be indexed'. If it is a
string, then you can use a TextProperty type, but for all the other
types, you are stuck with indexed properties. In fact, I don't really
see why the process that autogenerates the index.yaml file shouldn't
include the single property indexes as well.

It may be that the optimal strategy is to take all the properties that
you don't want indexed and pickle them into blob on store, and
unpickle them on a get. However, this just doesn't seem like the right
approach....

This is an area where some tools are really called for -- to allow us
to see where the datastore quota is actually being used.

Philip

Andy Freeman

unread,
Mar 12, 2009, 3:13:15 AM3/12/09
to Google App Engine
http://code.google.com/p/googleappengine/issues/detail?id=1084

seems relevant.

On Mar 11, 8:14 pm, WeatherPhilip <philip-goo...@gladstonefamily.net>
wrote:
> > on the unindexed properties yet.- Hide quoted text -
>
> - Show quoted text -

Marzia Niccolai

unread,
Mar 13, 2009, 12:57:25 PM3/13/09
to google-a...@googlegroups.com
Hi,

There is no 'multiplier' per se on datastore storage.  The issue is that we account for both the size of the data stored and the space taken by the indices for this data.  As such, the amount of storage you use depends specifically on the types of indexes your application has.

We are working on getting better documentation together that will give you a good idea on how you can account for the amount of storage an entity will take.

Please note that the FAQ on this subject currently is _not_ correct and we will be updating it.

-Marzia

Ben Nevile

unread,
Mar 18, 2009, 3:28:52 PM3/18/09
to Google App Engine
Hi Marzia,

Just want to add my voice to the chorus of people looking for a little
more transparency in terms of data storage and entities. In my ideal
fantasy world I'd be able to see a pie chart that would break down the
percentage of storage that each of my entities was using as a fraction
of the total storage used. Clicking on an entity's slice in the pie
would bring up another pie chart that would show the fraction of
storage used by that entity's primary store and each of its indices.
(This second chart may not be necessary once you guys have published
some more info on exactly how different properties and indices
manifest themselves on disk.)

Thanks in advance,
Ben



On Mar 13, 9:57 am, Marzia Niccolai <ma...@google.com> wrote:
> Hi,
>
> There is no 'multiplier' per se on datastore storage.  The issue is that we
> account for both the size of the data stored and the space taken by the
> indices for this data.  As such, the amount of storage you use depends
> specifically on the types of indexes your application has.
>
> We are working on getting better documentation together that will give you a
> good idea on how you can account for the amount of storage an entity will
> take.
>
> Please note that the FAQ on this subject currently is _not_ correct and we
> will be updating it.
>
> -Marzia
>

neoedmund

unread,
Mar 21, 2009, 3:59:13 AM3/21/09
to Google App Engine


On Mar 11, 1:28 am, Jonathan Ultis <jonathan.ul...@gmail.com> wrote:
> I created a model with fixed content that requires ~250b serialized,
> including all field names, the key, and the kind name, and parent
> (None). I added 312000 of those to the datastore, for 75 megs of rawdata. There are 8 indexable fields, The indices should require no more
> than 176 megs of additional space, if the indices don't do any sort of
> column compression. That's 250 megs of raw space.
>
> But, thedatastore reports 1GB of space used.
>
> That suggests perhaps 2x redundancy, plus a 50% fill rate in big
> table. Or, maybe just 4x redundancy. No idea.
>
> Anyhow, for now, take your raw object size including kind, key, field
> names, and field content, and multiply by 10x-15x, depending on how
> many indexable properties you have, to get your final storage size.

I met the same problem, uploaded less than 200GB raw data, and after a
dairy quota reset, 1GB quota is reached!
Reply all
Reply to author
Forward
0 new messages