AppEngine datastore disk space usage - entities & index size calculation

310 views
Skip to first unread message

Sharp-Developer.Net

unread,
Jun 13, 2008, 8:27:14 AM6/13/08
to Google App Engine, Alexander_...@dell.com
Hi,

I wonder how to calculate estimated disk space usage by entities and
indexes?

It would be nice to have some formula the same way as Amaxon have for
SimpleDB.

Also I think it would be a good idea if we have a visibility of space
usage by entity and index at control panel/dashboad.

Is there any guidance on formula and plans to provide the stats on
disk space usage by datastore API?

As I personally have few options how to organize my data schema and
would prefer to make a decision based on facts rather than a guess.

Folks, please rate this message if you have the same question becouse
I haven't found anything by searhing through docs or Internet.

Thanks.
--
Alexander Trakhimenok
http://sharp-developer.net/

javaDinosaur

unread,
Jun 15, 2008, 5:34:19 PM6/15/08
to Google App Engine
Folks on the Google AppEngine team have said that index space usage
will not be charged for because they want AppEngine developers write
queries that take best advantage of the DataStore api rather than code
their own cost avoidance query/sort routines in the middle tier.

When sizing DataStore entities you need to consider that entities are
just serialized object blobs, so space usage is going to be less
efficient that a conventional SQL database with separate schema.

nchauvat (Logilab)

unread,
Jun 16, 2008, 4:19:12 AM6/16/08
to Google App Engine
On 15 juin, 23:34, javaDinosaur <jonathan...@hotmail.co.uk> wrote:
> Folks on the Google AppEngine team have said that index space usage
> will not be charged for because they want AppEngine developers write
> queries that take best advantage of the DataStore api rather than code
> their own cost avoidance query/sort routines in the middle tier.

I second that. At Google IO, it was said during the Q&A session that
the reason the free storage quota was lower than for other Google
services (Gmail is over the GigaBytes for example) is that the space
used is actually larger due to indexes, logs, etc.

David Symonds

unread,
Jun 16, 2008, 4:48:53 AM6/16/08
to google-a...@googlegroups.com

Don't forget that Gmail also started with a "measley" 1 GB, and has
grown over time. Don't be too surprised if the free storage quota of
App Engine grows over time, too.


Dave.

joh...@easypublisher.com

unread,
Jun 16, 2008, 6:35:03 AM6/16/08
to google-a...@googlegroups.com

Make me wonder, is the requirement for indexes and logs lower
for GMail then an average GAE app? Why is that so?
(very hypothetically speaking)
/Johan


--
Johan Carlsson
Colliberty Easy Publisher
http://www.easypublisher.com

Derek Upham

unread,
Jun 19, 2008, 12:29:22 PM6/19/08
to Google App Engine
On Jun 15, 2:34 pm, javaDinosaur <jonathan...@hotmail.co.uk> wrote:
> Folks on the Google AppEngine team have said thatindexspace usage
> will not be charged for because they want AppEngine developers write
> queries that take best advantage of the DataStore api rather than code
> their own cost avoidance query/sort routines in the middle tier.

Where is that documented? Thanks.

Sjors

unread,
Jul 4, 2008, 7:57:43 AM7/4/08
to Google App Engine
On Jun 16, 7:34 am, javaDinosaur <jonathan...@hotmail.co.uk> wrote:
> Folks on the Google AppEngine team have said that index space usage
> will not be charged for because they want AppEngine developers write
> queries that take best advantage of the DataStore api rather than code
> their owncostavoidance query/sort routines in the middle tier.
>
> When sizing DataStore entities you need to consider that entities are
> just serialized object blobs, so space usage is going to be less
> efficient that a conventional SQL database with separate schema.

Two questions:
1 - Does the *current* dashboard already subtract the index size?
2 - How do you calculate the size of an object blob?

Example:

I have 1.44 million entities. Each entity consists of a key and an
integer. This eats 120 MB according the dashboard. That's almost 90
bytes for an integer. I actually only need to store a smallint (2
bytes).

Since I am writing an application that wants to store 21 billion
smallints, this problem represents the difference between 400 dollars
or 10 dollars a month on data storage: the difference between a hobby
and needing a business plan.

Sjors

Brett Morgan

unread,
Jul 4, 2008, 8:03:07 AM7/4/08
to google-a...@googlegroups.com

Any particular reason you need to store the integers seperately? Remember, queries only return a thousand objects. You would probably be advised to store chunks of these integers in entities, aportioned by some sane metric, be it time, or geo location, or...
 

Sjors




--

Brett Morgan http://brett.morgan.googlepages.com/

Sjors Provoost

unread,
Jul 4, 2008, 8:20:18 AM7/4/08
to google-a...@googlegroups.com
>> I have 1.44 million entities. Each entity consists of a key and an
>> integer. This eats 120 MB according the dashboard. That's almost 90
>> bytes for an integer. I actually only need to store a smallint (2
>> bytes).
>>
>> Since I am writing an application that wants to store 21 billion
>> smallints, this problem represents the difference between 400 dollars
>> or 10 dollars a month on data storage: the difference between a hobby
>> and needing a business plan.
>
> Any particular reason you need to store the integers seperately? Remember,
> queries only return a thousand objects. You would probably be advised to
> store chunks of these integers in entities, aportioned by some sane metric,
> be it time, or geo location, or...
>
>>
>> Sjors

Good question. I am not sure, but I think the answer is that yes, I need that.

Each points represents an altitude (from the NASA SRTM elevation data
of the whole world). My application displays an altitude profile of a
route. Let's say routes are typically between 1 km (walking) and 100
km (driving) and the altitude profile consists of 100 points (good
enough for mobile phone display). The elevation data resolution is 100
meters.

In that case the typical distance between the queried points is
between 100 (less makes no sense) and 1000 meters, or 1 and 10 data
points. So if I want to grab even just 2 points in one query, that
query would probably take another +- Pi * 10^2 points with it.

In other words, from the point of view of the database, the points
that I query are random and far between, so there is no point in
grouping them.

Sjors

Charlie

unread,
Jul 12, 2008, 3:31:49 AM7/12/08
to Google App Engine
In Sjors' example each entity seems to take around 90bytes. With
about 95% of that space going to the entity key. This seems to make
sense since each key is a string. I remember hearing a limit
associated with each key at Google I/O. Does anyone know what it is?
I think I might have been as much as 1,000bytes.

It is extremely tempting to use the data store like a python
dictionary type. Has anyone found a good design pattern to get around
this?

- Charlie.

P.S. Sjors, I suspect NASA SRTM comes in raster form, you could
probably tile this data and store the smaller raster a packed arrays
in the Blob type. Type array.array().tostring() This will give you
good space efficiency since you can store the Short Ints in only 2
bytes.


On Jul 4, 5:20 am, "Sjors Provoost" <provoostena...@gmail.com> wrote:
> >> I have 1.44 millionentities. Each entity consists of a key and an
> >> integer. This eats 120 MB according the dashboard. That's almost 90
> >> bytes for an integer. I actually only need to store a smallint (2
> >> bytes).
>
> >> Since I am writing an application that wants to store 21 billion
> >> smallints, this problem represents the difference between 400 dollars
> >> or 10 dollars a month on data storage: the difference between a hobby
> >> and needing a business plan.
>
> > Any particular reason you need to store the integers seperately? Remember,
> > queries only return a thousand objects. You would probably be advised to
> > store chunks of these integers inentities, aportioned by some sane metric,

Sjors Provoost

unread,
Jul 12, 2008, 3:49:52 AM7/12/08
to google-a...@googlegroups.com
On Sat, Jul 12, 2008 at 5:31 PM, Charlie <schm...@gmail.com> wrote:
>
> In Sjors' example each entity seems to take around 90bytes. With
> about 95% of that space going to the entity key. This seems to make
> sense since each key is a string. I remember hearing a limit
> associated with each key at Google I/O. Does anyone know what it is?
> I think I might have been as much as 1,000bytes.

I am still not sure if that 95% indeed belongs to the entity key, or
if the key is already subtracted from my usage. But I hope you are
right.

> P.S. Sjors, I suspect NASA SRTM comes in raster form, you could
> probably tile this data and store the smaller raster a packed arrays
> in the Blob type. Type array.array().tostring() This will give you
> good space efficiency since you can store the Short Ints in only 2
> bytes.

I am thinking that, in the current situation, the most efficient way
to store the SRTM data is in a file (or multiple files), in stead of
in the data store.

The most important thing is that I need to minimize disk reads so that
I only grab the 2 bytes that I need. My guess is that excludes .tar.gz
compression. That is a pity because this data just screams for
compression.

But I would much rather use the data store, for the sake of elegance.

Sjors

Charlie

unread,
Jul 12, 2008, 1:56:33 PM7/12/08
to Google App Engine
Sjors,

Try tiling the original raster. And store each tile as a blob in an
entity. You can use the zlib module to compress/decompress each tile
and if you make them small enough this should be efficient. I assume
you'll be using Google maps, or some other map, to determine the
route. If you match google's tiling system, you can look at which
tiles the user's route crosses and pull only those tiles out of data
store. (Using gets rather than queries.)

I look forward to seeing your app,
Charlie.

On Jul 12, 12:49 am, "Sjors Provoost" <sj...@sprovoost.nl> wrote:

Sjors Provoost

unread,
Jul 13, 2008, 12:14:45 PM7/13/08
to google-a...@googlegroups.com
Hi Charlie,

Thanks for your suggestion! You can see my prototype app in action at:
http://dudarev.com/webmaps/profiledemo/

And I wrote a post about the current issues:
http://sprovoost.nl/2008/07/13/restful/

Kind regards,

Sjors

Reply all
Reply to author
Forward
0 new messages