If you have large amounts of text in a single field you should get better compression from zlib than Snappy will get so your data will still be smaller if you do your own compression. YMMV.
Also, if you have large text fields you may save on storage costs by storing this data in the blobstore as the blobstore costs 54% as much as the datastore. One of my apps does this because I am storing more than 1MB of text per entity (so it won't fit in the datastore anyways) and it is a nice side effect that it costs me less to store in the blobstore.
In my simple tests my compressed text is about 10% of the original size for large bodies of text and I pay about 54% as much for storage (thanks to the blobstore) for a "theoretical" cost savings of 94.6% over uncompressed text. YMMV. Also, keep in mind this adds an extra API call to get your data as you must request it from the blobstore (plus the time to decompress the text) which will increase your request time and writing files to the blobstore does not appear to be possible inside a transaction (unless I am missing something).
- Bryce
On Thursday, June 21, 2012 10:19:03 PM UTC-7, Toshiya wrote:
Hi,This means the entities in GAE are compressed when it is stored in BigTable.
But the cost of data stored is calculated before compression, right?
Then, how meaningful is data compression in entities like
compressed property of python NDB, because using it means double compression?
I want to reduce my cost because there are many text in my application.
But I am wondering is it really better to compress them.