|Accumulated lots of data - deleting is is too expensive||John Wheeler||5/23/13 4:56 PM|
Been having trouble posting. I posted this but it appears it was deleted - I might have inadvertently deleted it.
I created an application two years ago and didn't know too much of what I was doing with indexes and such when I created it. For example, I indexed a string property that turned out in practice to be large, and while I thought it would be useful to have it indexed, it turned out it never really was. Fast forward 2 years and 18 million entities later, I've started experimenting with cleaning up some of this data, and cost feel high.
For example, I deleted about 1/2 million entities and it cost me over $100. It was also extremely slow. I'd imagine this probably cost Google on the order of a few cents in actuality.
The mismatch, mentally is that these entities, all 18M, take up near a 1/2 terabyte of storage, which itself costs less than $100 nowadays. It's hard for me to understand paying $3600 to delete them all, and even though its an expensive lesson I'll not repeat in the future (letting entities accumulate), I don't think it should be this expensive in general to delete data.
I'm hoping one of the app engine product managers reads this and takes it into account that deleting unused big data is just too expensive on app engine. Instead, what you're forced to do, is migrate bits of your data you want to keep and shutdown the old app, which seems like a lot of overhead to impose on developers when deletes, and datastore writes in general, could probably be much cheaper.
|Re: Accumulated lots of data - deleting is is too expensive||Alexis||5/24/13 2:47 AM|
I agree, for some Kinds we accumulated and do no longer use, we just don't delete them as doing so would cost more than keeping them for more than a year (or sometimes several years for light entities with several indexed properties), that is close to the anticipated remaining lifetime of the app.
Removing stuff has little added value so paying a big check for it is not appealing.
|Re: Accumulated lots of data - deleting is is too expensive||Renzo Nuccitelli||5/24/13 9:47 AM|
I have never let it happens on my apps, i just build a cron to daily clean upp old data.
But once you haven't done this on past, it would be possible writing a cron job that erase just dome of data every day. Depending on your apps access, you could do this using only with the free quota. Is this an option?
|Re: Accumulated lots of data - deleting is is too expensive||vlad||5/24/13 10:55 AM|
@Renzo - Deleting data as it comes is not a real solution as it costs you the same amount as deleting it all at once. Just spreading the financial pain in time.
The point is deleting data gets very expensive and it is a "hidden" cost most developers fail to consider. Perhaps GAE should hold an "amnesty day" once a year when developers can whack their data for free? In terms of customer loyalty that action alone would be worth GAE more than 100s of free developer days which they do around the globe.
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||John Wheeler||5/24/13 11:33 AM|
@vlad - I was thinking the same exact thing about the amnesty days idea. Ha, it would be funny, but great if we had 'Free delete Fridays' or something like that :-)
It would be best if deletes were extremely cheap - give developers the ability to do them at cost so they're not afraid to experiment on App Engine
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||barryhunter||5/24/13 11:53 AM|
On Fri, May 24, 2013 at 7:33 PM, John Wheeler wrote:
What makes you think they are not close to 'cost' already?
You seem to be assuming deletes are absurdly 'marked up' - for what ever reason.
Why would Google doe that?
Deleting at 'scale' is not cheap. Your data is replicated around. All those copies have to be found and 'deleted'. The indexes are seperate and have to be deleted too. There may be many.
In fact most of the time the data isn't actually deleted. Just Tombstoned. Marked as deleted, so the space is not actually reclaimed right away (to be sold again). Would be to much work to remove the 'holes' all the time.
The space will probably be reclaimed eventually, when the tablets are compacted. But not right away
In fact when a Application is deleted, wouldnt be surprised if Google don't jsut absorb the storage cost, and not actully bother deleting the data. Deletions will be relativly rare, and few will leave large amounts of data lying around. Will just be orphaned and ignored.
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||vlad||5/24/13 12:09 PM|
While it might be true that data deletion is expensive. The reality is Google is losing customers over that! It is obvious that whoever started this thread is not going to fork over $3600 for the privilege. His only way out right now is to cancel his credit card and abandon the account. Since I doubt GAE will let him just stop billing on that app. This is a sad situation and a flaw in GAE's business model.
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||Jason Collins||5/24/13 12:57 PM|
I too suspect that deletion is a truly expensive operation and that is directly reflected in pricing. Or worse, that the tablets remain forever fragmented and the space is never actually reused (as previously suggested on this thread).
I've often advocated for a way for me to mark an entity as "for deletion" and allow Google to come around in some kind of batch operation to clean it up. It would be ideal if it were immediately removed from indexes (i.e., from sight) and I would be willing to pay for it until the background cleanup comes around (e.g., maybe at least every X days) - as long as the wait+background-cleanup costs were some fraction of just outright deleting it.
Even without the immediate index removal, we have lots of use cases where the data could actually remain indexed because our particular use case naturally avoids these orphaned rows (e.g., think of all the blog posts and comments and +1's for a deleted account). I'm sure this is pretty common. So to be able to mark all these entities as "for deletion" or "reclaimable" would let a background process clean them up for little or no cost (apart from datastore storage while holding them during the "wait" period).
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||Jeff Schnitzer||5/24/13 5:32 PM|
The economist in me thinks that Google should just double the price of writes and make delete free.
|Re: Accumulated lots of data - deleting is is too expensive||Richard Watson||5/25/13 4:07 AM|
I agree with Renzo. You should be able to run a cron which deletes X per day for days that you're under your quota? If you're above your quota and you'll never drop below, either ignore it or move apps (if useful data > non-useful data). If you're below quota it'll take a year or more but it'll solve the problem.
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||Alexander Trakhimenok||5/27/13 8:11 PM|
Jeff, I would be strongly against doubling write prices and making deletes free.
In our app we have hundreds of millions records that we updated quite frequently but almost never delete. That would unnecessary double our costs just because someone did not think in advance about deletes.
I'm a huge fan of "pay as you go" and granular billig models so I believe GAE has right pricing structure. We all do mistakes, but we should pay just for our own mistakes, not someone's else. Isn't it?
Founder at www.myclasses.org, powered by GAE
|Re: [google-appengine] Re: Accumulated lots of data - deleting is is too expensive||Marcel Manz||5/28/13 1:11 AM|
You may have noticed that Google reduced the costs of the HRD:
Still, deleting large amounts of data is simply too expensive. Once you have an application that hits that magical number of 1TB you have to take in mind that to store this data will now cost you $240 every month, $180 after the latest price reduction. If you're moving towards 2TB where you might be able to drop the first TB for archiving reasons, it is most likely still cheaper to drag that 1TB for another year along ($2160), than to pay delete write costs for moving that data off HRD. Eventually at some point you might just copy what is required to a new app, deleting the old application and leaving the cleanup charges with Google.
From my perspective Google should introduce a more economical entity delete cost. A fixed cost per entity delete, no matter on how many properties or indexes that entity has. That delete operation also wouldn't need to be instantaneous, it could run through some slow queue that is completing the delete in lets say anywhere between now and 7 days. If the application requires an immediate delete, from my perspective it's fine to pay for the current charges, but Google in return should introduce some cheaper way for batch-deleting large amounts of entity data.
|Re: Accumulated lots of data - deleting is is too expensive||Vinny P||5/28/13 9:50 AM|
On Friday, May 24, 2013 2:57:01 PM UTC-5, Jason Collins wrote:I too suspect that deletion is a truly expensive operation and that is directly reflected in pricing. Or worse, that the tablets remain forever fragmented and the space is never actually reused (as previously suggested on this thread).
+1. I strongly suspect that entity write and deletion costs are reasonably close to App Engine's actual costs. Google has released many theoretical papers about their databases, but very little information about their actual implementation. It wouldn't surprise me if the datastore never actually deleted entities, just marked them and deleted index references to them.
I wonder if App Engine's occasional "slowtimes" are actually some kind of data compacting operation.
Storage costs themselves aren't the biggest cost driver. The biggest issue (as many other people have noted before) is having to delete references from indexes and all the other cleanup work involved. Do you mind posting a sample entity for us to look at (how many properties per entity, how many properties are indexed, the indexes that are generated, etc).
Technology & Media Advisor
My Go side project: http://invalidmail.com/