Wow! Thanks very much to everyone who posted suggestions and to those
who sent me direct replies.
First, please let me say that I believe it's in Google's best interest
and my company's for us to keep core portions of our application in
AppEngine. Furthermore, given the amount of investment we have in
this infrastructure, I intend to pursue any avenues we have to
continue using AppEngine. I believe that with Google's help we can
find an engineering solution and/or pricing model that allows for both
the platform and its customers to be successful.
Ok. That said, let me summarize and respond to some of the concerns/
recommendations above:
[NickolasD] Is this something you could move into Google Cloud SQL?
Yes, but it's not clear what the pricing model will be for CloudSQL
and whether it will be any cheaper than AppEngine.
[RichardW] Maybe the GAE team should borrow the idea of spot prices
from Amazon.
Love this idea. It would serve to spread out resource usage,
provide market pricing, and benefit all involved.
[RichardW] Maybe run your mapreduce on smaller sets of the data to
spread it out over multiple days and avoid adding too many instances?
As detailed above, the costly component here is the database
operations charge wrt large datasets and indexed properties.
[sb] Google Cloud SQL looks interesting. but 30 days is not enough
notice to respond to changes/decisions that may be made.
Totally agree. I get 45 days notice on my rent increases and it
takes far less effort for me to change apartments.
[de Witte] What if you disable the app for maintenance, doing the
following steps...
Really interesting suggestion! Would love to hear if someone's
tried this.
a) We'd really like to avoid turning off the app for the 1-2 days it
would take to create the indexes.
b) I'm not certain a rebuild would be any cheaper. If it is, that's
probably an unintentional pricing discrepancy that I'd prefer not to
rely on.
[JonS] For our application, we used Geohashing.
We used geohasing before geboxing, but it didn't work for us.
a) AppEngine requires that a query have at most one inequality
comparison, and geohashing uses it.
b) We found that geohashing queries were much slower than geoboxing
for the same parameters, adding human noticeable delay (>400ms).
[VivekP] I have a table with 1.5TB of data. It costs me ten of
thousands (one-time) to delete it and a few thousand (per year) to
keep it.
[Andrin] I use a version of geohashing which only uses the most
precise value.
Our geohash does the same thing, but the above limitations still
exist.
[RichardW] What if you had the gps data as children of each entry and
then used a keys-only query to match?
Love the suggestions! We thought about doing something like this as
well, but we'd still have one entity per StringListProperty.
That's not so bad, but we'd also need to copy down other properties
so we could restrict the query based on other values on the entity.
For example: Finding entities within a certain distance sorted by
popularity or filtered by user.
I'm not certain there would be cost benefits but I am certain it
would add substantial complexity to the data + app.
[IkaiL] please describe the engineering details and business purpose
Provided in an earlier post. Do you need any more info? Any
thoughts? We tried registering for Premier support a few days ago but
haven't heard back yet.
Thanks again,
-Corey