Re: [google-appengine] Is GAE good for database-heavy applications?

147 views
Skip to first unread message

Jeff Schnitzer

unread,
Aug 17, 2012, 11:52:33 AM8/17/12
to google-a...@googlegroups.com
This sounds like a very good fit for GAE.

It won't fit into the free quota (just to rebuild the download once an
hour will be hundreds of thousands of read ops per day) but you may
fit into the $9/mo minimum tier depending on how much user activity
you have.

Definitely use ndb over the old db interface.

Create a cron that starts a task that iterates through the dataset,
adding it to a blob in the blobstore using the files api. The reason
to start a task from cron is that the task will retry if for some
strange reason there is an error.

There will be a hard limit of 1MB per document stored in the
datastore. You mention "typically" 500-700 words but if there are
large outliers you may have problems. One solution is to pre-zip the
content, another is to overflow the data into multiple entities.

Sounds like the blob will be in the 10s of megabytes. I believe GAE
charges for bandwidth based on the pre-gzip-encoding size of requests.
If you have a lot of downloads, you may wish to zip it on write and
deliver a zipfile download to your users, which should dramatically
reduce bandwidth costs.

The one 'gotcha' you may run into is that SSL on appengine costs $100/mo.

Jeff

On Thu, Aug 16, 2012 at 11:53 PM, Jordan Bakke <jordan...@gmail.com> wrote:
> I'm writing a very limited-purpose web application that stores about 10-20k
> user-submitted articles (typically 500-700 words). At any time, any user
> should be able to perform searches on tags and keywords, edit any part of
> any article (metadata, text, or tags), or download a copy of the entire
> database that is recent up-to-the-hour. (It can be from a cache as long as
> it is updated hourly.) Activity tends to happen in a few unpredictable
> spikes over a day (wherein many users download the entire database
> simultaneously requiring 100% availability and fast downloads) and
> itermittent weeks of low activity. This usage pattern is set in stone.
>
> Is GAE a wise choice for this application? It appeals to me for its low cost
> (hopefully free), elasticity of scale, and professional management of most
> of the stack. I like the idea of an app engine as an alternative to a host.
> However, the excessive limitations and quotas on all manner of datastore
> usage concern me, as does the trade-off between strong and eventual
> consistency imposed by the datastore's distributed architecture.
>
> Is there a way to fit this application into GAE? Should I use the ndb API
> instead of the plain datastore API? Or are the requirements so
> data-intensive that GAE is more expensive than hosts like Webfaction?
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/NZQnuJubU-sJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Emanuele Ziglioli

unread,
Aug 19, 2012, 6:32:55 PM8/19/12
to google-a...@googlegroups.com, je...@infohazard.org

Sounds like the blob will be in the 10s of megabytes.  I believe GAE
charges for bandwidth based on the pre-gzip-encoding size of requests.
 If you have a lot of downloads, you may wish to zip it on write and
deliver a zipfile download to your users, which should dramatically
reduce bandwidth costs.


GAE doesn't gzip blobs > 1MB from the blobstore . To keep in mind...



Richard Watson

unread,
Aug 20, 2012, 5:56:39 AM8/20/12
to google-a...@googlegroups.com, je...@infohazard.org
On Friday, August 17, 2012 5:52:33 PM UTC+2, Jeff Schnitzer wrote:

It won't fit into the free quota (just to rebuild the download once an
hour will be hundreds of thousands of read ops per day)

Group articles and bundle those together as mini-blobs before creating the large downloadable blob. Recalculate only the mini blobs that have changes, once every e.g. 10 minutes.  Recalc the bigger one every hour.  Not sure how naturally grouped the data is, but even an artificial grouping of e.g. 500 articles per blob grouped by id should work.  If there's a way to put oft-changed articles together you'll save more.

Richard

Jeff Schnitzer

unread,
Aug 20, 2012, 12:02:39 PM8/20/12
to google-a...@googlegroups.com
You could also do it by only loading articles that have changed (query
on timestamp) and "hot-replacing" them in the blob (read blob into
ram, munge, save again). You'd want to do this in dynamic backend
with extra RAM depending on how large the blob gets. But at some
point you have to ask is it really worth the extra engineering to
maybe save a few bucks a month.

Also, not having billing enabled also means that if you ever get a
surge in users, the app goes down. Unless this is a hobby, he
probably wants to enable billing. That 50k daily limit on read ops
goes *fast* when users are doing queries for tags and whatnot.

Jeff
Reply all
Reply to author
Forward
0 new messages