You wrote code that treated DataStore Like SQL.
You didn't set Do Not index on the things you didn't need to index.
You changed the structure of your data midway but didn't flush and start
over you just changed.
Likely you aren't doing any clean up.
Likely you aren't using the right typing for your data.
So what I hear is "Whine, whine, whine, I built my stuff wrong, Google Tried
to help me but I wanted to move to Amazon so they didn't have many
suggestions I liked, so now I'm sad, whine, whine, whine, woe is me. Please
tell others so I can get sympathy for not understanding the platform I was
working on."
Did I miss anything?
Please share and comment.
Cheers
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to
google-appengi...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.
Seems like there's a simple solution to deleting all the data, though:
After you've moved the important data to a new app, just stop billing
on the old app. Make reclaiming it Google's problem.
Jeff
--
We are the 20%
"Cautionary tale: Building large Scale Data can cost lots if Datastore isn't
fully understood"
"Cautionary tale: Failure to be Scrap and Restart your DataStore when making
changes to the structure can be expensive"
But I don't think that "abusive price" is accurate.
Yes,
While the primary app I talk about is edge Cache, that’s because that’s the thing that people can most benefit from that people don’t seem to be using.
As part of my SEO tools we have what is now a 60 TB database of Backlinks and Crawler data about websites in the top 200k Alexa Sites.
Why should Deleting be Cheaper? The Operation takes the same amount of CPU, and after you do the delete you don’t have to pay for storage.
I don’t do near as much in the Java Space but it doesn’t seem there should be much difference between Python and Java. I ported both the primary apps to both languages to do comparative cost analysis, and there have been a few things that we found were faster or cheaper with one or the other, as a result in some case we deploy both and use different versioning so they can both be live and attached to the same data.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/oJRZxuV7yQgJ.
The HRD is super-cool and perfect for building reliable web
applications. But it is way too slow and expensive for large-scale
data processing. And the uber-reliability is usually pointless - when
dealing with massive data volumes, your collection system is likely
somewhat lossy in the first place. Losing a few bits probably won't
hurt you, and "synchronously replicated to more than three data
centers" is massive overkill.
You probably have the right idea moving to another platform. Use the
right tool for the right job; maybe something like MongoDB or Hadoop.
You'll get much better map/reduce support, higher performance, and
lower cost. GAE is not a box that you're stuck in; you might still
run part of your application on GAE if it makes sense. Just keep an
eye on latency and communication costs.
This isn't a scathing indictment of GAE so much as a realization that
it's not a universal tool. There are a lot of things that are easier
to build with other tools... and a lot of things that are easier to
build on app engine. And some things that are best hybrids of GAE and
something else.
Jeff
--
We are the 20%
My team writes 500 lines of code for every 50 that make it in to the final
product.
We know things about the efficiencies of Do While vs. ForEach that quite
possibly Google doesn't even know. We are that anal about testing. We test
query speed done different way's and compare cost and performance based on
the anticipated ratios of use.
We just never let "mistakes" grow to the point we can't control them.
I would say GAE handles big data really well. But you have to do testing to
make sure your structure is correct, and that your indexes are well thought
out.
Planning is always possible. Testing is always possible. But like driving
my Mini Cooper around LeGuna Seca, vs. driving a Ferrari around it. The
Ferrari is only faster if you can handle it. My mom can run laps in the
mini cooper, but would end up in the wall in a Ferrari.
Or like the discussion about executing code from students.
GAE is cycles on demand, so if you can build your app to be efficient it is
cheap. If you build it with errors it is expensive.
I recently found I could knock 3% off of my bill by disabling logging.
That's the level of testing we do. People say "but how can you afford to
pay devs to write code if you worry that much" well we are betting on the
long haul. We only need to learn the lesson once to capitalize on it for
years.
You say you can't predict growth. Sure I can. I either engineer something to
work for me and 3 of my friends, or I engineer it to be the next facebook.
There is room for some differences along the way, but I could build facebook
on GAE. No worry about big data, or scaling. (I think the GAE team would
deploy servers for me as fast as I could fill them)
Things that are designed for you and your friends you don't market, you
don't tell people about, so they don't grow. When we went from CDNinabox
going from something brandon uses for his sites to being a product, the
product got lots of complete re-writes. Testing in Java and Python, the
caching mechanism we use ended up using 4 different models based on the type
of site traffic the site we are accelerating gets. 1 hack for me became a
software with 40+ optimizations that can be turned on and off to make things
run up to 80% cheaper than the defaults. And to pick those settings we test.
We even schedule changes to test real traffic for periods of time.
I think the real lesson I'm trying to convey is one I learned at MSFT. For
every dev there is 1/40th of a CTO, 1/10 of a product manager 2 test
engineers 1/5 of a release manager, and 1/5 of a performance engineer. That
is 2.5 support staff for every programmer. If you are just writing code you
are working in a vacuum that makes it hard to plan, test, debug, and run
scalability metrics.
Did you see the thread about the push the button check back in 48 hours?
Though to be fair on RDS we just did a data dump to move to a new system
which we won't mention here, and our SQL export to 288 hours 17 minutes.
Data migration over the internet is tough when you get above 1 TB. And
making sure you don't have corruption during the move is rough.
Hi Brandon,
Please share.
Cheers
> > 20+on
--
1) Export data into Blobstore as very large blobs
2) Suck data out of the Blobstore
The export can run at Map/Reduce speeds... as fast as you want to pay
for. Bulk downloads from the blobstore should be fast. Unless each
of your entities are huge, fetching 30 at a time is an awfully small
number.
Jeff
--
We are the 20%
This was also what I was thinking about for a Back-up strategy.
-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of Jeff Schnitzer
Sent: Thursday, December 29, 2011 12:40 PM
To: google-a...@googlegroups.com
I think we are talking about two different things. I'm thinking of
Big Data like this:
http://en.wikipedia.org/wiki/Big_data
Typically characterized by:
* Large data volumes
* Batch updates
* Frequent need to analyze/sift through large quantities of data
The GAE datastore performs poorly in this regard. Map/reduce support
is anemic at best. Per-gigabyte storage is expensive. Raw I/O
performance is *dreadful*. Indexes consume excessive amounts of
space.
I love the GAE datastore, I think it's hands-down the Best Storage
Around for web applications that need scalability and availability.
But there's no way in hell I would use it to store a large-scale OLAP
system or any other kind of serious analytics product. You don't want
EC2 either. You need something like Hadoop on bare metal hardware
with really fat I/O pipes. It will cost you a tiny fraction of what
you'll spend at Google will cost and perform 10X better.
Jeff