ROFLMFAO DynamoDB From Amazon

1,756 views
Skip to first unread message

Brandon Wirtz

unread,
Jan 18, 2012, 2:57:10 PM1/18/12
to google-a...@googlegroups.com

Google should pay me to do a parody of this http://www.youtube.com/watch?feature=player_embedded&v=oz-7wJJ9HZ0#!

I might do it for free, but I suspect the audience for such a video is somewhat limited.

 

Dynamo Misses the Dynamic portion.  Turn the dial to determine how many queries per second you need to serve from this non-relational Database. Got a Traffic Spike? Sucks to be you. Oh, they failed to mention, “changing the dial on a 100gig database?”  it requires “about 1.5 minutes per gig when scaling up”.  So those of you with 4TB of data, if you want to scale up you need to give them 4 days notice.

 

 

 

André Pankraz

unread,
Jan 19, 2012, 3:55:52 AM1/19/12
to google-a...@googlegroups.com
I find this also strange...you have to precalculculate / estimate your writes / s and pay for that?! 1 $ for 1GB? Wow...


But on the other side, you seem to have missed the part where they claim this thing would be nearly as fast as our memcache here. We will see, but:
We GAE users are at the moment in no position to make a paradoy of anything in the Amazon cloud world.

The less restricted IAAS together with PAAS seems to work much better. Many startups work with Amazon IAAS as foundation to provide PAAS on top of that (Heroku, OpenShift etc). The Open Source community for Amazon is _much_ bigger.  The Google marketing etc. is...other topic ;) In can see where this goes in the long run...*wave*

Brandon Wirtz

unread,
Jan 19, 2012, 4:45:21 AM1/19/12
to google-a...@googlegroups.com

At the risk of someone tracking down my comments and using them against me… I talked to them the other day… They don’t get it. 

 

My product has a version that runs on AWS and we tested it with Dynamo, it isn’t as fast as memcache. It’s not as fast as DataStore.  They claimed they would be tuning it, but the it just isn’t as mature as Datastore.

 

A lot of why I started down the path of performance testing today was because Amazon CloudFront guys were talking about how great their product was and I said, it was slower than mine, and they came up with 2 countries (out of 12 tested) that were faster than me. But only by a few milliseconds.  Switching to Route53, and now with a few optimization to how my app initializes I’m faster than they are again, (and you don’t need to understand anything about your infrastructure or your code to implement my solution).

 

Think about that.  My software running on Google’s PaaS is faster and easier to implement than what I believe is the second largest CDN on the net.  That is a powerful statement. Sure about me, but more about the power of the Google platform. Truly enterprise scale systems that perform like purpose built solutions.  Amazon cost of delivery works out to be 16 cents a gig, we average 18. (we hope to have this to 16 with the new changes).

 

I don’t have the biggest deployment on AppEngine. I don’t spend the most money. There are for sure bigger and better stories out there, but Google is on the right track.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/UbwFtEIY7scJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Robert Kluin

unread,
Jan 19, 2012, 12:58:53 PM1/19/12
to google-a...@googlegroups.com
As much as I hate to admit it, I had the same feelings as you when I
read the DynamoDB announcement yesterday. ;)

Their pricing is a confusing and means I have to estimate the capacity
I will need, which sucks. Most of the apps I deal with have pretty
distinct wave-like traffic patterns, the daily peaks see traffic
levels several times the levels during the troughs. I've also worked
with apps with a lot (several TB) of data that go from sustaining a
few hundred QPS to sustaining 1000+ QPS, fiddling with nobs while
that's happening would suck -- GAE just magically kept working (well
too!). I guess I just didn't get the dynamic part of DynamoDB.

However, I still hope that this will give Google incentive to innovate
and be competitive. In my opinion the datastore is/was probably App
Engines biggest competitive advantage. App Engine's datastore pricing
is also lacking, but at least your not paying for over capacity.


Robert

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.

stevep

unread,
Jan 19, 2012, 4:46:59 PM1/19/12
to Google App Engine
Robert wrote:
However, I still hope that this will give Google incentive to
innovate
and be competitive.

Certainly am pulling for Dynamo to bring some competitive pressure
(but very happy any port seems a long ways off). Not yet there, but
fingers crossed -- come on Jeff apply that giant alien brain (whatever
happened to Yegge's Google+ series?). One of the greatest needs right
now for many GAE devs, though, is a simple best practices document.
Unfortunately this falls into the "documentation" category, and we all
know how engineers asses the value of more documentation vs. more new
and cool features. -stevep

Brandon Wirtz

unread,
Jan 19, 2012, 5:02:45 PM1/19/12
to google-a...@googlegroups.com
GAE needs a dedicated Tech Evangelist to write best practices, and to
generate clear examples of how code should be used. But also to show X vs Y
performance numbers.

You can't easily peer in to GAE's pricing. And a lot of people blame the
Expense on Google when they need to look at their own code. If we were
running the code from our first release we'd be paying 40x what we are now,
and we have maybe changed 60 lines of code. Some of those lines even slow
things down so we can offer more functionality, so likely if we had only
done feature expansion we'd be at 60x the price of what we are.

That's a big deal. It is also the kind of thing Google has never been good
at. They don't really do nice explanations of how to Optimize for their
products. I mean this in the most flattering of ways because the GAE team is
much more helpful than any other group I have interacted with at Google, but
generally Google's approach to documentation and evangelism is "We are so
much smarter than you we can't dumb this down enough that it would make
sense, and if you were smart enough to understand you wouldn't need me to
explain".

-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of stevep
Sent: Thursday, January 19, 2012 1:47 PM
To: Google App Engine
Subject: [google-appengine] Re: ROFLMFAO DynamoDB From Amazon

--

Andrei

unread,
Jan 19, 2012, 7:31:26 PM1/19/12
to Google App Engine
They claim you can do 250,000 writes per second

On Jan 18, 1:57 pm, "Brandon Wirtz" <drak...@digerat.com> wrote:
> Google should pay me to do a parody of thishttp://www.youtube.com/watch?feature=player_embedded
> <http://www.youtube.com/watch?feature=player_embedded&v=oz-7wJJ9HZ0#!>
> &v=oz-7wJJ9HZ0#!

Brandon Wirtz

unread,
Jan 19, 2012, 7:44:15 PM1/19/12
to google-a...@googlegroups.com
Yeah? So?

Have I Not shared when Google bot crawls me at 100k per hour? I hit 1/4 that
during those spikes. I could do some load testing if someone wants to foot
the bill, but I'll bet money I can push GAE to 1M easily.

-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of Andrei
Sent: Thursday, January 19, 2012 4:31 PM
To: Google App Engine
Subject: [google-appengine] Re: ROFLMFAO DynamoDB From Amazon

--

Andrei

unread,
Jan 19, 2012, 8:28:34 PM1/19/12
to Google App Engine
that is 250K per second, not per hour

Brandon Wirtz

unread,
Jan 19, 2012, 8:35:53 PM1/19/12
to google-a...@googlegroups.com
Aware. During big bursts we are handling 36 writes per crawled page, and
doing at times 1500 RPS

Andrei

unread,
Jan 19, 2012, 8:41:22 PM1/19/12
to Google App Engine
i wonder why i could not get more than 1500 writes per second

Brandon Wirtz

unread,
Jan 19, 2012, 8:48:17 PM1/19/12
to google-a...@googlegroups.com
Was it very nearly exactly that number? You aren't supposed to have to ask,
but sometimes you have to get the guys to take the training wheels off.

Or it may be the way your code is written, or the number of instances you
had active. What was the error you got when you pushed harder?

One of the problems we have been trying to work out with the CSV importer we
have been working on is that when we try to do 6 million records from our
command line python app, we are spinning up too many instances, so it costs
a fortune, but we have definitely been higher than 1500/s

Richard Watson

unread,
Jan 20, 2012, 6:19:28 AM1/20/12
to google-a...@googlegroups.com
TBH, I think it'll likely have a place. It won't be useful for handling cases of peaky loads with low individual transaction value, like general web traffic.  However, it's very likely useful when you need more predictable performance, when the lack of it is going to cost you.  It seems to be priced as a premium product for situations that need it.

α its_me

unread,
Jan 20, 2012, 6:19:21 PM1/20/12
to google-a...@googlegroups.com
Oh, they failed to mention, “changing the dial on a 100gig database?”  it requires “about 1.5 minutes per gig when scaling up”.  So those of you with 4TB of data, if you want to scale up you need to give them 4 days notice.

I've been searching... where did you find that? (Link?)

α its_me

unread,
Jan 21, 2012, 2:06:37 AM1/21/12
to google-a...@googlegroups.com
It seems, you are wrong. I put forward this question to folks at Amazon, and here's the reply:

The overall time is not linear as the Google Groups poster suggests. In most cases it will be between a few minutes to a few hours regardless of total size. Larger data sets may take a bit longer than smaller data sets simply because there is often more data movement to perform and coordination to be made across a greater number of machines. Rest assured though, we make use of parallelism where we can so the curve is far from linear.

Someone else has asked a similar question as well, here.

So, is it that you pulled the stats out of thin air? Please clarify Brandon. Thanks.

Jeff Barr

unread,
Jan 21, 2012, 10:49:03 AM1/21/12
to google-a...@googlegroups.com
One of my colleagues on the AWS team responded to the original comment/question. Here's what he had to say:

Brandon Wirtz

unread,
Jan 21, 2012, 5:57:22 PM1/21/12
to google-a...@googlegroups.com

I was talking to them.  They do On demand replication based on scale.  The formula is a bit more complex than what I laid out, but if you needed to scale up for a “slash dot” you couldn’t.

 

From: google-a...@googlegroups.com [mailto:google-a...@googlegroups.com] On Behalf Of a its_me
Sent: Friday, January 20, 2012 3:19 PM
To: google-a...@googlegroups.com
Subject: [google-appengine] Re: ROFLMFAO DynamoDB From Amazon

 

Oh, they failed to mention, “changing the dial on a 100gig database?”  it requires “about 1.5 minutes per gig when scaling up”.  So those of you with 4TB of data, if you want to scale up you need to give them 4 days notice.

 

I've been searching... where did you find that? (Link?)

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/Unne6VNwYxkJ.

N. Rosencrantz

unread,
Jan 21, 2012, 6:44:57 PM1/21/12
to google-a...@googlegroups.com
Thanks for this thread. I'm curious why you need so many writes. Do you write to the datastore just because there is a request? Isn't that ineffeicient? I have many handlers that don't do writes at all.
Best regards,
Nick

Brandon Wirtz

unread,
Jan 21, 2012, 7:15:35 PM1/21/12
to google-a...@googlegroups.com

We write any request that has been expired (file hasn’t been accessed in X seconds) or any new request we haven’t seen. 

 

While we average for the most part about 99% of requests from Users being served without touching the data store, when Google bot indexes a site it is often hitting pages users never do.  

 

JeffProbst.com has 35 total pages, 500 total assets.  We don’t have to touch the datastore (or memcache) for 99.998% of requests

 

XYHD.tv has 4800 total pages 8400 total assets. In a 24 hour period 350  unique pages receive user traffic 800-ish assets.  When Google Bot comes through it reads 4200 ish of those pages.  On XYHD.tv pages expire from the cache every 3 minutes. So we get about 97% cache hits for users, but only 10% for Google bot.

 

 

 

From: google-a...@googlegroups.com [mailto:google-a...@googlegroups.com] On Behalf Of Niklas Rosencrantz
Sent: Saturday, January 21, 2012 3:45 PM
To: google-a...@googlegroups.com
Subject: [google-appengine] Re: ROFLMFAO DynamoDB From Amazon

 

Thanks for this thread. I'm curious why you need so many writes. Do you write to the datastore just because there is a request? Isn't that ineffeicient? I have many handlers that don't do writes at all.
Best regards,
Nick

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/XkvAJNYnYUQJ.

Jeff Schnitzer

unread,
Jan 23, 2012, 3:56:09 PM1/23/12
to google-a...@googlegroups.com
I just noticed something about DynamoDB that might make it compelling:
They guarantee sub-10ms latency for reads and writes. I would donate
my left testicle to get sub-10ms latency for GAE datastore operations.

Jeff

Brandon Wirtz

unread,
Jan 23, 2012, 4:09:54 PM1/23/12
to google-a...@googlegroups.com
Latency. Not delivery. We were testing with 500k Blobs and GAE was faster
75% of the time, but was slower 75% of the time on 10k text strings.

Those numbers are approximate, but close.

Jeff

--


You received this message because you are subscribed to the Google Groups
"Google App Engine" group.

Brandon Wirtz

unread,
Jan 23, 2012, 4:10:59 PM1/23/12
to google-a...@googlegroups.com
It is apparently optimized for 16k minus some overhead.

-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of Jeff Schnitzer
Sent: Monday, January 23, 2012 12:56 PM
To: google-a...@googlegroups.com

Jeff

--


You received this message because you are subscribed to the Google Groups
"Google App Engine" group.

Andrei

unread,
Jan 23, 2012, 6:56:48 PM1/23/12
to Google App Engine
This is off topic, but I am writing JavaScript interface for DynamoDB
You can read here
https://forums.aws.amazon.com/thread.jspa?threadID=85588&tstart=0

lola grace

unread,
Nov 6, 2013, 2:53:15 PM11/6/13
to google-a...@googlegroups.com
My company went into production with DynamoDB and the provisioning throughput issues and cost became a major issue for us. We ended up achieving far better results and incredible cost savings by rolling our own using Amazon S3 for a datastore.

You can read the case study here: http://www.s3nosql.com

Jim

unread,
Nov 8, 2013, 2:30:07 PM11/8/13
to google-a...@googlegroups.com
I agree with your comments about using both IAAS and PAAS.  Our application does a lot of back-end processing using AWS clusters that we can spin up/down on demand for our more intensive batch-oriented analytic updates which happen once a week.  Our user interface runs on GAE written in Java w/ GWT.    Love the near-zero-admin of GAE and auto-scaling, etc for the front-end app and real-time meter data analytics that we do.  But for the bulk stuff Google doesn't offer anything that matches the flexibilty of AWS with services like EMR and the ability to spin up VM's to run apps like R, etc.  One size does not fit all requirements.
Reply all
Reply to author
Forward
0 new messages