40%+ increase of frontend instance hours over previous day

171 views
Skip to first unread message

Alan Xing

unread,
Dec 14, 2011, 3:15:42 AM12/14/11
to google-a...@googlegroups.com
Dear there,

Our app 'snsanalytics' experienced more than 40% front end CPU hours usages on Dec 13, comparing to Dec 12. Our traffic level on Dec 13 is actually flat to less comparing to that of Dec 12. Except for the abnormal increase of front end CPU hours, there is no any noticeable change on other resource consumption like db write/read.

There is absolutely no change from our side that can explain this increase. We didn't deploy any code in between. We didn't have any heavy lifting operations. Nothing we did is unusual comparing to previous day.

Could any one from GAE team help explain/investigate what happened?

Thanks,
Alan

Brandon Wirtz

unread,
Dec 14, 2011, 3:21:15 AM12/14/11
to google-a...@googlegroups.com

People always assume that steady traffic yields steady pricing.

 

If your app spends most of its time waiting on API’s like accessing the data store, Small changes in the traffic pattern through out the day can change the amount of concurrency and the number of Instances required to serve the traffic.

 

If you had 1000 people show up for 5 minutes at the same time, your cost will be much more than having 1000 people spread out in order through the day.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Alan Xing

unread,
Dec 14, 2011, 3:34:31 AM12/14/11
to google-a...@googlegroups.com
Even though I have no proof that an extremely abnormal traffic pattern change didn't occur. It does look like very unlikely. I closely monitored the status at various time points in the day. The higher cost was spread through the day. Number of front end instances was consistently higher comparing to the previous day.

I'd speculate this is either a GAE change or a GAE error. It coincided with the introduction of front end server classes and SDK 1.6.1.

To be accurate, our app front end server class was correctly set to F1 - the default, not as other people reported in this mailing group.

Brian Quinlan

unread,
Dec 14, 2011, 3:43:43 AM12/14/11
to google-a...@googlegroups.com
Hi Alan,

I took a very quick look at your application.

It looks like your datastore reads and writes did increase by about
10% between those two days. Your latency also increased for about 12
hours (without an increase in CPU usage).

It is possible that a small increase in datastore latency slowed down
your application enough that more instances were needed to service
requests. It could also be that you had some particularly long running
IO-bound tasks. But I don't have any strong evidence of either case
(the latency increase does correlate with the increase in billed
instances but I can't easily say why your latency increased).

Cheers,
Brian

Kenneth

unread,
Dec 14, 2011, 4:27:45 AM12/14/11
to google-a...@googlegroups.com
Are you using the old MS datastore or the HR datastore? If you're using MS then pretty much anything to do with the datastore is totally random, so expect random latency increases which result in higher instance counts and thus higher cost to you, randomly of course. Google will not be fixing these so move to the hr datastore when you can.

Alan Xing

unread,
Dec 14, 2011, 12:27:11 PM12/14/11
to google-a...@googlegroups.com
Yes, we are still using the M/S datastore. We feel that we are not offering mission critical services. These services don't require the HRD level availability. HRD db read/write/store all costs more. I know we could save some CPU hours by using Python 2.7 concurrency feature if we move over to HRD. There is loss and there is gain. Overall, we don't see our cost will reduce by moving M/S to HRD. That is why we are reluctant to make the move.

I have always wondered why GAE doesn't extend Python 2.7 support to M/S. It doesn't seem there is any particular technical blocker. Maybe I'm wrong.

In this random latency case, I again wonder why GAE doesn't plan to fix for M/S servers.

Is the plan to completely phase out M/S servers in some near future?

On Wed, Dec 14, 2011 at 1:27 AM, Kenneth <kenn...@aladdinschools.com> wrote:
Are you using the old MS datastore or the HR datastore? If you're using MS then pretty much anything to do with the datastore is totally random, so expect random latency increases which result in higher instance counts and thus higher cost to you, randomly of course. Google will not be fixing these so move to the hr datastore when you can.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/PsCn4-PDjvUJ.

Jeff Schnitzer

unread,
Dec 14, 2011, 12:57:01 PM12/14/11
to google-a...@googlegroups.com
On Wed, Dec 14, 2011 at 4:43 AM, Brian Quinlan <bqui...@google.com> wrote:
>
> It is possible that a small increase in datastore latency slowed down
> your application enough that more instances were needed to service
> requests.

I hate to dig up an old subject, but this is exactly the biggest
concern I have with GAE's pricing model. When Google screws up the
datastore, revenue goes up.

I don't think anyone at Google is so nefarious that they would
deliberately increase datastore latency, but in the long run behavior
follows incentives. And this seems like a strong incentive *not* to
fix latency issues. In the made-for-tv movie script, some executive
deliberately lets latency rise in the last few days of the quarter
just to make his revenue targets and get a bonus. And because of
this, some otherwise-friendly, normal schizophrenic's medication order
fails to process and he goes on a murderous rampage in NYC. And Sam
Waterston convenes a grand jury....ok ok, so I've been watching too
much Law And Order.

It would make me a lot happier if "time spent waiting for Google
services which we are already paying for" (ie, datastore operations)
was subtracted from instance hours we pay for. What this says is
datastore latency is Google's problem, not my problem. It means that
GAE engineers will be always be working extra hard to keep latency
down - because low latency improves Google's bottom line rather than
inflating it.

Jeff

Brian Quinlan

unread,
Dec 14, 2011, 3:14:13 PM12/14/11
to google-a...@googlegroups.com
On Thu, Dec 15, 2011 at 4:27 AM, Alan Xing <alan...@gmail.com> wrote:
> Yes, we are still using the M/S datastore. We feel that we are not offering
> mission critical services. These services don't require the HRD level
> availability. HRD db read/write/store all costs more.

What do you mean? The dollar cost for HRD is the same as MS.

>I know we could save
> some CPU hours by using Python 2.7 concurrency feature if we move over to
> HRD. There is loss and there is gain. Overall, we don't see our cost will
> reduce by moving M/S to HRD. That is why we are reluctant to make the move.
>
> I have always wondered why GAE doesn't extend Python 2.7 support to M/S. It
> doesn't seem there is any particular technical blocker. Maybe I'm wrong.
>
> In this random latency case, I again wonder why GAE doesn't plan to fix for
> M/S servers.

We do have a fix - the HRD :-) Seriously, to make MS more consistent
and reliable, you'd need to synchronously replicate the data across
machines and data centers and that is exactly what MRD.

Cheers,
Brian

Kenneth

unread,
Dec 14, 2011, 3:51:28 PM12/14/11
to google-a...@googlegroups.com
HRD is great and all (or at least it couldn't worse than MS), I'm going to try and take the leap this weekend, but you haven't provided us with a complete migration tool, which sucks. There would be a lot more credibility if you did, you could even announce a sunset period for MS and save a few SREs high bleed pressure.


Alan Xing

unread,
Dec 14, 2011, 4:20:01 PM12/14/11
to google-a...@googlegroups.com, Ward Supplee
The dollar cost of HRD and MS are the same? It was a surprise for me. I always had the impression HRD costed way more. Now I could not find that document except from Google search engine snapshot. As of Dec 10, 2011, GAE doc still mentioned HRD "uses approximately three times the storage and CPU cost of the master/slave option". Please see attached snapshot.

Regardless, I'm very happy to know that HRD is not costing more than MS any more. I will seriously think about to migrate to HRD soon.

As of this moment today, we are still seeing way much higher front end instance hours than I would have expected before yesterday's spike. I'm not convinced by the explanations I have received so far. I'd think it is good to be transparent about pricing. Choosing a platform is a long term relationship, transparency can help stabilize the relationship.
Datastore Comparison.jpg

Brian Quinlan

unread,
Dec 14, 2011, 4:32:21 PM12/14/11
to google-a...@googlegroups.com, Ward Supplee
On Thu, Dec 15, 2011 at 8:20 AM, Alan Xing <alan...@gmail.com> wrote:
> The dollar cost of HRD and MS are the same? It was a surprise for me. I
> always had the impression HRD costed way more. Now I could not find that
> document except from Google search engine snapshot. As of Dec 10, 2011, GAE
> doc still mentioned HRD "uses approximately three times the storage and CPU
> cost of the master/slave option". Please see attached snapshot.
>
> Regardless, I'm very happy to know that HRD is not costing more than MS any
> more. I will seriously think about to migrate to HRD soon.

When HRD was launched it did cost 3x more than MS (since it costs
Google at least 3x more to do the replication). But the pricing has
later adjusted to be the same as MS.

Cheers,
Brian

Greg

unread,
Dec 14, 2011, 5:28:09 PM12/14/11
to Google App Engine
@Alan and @Kenneth - I too was hesitant about moving to the HRD, but I
took the plunge five months ago. It's basically how the datastore
should be - I haven't seen ANY downtime, latency is much more
predictable, it "just works". You're going to love it, trust me.

HRD is NOT for "mission critical" only, it is for any app that you
care about at all. The only use for MS is now old apps that aren't
used - everything else should be migrated as a matter of the highest
priority.

Google haven't deprecated MS, probably to avoid yet another PR
backlash. But I'm sure privately they want to switch everyone over to
HRD as soon as possible, because it causes so many support headaches.
And it's not just Google - I'm sure you'll have noticed that you get
very little help from this group as soon as you admit you use MS.
Basically if you don't care enough about your app to migrate it, why
should we care about it either?

So to sum up, MIGRATE NOW! Make sure you understand eventual
consistency (see http://neogregious.blogspot.com/2011/04/migrating-app-to-high-replication.html
and http://neogregious.blogspot.com/2011/06/high-replication-migration-lessons.html),
and then GO FOR IT!

working

unread,
Dec 14, 2011, 9:57:53 PM12/14/11
to Google App Engine
Hi Brian,

The last step of the migration is "alias the app over to the HRD app".

I am wondering:
whether oldapp.appspot.com will be pointed to newapp-hrd.appspot.com?
whether old...@appspot.com will be pointed to newap...@appspot.com
for xmpp?
whether postm...@oldapp.appspotmail.com will be pointed to
postm...@newapp-hrd.appspotmail.com?
Do we have to go to Google Apps and manually switch domain names
www.oldapp.com to newapp-hrd.appspot.com?

Thanks,
coronin


On Dec 14, 1:32 pm, Brian Quinlan <bquin...@google.com> wrote:


> On Thu, Dec 15, 2011 at 8:20 AM, Alan Xing <alanx...@gmail.com> wrote:
> > The dollar cost of HRD and MS are the same? It was a surprise for me. I
> > always had the impression HRD costed way more. Now I could not find that
> > document except from Google search engine snapshot. As of Dec 10, 2011, GAE
> > doc still mentioned HRD "uses approximately three times the storage and CPU
> > cost of the master/slave option". Please see attached snapshot.
>
> > Regardless, I'm very happy to know that HRD is not costing more than MS any
> > more. I will seriously think about to migrate to HRD soon.
>
> When HRD was launched it did cost 3x more than MS (since it costs
> Google at least 3x more to do the replication). But the pricing has
> later adjusted to be the same as MS.
>
> Cheers,
> Brian
>
>
>
>
>
>
>
> > As of this moment today, we are still seeing way much higher front end
> > instance hours than I would have expected before yesterday's spike. I'm not
> > convinced by the explanations I have received so far. I'd think it is good
> > to be transparent about pricing. Choosing a platform is a long term
> > relationship, transparency can help stabilize the relationship.
>

> > On Wed, Dec 14, 2011 at 12:14 PM, Brian Quinlan <bquin...@google.com> wrote:

> >> > On Wed, Dec 14, 2011 at 1:27 AM, Kenneth <kennet...@aladdinschools.com>

Alan Xing

unread,
Dec 15, 2011, 2:04:58 AM12/15/11
to google-a...@googlegroups.com
Greg, thanks for offering reference on your HRD experience. 

Now that HRD is not costing more, we sure will move over soon. I hope GAE team has more of such consideration for the dev community. As far as I can tell, the GAE dev community is still considerably small and fragile, especially comparing to EC2.

Stefano Ciccarelli

unread,
Dec 15, 2011, 11:55:13 AM12/15/11
to google-a...@googlegroups.com
I can only agree.
This is the biggest problem of this pricing model.

Reply all
Reply to author
Forward
0 new messages