Google Groups

Re: [google-appengine] If your bill shoots up due to increased latency, you may not be refunded the charges incurred


Jon McAlister Jun 14, 2012 9:54 AM
Posted in group: Google App Engine
I'm an engineer on the App Engine team, and work as the TL for
the scheduler, appserver, and other serving infrastructure. I am
also closely involved with production and reliability issues. I
can offer some perspective here.

Jeff Schnitzer:
} "Degraded service == more profitable" is a perverse
} incentive, and will eventually produce undesirable development
} priorities and turn happy customers into angry customers.

This captures the issue well. It may seem at first like we've got
an incentive, but in truth the second-order effects are a much
stronger incentive for us. We care very much about predictable
reliable performance and continually work to improve here.

stevep:
} Forgone Scheduler improvements == more profitable also...
} Engineering: "If we make this Scheduler change, we can halve the
total number of instances without any hardware investment!!!!!!!!"
} Finance: "Remember you work for the shareholders, not the developers."

I've personally been involved in five projects over the last year
in which we shipped scheduler improvements which reduced the
number of instances needed by an app to run a workload. As with
the above point, it may seem at first like we've got a bad
incentive, but in truth the second-order effects override it. We
want App Engine and the scheduler to get more efficient over
time, and prioritize several projects internally to that effect.
It's in our best interest to see predictable and reliable
behaviors, greater efficiency, and higher performance. It's sort
of a never-ending project for us, as we have to keep up with
infrastructure changes, correctly adapt to new features, keep
complexity down, support new runtimes and computational models,
and the whole time try to make things look effortless (i.e.
automatic, with little-to-no feedback needed from the developer).
It's very important to us.

Jeff Schnitzer:
} Is there still a hard limit of 10 threads?

Yes, but probably not for the reason you expect. The primary
issue we run into is memory management. If we raised the default
to 100, many apps would then see out-of-memory deaths (more than
they do now), and these deaths show up differently for
python/java/go. The right path forward is more intelligent
algorithms wrt memory, providing configurability, and so on. This
is an example of the kinds of projects we work on for the
scheduler, but as with any team we have to prioritize our
projects. I'd recommend filing this (or any other desired
scheduler enhancements) on the public issue tracker so they can
get feedback/data/votes.


On Thu, Jun 14, 2012 at 8:48 AM, stevep <pros...@gmail.com> wrote:
> Forgone Scheduler improvements == more profitable also...
> Engineering: "If we make this Scheduler change, we can halve the total
> number of instances without any hardware investment!!!!!!!!"
> Finance: "Remember you work for the shareholders, not the developers."
>
> On Wednesday, June 13, 2012 11:18:15 PM UTC-7, nischalshetty wrote:
>>>
>>>  "Degraded service == more profitable" is a perverse
>>> incentive, and will eventually produce undesirable development
>>> priorities and turn happy customers into angry customers.
>>
>>
>> Rightly said. The way things are right now, this is the exact thing that
>> comes to a customers mind. Many are suggesting optimizing, max idle
>> instances and stuff, but when the latency goes from 300ms to like 20s (it
>> did in our case), there's hardly anything on your end that you can do
>> (without making your apps users angry).
>>
>>
>>
>>
>> On Thursday, June 14, 2012 7:47:24 AM UTC+5:30, Jeff Schnitzer wrote:
>>>
>>> On Wed, Jun 13, 2012 at 3:49 PM, alex wrote:
>>> >
>>> > If by Non-GAE systems you mean mostly IaaS and stuff like Beanstalk
>>> > then of
>>> > course they are "less sensitive", but again we're talking about
>>> > different
>>> > service levels (hence different approaches in solving a specific
>>> > problem/challenge). Others (e.g. Heroku) simply make you set a fixed #
>>> > of
>>> > instances. Well, that's one of the reasons I prefer GAE.
>>>
>>> The fact that normal appservers can have hundreds of threads blocking
>>> on reads and GAE apparently can't doesn't really seem related to the
>>> "service levels".
>>>
>>> I prefer GAE too, but this means I want to congratulate the team for
>>> the many good things they do and hold their feet to the fire when they
>>> do bad things.  "Degraded service == more profitable" is a perverse
>>> incentive, and will eventually produce undesirable development
>>> priorities and turn happy customers into angry customers.  From a game
>>> design perspective, this is a bad way to structure a business
>>> relationship.
>>>
>>> There were problems with the original pricing model, and now we have
>>> problems with the new one.  Let's talk about it.
>>>
>>> >> It's possible that Google can solve this problem entirely by getting
>>> >> better concurrency out of instances.  Is there still a hard limit of
>>> >> 10 threads?
>>> >
>>> > Yes. BWT, 99% of such "problems" I've seen could be effectively solved
>>> > with
>>> > push or pull queues.
>>>
>>> This doesn't really address the problem.  Queues are serviced by
>>> instances.  Datastore latency will still cause extra instance spinups.
>>>
>>> Jeff
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/Oixl790VrV0J.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.