Reliability

1 view
Skip to first unread message

Richard Watson

unread,
Sep 22, 2009, 3:22:22 AM9/22/09
to Google App Engine
Hi there,

I know the GAE team is exceptionally competent and committed to the
reliability of the platform, so this message is only to understand the
"why" a bit better.

Obviously reliability will increase over time. I assume the end
result will be more reliable than most alternatives, but it'd be nice
to get more insight into the challenges you're facing. I do read the
mail and blog posts you put up - much appreciated.

Questions I can think of now, maybe others have more/better questions:
- Why are there so many system-wide failures? Are they single-point-of-
failure in nature, or do they emerge due to the overall complexity?
- Is there no way to prevent an entire datacenter from becoming
unhealthy?
- If not, does the App Engine have to have committed datacenters, or
could it e.g. run on fewer machines inside datacenters shared with
other services? I would imagine the latter is quicker to move - fewer
resources, and maybe could be located closer to users.
- What are your reliability goals?

Regards,
Richard
Reply all
Reply to author
Forward
0 new messages