*** SECONDED ***
Well done to the team for the shorter than usual maintenance outage,
BUT COME ON GUYS... our customers and their customers get 500s instead
of the outage message - I'm livid.
We try to manage customer expectations about these outages to limit
the damage they do. We send out warning emails to customers, and then
show an outage message explaining what is going on (and that they were
warned about it). Except our customers get server errors, and we look
like LEMONS.
Now we're going to have to go away and write our own capabilities
test, because we can't trust yours. And if we can't trust the
capbilities API, it raises questions about what else we shouldn't be
trusting. Appengine is predicated on trust - users trust Google can
run the service better than we can.
I have no idea how the capabilities API works - it obviously isn't
automated, so I'm guessing someone forgot to change the settings. One
tiny task missed off the checklist - wait, you did have a checklist,
right? See how that one tiny task has got me questioning your whole
operation.
I love the Appengine concept. But for it to succeed, you have to win
trust, and to do that you have to get everything EXACTLY right. If you
don't have a seriously anal project manager running these maintenance
outages, get one.