Google app engine hello,
Regards,
- Yoav.
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to
google-appengi...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/google-appengine?hl=en.
It may be a little unfair calling the app engine team dishonest.
Trying to change something in a large organization can be a very
unrewarding experience.
Thank you Gregory,
You say it affected a small portion of users and you later remove issue notification from GAE status page which makes your status history and availability counter look better as well for the new customers comming to GAE and checking status page. Is this honest?
you never know, maybe next time your app will be within that small portion of incidents..
Me again :)
Yes, we are using HRD. Indeed since we moved to it (at least 4-5
months ago), things became stabler... stable enough? Good question.
I have a monitoring SW (running on EC2) making a request every minute.
In the past week this monitoring system gave me at least 2 errors per
day and sometimes more (500 - Internal Server Error... and no, the
request never reached our app. It fails before us). I know it seems
like a very low number, but still I'd like to have one day without an
error. (2 disappointed clients a day for me, as a very young start-up,
can cause some very bad brand reputation).
Regarding the dishonesty issue - It still amazes me that at times I
see an "Investigating" or "Elevated" sign in the system status,
sometimes 45 minutes of a very high Java latency, and the next day "No
significant issues" on the previous day. I, and I'm guessing the rest
of the people here, would really appreciate some kind of
acknowledgment from Google that you have seen the issue, and didn't
just "let it disappear" but rather investigated it, found the cause,
and are performing steps to make sure it does reappear.
Basically what I'm asking, and what I think everyone is asking here,
is to know that there is someone to talk to. If you look back at the
'issues' site, you'll see many 'production' issues from people like me
crying for help during downtime. These issues have gone unanswered
even now, months after the issues. If you'll look at other monitoring
sites you'll see that there is some kind of description of the issues
as they happen. Now, I fully understand that during times of
disruptions you guys are amazingly busy in trying to solve them, but
perhaps just a word from a human-being and not an automated SW to show
that we have someone there helping us, and perhaps, just perhaps an
ETA on a solution?
Thank you again, and sorry for the long posts - It is just frustrating
having nothing to do during down times other then refreshing the
status monitor and prying (last downtime, 45 minutes, I pryed to every
religions' god - anything that can work, I don't discriminate during
down times :) )
- Yoav
We have seen issues where long time to fill requests had high fail rates.
We have also seen that if we set the application settings to have too few
idle instances that we got a LOT of 500 errors.
Do you have your app set to automatic? Or have you clamped your Apps number
of idle instances. How long does a typical request take to fill? How about
a "long" request.
Some of your downtime may be your own fault, not GAE's. Don't know that for
sure, but when my multiple apps don't exhibit a behavior I assume that the
issue isn't system wide, but localized to something a given user is doing.
-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of trilok
Sent: Saturday, November 19, 2011 3:42 PM
To: Google App Engine
Example 1
----------------
http://code.google.com/status/appengine/detail/serving-java/2011/11/15#ae-trust-detail-helloworld-secure-get-java-error_rate
- First look at the general console, you will see no mentioning of
this event.
- OK. Failures happen. Now, why? what is done? Who is working on it?
During the failure, when will it be fixed? Any kind of online
information so I won't feel alone in the dark! Trust me, I'd feel
better knowing that someone brought his pet and it cut the power cord,
and it will take 30 minutes to come back rather then nothing at all!
Example 2
----------------
http://code.google.com/p/googleappengine/issues/detail?id=6274&can=5&colspec=ID%20Type%20Component%20Status%20Stars%20Summary%20Language%20Priority%20Owner%20Log
This was during a 45 minute down-time (BTW, marked again as 'under
investigation' after being marker as 'no significant issues' so kudos
on the remark!) between the 5th and the 6th phone call I've received
from restaurants wanting to leave my system. I think this was after I
tried Tweeting app engine, and right before I was looking for Mr.
Page's personal email address :)
- Is there anyone to talk to during these times or do I HAVE to sign
up for the $500 program just to have someone tell me when will the
problem be solved?
I don't think GAE teams is anything but very very professional. I
don't think I would ever manage to really set up such a wonderful
service. I just think that their customer-relationship needs a bit of
work (which will definitely result in more people joining and
staying!).
- Yoav.
Failure before reaching the app can be that you have no available instances
to handle the request which was why I was asking if you had set max idle
instances. If those are set wrong you will get 500 errors up the wazoo and
never see a single error in your log. I have helped several people who
thought they were saving money by setting those to very low numbers only to
find out they were losing 10% of their traffic to errors.
Thank you for your recommendation!
Unfortunately I would really not want this conversation to move to the
'why I get so many errors' realm, and as you can clearly see the
examples I have given above are system wide daily errors rather then
application specific errors. My original post specified 3 major system
wide disruptions, but the rest of my post talked mainly of GAE team's
handling of these disruptions.
I perfectly know that there are errors caused by my company's system,
and by our configuration - I am sure there are still many things we
can do to make them better (Thank god we are far from 10% error rate
on our account). I have no doubt that 99% of the time errors are
caused by users' bugs rather then GAE's production issues.
I would really want this conversation to move something in Google to
make their handling of the system wide errors as transparent and
honest as possible, to help us communicate as much information to our
clients as we can. (Mr. D'alesandre, are you reading this? Am I
achieving my goal somehow? :) )
I hope you understand why I am 'avoiding' your question, as I simply
think it can take us out of the path of to the discussion I am trying
to advance (Yes, I am limiting my max number of instances, and yes I
know it can cause errors but I almost always have at least 2 idle
instances, and my errors are at times where traffic is not at its
maximum).
- Yoav.
If less than 1% experienced an outage of less than 5 minutes, and those apps
were idle then the number of "lost" pages was likely even lower.
You said they were not meeting SLA, and Not Reporting. But if your numbers
are wrong, then we don't know that is the case. You can't call Google out,
AND not diagnose if the problem is YOUR fault.
You have not read my response.
All of the examples I have given are Google issues since they are
presented in the system status.
I do know when my system is down for my fault and when it is down on
Google's fault. If you dive into the system status you will see that
even by their account they do not meet the SLA (Current availability
is 98.92%. FAR from 99.95%).
Again, my real issue is that Google not only does not meet their SLA,
but that they do not communicate the issues transparently to their
users.
Facts are these:
1. Google do not meet their SLA (again, look at their stated current
availability in the system status).
2. When they are down on their fault, I have no way of knowing when is
the problem going to be fixed, and how are they preventing it in the
future. If I do not know the answers to those questions, how can I
communicate stability to my clients? What do I tell my paying clients
when they ask me 'when is the system going to be up again'?
I am begging you again, do now make it a "who's fault is it that my
system is down" issue! Sadly I already think that you've diverted the
conversation to a point where I won't get any answers now
Bottom line I would like Greg to answer: How can I continue using the
GAE when I do not get transparency regarding Google related down times
(during or after)?
- Yoav.
I don't think you read the graphs you sent.
For 45minutes they were at 30% errors. That only counts as 15 minutes of
downtime. And since it appears that only polls once every 5 min or so I'm
not sure what that means for SLA.
You spoke of downtimes you had that weren't reported elsewhere. And you
said that you know it was GAE's fault because they didn't hit your App. I'm
saying that your "downtime" may have been misconfiguration.
Also even if you are correct that 99.5 isn't being met since they are at
98.2 or something along those lines, it is early in the yearly average.
I'm all for calling people out when they have made mistakes, but my uptime
thus far is north of 99% and is likely north of 99.75% Amazon had nearly a
week of down time, they will be 2 years getting that back to an average of
99.5.
-----Original Message-----
From: google-a...@googlegroups.com
[mailto:google-a...@googlegroups.com] On Behalf Of trilok
Sent: Sunday, November 20, 2011 12:03 AM
To: Google App Engine
Subject: [google-appengine] Re: Google App Engine's Team Dishonesty
Mr Wirtz,
Facts are these:
- Yoav.
--
Again, my problem is that when it is Google's fault, there's no one to
talk to - Can you argue anything relevant to this point please? Amazon
had a week downtime, but THERE WAS SOMEONE TO TALK TO! Whenever Amazon
is done I receive an email from them telling me why, and how they
solved it.
I would be happy to continue the conversation of "is google meeting
their SLA" in private. But read the subject of this email - You can't
argue that on the 11th of November there was a serious issue which is
not represented in the system status, and that we have not been
informed any of!
Regarding the points you have mentioned:
1. For 45minutes, 30% of errors, is to me 45 minutes of downtime.
Since it happens during evening time when thousands of people enter my
site to order lunch, and 30% of them need to refresh the site to see
the welcome page and the rest suffer latency, I see it as 45 minutes.
(I agree, it could be worse, but it is still bad for my brand name).
2. By definition you cannot have the uptime you talked about when
Google themselves say that they have a 98.92% up time! Unless you do
not have traffic during those times, which is cool for you, but we
have traffic to our site all of the time.
- Yoav.
If their honesty is in contention then who's fault your app is down is
relevant.
Yes I can have higher than 98.92 because not everyone was down during those
times. Plus Edge Cache can serve up to 60% of my traffic for short periods
of time, so even during an outage I can be partially up. Also Static files
haven't had any downtime.
As to the "someone you can call" if everything is down, calling someone
doesn't help. Even if you had a premier app, the conversation would go "Hey
we are down" "Yep we lost all of the datacenter you are in" "can you get me
back up?" "yep when everyone else comes back up"
What a support account gets you is things like "Hey I'm on Python 2.7 and
Thread Safe seems to be giving me 2+2 = 5 and it takes 50 seconds to
calculate it" Or the ability to call someone and say "hey I need to alias a
bunch of apps on to each other and merge the data, is there a best practice
for that?"
Would an in dashboard SLA counter be awesome? Yes. But you are delusional
if you think Amazon has ever been forth coming about issues.
I get the distinct impression that your Downtime was higher than "stock"
because you had something misconfigured. I suspect that I hit the nail on
the head when I said your latency and instance settings were wrong, and you
are annoyed because I pointed this out.
And I don't feel bad about destroying your thread (if I have) the tone of
this thread was not appropriate. If you call someone stupid. It's an
insult, but it is not a moral judgment. If you say someone is not
transparent, it is just a statement of fact or opinion with no moral
judgment. When you call someone dishonest you are passing moral judgment.
When you judge someone's morals you should do so without question of them
being wrong.
You will see me "troll". Tell people they are wrong. I have even gone so
far as to say that an employee who has cost the community LOTS of money and
trouble should quite possibly be fired. I have NEVER judged those people on
a personal level. Call me an Ass, a Troll, an Idiot. Great I might do
something to deserve that, but the people from Google we have interacted
with on this list have NEVER done anything deceitful. They have discussed
with us changes in price, policy, features, and terms of service. They have
always been candid with us about legal issues. In terms of support you
could hardly ask for a better group. A group where every email has the
address of the person who sent it, so many of us have interacted off list.
Your thread is off base. If the GAE Team isn't hitting uptime, call them on
it. As you pointed out they fess up to that. Don't like the data on the
status or downtime board, point out that you want higher resolution.
Calling the team dishonest. Where I grew up calling someone a liar is the
kind of thing that gets your nose bloodied.
When I call support I want to know how long will it take them
approximately to solve the problem so I have something to tell my
clients. I'm a bloody paying customer, I deserve that, like my paying
customers deserve it from me. It is clear to me you don't run a B2B
business, so please don't comment on things you do not understand.
Since you fail to read my posts and understand them (read the first
line of this post) I can only ask you to mind your own business and
stop wasting my time with unrelated answers to my posts.
Can we SOMEHOW return to the subject at hand - Transparency? Or is
this a lost cause because of people like this guy?
I am sorry to tell you but I rechecked the configurations and all the
500's I get and non of them are our fault. But lets put that aside.
Now, Brandon, please tell me what will it take to return this
discussion to the transparency issue? Respect my problems with Google,
like people respect yours.
Both of you. Knock. It. Off.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
No. It means that all the OLD M/S apps bring down the average. (aka if you care about uptime don’t use M/S)
Premiere grants you support, AND if you have 300 apps instead of paying the per app minimum fee you pay for usage so if you have 300 apps that would cost $2 a month instead of the $10 a month minimum of ($3000) you pay the $500 + $600 in usage fees. See you save $1900.
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/k3J3OowyrUcJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.