Latency doubled since Nov 19 -- our Pingdom stats look pretty awful

170 views
Skip to first unread message

Per

unread,
Nov 23, 2012, 7:43:34 PM11/23/12
to google-a...@googlegroups.com

Hi App Engine team,

my Pingdom stats for a JSON request that does nothing at all has doubled (on average) since November 19.

Since your post-mortem for the recent outage mentions you made a poor performance config-change on that day, I'm pretty sure it has to do with it:

* Our app ("small-improvements-hrd") is on a custom domain, and it's on https
* Appstats confirms that these no-op requests do nothing (except create a session) and they take some 100 to 200ms to do so
* The time in the log file is shown as 500 to 2500ms though
* And Pingdom consistently shows 200 to 2500ms as well
* Our average per Pingdom was 700ms until November 19, and now it's 1400ms.



It's as if one extra roundtrip was added somewhere. Maybe the algorithm you described in the post-mortem still does something very inefficient?

Just to be sure, our other Pingdom check (which pulls our website front page) has also doubled. But the no-op is probably more relevant, since it really does nothing at all except return 1 byte.







Cheers,
Per





Per

unread,
Nov 23, 2012, 7:57:37 PM11/23/12
to google-a...@googlegroups.com


And for the big picture, here's our performance stats for the past 1.5 years:





Naturally we *did* make changes to the landing page in those 18 months, so of course it has to be taken with a grain of salt. And we added https 6 months ago.

However, both our Java Profiler and Appstats tell us the landing page has no major flaws. So a large share of the slowdown will have to do with App Engine IMO. Also note that we switched to F2 and then F4 over time, and we always have at least 2 instances running now. While both measures smoothen out the spikyness of the stats (and things go really bad when we move to F2 or just one VM) these expensive actions have almost no effect on the longer term trend either. Latency continues to go up.

Any help or suggestions would of course be greatly appreciated!

Cheers,
Per



Per

unread,
Nov 27, 2012, 6:30:36 PM11/27/12
to google-a...@googlegroups.com
Okay, I'll continue my conversation with myself here.

Someone fixed it! It's not unicorn land yet, but *substantially* faster. Hovering around 800ms according to Pingdom now. Which I guess is fine: Pingdom requests include session creation cost each time, so regular requests would be faster.

Well done Googlers!

I'm wondering what latencies others are seeing if measured from the outside? Anyone care to share their secrets?

Francois Masurel

unread,
Nov 28, 2012, 3:41:14 AM11/28/12
to google-a...@googlegroups.com
I can confirm that too.  It's pretty annoying.

François

Francois Masurel

unread,
Nov 28, 2012, 3:44:56 AM11/28/12
to google-a...@googlegroups.com
Hoping someone from Google will give some explanations about these bad numbers...

Marcel Manz

unread,
Nov 28, 2012, 6:37:33 AM11/28/12
to google-a...@googlegroups.com
We have setup an application in the EU datacenters lately and notice a way better consistent performance compared to US hosted apps (see attached screenshot).

Obviousely there aren't yet that many apps in the EU setup, for which GAE currently seems to provide better performance.
eu_milliseconds.jpg

alex

unread,
Nov 28, 2012, 6:44:54 AM11/28/12
to google-a...@googlegroups.com
Hey Marcel, I'm curious whether you've noticed latency decrease for
users located in EU.

Any change you have a couple charts comparing
US user => EU app vs EU user => EU app ?
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/X20-TBaY21gJ.
>
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Marcel Manz

unread,
Nov 28, 2012, 7:10:32 AM11/28/12
to google-a...@googlegroups.com
Our application only serves as a datastore backend for our frontend instances working in AWS EU-West. Due to the cold start issues and lack of dedicated memcache on GAE we're not running any frontends directly on GAE, hence I can't provide you with any user charts.

What I can tell you however is that we're seeing very consistent performance in making RPC-calls to our GAE backends running in EU for sending-/and retrieving data to/from it. A usual HTTP call from AWS to GAE usually takes around 150 ms to complete (setup the HTTP, transport 1 entity of few KB and store it in the datastore) - as measured by the AWS client. Our backends are running on Java-HRD.


alex

unread,
Nov 28, 2012, 8:22:56 AM11/28/12
to google-a...@googlegroups.com
Huh... I would've expected a little less than 150ms. I have some small
Go and Python apps that respond quite constantly within around 140ms
(Rackspace EU <=> US). A little Java app that does some weird xmlenc
with Bouncy Castle in around 200ms. Probably I'm just doing less stuff
during a request. In fact those go and python apps hit a cache most of
the time.

Thanks anyway!
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/ipjm3ruq4BQJ.

alex

unread,
Nov 28, 2012, 8:49:34 AM11/28/12
to google-a...@googlegroups.com
Sorry, I meant Rackspace EU <=> GAE US.
Reply all
Reply to author
Forward
0 new messages