Severe scalability problems - getting http server 500 error under heavy load.

75 views
Skip to first unread message

Anonymous Coderrr

unread,
Apr 15, 2009, 2:07:15 AM4/15/09
to Google App Engine
I have a fairly simple app - it looks up a couple of objects from the
google datastore and then creates a page from django template - pure
vanilla.

I wanted to see how my app would perform under heavy load, so I set up
a simulation where 500 virtual web-browsers would attempt to request
my page twice - exactly at the same time.


The results were dismal!! Nearly 50% of the requests resulted in a
"HTTP response code: 500" from GAE - not my application, but
apparently GAE itself.

I check my dashboard logs - no errors from my application. No errors
anywhere I could find.

What can I do? I'm not expecting 500 requests per second, but
certainly maybe 100. The best rate I can get according to the
dashboard is about 4.5 requests per second. The dev server running on
my laptop does better than that!!!

Thanks

Anonymous Coderrr

unread,
Apr 15, 2009, 2:17:35 AM4/15/09
to Google App Engine
additionally, under this load, the time it takes to service a request
grows from 2 seconds per request (no load) to 15 seconds per request
(high load).

T.J. Crowder

unread,
Apr 15, 2009, 3:03:18 AM4/15/09
to Google App Engine
Hi,

Since this is just a test app, can you post the code (perhaps to Pastie
[1], for syntax coloring and the like) and your mechanism for storing
the data? That might help people help you figure out why it isn't
performing as you would hope.

[1] http://pastie.org

FWIW,
--
T.J. Crowder
tj / crowder software / com
Independent Software Engineer, consulting services available

T.J. Crowder

unread,
Apr 15, 2009, 4:03:25 AM4/15/09
to Google App Engine
Hi again,

> ...mechanism for storing the data...

Wow, I found a convoluted way to say that! ;-) Let me try again: And
your test data. E.g., the things people would need to replicate the
result.

Sorry for the poor phrasing...

-- T.J.

Barry Hunter

unread,
Apr 15, 2009, 5:19:25 AM4/15/09
to google-a...@googlegroups.com
One thing that has become apprent is appengine, is designed to scale
under real world usage.

So if your App went from 0/1 users to 500 in a matter of seconds, then
the system wont work well. You need to ramp up the usage slowly.

Even a slashdotting would result in a 'ramp' usage.

Also 500 users coming from once source, might be a bit suspicios, and
appengine could be weary of a DOS attack.
--
Barry

- www.nearby.org.uk - www.geograph.org.uk -

Alex

unread,
Apr 15, 2009, 12:51:04 PM4/15/09
to Google App Engine
I would echo Barry's point -- my guess is that if all of these
requests came from the same IP in a matter of seconds that GAE was
using a security measure to turn away the connections. They probably
also had a similar signature, being from the same script requesting
the same information from the same place going to the same place --
classic DoS.

Alex Foley

On Apr 15, 2:19 am, Barry Hunter <barrybhun...@googlemail.com> wrote:
> One thing that has become apprent is appengine, is designed to scale
> under real world usage.
>
> So if your App went from 0/1 users to 500 in a matter of seconds, then
> the system wont work well. You need to ramp up the usage slowly.
>
> Even a slashdotting would result in a 'ramp' usage.
>
> Also 500 users coming from once source, might be a bit suspicios, and
> appengine could be weary of a DOS attack.
>

Anonymous Coderrr

unread,
Apr 15, 2009, 4:33:54 PM4/15/09
to Google App Engine
Good points.

I rewrote the test so it fires off 20 requests from 20 distinct ip
addresses simultaneously, once a second 10 times.

In effect 20 concurrent requests, once a second.

I had about 15% loss and the request time degradation was there. (2
seconds to fulfill a request on an idle system, 15 seconds under
load).

This still is no where near advertised load rates.


On Apr 15, 2:19 am, Barry Hunter <barrybhun...@googlemail.com> wrote:
> One thing that has become apprent is appengine, is designed to scale
> under real world usage.
>
> So if your App went from 0/1 users to 500 in a matter of seconds, then
> the system wont work well. You need to ramp up the usage slowly.
>
> Even a slashdotting would result in a 'ramp' usage.
>
> Also 500 users coming from once source, might be a bit suspicios, and
> appengine could be weary of a DOS attack.
>
> On 15/04/2009, Anonymous Coderrr <greedw...@gmail.com> wrote:
>
>
>
>
>
>
>
> >  I have a fairly simple app - it looks up a couple of objects from the
> >  google datastore and then creates a page from django template - pure
> >  vanilla.
>
> >  I wanted to see how my app would perform under heavy load, so I set up
> >  a simulation where 500 virtual web-browsers would attempt to request
> >  my page twice - exactly at the same time.
>
> >  The results were dismal!!  Nearly 50% of the requests resulted in a
> >  "HTTP response code: 500" from GAE - not my application, but
> >  apparently GAE itself.
>
> >  I check my dashboard logs - no errors from my application.  No errors
> >  anywhere I could find.
>
> >  What can I do?  I'm not expecting 500 requests per second, but
> >  certainly maybe 100.  The best rate I can get according to the
> >  dashboard is about 4.5 requests per second.  The dev server running on
> >  my laptop does better than that!!!
>
> >  Thanks
>
> --
> Barry
>
> -www.nearby.org.uk-www.geograph.org.uk-- Hide quoted text -
>
> - Show quoted text -

boson

unread,
Apr 15, 2009, 8:54:28 PM4/15/09
to Google App Engine
You need to ramp up your tests over many minutes to allow GAE to spawn
enough instances to handle the traffic. I don't know their exact
algorithm, but I know it takes time to scale up.
> > -www.nearby.org.uk-www.geograph.org.uk--Hide quoted text -

Anonymous Coderrr

unread,
Apr 16, 2009, 1:08:08 PM4/16/09
to Google App Engine
Hi,

I changed my test so it sends a batch of requests every 2 seconds.
The batch size starts out as 1 and increases by 1 every x seconds.
Last run I did I had it increase by 1 every 100 seconds. This ramped
up things slowly.

I still started getting errors around 16-20 requests per second. My
dashboard says I don't get anything above 20 requests per second or
so.

Am I still ramping things up too fast? I'll try an increase of 1
request every 200 seconds a bit later.

Thanks
> > > -www.nearby.org.uk-www.geograph.org.uk--Hidequoted text -
>
> > > - Show quoted text -- Hide quoted text -
Reply all
Reply to author
Forward
0 new messages