Severe scalability problems - getting http server 500 error under heavy load.

Anonymous Coderrr

unread,

Apr 15, 2009, 2:07:15 AM4/15/09

to Google App Engine

I have a fairly simple app - it looks up a couple of objects from the
google datastore and then creates a page from django template - pure
vanilla.

I wanted to see how my app would perform under heavy load, so I set up
a simulation where 500 virtual web-browsers would attempt to request
my page twice - exactly at the same time.

The results were dismal!! Nearly 50% of the requests resulted in a
"HTTP response code: 500" from GAE - not my application, but
apparently GAE itself.

I check my dashboard logs - no errors from my application. No errors
anywhere I could find.

What can I do? I'm not expecting 500 requests per second, but
certainly maybe 100. The best rate I can get according to the
dashboard is about 4.5 requests per second. The dev server running on
my laptop does better than that!!!

Thanks

Anonymous Coderrr

unread,

Apr 15, 2009, 2:17:35 AM4/15/09

to Google App Engine

additionally, under this load, the time it takes to service a request
grows from 2 seconds per request (no load) to 15 seconds per request
(high load).

T.J. Crowder

unread,

Apr 15, 2009, 3:03:18 AM4/15/09

to Google App Engine

Hi,

Since this is just a test app, can you post the code (perhaps to Pastie
[1], for syntax coloring and the like) and your mechanism for storing
the data? That might help people help you figure out why it isn't
performing as you would hope.

[1] http://pastie.org

FWIW,
--
T.J. Crowder
tj / crowder software / com
Independent Software Engineer, consulting services available

T.J. Crowder

unread,

Apr 15, 2009, 4:03:25 AM4/15/09

to Google App Engine

Hi again,

> ...mechanism for storing the data...

Wow, I found a convoluted way to say that! ;-) Let me try again: And
your test data. E.g., the things people would need to replicate the
result.

Sorry for the poor phrasing...

-- T.J.

Barry Hunter

unread,

Apr 15, 2009, 5:19:25 AM4/15/09

to google-a...@googlegroups.com

One thing that has become apprent is appengine, is designed to scale
under real world usage.

So if your App went from 0/1 users to 500 in a matter of seconds, then
the system wont work well. You need to ramp up the usage slowly.

Even a slashdotting would result in a 'ramp' usage.

Also 500 users coming from once source, might be a bit suspicios, and
appengine could be weary of a DOS attack.

--
Barry

- www.nearby.org.uk - www.geograph.org.uk -

Alex

unread,

Apr 15, 2009, 12:51:04 PM4/15/09

to Google App Engine

I would echo Barry's point -- my guess is that if all of these
requests came from the same IP in a matter of seconds that GAE was
using a security measure to turn away the connections. They probably
also had a similar signature, being from the same script requesting
the same information from the same place going to the same place --
classic DoS.

Alex Foley

On Apr 15, 2:19 am, Barry Hunter <barrybhun...@googlemail.com> wrote:
> One thing that has become apprent is appengine, is designed to scale
> under real world usage.
>
> So if your App went from 0/1 users to 500 in a matter of seconds, then
> the system wont work well. You need to ramp up the usage slowly.
>
> Even a slashdotting would result in a 'ramp' usage.
>
> Also 500 users coming from once source, might be a bit suspicios, and
> appengine could be weary of a DOS attack.
>

Anonymous Coderrr

unread,

Apr 15, 2009, 4:33:54 PM4/15/09

to Google App Engine

Good points.

I rewrote the test so it fires off 20 requests from 20 distinct ip
addresses simultaneously, once a second 10 times.

In effect 20 concurrent requests, once a second.

I had about 15% loss and the request time degradation was there. (2
seconds to fulfill a request on an idle system, 15 seconds under
load).

This still is no where near advertised load rates.

On Apr 15, 2:19 am, Barry Hunter <barrybhun...@googlemail.com> wrote:

> One thing that has become apprent is appengine, is designed to scale
> under real world usage.
>
> So if your App went from 0/1 users to 500 in a matter of seconds, then
> the system wont work well. You need to ramp up the usage slowly.
>
> Even a slashdotting would result in a 'ramp' usage.
>
> Also 500 users coming from once source, might be a bit suspicios, and
> appengine could be weary of a DOS attack.
>

> On 15/04/2009, Anonymous Coderrr <greedw...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I have a fairly simple app - it looks up a couple of objects from the
> > google datastore and then creates a page from django template - pure
> > vanilla.
>
> > I wanted to see how my app would perform under heavy load, so I set up
> > a simulation where 500 virtual web-browsers would attempt to request
> > my page twice - exactly at the same time.
>
> > The results were dismal!! Nearly 50% of the requests resulted in a
> > "HTTP response code: 500" from GAE - not my application, but
> > apparently GAE itself.
>
> > I check my dashboard logs - no errors from my application. No errors
> > anywhere I could find.
>
> > What can I do? I'm not expecting 500 requests per second, but
> > certainly maybe 100. The best rate I can get according to the
> > dashboard is about 4.5 requests per second. The dev server running on
> > my laptop does better than that!!!
>
> > Thanks
>
> --
> Barry
>

> -www.nearby.org.uk-www.geograph.org.uk-- Hide quoted text -
>
> - Show quoted text -

boson

unread,

Apr 15, 2009, 8:54:28 PM4/15/09

to Google App Engine

You need to ramp up your tests over many minutes to allow GAE to spawn
enough instances to handle the traffic. I don't know their exact
algorithm, but I know it takes time to scale up.

> > -www.nearby.org.uk-www.geograph.org.uk--Hide quoted text -

Anonymous Coderrr

unread,

Apr 16, 2009, 1:08:08 PM4/16/09

to Google App Engine

Hi,

I changed my test so it sends a batch of requests every 2 seconds.
The batch size starts out as 1 and increases by 1 every x seconds.
Last run I did I had it increase by 1 every 100 seconds. This ramped
up things slowly.

I still started getting errors around 16-20 requests per second. My
dashboard says I don't get anything above 20 requests per second or
so.

Am I still ramping things up too fast? I'll try an increase of 1
request every 200 seconds a bit later.

Thanks

> > > -www.nearby.org.uk-www.geograph.org.uk--Hidequoted text -
>
> > > - Show quoted text -- Hide quoted text -

Reply all

Reply to author

Forward