Connection timeouts on high load

291 views

Skip to first unread message

sephii

unread,

Apr 30, 2015, 5:00:59 AM4/30/15

to django...@googlegroups.com

Hello,

I have an application made with Django 1.7 and the Django Rest
Framework and I'm in the phase of load testing it. My setup is made of
3 servers:

- Nginx + gunicorn
- Gunicorn
- Postgresql + Memcached

Nginx is configured as a loadbalancer so I can add more gunicorn
instances if needed. In the current setup it already balances the load
between its own gunicorn instance and the other server with the
gunicorn instance. The project code is on the loadbalancer and it's
shared on the gunicorn server via an NFS share.

I'm using Locustio to simulate about 4000 users, that make a request
every 30-60 seconds, with about 200 new users per second. The servers
are handling the load until I get to about 2000 users, and then nginx
starts returning 502 and 504 errors with the following in the logs:

upstream timed out (110: Connection timed out) while connecting to upstream
recv() failed (104: Connection reset by peer) while reading response
header from upstream

I know most of the load is caused by the (high) hatch rate, since
every new simulated user goes through an account creation phase, which
I guess is quite slow because of the password hashing. But even when
all users have been created and are just doing regular requests, I
keep getting a failure rate of about 6% (with the same errors as
before), even though the gunicorn workers have plenty of CPU to use. I
checked the I/O (both network and disk) and there's no problem here.

Do you have any advice on where I could look or what tool I could use
to have an idea of what's going on?

Here's my gunicorn config:

bind = "0.0.0.0:8000"
workers = 8
preload_app = True
loglevel = 'info'
pidfile = '/home/myproject/gunicorn.pid'

And my nginx config:

upstream myproject {
server 127.0.0.1:8000 weight=2 fail_timeout=0;
server ip_of_gunicorn_server:8000 fail_timeout=0;
}

proxy_pass http://myproject;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;

proxy_read_timeout 300;
proxy_redirect off;

Thank you for your help,
Sylvain

Tom Evans

unread,

Apr 30, 2015, 7:24:27 AM4/30/15

to django...@googlegroups.com

On Thu, Apr 30, 2015 at 10:00 AM, sephii <sylvaint...@gmail.com> wrote:
> Hello,
>
> I have an application made with Django 1.7 and the Django Rest
> Framework and I'm in the phase of load testing it. My setup is made of
> 3 servers:
>
> - Nginx + gunicorn
> - Gunicorn
> - Postgresql + Memcached
>
> Nginx is configured as a loadbalancer so I can add more gunicorn
> instances if needed. In the current setup it already balances the load
> between its own gunicorn instance and the other server with the
> gunicorn instance. The project code is on the loadbalancer and it's
> shared on the gunicorn server via an NFS share.

I would not do that, but it is unlikely to cause many issues. NFS has
many gotchas.

>
> I'm using Locustio to simulate about 4000 users, that make a request
> every 30-60 seconds, with about 200 new users per second. The servers
> are handling the load until I get to about 2000 users, and then nginx
> starts returning 502 and 504 errors with the following in the logs:

I think you are being wildly optimistic about the performance of a
single app server. If the app server is not overloaded, add more
workers, if it is, add more app servers. Add some sort of monitoring
(I like munin) to all the servers to measure load - cpu, interrupts,
memory, disk io, memcached stats, postgre stats, nginx stats - and see
where your bottleneck is.

On the plus side, if you have 4000 users making requests every 30
seconds, you'll be able to afford many app servers ;)

Cheers

Tom

Reply all

Reply to author

Forward

0 new messages