The nginx box has a public IP, and then it talks to the upstream apaches
using the private network (same switch). We are sustaining a couple
hundred requests/sec.
We've had several issues with the upstreams being counted out by nginx,
causing the "no live upstreams" message in the error log and end users
seeing 502 errors. When this happens the machines are barely being
used, single digit load averages in 16 core boxes.
Initially we were seeing a ton of "connect() failed (110: Connection
timed out)", 1 every couple seconds. I added these to sysctl.conf and
that seemed to solve the problem:
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_fin_timeout = 20
net.ipv4.tcp_max_syn_backlog = 20480
net.core.netdev_max_backlog = 4096
net.ipv4.tcp_max_tw_buckets = 400000
net.core.somaxconn = 4096
Now things generally run fine but every once in awhile we get a huge
burst of "upstream prematurely closed connection while reading response
header from upstream" followed by a "no live upstreams". Again, no
apparent load on the machines involved. These bursts only last a minute
or so. We also still get an occasional "connect() failed (110:
Connection timed out)" but they are far less frequent, perhaps 1 or 2
per hour.
Anyone have recommendations for tuning the networking side to improve
the situation here? These are some of the nginx.conf settings we have
in place, removed the ones that don't seem related to the issue:
worker_processes 4;
worker_rlimit_nofile 30000;
events {
worker_connections 4096;
# multi_accept on;
use epoll;
}
http {
client_max_body_size 200m;
proxy_read_timeout 600s;
proxy_send_timeout 600s;
proxy_connect_timeout 60s;
proxy_buffer_size 128k;
proxy_buffers 4 128k;
keepalive_timeout 0;
tcp_nodelay on;
}
Happy to provide any other details. This is the "ulimit -a" on all
boxes:
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 300000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,220894#msg-220894
_______________________________________________
nginx mailing list
ng...@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
On looking at this again recently, we made two adjustments that
eliminated the connection issues completely:
net.nf_conntrack_max = 262144
net.ipv4.ip_local_port_range = 1024 65000
After making those two changes things became quite stable. However, we
still have massive numbers of TIME_WAIT connections both on the nginx
machine and on the upstream apache machines.
The nginx machine is accepting roughly 1000 requests/s, and has 40,000
connections in TIME_WAIT.
The apache machines are each accepting roughly 250 requests/s, and have
15,000 connections in TIME_WAIT.
We tried setting net.ipv4.tcp_tw_reuse to 1 and restarting networking.
That did not cause any trouble, but also didn't drop the TIME_WAIT
count. I have read that net.ipv4.tcp_tw_recycle is dangerous but we may
try that if others have had good experiences.
Is there a way to have these cleaned up more quickly? My concern is
that even with the expanded ip_local_port_range 40k is cutting it rather
close. Before we bumped ip_local_port_range the whole system was
falling down right as the TIME_WAIT count approached 32k. Is it normal
for nginx to cause this many TIME_WAIT connections? If we're only doing
1k requests/s and nearly exhausting the available port range what would
sites with heavier volume do?
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221550#msg-221550
is what your looking for
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221583#msg-221583
This may cause trouble if multiple clients trying to reach the server
over same NAT, so be careful. I have a negative experience even on ~
10 http reqs/min from NAT machine.
This is what I had read everywhere as well, so I've been hesitant to try
it. We definitely have a lot of users that would be coming at our
servers from the same buliding/NAT.
Has anyone tried using "net.ipv4.tcp_tw_reuse = 1" in a larger
connection count environment before?
I have it enabled now, but it did not seem to have any impact on the
number of TIME_WAIT connections. Does it wait until it actually needs
to reuse one (due to port exhaustion) before doing so? Or should it be
keeping the number lower?
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221587#msg-221587
Be sure to remember to turn on keepalive in your apache config as well.
http://nginx.org/en/docs/http/ngx_http_upstream_module.html
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221646#msg-221646
Are people using that version in production? Is there a release
schedule/estimate anywhere that indicates when that feature might
trickle over to stable?
We're using nginx heavily in a pretty vanilla load balancer role -
upstream of apache servers, ssl termination in nginx, that's it in terms
of features we are using.
It's worked fantastically well overall, we're just flirting with an
ephemeral port limit on a few of our sites (have worked around by
setting up multiple A records pointed at multiple nginx pairs). If we
could get keepalive connections between nginx and the upstream apaches I
believe we would be in very good shape and could keep our configuration
simple moving forward.
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,224118#msg-224118
According to their roadmap -- in 6 days :)
http://trac.nginx.org/nginx/roadmap
Out of curiosity why would it keep it in TIME_WAIT if it is closing the connection?
On Wednesday, January 25, 2012 at 5:14 PM, ggrensteiner wrote:
Have you tried using HTTP 1.1 keepalive connections from nginx toapache? They became available in 1.1.4 and will re-use sockets ratherthen close them and leaving them in TIME_WAITBe sure to remember to turn on keepalive in your apache config as well.Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,221646#msg-221646_______________________________________________nginx mailing list
_______________________________________________
nginx mailing list
ng...@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx
This is excellent news. Also apologies for somehow missing this page,
was exactly what I was looking for.
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,224171#msg-224171
gtuhl Wrote:
-------------------------------------------------------
> Alexandr Gomoliako Wrote:
> --------------------------------------------------
> -----
> > On Tue, Mar 20, 2012 at 11:33 PM, gtuhl
> > <nginx...@nginx.us> wrote:
> > > I'm thinking about giving the development
> > version with the upstream
> > > keepalive over http 1.1 a try.
> > >
> > > Are people using that version in production?
> > Is there a release
> > > schedule/estimate anywhere that indicates
> when
> > that feature might
> > > trickle over to stable?
> >
> > According to their roadmap -- in 6 days :)
> > http://trac.nginx.org/nginx/roadmap
> >
>
> This is excellent news. Also apologies for
> somehow missing this page, was exactly what I was
> looking for.
Posted at Nginx Forum: http://forum.nginx.org/read.php?2,220894,224560#msg-224560