Thousands of connect/close connections seems to cause "Redis server went away"

3,189 views
Skip to first unread message

Dennis McEntire

unread,
Jan 14, 2013, 9:29:55 PM1/14/13
to redi...@googlegroups.com
So we have some PHP code using the nicolasff/phpredis connector to a local redis server running on a pretty fast system with 4GB of ram and an SSD HD. There's less than 400,000 records in the database for now.

When we run a piece of code that does thousands of connections, it will consistently get a "Redis server went away" after about 4,000 records being inserted.

We have two ways of doing this, one that works, and one that doesn't. We're trying to figure out why the latter doesn't work and drops the above error.

The first way that works:
1. We connect to the server.
2. We process 40,000 database inserts.
3. We close the server connection.
That way works every time and does what it's supposed to.

The second way that fails:
1. We run through several recursive loops in the program, and each function has it's own server connect, process, and close steps.
2. After about 4,000 inserts, the program stops and errors out with the Redis server went away errors.

I know some people will say to do the first method that works, but this might be an indicator of a larger problem that we want to resolve before going into production. The program only takes a few seconds to complete, so that means that within a time frame of a few seconds we are doing thousands of connect/disconnect statements. Maybe there is a timeout that occurs after a connection is closed that takes a second or two to complete?

Any advice would be appreciated. I saw in our redis.conf file there is a line about how many clients can connect and it's commented out, but the line says 10,000.

Should I try uncommenting that and increasing that number to say, 50,000?

Thank you for any help,

Dennis

Jay A. Kreibich

unread,
Jan 14, 2013, 9:58:11 PM1/14/13
to redi...@googlegroups.com
On Mon, Jan 14, 2013 at 06:29:55PM -0800, Dennis McEntire scratched on the wall:
Check the network stack on the client and server ("netstat -a -n" works).
I'm guessing you have a whole ton of connections in the TIME_WAIT state.
If you slam through thousands of connections very quickly-- especially
between the same two IP addresses-- it is possible for the system to
simply run out of available ephemeral ports. On BSD systems, there
are only about 4000 ephemeral ports available.

The only thing you can do is: A) pool connections so you're not
opening/closing the connections so quickly B) tune the stack to
raise the number of ephemeral ports C) tune the stack to reduce the
TIME_WAIT state.

Chances are good B and C won't work unless you can make 40K ports
available, and you'll still have issues if the data set gets bigger.

-j

--
Jay A. Kreibich < J A Y @ K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
but showing it to the wrong people has the tendency to make them
feel uncomfortable." -- Angela Johnson

Dennis McEntire

unread,
Jan 15, 2013, 7:57:59 PM1/15/13
to redi...@googlegroups.com


Jay,

Thank you so much for the detailed response and great information. I checked into it and it appears we have 28K+ ports available:

[root@redis1 ~]# cat /proc/sys/net/ipv4/ip_local_port_range
32768   61000

So I'll look into the TCP_WAIT configuration to see how it's setup, and also we'll need to improve our code to optimize the connections to the server.

Thanks again,

Dennis

Dennis McEntire

unread,
Jan 15, 2013, 8:12:53 PM1/15/13
to redi...@googlegroups.com


Just a quick update -- it appears that our TCP_WAIT is set at 60 seconds, probably the default for our distro of Linux, Mandriva.

Anyway, changed it to 5 seconds and it appears that this resolved our issues. The program that was making all the connections was being run multiple times within a minute, tying up our ephemeral ports, eventually using up all 28K+ of them.

We ran this command: echo "5" > /proc/sys/net/ipv4/tcp_fin_timeout


Dennis




On Monday, January 14, 2013 6:29:55 PM UTC-8, Dennis McEntire wrote:
Reply all
Reply to author
Forward
0 new messages