Large numbers of TIME_WAIT socket connections running Redis in production

Simon Willison

unread,

Sep 6, 2010, 6:55:32 AM9/6/10

to Redis DB

We're using Redis for monitoring on a high traffic web application.
The app is written in Python/Django and uses the redis-py client
library.

We're seeing enormous numbers of TIME_WAIT connections left over from
our communications with Redis:

$ netstat -a | grep 6379 | grep TIME_WAIT | wc -l
15865

We're calling the r.connection.disconnect() at the end of every HTTP
request, but that hasn't addressed the problem. Any idea what's going
on here?

Thanks,

Simon Willison

Pieter Noordhuis

unread,

Sep 6, 2010, 7:31:26 AM9/6/10

to redi...@googlegroups.com

Hello Simon,

I've just checked the source code for redis-py, but it seems to
properly close the socket when disconnect() is called. When this
happens, the socket is also closed from the side of redis-server. Are
you sure all the connections are properly closed from the Python side
of things?

Cheers,
Pieter

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

Andy Lawman

unread,

Sep 6, 2010, 8:15:25 AM9/6/10

to redi...@googlegroups.com, Redis DB

Simon,

I'm sorry if I'm just stating the obvious, but when a client closes a TCP/IP connection it enters the TIME_WAIT state and must remain there for twice the Maximum Segment Lifetime (MSL). Are you saying that the sockets are in TIME_WAIT state for longer than this? (The original RFC specified the MSL as 2 minutes, but real world values are usually less than this.) Or is the large number of sockets in this state simply a reflection of the large number of Redis connections that you're closing? A Wireshark trace of the traffic would make it clear whether the packet flow is as expected or not.

Regards, Andy.

Simon Willison <si...@simonwillison.net>

To	Redis DB <redi...@googlegroups.com>
cc
bcc
Subject	Large numbers of TIME_WAIT socket connections running Redis in production

Simon Willison <si...@simonwillison.net>

Please respond to : redi...@googlegroups.com

Sent by: redi...@googlegroups.com
06/09/2010 11:55

-- You received this message because you are subscribed to the Google Groups "Redis DB" group. To post to this group, send email to redi...@googlegroups.com. To unsubscribe from this group, send email to redis-db+u...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

IMPORTANT - CONFIDENTIALITY NOTICE - This e-mail is intended only for the use of the addressee/s above. It may contain information which is privileged, confidential or otherwise protected from disclosure under applicable laws. If the reader of this transmission is not the intended recipient, you are hereby notified that any dissemination, printing, distribution, copying, disclosure or the taking of any action in reliance on the contents of this information is strictly prohibited. If you have received this transmission in error, please immediately notify us by reply e-mail or using the address below and delete the message and any attachments from your system.

Amadeus Services Ltd, World Business Centre 3, 1208 Newall Road, Hounslow, Middlesex, TW6 2TA, Registered number 4040059

Salvatore Sanfilippo

unread,

Sep 6, 2010, 10:21:50 AM9/6/10

to redi...@googlegroups.com

Hello Simon,

that's ok, it's a matter of TCP guarantees: probably you are not
reusing connections but opening a new one every time, that is
perfectly fine.

If you are doing this so fast that you are risking using all the
available ports, you should use this:

'echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse' for Linux
and 'sudo sysctl -w net.inet.tcp.msl=1000' for Mac OS X

Cheers,
Salvatore

On Mon, Sep 6, 2010 at 12:55 PM, Simon Willison <si...@simonwillison.net> wrote:

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Jon Stephens

unread,

Sep 6, 2010, 3:29:42 PM9/6/10

to redi...@googlegroups.com

You might want to set tcp_tw_recycle to 1 also. We have similar issues in our environment and spent some time tuning things. You can also check out www.speedguide.net/read_articles.php?id=121 for more info.

teleo

unread,

Sep 7, 2010, 6:09:53 AM9/7/10

to Redis DB

Hi Salvatore,

Would it be possible to document the recommended Linux configuration
for Redis servers (and benchmarking clients) in the Wiki?

Thanks,
Te.

On Sep 6, 10:29 pm, Jon Stephens <wet...@gmail.com> wrote:
> You might want to set tcp_tw_recycle to 1 also. We have similar issues in

> our environment and spent some time tuning things. You can also check outwww.speedguide.net/read_articles.php?id=121for more info.

> "Redis DB" group.>> To post to this group, send email tored...@googlegroups.com.

> >> To unsubscribe from this group, send email to
>

> redis-db+u...@googlegroups.com<redis-db%2Bunsu...@googlegroups.com>
> .>> For more options, visit this group at

>
> http://groups.google.com/group/redis-db?hl=en.
>
>
>
>
>
> > --
> > Salvatore 'antirez' Sanfilippo
> >http://invece.org
>
> > "We are what we repeatedly do. Excellence, therefore, is not an act,
> > but a habit." -- Aristotele
>
> > --
> > You received this message because you are subscribed to the Google Groups

> "Redis DB" group.> To post to this group, send email tored...@googlegroups.com.

> > To unsubscribe from this group, send email to
>

> redis-db+u...@googlegroups.com<redis-db%2Bunsu...@googlegroups.com>
> .> For more options, visit this group at
>
> http://groups.google.com/group/redis-db?hl=en.
>
>
>
>

Salvatore Sanfilippo

unread,

Sep 8, 2010, 10:06:48 AM9/8/10

to redi...@googlegroups.com

On Tue, Sep 7, 2010 at 12:09 PM, teleo <lev....@gmail.com> wrote:
> Hi Salvatore,
>
> Would it be possible to document the recommended Linux configuration
> for Redis servers (and benchmarking clients) in the Wiki?

Yes, adding this to my TODO list.

Some guy in the list with some good suggestion? I can only thing to
the following things:

1) networking tuning, like reusing connections, max number of FDs and so forth
2) filesystem. What's the best filesystem to, for instance, have good
fsync() performances? I think ext4, but different mount options will
change the outcome. Help appreciated about this.
3) copy-on-write: the overcommit policy should be set to 1
4) when using VM and persistence (rdb or AOF) it's a good idea to use
two disks if possible, one for the swap file, one for persistence.

Please help me elaborating this collectively here! So I can write a wiki page :)

Cheers,
Salvatore

Jak Sprats

unread,

Sep 9, 2010, 2:03:09 AM9/9/10

to Redis DB

suggestions:
1.) use SSDs for VM disk, not so important for persistence disk

For high concurrency:
1.) ulimit -n 90000
2.) echo 1024 6500 > /proc/sys/net/ipv4/ip_local_port_range

Multiple instances (SMP):
1.) if running multiple instances I use taskset -c core for each ./
redis-server (but the the bgSave runs on the same core, very lame as
taskset gets inherited to the child - i had to put a schedaffinity in
my code for this ... possible feature)
2.) for real performance it is crucial to not run ./redis-server (or
redis-benchmark) on the core where the software-interrupts are going
to (and they always go to the first cpu in your cpu_affinity - not to
all of them, linux has some progress to make :)
3.) finally if your NIC has multiple RX/TX Qs, setting each Rx/Tx Q's
CPU affinity to a different core (this is voodoo, but its a times 2
speedup)

On Sep 8, 7:06 am, Salvatore Sanfilippo <anti...@gmail.com> wrote:

> On Tue, Sep 7, 2010 at 12:09 PM, teleo <lev.ko...@gmail.com> wrote:
> > Hi Salvatore,
>
> > Would it be possible to document the recommended Linux configuration
> > for Redis servers (and benchmarking clients) in the Wiki?
>
> Yes, adding this to my TODO list.
>
> Some guy in the list with some good suggestion? I can only thing to
> the following things:
>
> 1) networking tuning, like reusing connections, max number of FDs and so forth
> 2) filesystem. What's the best filesystem to, for instance, have good
> fsync() performances? I think ext4, but different mount options will
> change the outcome. Help appreciated about this.
> 3) copy-on-write: the overcommit policy should be set to 1
> 4) when using VM and persistence (rdb or AOF) it's a good idea to use
> two disks if possible, one for the swap file, one for persistence.
>
> Please help me elaborating this collectively here! So I can write a wiki page :)
>
> Cheers,
> Salvatore
>
> --

> Salvatore 'antirez' Sanfilippohttp://invece.org

teleo

unread,

Sep 9, 2010, 3:43:11 AM9/9/10

to Redis DB

Jak, how important is it in to set CPU affinities if running several
instances on Linux?

On Sep 9, 9:03 am, Jak Sprats <jakspr...@gmail.com> wrote:
> suggestions:
> 1.) use SSDs for VM disk, not so important for persistence disk
>
> For high concurrency:

> 1.) ulimit-n 90000

> 2.) echo 1024 6500 > /proc/sys/net/ipv4/ip_local_port_range
>
> Multiple instances (SMP):

> 1.) if running multiple instances I use taskset-c core for each ./

Jak Sprats

unread,

Sep 10, 2010, 12:29:26 AM9/10/10

to Redis DB

Hi Teleo,

this is a hard question. It involves everything from context-switch
overhead to NUMA vs UMA related memory access differences :)

In short, I cant tell you an exact number.

In practice, having 2 redis-servers running on the same core will
degrade the performance of each by 50%+. Linux's scheduler is good,
but its not instantaneously good, so if you get a burst, it will be
confounded if you havent pegged the redis-server to a specific core.

the other problem w/ non-pegged servers is when the server is running
on the same core that is handling the tcp packet flow, this will
result in about a 40% reduction in performance.

But there is the issue, that "taskset" will also taskset the bgSave
child to the same core and this will reduce performance by 50% while
saving. To get around this, you need to change the C-code and make
sure the child pid goes to a different core via the C-call
set_sched_affinity().

Redis is a single process. If you can peg it to a core and dont peg
any other processes to said core and said core is not responsble for
tcp flow control, this is the best setup.

On a real low level, this also ups L1 and L2 cache hit ratios and
takes context switches out of the equation as well as cache coherency
synchronisation overhead.

The real nice thing about this setup is it is predictable, which is
the key to low latency.

- Jak

teleo

unread,

Sep 10, 2010, 5:23:03 AM9/10/10

to Redis DB

Hi Jak,

Thanks a lot for the explanation.

Didier

unread,

Sep 11, 2010, 12:32:44 PM9/11/10

to Redis DB

Hello,

you might also want to play with the (soft) real-time scheduler.
For instance:

int policy;
struct sched_param param;

int rc = pthread_getschedparam( _thread, &policy, &param );
if ( rc )
{
log( LOG_ERR, "ERROR: pthread_getschedparam %d", errno );
return;
}

memset( &param, 0, sizeof(param) );
param.sched_priority = _priority;

rc = pthread_setschedparam( _thread, SCHED_FIFO, &param );
if ( rc )
{
log( LOG_ERR, "ERROR: pthread_setschedparam %d", errno );
return;
}

log( LOG_INFO, "Activated real-time scheduler with priority %d",
_priority );

The setting is inherited by child processes, so it should be called in
the main redis process
with a higher priority than in the forked bgsave child.

Another possibility on a NUMA machine is to bind at the NUMA node
level instead of
binding at the virtual CPU level (with numactl), and keep one core
free per NUMA node
for the bgsave processes. One benefit of using numactl is you can
specify the memory
placement policy as well.

On an Intel box with hyperthreading, an interesting configuration to
test would be to put
one redis instance per physical core (so that main redis and bgsave
child run on the two
"threads" of the core).

Just my two cents ...

Regards,
Didier.

Reply all

Reply to author

Forward