Re: Redis connection timeouts

15,304 views
Skip to first unread message

Felix Gallo

unread,
Feb 4, 2013, 11:32:39 AM2/4/13
to redi...@googlegroups.com
You're probably running out of something at the OS level; check your kernel logs, it's possible there's something informative in there.  What does `top` say about your use of swap (hit f and turn on the swap column)?  What does your ops team say about the socket and file descriptor limit configuration?  What does /etc/sysctl.conf look like?

By the by, 800 connections is probably not best practice unless you have, say, 800 app server instances.  One of my busiest redis instances has 26 connections.  It's a better pattern to allocate a connection per app server than a connection per user or connection per session.  If your app is connecting users directly to the database then your architecture is probably causing you issues.

F.

On Mon, Feb 4, 2013 at 8:10 AM, <robert...@gmail.com> wrote:
Hi all,
We're struggling with an issue where connections from our application to Redis are regularly timing out.  The Ruby Redis gem responds with: Redis::TimeoutError: Connection timed out.  The errors come from both our Rails application and our Resque workers.

We're running Redis 2.4.17 with a 6.5 GB DB.  In the redis.conf for our Redis Master, timeout is set to 0.

Initially when we started having these issues, we shut off all persistence on the Master and ran with persistence on the Slave only.  This prevented timeout issues for a few weeks, but we've started seeing timeouts again.

We have a 120 Resque workers connected to the Redis DB and then a varying number of threads from Passenger running our Rails app.  We're seeing up to 800 clients connected to Redis, but the timeouts seem to happen regardless of the number of clients - we've seen them happen with only a few hundred connections.

This issue - https://github.com/mperham/sidekiq/issues/517 – indicated that the issue might be due to swapping.  We have plenty of free RAM and load is rarely above 1.

We largely see the timeouts occurring on an hourly basis, which would lead to suspecting some kind of scheduled job.  We are running scheduled tasks via Resque Scheduler, but in reviewing these tasks, haven't identified anything running deletes against the DB (which we've seen produce Redis timeouts when doing DB maintenance in the past).

Based on this issue - http://code.google.com/p/redis/issues/detail?id=500 - we thought that perhaps is was conflict between concurrent connections, so we increased the open file limit for the user running the Redis app on our Redis master, from 1024 to 2048.  This didn't have any effect.

Our Redis info is below.

Our application is growing quickly and the timeouts are causing concern from our operations team about whether Redis can stand up to our growth.  Additionally, we're increasing the number of servers and Resque workers that we have running to handle growing load on our site, so this has us concerned that the issue is only going to worsen in severity.

Any suggestions of where to look for the root cause of this issue?

Thanks!

Rob Shedd


$ ./redis-cli info
redis_version:2.4.17
redis_git_sha1:00000000
redis_git_dirty:0
arch_bits:64
multiplexing_api:epoll
gcc_version:4.1.2
process_id:4456
run_id:efe123235fabcd130f8558ac71f03e9398031172
uptime_in_seconds:2159271
uptime_in_days:24
lru_clock:1779123
used_cpu_sys:201364.34
used_cpu_user:107124.49
used_cpu_sys_children:5.86
used_cpu_user_children:28.78
connected_clients:516
connected_slaves:1
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
used_memory:7021963088
used_memory_human:6.54G
used_memory_rss:7210254336
used_memory_peak:7057749736
used_memory_peak_human:6.57G
mem_fragmentation_ratio:1.03
mem_allocator:jemalloc-3.0.0
loading:0
aof_enabled:0
changes_since_last_save:1981202607
bgsave_in_progress:0
last_save_time:1357809314
bgrewriteaof_in_progress:0
total_connections_received:52464424
total_commands_processed:20682744678
expired_keys:1124399
evicted_keys:0
keyspace_hits:354418470
keyspace_misses:199629652
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:273343
vm_enabled:0
role:master
slave0:10.97.18.18,6379,online
db0:keys=10492919,expires=3392
db1:keys=222,expires=0

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Felix Gallo

unread,
Feb 5, 2013, 11:58:56 AM2/5/13
to redi...@googlegroups.com
Redis doesn't have any particularly interesting limits except for some tricky issues around memory during persistence.  The surprise limiting factors, in my experience, have arisen at the operating system / TCP layer whenever the performance parameters of Redis have spurred people to get too creative.  

I've seen excess Redis-related ebullience cause:

* simple filehandle starvation
* simple socket port starvation
* socket starvation due to excess lingering and bad client shutdown
* overflows of the netfilter connection tracking tables (net.ipv4.netfilter.ip_conntrack_max)

and probably several other starvation scenarios I'm forgetting.  Your sysctl.conf doesn't seem to armor against those very much; you might check into increasing some of the default values in netfilter and so forth.

Periodic hangs are generally associated with persistence (and the underlying disk being too slow to accept writes; especially frequent with virtualized disks); you might also make sure that you've restarted the server after you've made changes to your config file to turn off persistence.

It's also pretty easy for ill behaved code to blow up a redis instance, especially a 32 bit redis instance, by, e.g., logging on every redis write into redis, including the log itself (or something equally dumb like that).  Because redis writes are so very fast, you can fill the memory space very rapidly.  You might check to see if there's a pattern with a particular client or with a particular workload when you're seeing the slowdowns.

F.

On Tue, Feb 5, 2013 at 1:04 AM, <robert...@gmail.com> wrote:
I already checked the kernel log (/var/log/messages) with no luck.

Pretty sure it's not swap:

top - 18:40:21 up 132 days,  1:11,  1 user,  load average: 0.14, 0.17, 0.17
Tasks: 117 total,   1 running, 116 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.8%us,  1.3%sy,  0.0%ni, 96.0%id,  0.0%wa,  0.3%hi,  1.5%si,  0.0%st
Mem:  32959864k total, 15864288k used, 17095576k free,  3805836k buffers
Swap:  2096472k total,     8032k used,  2088440k free,  4655836k cached
 
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  SWAP COMMAND                                                                                                                                                                   
 4456 betdash   15   0 6934m 6.7g  820 S 21.0 21.4   5219:04  38m redis-server 


The server Redis is running on has 8 cores and 32 GB RAM.  So, plenty of power.  It is a VMWare instance.


We're on CentOS 5.5.  This is sysctl.conf which is fairly standard across our stack:
 
# puppet sysctl module
#
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
# sysctl.conf(5) for more details.
 
net.ipv4.ip_forward = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
net.ipv4.tcp_max_syn_backlog = 1280
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.icmp_echo_ignore_broadcasts = 1
net.ipv4.icmp_ignore_bogus_error_responses = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.tcp_timestamps = 0
 
# ensure code dumps can never be made by setuid programs
fs.suid_dumpable = 0
 
# BetDash Redis Setting to resolve memory allocation error
vm.overcommit_memory = 1


We'll look into limiting the number of concurrent connections within our architecture.  Is there an upper limit that we should be wary of here? 

Thanks for the thoughts.

Salvatore Sanfilippo

unread,
Feb 8, 2013, 5:00:01 AM2/8/13
to Redis DB
On Tue, Feb 5, 2013 at 5:58 PM, Felix Gallo <felix...@gmail.com> wrote:
> It's also pretty easy for ill behaved code to blow up a redis instance,
> especially a 32 bit redis instance, by, e.g., logging on every redis write
> into redis, including the log itself (or something equally dumb like that).
> Because redis writes are so very fast, you can fill the memory space very
> rapidly. You might check to see if there's a pattern with a particular
> client or with a particular workload when you're seeing the slowdowns.


Because of this finally 2.6 comes with a default limit of 3GB in data
size if 32 bit arch is detected.
This somewhat should improve the experience.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter

Yiftach Shoolman

unread,
Feb 8, 2013, 5:48:38 AM2/8/13
to redi...@googlegroups.com
I would also suggest to check connections/sec on your Redis DB during your peak periods. 
This usually creates much more load on the system than managing many concurrent connections.
And if this is your issue, use persistent connections / connection pooling approach, it always provides better performance. 

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.





--

Yiftach Shoolman
+972-54-7634621

robert...@gmail.com

unread,
Feb 15, 2013, 11:27:47 AM2/15/13
to redi...@googlegroups.com
Felix,
Thanks for the suggestions.

One of our sysadmins has investigated and responded:

We're not using iptables on the Redis machine, so we've discounted this.

If there was kernel resource starvation, I would expect to see this reflected in the logs. If such starvation was happening at such a regular and well-defined interval (i.e. on the hour) it should in theory be possible to determine the trigger and remediate accordingly. In the absence of anything in the logs, it is difficult to confirm or deny such starvation.

What did you see when you've seen starvation?

We have all persistence turned off on the machine, and the machine has plenty of RAM and CPU - load is below 1 pretty much all of the time, and no sign of the box swapping.  Initially, we saw timeouts and removed persistence (moving persistence to a slave) and that fixed the problem for a time.  But then it returned.

We're working to cut our number of Redis connections, but we're seeing the timeout errors at all levels of numbers of connections - it doesn't seem to be related to a threshold.  We've moved our scheduled jobs around to try to reduce the number of processes connecting at any one point in time, but that hasn't had any effect either.

We're seeing two classes of errors:

Redis::CannotConnectError: Timed out connecting to Redis on redis.betdash.com:6379
from: 
[PROJECT_ROOT]/vendor/bundle/ruby/1.9.1/gems/redis-3.0.2/lib/redis/client.rb, line 266


Redis::TimeoutError: Connection timed out
from:
[PROJECT_ROOT]/vendor/bundle/ruby/1.9.1/gems/redis-3.0.2/lib/redis/client.rb, line 204

Frankly, we're not even sure if it's really a timeout issue, or if it's a networking problem.  Any suggestions as to how to localize what is going on so we at least know what to fix would be useful.

This is really stumping us.

Thanks.

Rob

Salvatore Sanfilippo

unread,
Feb 15, 2013, 11:35:02 AM2/15/13
to Redis DB
Hello,

before investigating further have you planned to upgrade to Redis 2.6?
I suggest doing that even if maybe this is not related to this issue.
But even if the problem is not something with 2.4, 2.6 has more tools
in order to investigate what's wrong.

Cheers,
Salvatore

Felix Gallo

unread,
Feb 15, 2013, 11:35:57 AM2/15/13
to redi...@googlegroups.com
It's been a while since I exhausted resources on a Linux box, so I couldn't tell you whether or not it's reflected in the syslog properly all the time.  I would expect so too, but actually now that I think about it if you run out of sockets owing to excess TIME_WAITs or file descriptors via ulimit or kernel configuration, I don't think there's a system level message generated. 

The important things to check are: file descriptors, socket availability, network buffer sizes.

I would also get your sysadmin to capture packets between the two systems and see if you can catch a timeout happening.  Seeing the TCP exchange with wireshark or the like would definitely help isolate where the issue is.

As a super, super ugly quick and dirty test, you might run a process on the same box which continually builds and tears down a redis db connection via the unix domain socket, and then via the loopback port.  If you see timeouts there without eth0 activity then you know it's a problem on the system and not with eth0.  If you don't, that points a finger at eth0.

F.

robert...@gmail.com

unread,
Feb 15, 2013, 2:47:05 PM2/15/13
to redi...@googlegroups.com
I've thought about it, but wasn't sure if it would be a good idea until we understood what was going on.  I'll see if I can get this organized.

robert...@gmail.com

unread,
Feb 15, 2013, 2:48:35 PM2/15/13
to redi...@googlegroups.com
Thanks Felix.

I'm pretty sure from testing today while timeouts were ongoing that it's Redis that's having the issue and not the network now.


Yes, the latency does seem to be Redis and not the client - as suggested in another thread, I captured the latency stats while timeouts were ongoing:

[server]$ ./redis-cli --latency

min: 0, max: 3, avg: 0.04 (852 samples)


[server]$ ./redis-cli --latency

min: 0, max: 2, avg: 0.02 (322 samples)



[server]$ ./redis-cli --latency

min: 0, max: 28193, avg: 157.51 (179 samples) <<<<<<<<<<< while timeouts ongoing


I appreciate the other suggestions.  I'll speak with my sysadmin about them.

Thanks!!!

Rob

Salvatore Sanfilippo

unread,
May 7, 2013, 4:30:28 AM5/7/13
to Redis DB
On Mon, May 6, 2013 at 9:50 PM, Real Shobee <real....@googlemail.com> wrote:
> Hello Robert,
>
> did you ever have found a solution on this? We are facing exactly the same
> problems and struggeling with it for a lot of weeks now. As soon as there is
> some heavy I/O we are getting redis connection timeouts.
>
> We would really appreciate any hints to solve this.

Hello Real,

many timeouts like the ones described by the original poster are due
to the following:

1) The client is configured with a short timeout.
2) The server is used in a way so that it can't always reply to
requests in time.

For example I may pick a timeout of 1 second for my clients, but then
if I've a 50 GB database, and I'm running on one of the EC2 instances
where fork time is very big, it will take maybe 2 seconds to fork
every time we need to persist to disk, and you'll see the timeout
client side reached.

Similarly I can't set a low timeout and then use KEYS or do slow
blocking commands like removing a big range from a sorted set, or
callign DEL against a huge key.

So the solution is similar to the problem:

1) Relax the timeout if you can't remoe latency from the server.
2) Or understand why there is the latency followign this guide
http://redis.io/topics/latency

As a pre-requisite I would check the max latency of the server usign
redis-cli --latency

Cheers,
Salvatore

Robert Shedd

unread,
May 7, 2013, 5:15:21 AM5/7/13
to redi...@googlegroups.com
Real,
Much along the lines of Salvatore's reply below, we found that our issue was mitigated by two main changes:

- We disabled persistence on our master Redis instance and moved it to our slave
- We discontinued our use of KEYS against the production instance.  We had been using this to delete a slice of our DB and just like the docs said, as our DB grew, this increasingly contributed to our timeouts

With these changes, our connections have been much more stable.

We also found that our application had been leaving some stale connections around with the default timeout of 0.  We changed the Redis timeout to 6000 and this helped cleanup these stale connections - I don't believe that it contributed to our connection timeout issues greatly, but it does make the Redis client list much cleaner.

Rob


On Monday, May 6, 2013 8:50:21 PM UTC+1, Real Shobee wrote:
Hello Robert,

did you ever have found a solution on this? We are facing exactly the same problems and struggeling with it for a lot of weeks now. As soon as there is some heavy I/O we are getting redis connection timeouts.

We would really appreciate any hints to solve this.

Thanks in advance
Frank

Victor Castell

unread,
Jan 9, 2014, 7:46:09 AM1/9/14
to redi...@googlegroups.com
Hello,

sorry for bringing up this issue again. We are facing the exact same problem.

Following the recommendations, we ended up disabling persistence in our master and let the slaves dump.

My question is how to handle master restarts in a setup like this? Because if a restart of the master occurs it begins with an empty database and inmediately propagate this changes to the slaves causing the dumps to empty.

I thought AOF was the way to go but I'm seeing a rewrite to 0 bytes in case of restart also.

I'm a bit puzzled with this.

Could Robert or someone indicate me how you handle this case?

Thanks!

On Tuesday, May 7, 2013 9:39:40 PM UTC+2, Real Shobee wrote:
Hello Salvatore and Robert,

thanks a lot for your answers and for leading us into the right direction!

Real

Robert Shedd

unread,
Jan 9, 2014, 12:42:59 PM1/9/14
to redi...@googlegroups.com
For restarting the master, we initially thought that you needed to temporarily enable persistence on the master before restarting to force the dump.  But this ended up being redundant, since via the SHUTDOWN command you can control whether Redis dumps out or not: http://redis.io/commands/shutdown

How are you triggering the restart of the master?

Rob


--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/LrFlfQRMNNw/unsubscribe.

To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.

Victor Castell

unread,
Jan 9, 2014, 5:32:51 PM1/9/14
to redi...@googlegroups.com
That's great! I didn't know what SHUTDOWN command could persist too, thank you.

The downside of this is that there's a drama if restarting the master "accidentally" as happened to me. I was provisioning with one of our favorite CM tools and upgraded Redis to latest version and apt restarted the service.

We can handle planned restarts with the SHUTDOWN technique but how do you handle in case of "unplanned"

Robert Shedd

unread,
Jan 9, 2014, 7:15:06 PM1/9/14
to redis-db
We had Redis configured as a service and the service was configured to shut Redis down such that it would dump itself out to disk.

In the unplanned failure scenario, we didn't have redis automatically restarted on the master.  If the process was killed in some kind of unplanned failure, a manual failover process was required to the slaves.

In the worst case scenario where some kind of data corruption occurred and we weren't able to recover from the slaves, our application was setup so that we could rebuild the Redis database based on our other production data store (MySQL).

Victor Castell

unread,
Jan 10, 2014, 3:47:23 AM1/10/14
to redi...@googlegroups.com
This clarifies some points for us regarding the unplanned restarts and how to handle.

Regarding the rebuild of redis we can only do it for some of the data but not all.

Anyway helped me a lot.

Thanks Rob.

dav...@blismedia.com

unread,
Jan 13, 2014, 1:34:02 PM1/13/14
to redi...@googlegroups.com, Rajalakshmi Iyer
Hi all,
we have implemented a very simple rate limiter using Redis. Our code is running on EC2.

Unfortunately, we are experiencing timeouts: Redis become very slow from time to time, and takes almost half a second in some cases to reply to our queries (our client timeout is 100msec). The data we store are integers. This happens even after disabling fsync.

I have tried different settings in the configuration file for the snapshotting and then we decided to try different configurations: we first had a cluster of 4 redis instances running with twemproxy (all running on the same machine with 8 cores). we moved to elasticache and we are currently running on a single process server running on an m1.large instance (version 2.6.13). All of them showed the same behaviour.

We wonder whether you can help us to find out what's going on and if we have some settings wrong.

Best regards,
Davide




This email and any attachments to it may be confidential and are intended solely for the use of the individual to whom it is addressed. Any views or opinions expressed are solely those of the author and do not necessarily represent those of BlisMedia Ltd, a company registered in England and Wales with registered number 06455773. Its registered office is 3rd Floor, 101 New Cavendish St, London, W1W 6XH, United Kingdom.

If you are not the intended recipient of this email, you must neither take any action based upon its contents, nor copy or show it to anyone. Please contact the sender if you believe you have received this email in error. 

Matt Palmer

unread,
Jan 13, 2014, 4:43:32 PM1/13/14
to redi...@googlegroups.com
On Mon, Jan 13, 2014 at 10:34:02AM -0800, dav...@blismedia.com wrote:
> Hi all,
> we have implemented a very simple rate limiter using Redis. Our code is
> running on EC2.
>
> Unfortunately, we are experiencing timeouts: Redis become very slow from
> time to time, and takes almost half a second in some cases to reply to our
> queries (our client timeout is 100msec). The data we store are integers. *This
> happens even after disabling fsync*.

Do the timeouts happen to correspond with times when an AOF rewrite or RDB
snapshot starts? Most EC2 instance types run in a virtualisation
environment where a fork() call can take an absurdly long time to complete,
during which time Redis will be completely non-responsive.

- Matt

--
I seem to have my life in reverse. When I was a wee'un, it seemed perfectly
normal that one could pick up the phone and speak to anybody else in the
world who also has a phone. Now I'm older and more experienced, I'm amazed
that this could possibly work. -- Peter Corlett, in the Monastery

Salvatore Sanfilippo

unread,
Jan 14, 2014, 5:08:45 AM1/14/14
to Redis DB
On Mon, Jan 13, 2014 at 7:34 PM, <dav...@blismedia.com> wrote:

> happens even after disabling fsync.

Hello, please note that when you disable fsync, Redis still needs to
write to disk, and even just write blocks with very slow disks.
If disk can't cope it is a problem. However there are other causes of
latency spikes, like running slow commands. In the "latency" page in
the Redis documentation common causes are explained.
Another common cause is fork() as explained, that in EC2 is very slow
in some kind of instances. You can check this very easily via INFO.

Before disabling fsync, was Redis logging that fsync was slow? If this
is the case, likely also write() is slow, however currently this is
not explicitly tracked by Redis.
This would require additional gettimeofday() calls before/after write,
I'm evaluating whatever it adds a considerable cost or not in the
context of the function flushAppendOnlyFile().

However if after your debugging you remain with some doubt about this,
I strongly suggest to modify the code of flushAppendOnlyFile() to
track this issue.

Actually I'm tempted to implement this support as an optional
debugging support that can be enabled to monitor the write call
latency.

So how you procede from here? Make sure to check/understand all the
latency issues that are possible following the latency page at
redis.io to start, then provide more info here to get some more hint.

Regards,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a
proposition by replacing it with a superficially similar yet
unequivalent proposition (the "straw man"), and to refute it
— Wikipedia (Straw man page)

Victor Castell

unread,
Jan 14, 2014, 12:09:06 PM1/14/14
to redi...@googlegroups.com, Rajalakshmi Iyer

I have tried different settings in the configuration file for the snapshotting and then we decided to try different configurations: we first had a cluster of 4 redis instances running with twemproxy (all running on the same machine with 8 cores). we moved to elasticache and we are currently running on a single process server running on an m1.large instance (version 2.6.13). All of them showed the same behaviour.

I really would not recommend running more than one instance per server. 

I suggest to follow Salvatore advice, discard any common latency issue and come back with more details.

Regards
--
V

Josiah Carlson

unread,
Jan 14, 2014, 1:26:55 PM1/14/14
to redi...@googlegroups.com
Two likely sources: slow query, noisy neighbor.

To check for a slow query, you check the slowlog. Documentation is available: http://redis.io/commands/slowlog

Unless you are running the biggest AWS instance types, you are sharing the machine with another user or users. If other users are making heavy use of the network, processor, and/or disk, you will experience it. This is known as the "noisy neighbor problem". You can hop around to different machines, which may or may not solve your problem. If you are okay with letting someone else run your stuff, you may want to look at redis-cloud.com - they do AWS hosting of Redis and only use the biggest boxes. That should solve any potential noisy neighbor, but it won't fix your slow queries.

 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

jagan naidu

unread,
Jul 9, 2014, 2:37:13 AM7/9/14
to redi...@googlegroups.com
Hello,

 I am facing same issue , we put a script for redis health check  its shows below error and also issue resolved by itself when i check the redis monitor log.

still we couldn't figure out the issue by redis behaving like this.. we are getting pagerduty alerts like Redis on host is having issue  and also issue resolved by itself..how can we find the root cause.. Could you please help us. Thanks

Could not connect to Redis at xxxxx: Connection timed out
DEBUG 20140709 04:23:11: Redis on xxxxx  is having issue.
DEBUG 20140709 04:23:11: Alert: Check the health of redis servers.
DEBUG 20140709 04:30:09: health_check_port xxxxx  6379 is A-OK
DEBUG 20140709 04:30:09: health_check_port xxxxx  6379 is A-OK
DEBUG 20140709 04:30:09: Redis master and slave port are OK.
DEBUG 20140709 04:30:09: Redis master info is OK.
DEBUG 20140709 04:30:09: Redis slave info is OK.
DEBUG 20140709 04:30:09: Redis master set redis_replication 2014-07-09 04:30:09 is OK.
DEBUG 20140709 04:30:11: Redis slave get redis_replication 2014-07-09 04:30:09 is OK.

my redis conf file in master dtabase..

daemonize yes
pidfile /var/run/redis_6379.pid
port 6379
timeout 300
loglevel warning
logfile /var/log/redis_6379.log
#syslog-enabled no
#syslog-ident redis
#syslog-facility local0
databases 16

# SNAPSHOTTING
save 180 1
save 120 10
save 60 100
rdbcompression yes
dir /mnt/mysql/redis
dbfilename redis.6379.dump.rdb

# SECURITY
requirepass xxxx

# LIMITS
#maxclients 128
#maxmemory <bytes>
#maxmemory-policy volatile-lru
#maxmemory-samples 3

# APPEND ONLY MODE
appendonly no
appendfilename appendonly.aof
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# SLOW LOG
slowlog-log-slower-than 10000
slowlog-max-len 1024

# ADVANCED CONFIG
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes

client-output-buffer-limit normal 0 0 0
client-output-buffer-limit slave 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

# REPLICATION
masterauth Cxxxx
#slave-serve-stale-data no
slaveof xxxx  6379

Thanks
Jagan

Josiah Carlson

unread,
Jul 9, 2014, 11:02:56 AM7/9/14
to redi...@googlegroups.com
What have you done to diagnose your problem? Have you checked your slowlog? Have you checked your logs on your master? Have you read through the advice in this thread for pointers on how to diagnose your issue?

 - Josiah


For more options, visit https://groups.google.com/d/optout.

jagan naidu

unread,
Jul 11, 2014, 3:09:31 AM7/11/14
to redi...@googlegroups.com
Hi Josiah,

 I have checked all possible ways but still can't find the root cause. my redis log shows below and log updated long ago. 

 tail -100f redis_6379.log
[10865] 30 Apr 02:41:36.725 # Server started, Redis version 2.6.14
[10865] 30 Apr 02:41:36.725 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[10865] 01 May 02:04:49.737 # User requested shutdown...
[10865] 01 May 02:04:49.970 # Redis is now ready to exit, bye bye...
[2545] 01 May 02:04:53.013 # Server started, Redis version 2.6.14
[2545] 01 May 02:04:53.014 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

[root@xxxxx log]$ date
Fri Jul 11 06:46:30 UTC 2014

slow log
======

redis-cli
redis 127.0.0.1:6379> ping
(error) ERR operation not permitted
redis 127.0.0.1:6379> slowlog get 2
(error) ERR operation not permitted

As per my observation daily we had running DB Freeze binary backups  in right scale. but status of those are failed.  the backup stats at 2014-07-10 03:43:57 PDT and failed at Thu Jul 10 10:43:51 +0000 2014 and then  we are receiving the alert at Jul 10, 2014 at 11:23 PM, Is this due to DB Freeze binary backup? and i don't see any load and io wait at that time in master server when we receive the alert.  Finally the alert resolved it self after few mins.  Can you please help us.

==Thanks
    Jagan

Yiftach Shoolman

unread,
Jul 11, 2014, 3:51:55 AM7/11/14
to redi...@googlegroups.com
Hi Jogan,

1. Did you set vm.overcommit_memory = 1 ?
2. What is the size of your dataset ? can you send your 'info all' output ?
3. On the surface it looks like you snapshot your entire dataset every 3min (180 sec).This is not the best practice. 
If you have a large dataset you might running long fork processes that can block your Redis. See more info here under the 'Latency generated by fork' section.
Instead you can just enable appendonly every one second and snapshot every hour.
 

--

Yiftach Shoolman
+972-54-7634621

Josiah Carlson

unread,
Jul 11, 2014, 5:37:44 PM7/11/14
to redi...@googlegroups.com
Replies inline.

On Fri, Jul 11, 2014 at 12:09 AM, jagan naidu <jaga...@gmail.com> wrote:
Hi Josiah,

 I have checked all possible ways but still can't find the root cause. my redis log shows below and log updated long ago. 

 tail -100f redis_6379.log
[10865] 30 Apr 02:41:36.725 # Server started, Redis version 2.6.14
[10865] 30 Apr 02:41:36.725 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[10865] 01 May 02:04:49.737 # User requested shutdown...
[10865] 01 May 02:04:49.970 # Redis is now ready to exit, bye bye...
[2545] 01 May 02:04:53.013 # Server started, Redis version 2.6.14
[2545] 01 May 02:04:53.014 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

If Redis hasn't produced a log in over 2 months, then you may want to fix your logging before continuing. That's going to make your life 10x easier in the long-term in debugging Redis.

[root@xxxxx log]$ date
Fri Jul 11 06:46:30 UTC 2014

slow log
======

redis-cli
redis 127.0.0.1:6379> ping
(error) ERR operation not permitted
redis 127.0.0.1:6379> slowlog get 2
(error) ERR operation not permitted

The only place that I see this in the source for Redis is in replication message processing. Are you connecting directly to your Redis master? Are you using a proxy? Can you give more details to your connection/configuration?

And as per the other reply to the thread... yes, backing up your full database every 3 minutes is probably not what you actually want to do, even if you think it is. Use replication + AOF + periodic snapshots + backups.

 - Josiah

Salvatore Sanfilippo

unread,
Jul 11, 2014, 6:21:53 PM7/11/14
to Redis DB
On Fri, Jul 11, 2014 at 11:37 PM, Josiah Carlson
<josiah....@gmail.com> wrote:

> The only place that I see this in the source for Redis is in replication
> message processing

It must be <= 2.6 to reply "operation not permitted" btw...
and this happens when no AUTH was given but a password is set.
However you can still see the error in 2.8 code base for 2.6
interoperability of the replication layer.

Salvatore





--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

"One would never undertake such a thing if one were not driven on by
some demon whom one can neither resist nor understand."
— George Orwell
Reply all
Reply to author
Forward
0 new messages