redis server randomly stops without error

4,814 views
Skip to first unread message

Anoop Kulkarni

unread,
Jun 27, 2014, 6:26:13 AM6/27/14
to redi...@googlegroups.com
My redis server randomly stops without any stack trace or error. It happens every 1-3 days but not on any particular schedule.

I tried attaching gdb and it gives the following trace. Unfortunately I dont see anything remotely useful on that trace. I'm on

Distributor ID: Ubuntu
Description:    Ubuntu 13.04
Release:        13.04
Codename:       raring

I installed redis-server through apt-get. I dont see any major server load
15:53:52 up 19 days, 16:30,  2 users,  load average: 0.18, 0.81, 1.69

Can someone please point me in the right direction in troubleshooting this problem?

thanks in advance

GNU gdb (GDB) 7.5.91.20130417-cvs-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/redis-server...(no debugging symbols found)...done.
Attaching to program: /usr/bin/redis-server, process 18892
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libm.so.6
Reading symbols from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...(no debugging symbols found)...done.
[New LWP 18894]
[New LWP 18893]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x00007f63ccf430c3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
(gdb)
(gdb) continue
Continuing.
[Thread 0x7f63cc3ff700 (LWP 18893) exited]
[Thread 0x7f63cb9fe700 (LWP 18894) exited]

Program terminated with signal SIGKILL, Killed.
The program no longer exists.
(gdb)
The program is not being run.
(gdb) continue
The program is not being run.
(gdb)
The program is not being run.
(gdb) bt
No stack.
(gdb) info registers
The program has no registers now.
(gdb) core
No core file now.

Yiftach Shoolman

unread,
Jun 27, 2014, 8:25:39 AM6/27/14
to redi...@googlegroups.com
​Please 'redis-cli' to your Redis instance and send the output of:
1. 'config get *'
2. 'info all' 


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Yiftach Shoolman
+972-54-7634621

Jan-Erik Rediger

unread,
Jun 27, 2014, 8:33:50 AM6/27/14
to redi...@googlegroups.com
Anything in the log? What version of Redis is it?
How much memory was in use? How much memory has the server? (at best
show us the output of INFO)

Anoop Kulkarni

unread,
Jul 4, 2014, 2:54:21 PM7/4/14
to redi...@googlegroups.com
127.0.0.1:6379> config get *
  1) "dbfilename"
  2) "dump.rdb"
  3) "requirepass"
  4) ""
  5) "masterauth"
  6) ""
  7) "unixsocket"
  8) ""
  9) "logfile"
 10) "/var/log/redis_6379.log"
 11) "pidfile"
 12) "/var/run/redis_6379.pid"
 13) "maxmemory"
 14) "0"
 15) "maxmemory-samples"
 16) "3"
 17) "timeout"
 18) "60"
 19) "tcp-keepalive"
 20) "0"
 21) "auto-aof-rewrite-percentage"
 22) "100"
 23) "auto-aof-rewrite-min-size"
 24) "67108864"
 25) "hash-max-ziplist-entries"
 26) "512"
 27) "hash-max-ziplist-value"
 28) "64"
 29) "list-max-ziplist-entries"
 30) "512"
 31) "list-max-ziplist-value"
 32) "64"
 33) "set-max-intset-entries"
 34) "512"
 35) "zset-max-ziplist-entries"
 36) "128"
 37) "zset-max-ziplist-value"
 38) "64"
 39) "hll-sparse-max-bytes"
 40) "3000"
 41) "lua-time-limit"
 42) "5000"
 43) "slowlog-log-slower-than"
 44) "10000"
 45) "slowlog-max-len"
 46) "128"
 47) "port"
 48) "6379"
 49) "tcp-backlog"
 50) "511"
 51) "databases"
 52) "2"
 53) "repl-ping-slave-period"
 54) "10"
 55) "repl-timeout"
 56) "60"
 57) "repl-backlog-size"
 58) "1048576"
 59) "repl-backlog-ttl"
 60) "3600"
 61) "maxclients"
 62) "10000"
 63) "watchdog-period"
 64) "0"
 65) "slave-priority"
 66) "100"
 67) "min-slaves-to-write"
 68) "0"
 69) "min-slaves-max-lag"
 70) "10"
 71) "hz"
 72) "10"
 73) "no-appendfsync-on-rewrite"
 74) "no"
 75) "slave-serve-stale-data"
 76) "yes"
 77) "slave-read-only"
 78) "yes"
 79) "stop-writes-on-bgsave-error"
 80) "yes"
 81) "daemonize"
 82) "yes"
 83) "rdbcompression"
 84) "yes"
 85) "rdbchecksum"
 86) "yes"
 87) "activerehashing"
 88) "yes"
 89) "repl-disable-tcp-nodelay"
 90) "no"
 91) "aof-rewrite-incremental-fsync"
 92) "yes"
 93) "appendonly"
 94) "no"
 95) "dir"
 96) "/var/redis/6379"
 97) "maxmemory-policy"
 98) "volatile-lru"
 99) "appendfsync"
100) "everysec"
101) "save"
102) "900 1 300 10 60 10000"
103) "loglevel"
104) "notice"
105) "client-output-buffer-limit"
106) "normal 0 0 0 slave 268435456 67108864 60 pubsub 33554432 8388608 60"
107) "unixsocketperm"
108) "0"
109) "slaveof"
110) ""
111) "notify-keyspace-events"
112) ""
113) "bind"

127.0.0.1:6379> info all
# Server
redis_version:2.8.12
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:d39ba93ce707b808
redis_mode:standalone
os:Linux 2.6.32-042stab090.5 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.7.3
process_id:1567
run_id:ca2d6faebadeab030165fa7748bcf66525f33ffd
tcp_port:6379
uptime_in_seconds:648
uptime_in_days:0
hz:10
lru_clock:11990918
config_file:/etc/redis/6379.conf

# Clients
connected_clients:11
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:658056448
used_memory_human:627.57M
used_memory_rss:98742272
used_memory_peak:658366264
used_memory_peak_human:627.87M
used_memory_lua:33792
mem_fragmentation_ratio:0.15
mem_allocator:jemalloc-3.6.0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1404499198
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:30
total_commands_processed:39
instantaneous_ops_per_sec:0
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:3
evicted_keys:0
keyspace_hits:1
keyspace_misses:0
pubsub_channels:8
pubsub_patterns:0
latest_fork_usec:0

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:1.31
used_cpu_user:8.70
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

# Commandstats
cmdstat_zrange:calls=1,usec=51007,usec_per_call=51007.00
cmdstat_info:calls=29,usec=4225,usec_per_call=145.69
cmdstat_config:calls=1,usec=411,usec_per_call=411.00
cmdstat_subscribe:calls=8,usec=203,usec_per_call=25.38

# Keyspace
db0:keys=13809,expires=328,avg_ttl=45159558


It died again a few times today. These are the logs between restarts

[1444] 04 Jul 23:35:41.662 # Server started, Redis version 2.8.12
[1444] 04 Jul 23:35:41.662 # WARNING overcommit_memory is set to 0! Background save may fail under low
 memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboo
t or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[1444] 04 Jul 23:35:49.969 * DB loaded from disk: 8.307 seconds
[1444] 04 Jul 23:35:49.969 * The server is now ready to accept connections on port 6379
[1444] 04 Jul 23:36:53.576 # Client id=28 addr=127.0.0.1:54545 fd=33 name= age=71 idle=0 flags=N db=0
sub=17 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=166 omem=33620848 events=rw cmd=subscribe schedule
d to be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:36:56.611 # Client id=2 addr=127.0.0.1:54519 fd=7 name= age=74 idle=0 flags=N db=0 su
b=16 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=127 omem=34341464 events=rw cmd=subscribe scheduled
to be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:00.458 # Client id=48 addr=127.0.0.1:55673 fd=6 name= age=6 idle=0 flags=N db=0 su
b=17 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=97 omem=33974056 events=rw cmd=subscribe scheduled t
o be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:00.462 # Client id=10 addr=127.0.0.1:54527 fd=15 name= age=78 idle=0 flags=N db=0
sub=12 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=111 omem=34057432 events=rw cmd=subscribe schedule
d to be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:01.722 # Client id=49 addr=127.0.0.1:55702 fd=7 name= age=4 idle=1 flags=N db=0 su
b=16 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=44 omem=33849824 events=rw cmd=subscribe scheduled t
o be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:02.691 # Client id=50 addr=127.0.0.1:55770 fd=6 name= age=0 idle=0 flags=N db=0 su
b=17 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=3 omem=50331896 events=rw cmd=subscribe scheduled to
 be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:03.460 # Client id=51 addr=127.0.0.1:55773 fd=7 name= age=1 idle=0 flags=N db=0 su
b=12 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=35 omem=35016184 events=rw cmd=subscribe scheduled t
o be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:03.707 # Client id=52 addr=127.0.0.1:55785 fd=6 name= age=0 idle=0 flags=N db=0 su
b=17 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=35 omem=35016184 events=rw cmd=subscribe scheduled t
o be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:40.159 # Client id=83 addr=127.0.0.1:56340 fd=6 name= age=3 idle=0 flags=N db=0 sub=19 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=3 omem=33554680 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:40.159 # Client id=84 addr=127.0.0.1:56351 fd=15 name= age=2 idle=0 flags=N db=0 sub=15 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=3 omem=33554680 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
[1444] 04 Jul 23:37:44.465 # Client id=86 addr=127.0.0.1:56409 fd=15 name= age=4 idle=3 flags=N db=0 sub=19 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=64 oll=1 omem=58720296 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
[1567] 05 Jul 00:09:58.743 * Increased maximum number of open files to 10032 (it was originally set to 1024).

Anoop Kulkarni

unread,
Jul 5, 2014, 2:49:34 AM7/5/14
to redi...@googlegroups.com
I've downgraded redis from 2.8 stable to 2.6 now.

Whenever 2.8 crashes it did not delete the pid file however 2.6 deletes the pid file when it stops. Atleast this allows me to monitor it and restart using a different program. However the crashes occur on both versions.

Michel Martens

unread,
Jul 5, 2014, 7:48:54 AM7/5/14
to redi...@googlegroups.com
Can you try configuring a maxmemory limit?

Anoop Kulkarni

unread,
Jul 5, 2014, 8:24:32 AM7/5/14
to redi...@googlegroups.com
I set a maxmemory limit. Will keep you updated if this helps

Anoop Kulkarni

unread,
Jul 5, 2014, 3:44:09 PM7/5/14
to redi...@googlegroups.com
Didn't help, it died again a few times today.

15) "maxmemory"
16) "736870912"
17) "maxmemory-samples"
18) "3"
79) "maxmemory-policy"
80) "allkeys-lru"

Josiah Carlson

unread,
Jul 5, 2014, 4:08:54 PM7/5/14
to redi...@googlegroups.com
Looking at your logs and your INFO output, firstly, your clients are dying because they aren't able to receive data fast enough, or your server can't send it fast enough. I don't know which side is the problem, but you need to work on fixing that. If you are running on a slow VM, upgrade your VM to one with faster IO. Second, your info output says, "mem_fragmentation_ratio:0.15". This usually means that your Redis data is swapped out to disk, which would suggest that Redis is under memory pressure from the system itself. Could you send us the first ~5 lines of output from 'top'? Something like:

top - 13:03:47 up 29 days, 21:20, 20 users,  load average: 0.24, 0.56, 0.51
Tasks: 286 total,   2 running, 284 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.8%us,  0.5%sy,  0.0%ni, 96.2%id,  0.1%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:   8118860k total,  7813236k used,   305624k free,   198672k buffers
Swap:  8328188k total,   446212k used,  7881976k free,   705708k cached

That will at least give us a chance of seeing how much memory you have, what other high-level memory pressure you have, etc.

Honestly, I suspect that Redis is being killed by the OOM killer, and some other service you are running is starting it back up. It then runs into memory issues (maybe due to slow subscribers, or maybe the slow subscribers exacerbate the issue), gets killed, and is automatically restarted.

Send us the top output, and tell us about the machine Redis is running on (other processes, services, etc.)

 - Josiah



--

Anoop Kulkarni

unread,
Jul 5, 2014, 4:13:46 PM7/5/14
to redi...@googlegroups.com
top - 01:43:01 up  8:40,  1 user,  load average: 0.21, 0.25, 0.13
Tasks:  27 total,   1 running,  26 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 99.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem:   1048576 total,   858256 used,   190320 free,        0 buffers
KiB Swap:  1310720 total,   394376 used,   916344 free,    10124 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 3822 dev       20   0  786m 132m 2836 S   0.7 13.0   0:28.91 node
 3756 redis     20   0  731m 148m  576 S   0.3 14.5   0:24.33 redis-server
 3823 dev       20   0  826m 179m 3060 S   0.3 17.5   0:44.12 node
 3825 dev       20   0  823m 170m 2816 S   0.3 16.7   0:28.75 node
    1 root      20   0 26496 1172  208 S   0.0  0.1   0:00.33 init

Anoop Kulkarni

unread,
Jul 5, 2014, 4:16:29 PM7/5/14
to redi...@googlegroups.com
I've moved from 64 bit to 32 bit redis-server and its brought down the memory usage. I'm trying to clean up more junk from the db to reduce the memory footprint if that helps.

My VM is on ubuntu with 1g ram and 1g swap

Josiah Carlson

unread,
Jul 5, 2014, 4:25:21 PM7/5/14
to redi...@googlegroups.com
You are running Node and Redis on the same server. Node is eating your resident memory and pushing Redis into swap. Move Redis to a different server.

 - Josiah


On Sat, Jul 5, 2014 at 1:16 PM, Anoop Kulkarni <anoop.k...@gmail.com> wrote:
I've moved from 64 bit to 32 bit redis-server and its brought down the memory usage. I'm trying to clean up more junk from the db to reduce the memory footprint if that helps.

My VM is on ubuntu with 1g ram and 1g swap

--

Anoop Kulkarni

unread,
Jul 5, 2014, 5:02:46 PM7/5/14
to redi...@googlegroups.com
I have another application on another server running node and redis and so far I've never had any problems on it. Infact its been months since I've needed to restart anything.

Is there any way for me to check if node is the reason why redis is shutting down?

Anoop Kulkarni

unread,
Jul 7, 2014, 7:26:50 AM7/7/14
to redi...@googlegroups.com
I've gone through a day without crashes and I think I might have found out my problem.

I'm using redis for pub/sub as well, so there could be a scenario (worst case) when there are around 500 publish/susbscribe clients. My site has been running for a few months and its been a low traffic site, so I've never had to experience any ill effect of having so many clients.

So what changed two weeks ago? I signed up for webmaster tools and found out that google bot started indexing my site two weeks ago. So there could have been a case where all the pub/sub clients may be active within a short piece of time something either redis didn't like or my vm configuration did not support. Since googlebots dont register in analytics, I did not suspect it in the beginning.

I've now changed it to have a single pub/sub channel and move the routing logic to the app. Its been over a day without a crash which is much longer than before so I'm hoping that was the issue. Fingers crossed....

Josiah Carlson

unread,
Jul 9, 2014, 1:00:01 PM7/9/14
to redi...@googlegroups.com
I'm glad that your issue seems to be temporarily solved, but Redis shouldn't be falling over with 500 pubsub clients unless your machine is drastically underpowered. Source: I know people with 20k pubsub clients that have zero issues.

What were you doing specifically with pubsub to have the issue? Were you using a specific library for handling communication brokering?

 - Josiah



--

Anoop Kulkarni

unread,
Jul 9, 2014, 2:30:33 PM7/9/14
to redi...@googlegroups.com
I was using the node-redis npm module (https://github.com/mranney/node_redis) for redis communication. I confirmed its the pub/sub channels which were causing the crashes, dont know why as there was insufficient logging.

Josiah Carlson

unread,
Jul 9, 2014, 6:41:42 PM7/9/14
to redi...@googlegroups.com
Having pubsub channels and pubsub connections to Redis shouldn't be an issue up to your platform's file handle limit. Then at that point Redis will just stop being able to accept connections.

I believe it is far more likely that the number of connections were masking an issue related to data distribution; and my hypothesis is that the data you were publishing was being distributed to more clients than you expected, which caused high memory use (on an already-memory-saturated machine), which resulted in killing by the Linux OOM killer. This would explain why you were seeing the issue with 500 clients, but not with your far fewer clients now. You have less memory pressure. This would also explain why you could run Redis on your other server (that was also running Node.js, but a different app) but not this one.

 - Josiah


On Wed, Jul 9, 2014 at 11:30 AM, Anoop Kulkarni <anoop.k...@gmail.com> wrote:
I was using the node-redis npm module (https://github.com/mranney/node_redis) for redis communication. I confirmed its the pub/sub channels which were causing the crashes, dont know why as there was insufficient logging.

--
Reply all
Reply to author
Forward
0 new messages