For the 2nd day in a row, today, right around 4pm est - our instances of redis began to crash. We have 3 instances on 3 different machines. The first will crash, failover will happen, then the 2nd will crash, failing over then the 3rd.
The dump suggests an OOM exception. Whats really strange is we cant figure out how or why. Over the course of about 2 minutes we'll watch (via datadog) memory usage climb on that machine from less than 1GB all the way to 16GB used then crash. It happens the same on all machines one after the other. To make this even more confusing, when we look at disk, network and cpu usage, we dont see any changes. I would fully expect to see network traffic spike in correlation to the memory usage, if this were one of my applications trying to write to redis.
The only things I can think of are perhaps someone has a lua script in place that is generating keys server side (but no cpu spike) or there is a bug in redis. I just rolled out 3.2.8 in the middle of the day in hopes of negating this.
Has anyone seen or heard of anything like this? Suggestions on how to trouble shoot?
27229:M 02 May 16:04:56.041 # Out Of Memory allocating 32774 bytes!
=== REDIS BUG REPORT START: Cut & paste starting from here ===
27229:M 02 May 16:04:56.253 # ------------------------------------------------
27229:M 02 May 16:04:56.371 # !!! Software Failure. Press left mouse button to continue
27229:M 02 May 16:04:56.378 # Guru Meditation: "Redis aborting for OUT OF MEMORY" #server.c:3826
27229:M 02 May 16:04:56.378 # (forcing SIGSEGV in order to print the stack trace)
27229:M 02 May 16:04:56.378 # ------------------------------------------------
27229:M 02 May 16:04:56.378 # Redis 3.2.5 crashed by signal: 11
27229:M 02 May 16:04:56.378 # Crashed running the instuction at: 0x45ea2a
27229:M 02 May 16:04:56.378 # Accessing address: 0xffffffffffffffff
27229:M 02 May 16:04:56.378 # Failed assertion: <no assertion failed> (<no file>:0)
------ STACK TRACE ------
EIP:
/usr/local/bin/redis-server 0.0.0.0:6379(_serverPanic+0x7a)[0x45ea2a]
Backtrace:
------ INFO OUTPUT ------
# Server
redis_version:3.2.5
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:a1ff76e74af07ea4
redis_mode:standalone
os:Linux 4.4.0-75-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.4.0
process_id:27229
run_id:1a1003987ba1e1773da12c8660becd8531f00681
tcp_port:6379
uptime_in_seconds:84935
uptime_in_days:0
hz:10
lru_clock:583268
executable:/usr/local/bin/redis-server
config_file:/etc/redis/redis.conf
# Clients
connected_clients:1133
client_longest_output_list:421123
client_biggest_input_buf:277895
blocked_clients:0
# Memory
used_memory:17292894000
used_memory_human:16.11G
used_memory_rss:16246845440
used_memory_rss_human:15.13G
used_memory_peak:17292894000
used_memory_peak_human:16.11G
total_system_memory_human:15.66G
used_memory_lua:64512
used_memory_lua_human:63.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.94
mem_allocator:jemalloc-4.0.3
# Persistence
loading:0
rdb_changes_since_last_save:46549975
rdb_bgsave_in_progress:0
rdb_last_save_time:1493698199
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
# Stats
total_connections_received:12270
total_commands_processed:117344660
instantaneous_ops_per_sec:2347
total_net_input_bytes:919509293822
total_net_output_bytes:2708738407041
instantaneous_input_kbps:5372.76
instantaneous_output_kbps:2491.19
rejected_connections:0
sync_full:6
sync_partial_ok:0
sync_partial_err:3
expired_keys:390208
evicted_keys:0
keyspace_hits:8673514
keyspace_misses:2170054
pubsub_channels:11744
pubsub_patterns:0
latest_fork_usec:32849
migrate_cached_sockets:0
# Replication
role:master
connected_slaves:2
slave0:ip=10.254.18.160,port=6379,state=online,offset=917243561978,lag=5
slave1:ip=10.254.18.162,port=6379,state=online,offset=907936106924,lag=5
master_repl_offset:918536803459
repl_backlog_active:1
repl_backlog_size:10485760
repl_backlog_first_byte_offset:918526317700
repl_backlog_histlen:10485760
# CPU
used_cpu_sys:8463.43
used_cpu_user:6183.00
used_cpu_sys_children:3.58
used_cpu_user_children:15.92
# Commandstats
cmdstat_get:calls=8892618,usec=64694027,usec_per_call=7.28
cmdstat_setnx:calls=376454,usec=2725244,usec_per_call=7.24
cmdstat_setex:calls=43638768,usec=806517859,usec_per_call=18.48
cmdstat_psetex:calls=4347816,usec=41523467,usec_per_call=9.55
cmdstat_del:calls=304002,usec=1431448,usec_per_call=4.71
cmdstat_exists:calls=286473,usec=1239485,usec_per_call=4.33
cmdstat_incr:calls=155415,usec=808771,usec_per_call=5.20
cmdstat_rpush:calls=1514480,usec=24286873,usec_per_call=16.04
cmdstat_lrange:calls=696864,usec=20380261,usec_per_call=29.25
cmdstat_lrem:calls=807950,usec=6515745,usec_per_call=8.06
cmdstat_hget:calls=953582,usec=7492331,usec_per_call=7.86
cmdstat_hmset:calls=224768,usec=2805050,usec_per_call=12.48
cmdstat_hdel:calls=2464,usec=36984,usec_per_call=15.01
cmdstat_hgetall:calls=301371,usec=11435223,usec_per_call=37.94
cmdstat_expire:calls=2422107,usec=5874718,usec_per_call=2.43
cmdstat_expireat:calls=25643,usec=183163,usec_per_call=7.14
cmdstat_pexpire:calls=283954,usec=1306249,usec_per_call=4.60
cmdstat_ping:calls=658183,usec=1206395,usec_per_call=1.83
cmdstat_echo:calls=5986,usec=5855,usec_per_call=0.98
cmdstat_psync:calls=6,usec=178147,usec_per_call=29691.17
cmdstat_replconf:calls=166425,usec=503685,usec_per_call=3.03
cmdstat_info:calls=770127,usec=335127970,usec_per_call=435.16
cmdstat_config:calls=23716,usec=418636,usec_per_call=17.65
cmdstat_subscribe:calls=972294,usec=9463635,usec_per_call=9.73
cmdstat_unsubscribe:calls=1699371,usec=13051073,usec_per_call=7.68
cmdstat_publish:calls=46224316,usec=155590380,usec_per_call=3.37
cmdstat_cluster:calls=3447,usec=122141,usec_per_call=35.43
cmdstat_client:calls=6064,usec=12486,usec_per_call=2.06
cmdstat_eval:calls=143,usec=20450,usec_per_call=143.01
cmdstat_evalsha:calls=1575332,usec=161869321,usec_per_call=102.75
cmdstat_slowlog:calls=4374,usec=2434197,usec_per_call=556.52
cmdstat_script:calls=143,usec=3982,usec_per_call=27.85
cmdstat_command:calls=4,usec=33079,usec_per_call=8269.75
# Cluster
cluster_enabled:0
# Keyspace
db0:keys=259083,expires=258376,avg_ttl=23767101
hash_init_value: 1493572598