Redis 3.2.5 Rapid Memory Growth then OOM Exception

255 views
Skip to first unread message

William Davis

unread,
May 2, 2017, 5:01:00 PM5/2/17
to Redis DB
For the 2nd day in a row, today, right around 4pm est - our instances of redis began to crash. We have 3 instances on 3 different machines. The first will crash, failover will happen, then the 2nd will crash, failing over then the 3rd.

The dump suggests an OOM exception. Whats really strange is we cant figure out how or why. Over the course of about 2 minutes we'll watch (via datadog) memory usage climb on that machine from less than 1GB all the way to 16GB used then crash. It happens the same on all machines one after the other. To make this even more confusing, when we look at disk, network and cpu usage, we dont see any changes. I would fully expect to see network traffic spike in correlation to the memory usage, if this were one of my applications trying to write to redis.

The only things I can think of are perhaps someone has a lua script in place that is generating keys server side (but no cpu spike) or there is a bug in redis. I just rolled out 3.2.8 in the middle of the day in hopes of negating this.

Has anyone seen or heard of anything like this? Suggestions on how to trouble shoot?


27229:M 02 May 16:04:56.041 # Out Of Memory allocating 32774 bytes!


=== REDIS BUG REPORT START: Cut & paste starting from here ===
27229:M 02 May 16:04:56.253 # ------------------------------------------------
27229:M 02 May 16:04:56.371 # !!! Software Failure. Press left mouse button to continue
27229:M 02 May 16:04:56.378 # Guru Meditation: "Redis aborting for OUT OF MEMORY" #server.c:3826
27229:M 02 May 16:04:56.378 # (forcing SIGSEGV in order to print the stack trace)
27229:M 02 May 16:04:56.378 # ------------------------------------------------
27229:M 02 May 16:04:56.378 # Redis 3.2.5 crashed by signal: 11
27229:M 02 May 16:04:56.378 # Crashed running the instuction at: 0x45ea2a
27229:M 02 May 16:04:56.378 # Accessing address: 0xffffffffffffffff
27229:M 02 May 16:04:56.378 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/local/bin/redis-server 0.0.0.0:6379(_serverPanic+0x7a)[0x45ea2a]

Backtrace:

------ INFO OUTPUT ------
# Server
redis_version:3.2.5
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:a1ff76e74af07ea4
redis_mode:standalone
os:Linux 4.4.0-75-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.4.0
process_id:27229
run_id:1a1003987ba1e1773da12c8660becd8531f00681
tcp_port:6379
uptime_in_seconds:84935
uptime_in_days:0
hz:10
lru_clock:583268
executable:/usr/local/bin/redis-server
config_file:/etc/redis/redis.conf

# Clients
connected_clients:1133
client_longest_output_list:421123
client_biggest_input_buf:277895
blocked_clients:0

# Memory
used_memory:17292894000
used_memory_human:16.11G
used_memory_rss:16246845440
used_memory_rss_human:15.13G
used_memory_peak:17292894000
used_memory_peak_human:16.11G
total_system_memory:16818286592
total_system_memory_human:15.66G
used_memory_lua:64512
used_memory_lua_human:63.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.94
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:46549975
rdb_bgsave_in_progress:0
rdb_last_save_time:1493698199
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:12270
total_commands_processed:117344660
instantaneous_ops_per_sec:2347
total_net_input_bytes:919509293822
total_net_output_bytes:2708738407041
instantaneous_input_kbps:5372.76
instantaneous_output_kbps:2491.19
rejected_connections:0
sync_full:6
sync_partial_ok:0
sync_partial_err:3
expired_keys:390208
evicted_keys:0
keyspace_hits:8673514
keyspace_misses:2170054
pubsub_channels:11744
pubsub_patterns:0
latest_fork_usec:32849
migrate_cached_sockets:0

# Replication
role:master
connected_slaves:2
slave0:ip=10.254.18.160,port=6379,state=online,offset=917243561978,lag=5
slave1:ip=10.254.18.162,port=6379,state=online,offset=907936106924,lag=5
master_repl_offset:918536803459
repl_backlog_active:1
repl_backlog_size:10485760
repl_backlog_first_byte_offset:918526317700
repl_backlog_histlen:10485760

# CPU
used_cpu_sys:8463.43
used_cpu_user:6183.00
used_cpu_sys_children:3.58
used_cpu_user_children:15.92

# Commandstats
cmdstat_get:calls=8892618,usec=64694027,usec_per_call=7.28
cmdstat_setnx:calls=376454,usec=2725244,usec_per_call=7.24
cmdstat_setex:calls=43638768,usec=806517859,usec_per_call=18.48
cmdstat_psetex:calls=4347816,usec=41523467,usec_per_call=9.55
cmdstat_del:calls=304002,usec=1431448,usec_per_call=4.71
cmdstat_exists:calls=286473,usec=1239485,usec_per_call=4.33
cmdstat_incr:calls=155415,usec=808771,usec_per_call=5.20
cmdstat_rpush:calls=1514480,usec=24286873,usec_per_call=16.04
cmdstat_lrange:calls=696864,usec=20380261,usec_per_call=29.25
cmdstat_lrem:calls=807950,usec=6515745,usec_per_call=8.06
cmdstat_hget:calls=953582,usec=7492331,usec_per_call=7.86
cmdstat_hmset:calls=224768,usec=2805050,usec_per_call=12.48
cmdstat_hdel:calls=2464,usec=36984,usec_per_call=15.01
cmdstat_hgetall:calls=301371,usec=11435223,usec_per_call=37.94
cmdstat_expire:calls=2422107,usec=5874718,usec_per_call=2.43
cmdstat_expireat:calls=25643,usec=183163,usec_per_call=7.14
cmdstat_pexpire:calls=283954,usec=1306249,usec_per_call=4.60
cmdstat_ping:calls=658183,usec=1206395,usec_per_call=1.83
cmdstat_echo:calls=5986,usec=5855,usec_per_call=0.98
cmdstat_psync:calls=6,usec=178147,usec_per_call=29691.17
cmdstat_replconf:calls=166425,usec=503685,usec_per_call=3.03
cmdstat_info:calls=770127,usec=335127970,usec_per_call=435.16
cmdstat_config:calls=23716,usec=418636,usec_per_call=17.65
cmdstat_subscribe:calls=972294,usec=9463635,usec_per_call=9.73
cmdstat_unsubscribe:calls=1699371,usec=13051073,usec_per_call=7.68
cmdstat_publish:calls=46224316,usec=155590380,usec_per_call=3.37
cmdstat_cluster:calls=3447,usec=122141,usec_per_call=35.43
cmdstat_client:calls=6064,usec=12486,usec_per_call=2.06
cmdstat_eval:calls=143,usec=20450,usec_per_call=143.01
cmdstat_evalsha:calls=1575332,usec=161869321,usec_per_call=102.75
cmdstat_slowlog:calls=4374,usec=2434197,usec_per_call=556.52
cmdstat_script:calls=143,usec=3982,usec_per_call=27.85
cmdstat_command:calls=4,usec=33079,usec_per_call=8269.75

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=259083,expires=258376,avg_ttl=23767101
hash_init_value: 1493572598

ma...@andyh.io

unread,
May 2, 2017, 8:05:17 PM5/2/17
to redi...@googlegroups.com
If something only happens at particular time, I feel like the applications might do something wrong. Since the network is ok, any chance a client is running on the same machine as the Redis server?

Another way to solve the OOM error is to set maxmemory. So you can debug the issue while Redis is running.

Andy

Sent from my iPhone
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Itamar Haber

unread,
May 3, 2017, 12:49:22 PM5/3/17
to Redis DB
Also.

* Since you're using scripts there is a possibility of a rogue typo (or similar) in one of them, creating keys or who knows what
* Try monitoring cmdstat* (available via `INFO commandstats`) and keyspace metrics (`info keyspace`) to catch a pattern
* Your client_longest and client_biggest values could be benign but seem high

Note: there a partial sync errors, that may or may not be related. Perhaps correlating the logs of the three servers would yield an insight.

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.



--

Itamar Haber | Chief OSS Education Officer
Redis Labs ~/redis

Mobile: +972 (54) 567 9692
Twitter: @itamarhaber
Skype: itamar.haber

RoshanRC

William Davis

unread,
May 3, 2017, 6:00:44 PM5/3/17
to Redis DB
In your opinion what is the most efficient way to identify commands that could be causing this? Short of running monitor I'm not really sure. 
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Tuco

unread,
May 4, 2017, 1:18:53 AM5/4/17
to Redis DB
without running monitor, I think your best bet is by running "info commandstats", "info keyspace" and "client list" like Itamar suggested.
since it happens over a period of 2 mins, you could run a script, which will run "info commandstats" every min and dump it somewhere, may be create a graph.
Whenever OOM happens, you could see which command increases abnormally or which client created any new connection, if any.
Reply all
Reply to author
Forward
0 new messages