Redis suddenly using much more ram (40 to 60%)

2,098 views
Skip to first unread message

Léo FERLIN SUTTON

unread,
Mar 14, 2018, 1:12:34 PM3/14/18
to Redis DB
Hello !

We have had a problem where our redis instance gets OOM-Killed regularly and we would like to fix the underlying problem.

We have a redis instance and a slave instance. Only the master instance is used for queries.

We have noticed two strange behaviors : 
  1. The ram used by redis increases a lot (40 to 60% increase) before it gets OOM-Killed. Without any big changes in the dbsize.
  2. The slave seems to disconnect and the slave's instance becomes unreachable.
We don't yet have maxmemory configured on the redis instace, and will add it. However we would still like to understand why the memory used by redis increases so much.

I was hoping someone here could point us in the right direction to better understand what is going on ?

Hardware information:

Google cloud instance :  
n1-highmem-8 (8 vCPUs, 52 GB memory)
Intel Ivy Bridge

Software information: 

Redis 3.2.6 is running on a debian 9.3.

Bellow are the keys we changed compared to the default redis configuration found on `http://download.redis.io/redis-stable/redis.conf`

bind 0.0.0.0
protected-mode no
timeout
300
daemonize yes
supervised systemd
pidfile
/var/run/redis/redis-server.pid
logfile
/var/log/redis/redis-server.log
syslog
-enabled yes
syslog
-ident redis
syslog
-facility user
save
3600 1
dir
/var/lib/redis
requirepass placeholder
client
-output-buffer-limit slave 512mb 512mb 0
hz
100



Redis usage : 

Unless a big oversight only two request are made to the redis instance : 

"AUTH" "placeholder"
"SET" "guid" "1" "NX" "EX" "604800"


Metrology and Logs: 

Our redis instace uses around 28Gb of ram for ~250 millions keys. The expire on a 7 days basis so the number varies a bit, going up and down.

Log file on the master : 
2018-03-14T13:43:30.009333+00:00 user notice redis[32161]: 1 changes in 3600 seconds. Saving...
2018-03-14T13:43:30.927514+00:00 user notice redis[32161]: Background saving started by pid 16157
2018-03-14T13:50:31.684574+00:00 user notice redis[16157]: DB saved on disk
2018-03-14T13:50:32.334737+00:00 user notice redis[16157]: RDB: 2280 MB of memory used by copy-on-write
2018-03-14T13:50:33.535211+00:00 user notice redis[32161]: Background saving terminated with success
2018-03-14T14:05:19.732484+00:00 user warning redis[32161]: Disconnecting timedout slave: 10.16.77.3:<unknown-slave-port>
2018-03-14T14:05:19.733100+00:00 user warning redis[32161]: Connection with slave 10.16.77.3:<unknown-slave-port> lost
2018-03-14T14:50:34.024001+00:00 user notice redis[32161]: 1 changes in 3600 seconds. Saving...
2018-03-14T14:50:35.469305+00:00 user notice redis[32161]: Background saving started by pid 23900
2018-03-14T14:50:52.408604+00:00 user warning redis[32161]: Background saving terminated by signal 9
2018-03-14T14:50:52.422279+00:00 user notice redis[32161]: 1 changes in 3600 seconds. Saving...
2018-03-14T14:50:53.756582+00:00 user notice redis[32161]: Background saving started by pid 23957
2018-03-14T14:51:14.711354+00:00 user warning redis-server: systemd supervision requested, but NOTIFY_SOCKET not found
2018-03-14T14:51:14.718156+00:00 user notice redis[24001]: Redis 3.2.6 (00000000/0) 64 bit, standalone mode, port 6379, pid 24001 ready to start.
2018-03-14T14:51:14.718493+00:00 user warning redis[24001]: WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2018-03-14T14:51:14.718629+00:00 user warning redis[24001]: Server started, Redis version 3.2.6


Log file on Slave : 
2018-03-14T12:57:03.051199+00:00 pr-li-redi03-0001 user notice redis[812]: 1 changes in 3600 seconds. Saving...
2018-03-14T12:57:04.334096+00:00 pr-li-redi03-0001 user notice redis[812]: Background saving started by pid 16703
2018-03-14T13:03:26.783467+00:00 pr-li-redi03-0001 user notice redis[16703]: DB saved on disk
2018-03-14T13:03:27.608616+00:00 pr-li-redi03-0001 user notice redis[16703]: RDB: 2132 MB of memory used by copy-on-write
2018-03-14T13:03:28.944960+00:00 pr-li-redi03-0001 user notice redis[812]: Background saving terminated with success
2018-03-14T14:03:29.041233+00:00 pr-li-redi03-0001 user notice redis[812]: 1 changes in 3600 seconds. Saving...
2018-03-14T14:03:30.514384+00:00 pr-li-redi03-0001 user notice redis[812]: Background saving started by pid 25983



result of an INFO command a few minutes before the redis was OOM-Killed : 



127.0.0.1:6379> info
# Server
redis_version
:3.2.6
redis_git_sha1
:00000000
redis_git_dirty
:0
redis_build_id
:826601c992442478
redis_mode
:standalone
os
:Linux 4.9.0-5-amd64 x86_64
arch_bits
:64
multiplexing_api
:epoll
gcc_version
:6.2.1
process_id
:32161
run_id
:886cd587c32e9de75cb05cc9c57cbed576fe27bc
tcp_port
:6379
uptime_in_seconds
:80374
uptime_in_days
:0
hz
:100
lru_clock
:11089526
executable
:/usr/bin/redis-server
config_file
:/etc/redis/redis.conf


# Clients
connected_clients
:1
client_longest_output_list
:0
client_biggest_input_buf
:0
blocked_clients
:0


# Memory
used_memory
:45283130024
used_memory_human
:42.17G
used_memory_rss
:45191077888
used_memory_rss_human
:42.09G
used_memory_peak
:47366804176
used_memory_peak_human
:44.11G
total_system_memory
:54881566720
total_system_memory_human
:51.11G
used_memory_lua
:37888
used_memory_lua_human
:37.00K
maxmemory
:0
maxmemory_human
:0B
maxmemory_policy
:noeviction
mem_fragmentation_ratio
:1.00
mem_allocator
:jemalloc-3.6.0


# Persistence
loading
:0
rdb_changes_since_last_save
:2589625
rdb_bgsave_in_progress
:0
rdb_last_save_time
:1521035433
rdb_last_bgsave_status
:ok
rdb_last_bgsave_time_sec
:423
rdb_current_bgsave_time_sec
:-1
aof_enabled
:0
aof_rewrite_in_progress
:0
aof_rewrite_scheduled
:0
aof_last_rewrite_time_sec
:-1
aof_current_rewrite_time_sec
:-1
aof_last_bgrewrite_status
:ok
aof_last_write_status
:ok


# Stats
total_connections_received
:34496229
total_commands_processed
:68757698
instantaneous_ops_per_sec
:1104
total_net_input_bytes
:4795457932
total_net_output_bytes
:108114199998
instantaneous_input_kbps
:74.95
instantaneous_output_kbps
:2959.37
rejected_connections
:0
sync_full
:1
sync_partial_ok
:0
sync_partial_err
:0
expired_keys
:7225216
evicted_keys
:0
keyspace_hits
:0
keyspace_misses
:0
pubsub_channels
:0
pubsub_patterns
:0
latest_fork_usec
:917511
migrate_cached_sockets
:0


# Replication
role
:master
connected_slaves
:0
master_repl_offset
:2931347471
repl_backlog_active
:1
repl_backlog_size
:1048576
repl_backlog_first_byte_offset
:2930298896
repl_backlog_histlen
:1048576


# CPU
used_cpu_sys
:4071.43
used_cpu_user
:4844.53
used_cpu_sys_children
:420.82
used_cpu_user_children
:7169.69


# Cluster
cluster_enabled
:0


# Keyspace
db0
:keys=270353286,expires=270353286,avg_ttl=297391153


I also attached a screenshot of our redis used memory graph around the time of the incident. Unfortunately I didn't have the idea to do a redis-cli monitor.

Thank you for ready this post !

Regards,

Léo
Screenshot from 2018-03-14 17-13-00.png

Josiah Carlson

unread,
Apr 19, 2018, 1:43:43 PM4/19/18
to redi...@googlegroups.com
Hopping in on this way late, but I've seen this before:

1. your slave gets behind / disconnects during heavy load (or a network card, cable, etc., is crap and you lose the connection)
2. and because you are getting more than 1 megabyte of write backlog, your reconnecting slave == fork (I'd at least increase that to 512 megs)
3. because your master is still getting heavy write load, memory quickly increases until you've basically got 2 copies of your "active" data in memory
4. as your second process is blasting data to disk, it eventually finishes, but perhaps not before one or both processes get OOM-killed due to your machine running out of memory

My suggestions:
* increase `repl_backlog_size` to something reasonable, say 512 megs or even 1-2 gigs (this is how much memory you are willing to dedicate to recently reconnected slaves)
* maybe increase `client-output-buffer-limit slave` to 1 gig, add warnings and monitors when these things happen so you can watch the disconnect, etc. to find the proper sizes.

That *should* mostly solve your fork on disconnect problems, then it's just an issue to make sure your disconnect doesn't happen, and knowing the volume of your writes in the time it takes for your machine to save to disk, then for the slave to get a full resync. Better to try to prevent the full resync by ensuring it doesn't disconnect, and having a big enough buffer to be able to do a PSYNC.

 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Salvatore Sanfilippo

unread,
Apr 20, 2018, 4:19:39 AM4/20/18
to redi...@googlegroups.com
Thanks for the reply Josiah, I want just to add an info that may be useful:

> 1. your slave gets behind / disconnects during heavy load (or a network card, cable, etc., is crap and you lose the connection)

About that, recently Redis Labs devs fixed a bug that caused this
issue in different scenarios.
Basically the output buffers handling code had a cap to transfer at
max a given number of bytes per client, to avoid losing too time in
the callback to serve a single client that requested some huge amount
of data (see KEYS *).
However when I implemented it I completely forgot that the same code
path, being used by Slaves as well, would limit the output bandwidth
that a slave could peak. This made slaves lagging behind in different
very high load scenarios. Now this is fixed.

Commit is 548e4fe088e003b334b0914fa7621606b821df79 and should be
implemented starting with 4.0.7, so the latest has it.

Salvatore
>> email to redis-db+u...@googlegroups.com.
>> To post to this group, send email to redi...@googlegroups.com.
>> Visit this group at https://groups.google.com/group/redis-db.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Léo FERLIN SUTTON

unread,
May 18, 2018, 9:36:42 AM5/18/18
to Redis DB
Thank you both !

I'm coming back late to this issue. I will try out your suggestions.

Regards,

Léo
Reply all
Reply to author
Forward
0 new messages