Redis master loses connection with slaves

Alexey Mravyan

unread,

Dec 19, 2018, 10:08:39 AM12/19/18

to Redis DB

Hi all!

I have 3 redis nodes (v.4.11, sentinel, 1 master + 2 slaves) and faced a strange problem couple of weeks ago: master suddenly switches to another host because both slaves connections are lost.

I suggested, that the reason is in client-output-buffer-limit for slaves connections, increased it, but nothing changed. I don't think that it's network related issue because other services on this VMs are stable.

Could you please give an advice, what could be the reason of the problem?

Logs, configs and info output are below

Log from master:

40190:M 12 Dec 14:10:23.083 # Connection with slave client id #131352 lost.

40190:M 12 Dec 14:10:29.927 # Connection with slave client id #163853 lost.

40190:S 12 Dec 14:10:34.012 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.

40190:S 12 Dec 14:10:34.012 * SLAVE OF 10.10.10.2:6381 enabled (user request from 'id=128421220 addr=10.10.10.1:24446 fd=558 name=sentinel-d8e35ae9-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=

3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')

40190:S 12 Dec 14:10:34.045 # CONFIG REWRITE executed with success.

40190:S 12 Dec 14:10:34.882 * Connecting to MASTER 10.10.10.2:6381

40190:S 12 Dec 14:10:34.882 * MASTER <-> SLAVE sync started

40190:S 12 Dec 14:10:34.883 * Non blocking connect for SYNC fired the event.

40190:S 12 Dec 14:10:34.884 * Master replied to PING, replication can continue...

40190:S 12 Dec 14:10:34.885 * Trying a partial resynchronization (request 18c067489c9d03637f584dd61d9d609c5e61a309:2298722417497).

40190:S 12 Dec 14:10:34.886 * Full resync from master: 9055605628b944555920df49d918c4335b5f2d0b:2298709630285

40190:S 12 Dec 14:10:34.886 * Discarding previously cached master state.

14154:C 12 Dec 14:10:36.689 * DB saved on disk

14154:C 12 Dec 14:10:36.748 * RDB: 154 MB of memory used by copy-on-write

40190:S 12 Dec 14:10:36.885 * Background saving terminated with success

40190:S 12 Dec 14:10:39.703 * MASTER <-> SLAVE sync: receiving 943754945 bytes from master

40190:S 12 Dec 14:10:44.794 * MASTER <-> SLAVE sync: Flushing old data

40190:S 12 Dec 14:12:25.056 * MASTER <-> SLAVE sync: Loading DB in memory

40190:S 12 Dec 14:12:59.022 * MASTER <-> SLAVE sync: Finished with success

40190:S 12 Dec 14:12:59.022 * 10000 changes in 60 seconds. Saving...

Log from one of the slaves:

10735:M 12 Dec 14:10:23.080 # Setting secondary replication ID to 18c067489c9d03637f584dd61d9d609c5e61a309, valid up to offset: 2298709619422. New replication ID is 9055605628b944555920df49d918c4335b5f2d0b

10735:M 12 Dec 14:10:23.080 # Connection with master lost.

10735:M 12 Dec 14:10:23.080 * Caching the disconnected master state.

10735:M 12 Dec 14:10:23.080 * Discarding previously cached master state.

10735:M 12 Dec 14:10:23.080 * MASTER MODE enabled (user request from 'id=131363 addr=10.10.10.2:42426 fd=19 name= age=781175 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=r cmd=exec')

10735:M 12 Dec 14:10:23.097 # CONFIG REWRITE executed with success.

10735:M 12 Dec 14:10:24.326 * Slave 10.10.10.1:6380 asks for synchronization

10735:M 12 Dec 14:10:24.326 * Partial resynchronization not accepted: Requested offset for second ID was 2298710003403, but I can reply up to 2298709619422

10735:M 12 Dec 14:10:24.326 * Starting BGSAVE for SYNC with target: disk

10735:M 12 Dec 14:10:24.413 * Background saving started by pid 45786

10735:M 12 Dec 14:10:34.885 * Slave 10.10.10.3:6382 asks for synchronization

10735:M 12 Dec 14:10:34.885 * Partial resynchronization not accepted: Requested offset for second ID was 2298722417497, but I can reply up to 2298709619422

10735:M 12 Dec 14:10:34.885 * Waiting for end of BGSAVE for SYNC

45786:C 12 Dec 14:10:39.470 * DB saved on disk

redis.conf

bind 10.10.10.1

protected-mode yes

port 6382

tcp-backlog 511

timeout 0

tcp-keepalive 300

daemonize no

supervised systemd

pidfile "/var/run/redis/redis.pid"

loglevel notice

logfile "/var/log/redis/redis.log"

databases 16

save 900 1

save 300 10

save 60 10000

stop-writes-on-bgsave-error yes

rdbcompression yes

rdbchecksum yes

dbfilename "dump.rdb"

dir "/var/lib/redis"

slave-serve-stale-data yes

slave-read-only yes

repl-diskless-sync no

repl-diskless-sync-delay 5

repl-disable-tcp-nodelay no

slave-priority 100

appendonly no

appendfilename "appendonly.aof"

appendfsync everysec

no-appendfsync-on-rewrite no

auto-aof-rewrite-percentage 100

auto-aof-rewrite-min-size 64mb

aof-load-truncated yes

lua-time-limit 5000

slowlog-log-slower-than 10000

slowlog-max-len 128

latency-monitor-threshold 0

notify-keyspace-events ""

hash-max-ziplist-entries 512

hash-max-ziplist-value 64

list-max-ziplist-size -2

list-compress-depth 0

set-max-intset-entries 512

zset-max-ziplist-entries 128

zset-max-ziplist-value 64

hll-sparse-max-bytes 3000

activerehashing yes

client-output-buffer-limit normal 0 0 0

client-output-buffer-limit slave 1000000kb 1000000kb 0

client-output-buffer-limit pubsub 32mb 8mb 60

hz 10

aof-rewrite-incremental-fsync yes

redis-sentinel.conf

bind 10.10.10.3

port 16382

sentinel myid 58ea906be61fb05e6fa14f95b504b47b51a754bb

sentinel deny-scripts-reconfig yes

sentinel monitor im-redis-cluster 10.10.10.3 6382 2

sentinel down-after-milliseconds im-redis-cluster 5000

dir "/"

maxclients 4064

sentinel failover-timeout im-redis-cluster 10000

sentinel config-epoch im-redis-cluster 68

sentinel leader-epoch im-redis-cluster 68

sentinel known-slave im-redis-cluster 10.10.10.2 6381

sentinel known-slave im-redis-cluster 10.10.10.1 6380

sentinel known-sentinel im-redis-cluster 10.10.10.1 16380 d8e35ae957fcd2bd110082c63fb53c1508d130b1

sentinel known-sentinel im-redis-cluster 10.10.10.2 16381 7da4777423c7c27fe1d053a737e1cdb29d44c064

sentinel current-epoch 68

INFO

# Server

redis_version:4.0.11

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:bf99aef9569f9a49

redis_mode:standalone

os:Linux 3.10.0-514.16.1.el7.x86_64 x86_64

arch_bits:64

multiplexing_api:epoll

atomicvar_api:atomic-builtin

gcc_version:7.3.1

process_id:40190

run_id:1738cb50f6a22e675ef52a5179b16a7b962b97af

tcp_port:6382

uptime_in_seconds:1922929

uptime_in_days:22

hz:10

lru_clock:1725481

executable:/usr/bin/redis-server

config_file:/etc/redis/redis.conf

# Clients

connected_clients:709

client_longest_output_list:5

client_biggest_input_buf:147

blocked_clients:21

# Memory

used_memory:3030200720

used_memory_human:2.82G

used_memory_rss:2448142336

used_memory_rss_human:2.28G

used_memory_peak:4485345048

used_memory_peak_human:4.18G

used_memory_peak_perc:67.56%

used_memory_overhead:146898953

used_memory_startup:786576

used_memory_dataset:2883301767

used_memory_dataset_perc:95.18%

total_system_memory:33560887296

total_system_memory_human:31.26G

used_memory_lua:130048

used_memory_lua_human:127.00K

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

mem_fragmentation_ratio:0.81

mem_allocator:jemalloc-4.0.3

active_defrag_running:0

lazyfree_pending_objects:0

# Persistence

loading:0

rdb_changes_since_last_save:349101

rdb_bgsave_in_progress:0

rdb_last_save_time:1545229295

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:23

rdb_current_bgsave_time_sec:-1

rdb_last_cow_size:159281152

aof_enabled:0

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_last_cow_size:0

# Stats

total_connections_received:131904200

total_commands_processed:11228834066

instantaneous_ops_per_sec:16453

total_net_input_bytes:3478041956927

total_net_output_bytes:5876168155463

instantaneous_input_kbps:7567.01

instantaneous_output_kbps:36588.43

rejected_connections:0

sync_full:7

sync_partial_ok:0

sync_partial_err:7

expired_keys:38492426

expired_stale_perc:8.57

expired_time_cap_reached_count:0

evicted_keys:0

keyspace_hits:5423342169

keyspace_misses:35443446

pubsub_channels:40117

pubsub_patterns:62

latest_fork_usec:97519

migrate_cached_sockets:0

slave_expires_tracked_keys:0

active_defrag_hits:0

active_defrag_misses:0

active_defrag_key_hits:0

active_defrag_key_misses:0

# Replication

role:master

connected_slaves:2

slave0:ip=10.10.10.2,port=6381,state=online,offset=3439821078815,lag=0

slave1:ip=10.10.10.3,port=6380,state=online,offset=3439820626219,lag=0

master_replid:8be873e62c983d5d482c0bdf1e5d1afc312a2e75

master_replid2:9055605628b944555920df49d918c4335b5f2d0b

master_repl_offset:3439824975811

second_repl_offset:3393464791547

repl_backlog_active:1

repl_backlog_size:1048576

repl_backlog_first_byte_offset:3439823927236

repl_backlog_histlen:1048576

# CPU

used_cpu_sys:44282.73

used_cpu_user:95441.03

used_cpu_sys_children:37332.18

used_cpu_user_children:292517.78

# Cluster

cluster_enabled:0

# Keyspace

db0:keys=1002479,expires=261897,avg_ttl=91790875

db1:keys=749050,expires=748989,avg_ttl=3444468

db3:keys=12,expires=12,avg_ttl=643424402

db4:keys=2230,expires=2230,avg_ttl=77709518

db5:keys=2,expires=1,avg_ttl=259246

hva...@gmail.com

unread,

Dec 19, 2018, 1:11:45 PM12/19/18

to Redis DB

The master/slave status of each of your Redis instances is controlled by Sentinel. If you want to find out why the master role was switched to another host, look in the Sentinel logs for the reason.

Alexey Mravyan

unread,

Dec 20, 2018, 3:08:35 AM12/20/18

to Redis DB

Unfortunately I don't see any useful information(the reason of switching) in sentinel logs.

Logs are below:

slave (10.10.10.1)

Dec 12 14:10:22 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:22.893 # +new-epoch 67

Dec 12 14:10:22 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:22.896 # +vote-for-leader 7da4777423c7c27fe1d053a737e1cdb29d44c064 67

Dec 12 14:10:23 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:23.971 # +config-update-from sentinel 7da4777423c7c27fe1d053a737e1cdb29d44c064 10.10.10.2 16381 @ im-redis-cluster 10.10.10.3 6

Dec 12 14:10:23 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:23.971 # +switch-master im-redis-cluster 10.10.10.3 6382 10.10.10.2 6381

Dec 12 14:10:23 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:23.971 * +slave slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:23 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:23.971 * +slave slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:34 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:34.010 * +convert-to-slave slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:48 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:48.869 # +sdown slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:49 im-v-db02.mlg.ru redis-server[337]: 337:X 12 Dec 14:10:49.912 # +sdown slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

slave (10.10.10.2)

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.748 # +sdown master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.811 # +odown master im-redis-cluster 10.10.10.3 6382 #quorum 2/2

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.811 # +new-epoch 67

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.811 # +try-failover master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.838 # +vote-for-leader 7da4777423c7c27fe1d053a737e1cdb29d44c064 67

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.898 # d8e35ae957fcd2bd110082c63fb53c1508d130b1 voted for 7da4777423c7c27fe1d053a737e1cdb29d44c064 67

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.914 # +elected-leader master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.914 # +failover-state-select-slave master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.980 # +selected-slave slave 10.10.10.2:6381 10.10.10.2 6381 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:22 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:22.980 * +failover-state-send-slaveof-noone slave 10.10.10.2:6381 10.10.10.2 6381 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.080 * +failover-state-wait-promotion slave 10.10.10.2:6381 10.10.10.2 6381 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.104 # 58ea906be61fb05e6fa14f95b504b47b51a754bb voted for 7da4777423c7c27fe1d053a737e1cdb29d44c064 67

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.146 # -sdown master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.146 # -odown master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.908 # +promoted-slave slave 10.10.10.2:6381 10.10.10.2 6381 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.908 # +failover-state-reconf-slaves master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:23.972 * +slave-reconf-sent slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:24 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:24.965 * +slave-reconf-inprog slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 # +failover-end-for-timeout master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 # +failover-end master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 * +slave-reconf-sent-be slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 * +slave-reconf-sent-be slave 10.10.10.2:6381 10.10.10.2 6381 @ im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 # +switch-master im-redis-cluster 10.10.10.3 6382 10.10.10.2 6381

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 * +slave slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:33 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:33.934 * +slave slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:48 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:48.403 # +sdown slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:50 im-v-db03.mlg.ru redis-server[8912]: 8912:X 12 Dec 14:10:50.554 # +sdown slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

master (10.10.10.3)

Dec 12 14:10:22 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:22.800 # +sdown master im-redis-cluster 10.10.10.3 6382

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.089 # +new-epoch 67

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.089 # +tilt #tilt mode entered

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.105 # +vote-for-leader 7da4777423c7c27fe1d053a737e1cdb29d44c064 67

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.973 # +config-update-from sentinel 7da4777423c7c27fe1d053a737e1cdb29d44c064 10.10.10.2 16381 @ im-redis-cluster 10.91.128.1

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.973 # +switch-master im-redis-cluster 10.10.10.3 6382 10.10.10.2 6381

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.973 * +slave slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:23 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:23.973 * +slave slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:53 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:53.113 # -tilt #tilt mode exited

Dec 12 14:10:53 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:53.114 # +sdown slave 10.10.10.3:6382 10.10.10.3 6382 @ im-redis-cluster 10.10.10.2 6381

Dec 12 14:10:53 im-v-db04.mlg.ru redis-server[32627]: 32627:X 12 Dec 14:10:53.114 # +sdown slave 10.10.10.1:6380 10.10.10.1 6380 @ im-redis-cluster 10.10.10.2 6381

среда, 19 декабря 2018 г., 21:11:45 UTC+3 пользователь hva...@gmail.com написал:

Tyler Sullens

unread,

Dec 20, 2018, 11:07:25 AM12/20/18

to redi...@googlegroups.com

You mentioned increasing client-output-buffer-limit, what did you change it to? I’m guessing you’re seeing this happen periodically, if that’s correct how often are you experiencing it?

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

hva...@gmail.com

unread,

Dec 20, 2018, 1:07:08 PM12/20/18

to Redis DB

The first two line in the sentinel log you posted from 10.10.10.2 and from 10.10.10.3 say '+sdown', indicating they believe the master is down (not reachable or not responding to their probes). The second line from 10.10.10.2 says '+odown' which indicates it knows that multiple sentinels in the quorum agree that the master is down. I found descriptions of these messages about halfway down this page on using Sentinel: https://redis.io/topics/sentinel

If I'm correctly interpreting your host labels above the log lines, one of the sentinels which said it cannot get a response from the master Redis was on the same machine as the master Redis (10.10.10.2). The sentinel on 10.10.10.1 did not log an indication that it thought the master was down. This combination suggests the Redis server didn't crash or the host become unavailable on the network, but rather the Redis server process was too slow to respond, and the slowness made 2 of the 3 sentinels think it was down.

If you keep slow logs on your Redis servers, you can check those to see if there was a command that monopolized the server for too long. See the warning for the KEYS command (https://redis.io/commands/keys) as an example of such commands, though LUA scripts can also have this effect. The latency troubleshooting page may be helpful in determining whether or not it was from a slow-responding Redis process: https://redis.io/topics/latency

Your performance graphs for these servers can also give indications of whether there were unusual spikes in cpu or ram or disk-i/o or network consumption around the time the sentinels said the master Redis was down/unresponsive.

Alexey Mravyan

unread,

Dec 21, 2018, 1:59:07 AM12/21/18

to Redis DB

At first client-output-buffer-limit was absolutely default (normal 0 0 0 slave 256mb 64mb 60 pubsub 32mb 8mb 60). Then I've increased slave limit to 2Gb, but it changed nothing (config set client-output-buffer-limit "slave 2048000000 2048000000 0"). After the last master switching I've set all limits to 0 (no limits at all, slave 0 0 0 pubsub0 0 0).

I face this issue every 3-6 days for last 2 or 3 weeks and I can't detect any changes in cpu/ram/network consumption before master switches.

четверг, 20 декабря 2018 г., 19:07:25 UTC+3 пользователь Tyler Sullens написал:

Alexey Mravyan

unread,

Dec 21, 2018, 2:03:13 AM12/21/18

to Redis DB

Thanks a lot,will look through the latency topic.

But unfortunately I d'ont think it could be related, because there is no change in cpu/ram/network consumption, number of redis clients and their activity before master switches. I've already faced this problem in the morning (less clients, less load) and during working hours (more load)

четверг, 20 декабря 2018 г., 21:07:08 UTC+3 пользователь hva...@gmail.com написал:

hva...@gmail.com

unread,

Dec 21, 2018, 5:44:34 AM12/21/18

to Redis DB

Have you configured a slow log to find out if the master is receiving commands that will prevent redis from responding to other client commands? There is also the question of floods of new connections. If your clients connect only as long as it takes to send a command and get the reply, then disconnect, your master may be receiving many new connections at the same time. This can tie redis up in accepting the new connections and leave it with little left over to process commands. If your clients keep their connections open to the master then this is not likely the issue.

hva...@gmail.com

unread,

Dec 21, 2018, 12:29:39 PM12/21/18

to Redis DB

I just noticed the "tilt mode" lines from the sentinel logs on the 10.10.10.3 server. Sentinel enters "tilt mode" when there is a jump in timestamps. Your machines likely do not have their time clocks synchronized via NTP. Different time readings on servers can cause trouble in a number of different ways. This might be the cause of the unplanned failover, or a contributing factor.

On Thursday, December 20, 2018 at 11:03:13 PM UTC-8, Alexey Mravyan wrote:

Reply all

Reply to author

Forward