Restart redis nodes with no downtime

jorged...@gmail.com

unread,

Feb 10, 2017, 5:31:46 PM2/10/17

to Redis DB

Hi here. So, we have a Redis cluster with 3 masters and 3 slaves. Redis is accessed from our web servers and our long-processing-jobs servers (among other servers). We want to restart our server nodes, however, whenever we have done it, we will suffer from latency spikes (around 2-3 minutes each). Each node holds about 50GB of data. We use phpredis C extension to connect to the cluster. We were expecting to have clients redirected accordingly.

So, this is what we ran:

1.- On slave node "redis-cli shutdown" (this causes a latency spike). Right after, the slave shows as fail from "redis-cli cluster nodes" output from another node

aaaaaac548698c033267c74b6099e4a434971fbb 192.0.0.2:6379 myself,slave aaaaaa4dd75553c3b6735b9c6060d9c511db0519 0 0 44 connected

aaaaaaa2d4a5c0161a0e1ae4dbba021a9d8a9761 192.0.0.3:6379 slave,fail aaaaaa850b22df65413e1524ce481412d87c6b7c 1486099567186 1486099565185 50 disconnected

aaaaaa163054dd43c744cdf7e58dc56d9e49e1e5 192.0.0.4:6379 master - 0 1486099621277 49 connected 10923-16383

aaaaaa850b22df65413e1524ce481412d87c6b7c 192.0.0.5:6379 master - 0 1486099620777 50 connected 0-5460

aaaaaab8d3f60987d09365276e52414511308476 192.0.0.6:6379 slave aaaaaa163054dd43c744cdf7e58dc56d9e49e1e5 0 1486099619777 49 connected

aaaaaa4dd75553c3b6735b9c6060d9c511db0519 192.0.0.7:6379 master - 0 1486099620277 48 connected 5461-10922

2.- Reboot slave server and wait for node to rejoin the cluster. This causes a second latency spike once the slave receives the dataset from the master and restarts itself to load the dataset into memory. This is the output from "redis-cli cluster nodes" from the rebooted server (slave)

aaaaaac548698c033267c74b6099e4a434971fbb 192.0.0.2:6379 slave aaaaaa4dd75553c3b6735b9c6060d9c511db0519 0 0 48 connected

aaaaaaa2d4a5c0161a0e1ae4dbba021a9d8a9761 192.0.0.3:6379 myself,slave aaaaaa850b22df65413e1524ce481412d87c6b7c 0 0 47 connected

aaaaaa163054dd43c744cdf7e58dc56d9e49e1e5 192.0.0.4:6379 master - 0 1486100401395 49 connected 10923-16383

aaaaaa850b22df65413e1524ce481412d87c6b7c 192.0.0.5:6379 master - 0 1486100399863 50 connected 0-5460

aaaaaab8d3f60987d09365276e52414511308476 192.0.0.6:6379 slave aaaaaa163054dd43c744cdf7e58dc56d9e49e1e5 0 1486100399352 49 connected

aaaaaa4dd75553c3b6735b9c6060d9c511db0519 192.0.0.7:6379 master - 0 1486100400373 48 connected 5461-10922

3.- Trigger a failover so the slave becomes the master, from slave "redis-cli cluster failover" (this causes another latency spike). So, the old master becomes a slave and then restarts and loads the dataset into memory.

So, it took us 3 latency spikes to get just a server restarted and switch the master/slave roles. Are there any other better ways to restart redis nodes with no downtime? Maybe by tweaking some settings? Any help will be appreciated. Thanks!

Jorge

jorged...@gmail.com

unread,

Feb 10, 2017, 5:33:38 PM2/10/17

to Redis DB

BTW, our redis version is 3.0.2

ma...@andyh.io

unread,

Feb 10, 2017, 10:21:48 PM2/10/17

to redi...@googlegroups.com

Let me reply by the steps you stated.

1. Do you read from slave?

2. This is expected and difficult to solve. Because when the slave rejoins the cluster, master will generate a full rdb and send to the slave. This will consume CPU and network resource.

3. It's the same process as 2.

Right now with how Redis replication works, the full sync can cause latency spike. I heard that with the coming 4.x, the problem will be fixed. At the moment, it is recommending to decrease the size of single Redis instance therefore the size of rdb file is smaller and take less time to send to slave.

Andy

Sent from my iPhone

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

jorged...@gmail.com

unread,

Feb 13, 2017, 12:48:37 AM2/13/17

to Redis DB

Thanks for the details. Yeah, some clients were reading from the slave. Looking forward to the new upcoming release soon.