Can redis avoid full sync when Master comes back?

1,558 views
Skip to first unread message

张新峰

unread,
Aug 22, 2014, 12:32:56 PM8/22/14
to redi...@googlegroups.com
Hi, all
    We are currently testing Redis to store authcode and accesstoken information for web SSO. Redis provides master-slave for HA which is great, but seems the solution is not perfect. Considering the following case(which is a very common case):
1. Suppose we have two nodes, A and B. B is the slaveof A.
2. After a few days, A is down (or manually take offline for some maintainence job). We can bring B online as master on the fly use the "slave of no one" command.
3. After we fixed A in a few minutes, we set A as the slaveof B. 

    In the previous situations, step 1 and 2 is OK. But for step 3, seems we have to do a full sync? This looks quite silly as most of the data in B is from A. B can just send minutes of incremental data to A and A really didn't have to discard all the previous data set. Especially considering we may use up to 64G memory per instance, both CPU/Memory/IO will be under very large pressure for this operation which is unnecessary. 
    I also noticed that the author of Redis once mentioned that how strong he hates the replication mechanism of MySQL, but I hate more if I have to be called at midnight. So is there any solution we can avoid full sync in step 3?
Can't we just provide the offset information and redis can start sync from that point on? 

Jan-Erik Rediger

unread,
Aug 22, 2014, 1:44:52 PM8/22/14
to redi...@googlegroups.com
Currently the PSYNC approach is a trade-off to ensure to avoid these
mass reloads in case the network breaks and still be consistent between
master and slave when they are able to sync again.
This is done by remembering both the runid of the master and the offset
of the replication buffer (if the offset gets to big a full sync is
needed too).

To be able to use the same mechanism for the case when either the master
or the slave restarts we either need to completeley ignore the runid
(which is different on each start) or somehow cache the runid in the
dump/aof.
This was discussed in an issue[1] and I think on this mailing list
somewhere, too. But as of yet there's no solution, sorry.

[1] https://github.com/antirez/redis/issues/1702
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

张新峰

unread,
Aug 22, 2014, 4:42:12 PM8/22/14
to redi...@googlegroups.com, jan...@fnordig.de
Hi, Janerik
    Thanks for answering. So there's no solution to this problem? Even don't have any workarounds or hacks? I take a look at some proxies like twemproxy or haproxy but couldn't find any thing to solve this issue. In fact, I'm a little bit surprised as I think this is a common and big problem but seems only a few guys realized?
    Yes Redis have replication, yes you can switch to slave when master is down but if you can put master back, then what's the meaning of having replication?

Jan-Erik Rediger

unread,
Aug 22, 2014, 6:55:53 PM8/22/14
to redi...@googlegroups.com
First, I understand your frustration, but please consider less harsh
words (using a bunch of exclamation marks isn't the best solution to
make your point clear).
Makes it much easier for people to be willing to discuss this
with you and work on a solution.

I'll try to answer your questions to the best of my knowledge.

1. Is there a solution to your problem?

Not that I know of. As stated before the current system is not able to
do that.

2. Why have replication at all then?

The meaning of the replication is
1) to have one or more slave instances that can be used to distribute the workload (reads can go to the slaves)
2) to be able to quickly promote a new master instance in case the first
master becomes unreachable (crashes, network issues, whatever)

3. The actual "even worse" situation

Yes, if a RDB is present, Redis will first try to load that.
Simply discarding all that data might not be a good idea in all cases.
Of course in your case, where _you_ know that it would be fine,
discarding would speed things up. Redis can't know that.
(Workaround would be to just move the RDB out of the way yourself)
What if the slave can't reach the master? The data would have been
deleted already. Not a good thing either.

"it transfers this big 64G memory dump to Node A"
Yip, true as well. How should Redis know about consistent data? It
can't know what changed on the master already.
Slaves are exact copies of the master. Re-sync shouldn't break this.
That's why it resyncs the whole data set.

PSYNC helps a little, but it needs to be improved to be usable after an
instance is restarted.



I hope this makes things more clear to you.

I'd be very happy if we can get the replication to work better
for scenarios like your's.

Kind regards,
Jan-Erik

P.S.:
about the memory fragmentation situation: That's due to how the
underlying malloc implementation works. The malloc implementation will
re-use free'd pages of memory, but it can't easily hand back all
allocated memory to the kernel.
The only thing to really return ALL the allocated pages back to the OS
is to exit your application. If you consider your peak memory usage
before you can avoid this. You should also look at the `maxmemory`
setting to avoid Redis eating all your RAM.

On Fri, Aug 22, 2014 at 11:26:25AM -0700, 张新峰 wrote:
> After reviewed Redis source code. Looks like things are even worth.
> In step 3, Node A will first load 64G rdb files from disk, then full
> sync to master. The first step to sync with master is just clear the data
> it just loaded and most part of the data is still useful(!!!!). And Node B
> then will dump its memory ( Why on earth do Redis community think memory
> dump is a good idea and do it all the times? We are very serious users and
> one minute of down time may relate to millions of dollars, dump of memory??
> Are you kidding me??) And then Node B will transfer this big 64G memory
> dump to Node A. (Are you kidding me again?? Even with Gigabyte bandwidth
> network, this may take about 1 minute to transfer and this may consume all
> the network bandwidth and brought the whole site down!!! And the foolish
> thing is that we just loaded most of the data!!!!)
>
> Then why on earth did Redis support replication??? Make thing worse when
> something wrong happened? Especially considering I saw somebody complaining
> that Redis has some memory fragement issue and looks like the suggestion
> from Redis community is just "restart service". (This must be kidding for
> the third time).
>
>
> Forgive me because we are just experimenting Redis and I may
> misunderstand something so some words may look very unpleasant. But we
> really hope there's solution to this very very common problem, so that we
> can use Redis happily ever after.
>
>
> On Saturday, August 23, 2014 1:44:52 AM UTC+8, janerik wrote:
> >
> > an email to redis-db+u...@googlegroups.com <javascript:>.
> > > To post to this group, send email to redi...@googlegroups.com
> > <javascript:>.

张新峰

unread,
Aug 22, 2014, 8:15:09 PM8/22/14
to redi...@googlegroups.com, jan...@fnordig.de
Hi, Erik

    Thanks for the patient answer. I may looked a little bit emotional but in deep I really liked Redis for its high efficiency.
    In fact, facing these facts, we are even considering to change the Redis source code to make it support the real "partial sync". Do you think its gona be a very big effort?  The basic idea is like this:
    1. Master will write another copy of AOF and with sequence number before each command. (Let's name it as "master_replication.log" ) 
     2. Each time master send command to slave, it will also send the log file number and cmd sequence number to client.
     3. Slave will write this data into another file called slave_replication.log.

     4. When master is down. Slave will take the master role and the sequence number continues.
     5. When master is started and we will send a "forcesync slaveIP sequence number". If the new master think the sequence number is on its log file, it will send commands from this point on. Or else, we will fall back to previous psync mechanism.

    This is just a rough idea and may have a lot of things to handle, but I think as Redis is an great open source software, people should add this feature.

yin qiwen

unread,
Sep 2, 2014, 8:14:08 AM9/2/14
to redi...@googlegroups.com, jan...@fnordig.de
In Ardb(a redis clone build with rocksdb/leveldb/lmdb, https://github.com/yinqiwen/ardb), we implement such feature by a design described below,the design is very like redis 2.8's design for replication partial resync, it's easy to port to redis, but i'm not sure it would work well with redis, since there is too many persistence work which may block redis. 
Anyway, the design is simple if you already know how redis work for partial resync.  
  • Master/Slave both create a  buffer by 'mmap', it has two part:  the first header part is the sync state which is a fixed size struct, it have runid, sync offset,  etc,  the second part is the replication backlog content. The buffer would be flushed to disk every 5 secs.
  • If slave connected master, master  decide if slave should do a full resync, or a partial resync., just like redis.(There is another check sum part in 'psync' like command to let master to check if it's a valid offset, which redis don't have in 'PSYNC')
  • After slave synced with master, slave would save master's runid, sync offset, replication log checksum into sync state. Slave would replace it's runid by master's runid.
  • When master is down, slave have a runid like previous master, also the same replication backlog state.
  • When old master restarted, if you set it as slaveof current master, master would treat it like a disconnected slave, at most of time it would do a partial resync not a full resync. 
  • There is a check sum for replication log, which could avoid data consistency problem in replication.


在 2014年8月23日星期六UTC+8上午8时15分09秒,张新峰写道:
Reply all
Reply to author
Forward
0 new messages