Redis master -> slave replication.

Sairam M P

unread,

Feb 5, 2016, 8:29:34 AM2/5/16

to Redis DB

Hi,

We are using redis-2.8.17, as master -> slave setup. Currently we use disk based replication. During full sync, disk based replication writes the data to master disk and then transfers it to slave disk. To reduce the time taken to configure full sync, we tried diskless replication in the latest redis, but its diskless only with respect to master and in slave the data is written to the disk before loading it to memory.

Is there an option to stream data from master's memory to slave's memory, there by eliminating the time taken to write to slave's disk before loading to memory.

If there is no option to stream data directly to slave's memory, is there a way i can speed up the process of loading the data from disk to memory. For our case its 11GB of data in disk, which takes ~6mins to load.

-Sairam.

The Real Bill

unread,

Feb 6, 2016, 12:02:21 AM2/6/16

to Redis DB

"We are using redis-2.8.17, as master -> slave setup. Currently we use disk based replication. During full sync, disk based replication writes the data to master disk and then transfers it to slave disk. To reduce the time taken to configure full sync, we tried diskless replication in the latest redis, but its diskless only with respect to master and in slave the data is written to the disk before loading it to memory. "

There is no "disk based replication" in Redis. If you transfer over an RDB you have to restart the slave which will then discard the RDB you copied over. All replication for Redis is from Redis directly to Redis. Yes, the slave saves the RDB to disk, no you can't disable that. it does it to prevent a large delay at the end when it needs to load it in. If it were to load the RDB from memory you'd wind up needing about 2X the memory - one of the in-memory RDB it is loading from and one for the end-point data. For smaller data sets that might be fine, but in this case you're talking 22GB or more of memory needed to load your 11GB.

That said I am curious as to how much quicker it would be with a memory only rdbLoad call. i doubt it is worth the price of the memory at the larger end of the data set size scale.

"Is there an option to stream data from master's memory to slave's memory, there by eliminating the time taken to write to slave's disk before loading to memory.

If there is no option to stream data directly to slave's memory, is there a way i can speed up the process of loading the data from disk to memory. For our case its 11GB of data in disk, which takes ~6mins to load. "

How often are you doing this? You should be firing up the system once and letting Redis keep it in sync natively and naturally. If you're copying over the RDB file you're just spinning your wheels. If you are restarting your slave often you will not be happy. If you're experiencing disconnects between the master and slave causing a full resync you need to figure out why and fix that. Even on naked GBe it will take a couple minutes to transfer 11GB+ of data.

So:
1. Set up M/S replication
2. Let it complete
3. Keep it running
4. Let the master keep the slave in sync as things change

If you don't care about actual persistent on the slave, set the `dir` directive to a memory filesystem to speed it up. You'd need the extra memory space for the RDB of course. Alternatively, split the data up into smaller instances of Redis with either client-side sharding or Redis Cluster. Then each instance will be smaller, thus reducing the problem. For example, splitting it up among 6 instances would be under 2GB each depending on how the data is distributed.

Cheers,
Bill

Sairam M P

unread,

Feb 8, 2016, 9:26:50 AM2/8/16

to Redis DB

Thank you bill for the response.

By "disk based" i was stating the default redis replication. Sorry for not being clear on this.

Our below requirement is not due to frequent master->slave link break ups or the slave(s) restart. We have replicated our redis across DC, in case of any issues in the primary DC, we will switch the service to the secondary DC. After switching the service to the secondary DC, we need to configure replication from secondary DC to primary DC asap. For this requirement we are exploring options on how to load the data @secondary DC to primary DC in the fastest possible ways.