Reliably detecting when slave sync is done

2,418 views
Skip to first unread message

Pawel

unread,
Sep 18, 2012, 4:49:25 PM9/18/12
to redi...@googlegroups.com
Hi.

What would be the reliable way to tell if a server is synchronized fully with the new master, after issuing SLAVEOF?
I assume I need to pay attention to the "master*" INFO keys.

I see these two values for right after the SLAVEOF is issued, and the sync seems to be complete:

redis 127.0.0.1:6379> slaveof 172.18.10.233 6379
redis 127.0.0.1:6379> info
master_host:172.18.10.233
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
master_link_down_since_seconds:1195570

redis 127.0.0.1:6379> info
master_host:172.18.10.233
master_port:6379
master_link_status:up
master_last_io_seconds_ago:7
master_sync_in_progress:0

Is it OK to wait until 'master_link_status' becomes 'up', and 'master_sync_in_progress' becomes '0' and 'master_last_io_seconds' becomes >= 0?

Also, is there any reason to update to (I'm on 2.4.16 right now) 2.4.17, if I'm not using Sentinel, ziplists, or double values? 

Thank you!
  -- Pawel.

Nicholas Knight

unread,
Sep 18, 2012, 5:30:39 PM9/18/12
to redi...@googlegroups.com

On Sep 18, 2012, at 1:49 PM, Pawel wrote:

> Hi.
>
> What would be the reliable way to tell if a server is synchronized fully with the new master, after issuing SLAVEOF?
> I assume I need to pay attention to the "master*" INFO keys.
>
> I see these two values for right after the SLAVEOF is issued, and the sync seems to be complete:

[snip info output]

> Is it OK to wait until 'master_link_status' becomes 'up', and 'master_sync_in_progress' becomes '0' and 'master_last_io_seconds' becomes >= 0?


If you have no reason to believe something has gone haywire, this ought to tell you that the initial sync process has completed, yes.

What it won't do is tell you on an ongoing basis that the slave is up-to-date. Depending on what you're actually wanting, that may or may not be a problem.

I asked a related question a bit over a year ago, but specifically with regard to ongoing monitoring, that you might be interested in, though it didn't get a lot of attention: https://groups.google.com/d/topic/redis-db/c_x7nu1Pn2k/discussion

-NK

Felix Gallo

unread,
Sep 18, 2012, 5:39:22 PM9/18/12
to redi...@googlegroups.com
Since slave sync is serial, if you SET a timestamp/checkpoint on the master periodically, then you can guarantee that the slave is at least as up to date as the timestamp/checkpoint it has locally, which is often a pragmatically acceptable substitute.  

Note that if you do this and time functions are involved, you will want to make absolutely sure that all of your servers are running ntpd synced to the same clock.

example with simple checkpointing:

r.set "s:data_version", 6
<do gigantic insert of data>
r.set "s:data_version", 7

now if the slave has s:data_version of 6, then the insert hasn't happened yet as far as it can tell, and you can proceed to triage why that might be.

F.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.


Pawel

unread,
Sep 18, 2012, 7:21:14 PM9/18/12
to redi...@googlegroups.com

On Tuesday, September 18, 2012 2:39:46 PM UTC-7, Felix wrote:
Since slave sync is serial, if you SET a timestamp/checkpoint on the master periodically, then you can guarantee that the slave is at least as up to date as the timestamp/checkpoint it has locally, which is often a pragmatically acceptable substitute.  

Note that if you do this and time functions are involved, you will want to make absolutely sure that all of your servers are running ntpd synced to the same clock.

example with simple checkpointing:

r.set "s:data_version", 6
<do gigantic insert of data>
r.set "s:data_version", 7

now if the slave has s:data_version of 6, then the insert hasn't happened yet as far as it can tell, and you can proceed to triage why that might be.

Looks neat. My interest, however, is really to just know whether the initial replication is completed.
 

F.

On Tue, Sep 18, 2012 at 2:30 PM, Nicholas Knight <nkn...@runawaynet.com> wrote:

On Sep 18, 2012, at 1:49 PM, Pawel wrote:

> Hi.
>
> What would be the reliable way to tell if a server is synchronized fully with the new master, after issuing SLAVEOF?
> I assume I need to pay attention to the "master*" INFO keys.
>
> I see these two values for right after the SLAVEOF is issued, and the sync seems to be complete:

[snip info output]

> Is it OK to wait until 'master_link_status' becomes 'up', and 'master_sync_in_progress' becomes '0' and 'master_last_io_seconds' becomes >= 0?

If you have no reason to believe something has gone haywire, this ought to tell you that the initial sync process has completed, yes.

What can go haywire? Or, more precisely, how would I know if something went haywire?
 

What it won't do is tell you on an ongoing basis that the slave is up-to-date. Depending on what you're actually wanting, that may or may not be a problem.

In my case, I'm not interested in knowing whether replication is up-to-date.

Josiah Carlson

unread,
Sep 18, 2012, 10:55:15 PM9/18/12
to redi...@googlegroups.com
[snip]
> Looks neat. My interest, however, is really to just know whether the initial
> replication is completed.

You can't rely on the time since last command stuff on the slave,
because who knows how much data the master has queued up.

Check the master for buffer size, check the slave for last message
received, and also subscribe on the slave along with a publish on the
master. The combination of all 3 will tell you if the initial sync has
completed (along with whether the slave has caught up, at some point
in the past, with the master.

Of course that just tells you *now*. What will happen in a second or
two is different.

- Josiah

Nicholas Knight

unread,
Sep 18, 2012, 10:55:15 PM9/18/12
to redi...@googlegroups.com

On Sep 18, 2012, at 4:21 PM, Pawel wrote:
>
> > If you have no reason to believe something has gone haywire, this ought to tell you that the initial sync process has completed, yes.
>
> What can go haywire? Or, more precisely, how would I know if something went haywire?

Since you're only concerned with the initial sync, not much. The "routine" items ("oops, the network disappeared") should be reflected in master_link_status, master_sync_in_progress, and the logfile. Anything else would be a bug that should be reported.

The easiest sanity check is pretty obvious -- make sure the master and slave have the same number of keys. There should be lines in your 'info' output like this:

db0:keys=21760,expires=0

The only way to be 100% sure you've got a good replication is, of course, to thoroughly compare the data on both machines. I don't recall if rdb files are guaranteed to be byte-for-byte identical given the same state, but rdbtools[1] might help you out there.

-NK

[1] https://github.com/sripathikrishnan/redis-rdb-tools

Pawel

unread,
Sep 19, 2012, 12:38:40 AM9/19/12
to redi...@googlegroups.com

On Tuesday, September 18, 2012 7:55:22 PM UTC-7, Nicholas Knight wrote:

On Sep 18, 2012, at 4:21 PM, Pawel wrote:
> > If you have no reason to believe something has gone haywire, this ought to tell you that the initial sync process has completed, yes.
> What can go haywire? Or, more precisely, how would I know if something went haywire?
Since you're only concerned with the initial sync, not much. The "routine" items ("oops, the network disappeared") should be reflected in master_link_status, master_sync_in_progress, and the logfile. Anything else would be a bug that should be reported.

The easiest sanity check is pretty obvious -- make sure the master and slave have the same number of keys. There should be lines in your 'info' output like this:

db0:keys=21760,expires=0

Just in case somebody finds this helpful, what I'm doing - is switching nodes. Considering there are two nodes - master and slave. Under certain conditions, the application needs to add a node and make that new node a master. It will suspend writing to Redis entirely, make the new node slave of the old master, wait until replication is finished(*), tell the new node to become a master, and tell old master and slave to become slaves of the new master. The whole process is done by the application itself. The only part that I need to make sure is synchronous is that (*).
 

The only way to be 100% sure you've got a good replication is, of course, to thoroughly compare the data on both machines. I don't recall if rdb files are guaranteed to be byte-for-byte identical given the same state, but rdbtools[1] might help you out there.

Right. But it would be a bug if the data does not compare exact. 

I don't think it's ever possible to guarantee ongoing full replication, as slaves will always be behind at least somewhat.
 

-NK

[1] https://github.com/sripathikrishnan/redis-rdb-tools

And thank you guys, Redis is awesome.
 

Pawel Veselov

unread,
Sep 19, 2012, 2:03:33 AM9/19/12
to redi...@googlegroups.com
On Tue, Sep 18, 2012 at 7:55 PM, Josiah Carlson <josiah....@gmail.com> wrote:
[snip]
> Looks neat. My interest, however, is really to just know whether the initial
> replication is completed.
You can't rely on the time since last command stuff on the slave,
because who knows how much data the master has queued up.

Check the master for buffer size, check the slave for last message
received, and also subscribe on the slave along with a publish on the
master. The combination of all 3 will tell you if the initial sync has
completed (along with whether the slave has caught up, at some point
in the past, with the master.

 
How do I check the queue (buffer?) size? (I don't see any INFO keys that look like it) Or, better, have it flushed synchronously? (I don't think I can).

If I issue the replication from the new slave *after* cutting off any writes to the master, wouldn't the replication request always be last on queue? Or does it have an ability to cut in?

Salvatore Sanfilippo

unread,
Sep 19, 2012, 3:56:28 AM9/19/12
to redi...@googlegroups.com
On Tue, Sep 18, 2012 at 10:49 PM, Pawel <pawel....@gmail.com> wrote:

> Is it OK to wait until 'master_link_status' becomes 'up', and
> 'master_sync_in_progress' becomes '0' and 'master_last_io_seconds' becomes
>>= 0?

Hello, if you are interested to check if the first synchronisation
happened with success you should just check that master_link_status is
UP.
This means that the initial RDB was loaded from master.

This is probably a good enough strategy to also check if an already
synchronised slave is working ok, because if it is NOT receiving data,
it will disconnect for timeout and it will try to resync again.
However there are cases where the slave may receive some data but not
all as the speed at which we populate the Redis master is faster than
the master <-> slave link bandwidth. I never saw this thing reported
historically, but this may happen. When this happens in Redis 2.6, at
some point the master disconnects the replication link as we are using
too much buffers, as data can't be delivered as fast as needed, so
even in this case, eventually, the replication will be restarted.

However as suggested already in this thread, to check if a slave
already synched with the master is in pair with its data, the simplest
and more reliable way to do so is probably to PUBLISH something on the
server and check how much time it will take to replicate on the slave
(just subscribing on the slave side). In this way you can obtain the
"lag" between the two.

A different but related problem is to understand if we can trust a
slave after a master is gone. In such a case we no longer have
informations about the master dataset, but we can at least check the
master_link_down_since_seconds field in INFO to see if the slave
disconnected with the master no longer than X seconds ago, with X
chosen accordingly to our requirements.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org

Beauty is more important in computing than anywhere else in technology
because software is so complicated. Beauty is the ultimate defence
against complexity.
— David Gelernter
Reply all
Reply to author
Forward
0 new messages