|feature request discussion: non-restarting replication||Jeremy Zawodny||2/22/12 2:11 PM|
We encountered an issue recently after re-sizing our redis clusters a bit and it led to a wish for a simple feature. I wanted to present the case for it here to see if anyone else had ideas to add before putting it on github or even trying to just code it up and submit a pull request.
We have redis clusters in two data centers: A and B. Both clusters contain 10 machines and they all run 4 instances of redis-server. One data center is "active" and the other is "standby". The machines are "paired" across data centers, so redis1 in data center B is slaving from redis1 in data center A.
The problem is that occasionally the WAN link between the data centers is interrupted and the slaves in data center B decide they need to re-sync with their masters. Unfortunately, *all* the instances on each slave try to do this AT THE SAME TIME and that causes too much stress on the masters. Effectively, the masters are DoSd by the slaves all re-syncing at the same time.
We've worked around this by reducing the max memory size of the instances, but we'd really like to make more RAM available to redis and have a more controlled way of doing the re-sync.
We already have a process in place to run periodically on the redis slaves and ensure that they're replicating properly. If there's a problem, it re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is running well.
So what I'd like is a config directive in redis that says "if you're a slave and you lose contact with the master, do not re-sync." The idea is that I'd set this to true (it'd be false by default) and then my exiting script would handle those occasional times when slaves get disconnected.
Looking at the redis code, this should be fairly straightforward.
Comments or objections?
|Re: feature request discussion: non-restarting replication||Scott Smith||2/22/12 2:18 PM|
+1 on that.
We experience the same problem if a host restarts. Is there a solution that would solve for both scenarios?
|Re: feature request discussion: non-restarting replication||Josiah Carlson||2/22/12 2:25 PM|
Scott: If you don't want replication when Redis first starts up,
disable replication in the configuration file, then enable it once it
is started via "SLAVEOF host port".
On Wed, Feb 22, 2012 at 2:18 PM, Scott Smith <sc...@ohlol.net> wrote:
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||2/22/12 2:27 PM|
Yes, that's what we do as well. That fixes the "at boot" DoS but not the random network fail case.
|Re: feature request discussion: non-restarting replication||Josiah Carlson||2/22/12 2:30 PM|
I can see the use of this, but I can't help but think that this is one
of those things that maybe should be a special command instead of a
configuration option. The command would be something like "SLAVEOF
host port ONCE", which says that it will slave to that master until
the link goes down, then it won't reconnect.
Why not a config file option? Because configuration files are the
|Re: feature request discussion: non-restarting replication||Jan Oberst||2/22/12 4:06 PM|
I agree with Josiah here.
Our config also has slaves start as masters. After boot we set SLAVEOF one machine after the next, like Jeremy mentioned.
We use a central management script that keeps track of all our redis machines. I think SLAVEOF ... ONCE would work well, because all we'd have to change is the central management tool.
I would add another INFO flag that states "slave_out_of_sync" or something similar. We're reading the INFO every minute anyways, so if a slave is out of sync we could just schedule it for another SLAVEOF .... ONCE call, which would effectively re-sync the machine.
|Re: [REDIS] Re: feature request discussion: non-restarting replication||Jay Kreibich||2/22/12 6:46 PM|
On Wed, Feb 22, 2012 at 02:18:46PM -0800, Scott Smith scratched on the wall:
I might suggest the ability to configure a global lock file that is
Ideally, you could configure different lock files for BG[SAVE] and
In the case of SAVE, I would have the command immediately return an
"Intelligence is like underwear: it is important that you have it,
|Re: feature request discussion: non-restarting replication||GregA||2/22/12 11:33 PM|
I seem to recall a discussion 6-9 months ago about this same situation. The thread centered around creating a config limit on the number of simultaneous slave SYNC commands a master will allow. I thought there was some progress made toward creating a patch and getting it included in a release?
On Wed, Feb 22, 2012 at 2:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
--You received this message because you are subscribed to the Google Groups "Redis DB" group.
|Re: feature request discussion: non-restarting replication||Pierre Chapuis||2/23/12 1:01 AM|
On Feb 22, 11:30 pm, Josiah Carlson <josiah.carl...@gmail.com> wrote:On the other hand you can add comments to configuration files to
explain your choices, and you can version them. And you can even
put them in cfengine / puppet / chef if you want.
My favorite kind of configuration system for critical infrastructure
(which Redis has become) is something similar to Cisco's IOS, where
the configuration is dynamic but you can dump it to a file and copy
it to another instance.
Also, now that we have Lua in Redis, why not use it as a configuration
language? After all, it's pretty good at that.
|Re: feature request discussion: non-restarting replication||Dvir Volk||2/23/12 1:35 AM|
yeah, this one was initiated by my pains with this situation. it was then my impression that my scenario of many slaves for one master was rare and this wasn't a priority. maybe things have changed since?--
System Architect, The Everything Project (formerly DoAT)
|Re: feature request discussion: non-restarting replication||Colin Vipurs||2/23/12 1:48 AM|
+1 as well. It seems that this could be a useful feature for doing a
one-time failover from master to slave
On Wed, Feb 22, 2012 at 10:18 PM, Scott Smith <sc...@ohlol.net> wrote:
Something which you, I, and everyone else would call "Tuesday", of course.
|Re: feature request discussion: non-restarting replication||melo||2/23/12 3:18 AM|
On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
I understand the problem this is causing your system but I believe the
There is one event here that triggers a chain of two problems/symptoms:
* Event: connectivity loss between master and slave;
The solutions presented on this thread try to tackle Problem 2, how to
I would propose that, for each slave, a rotating AOF file should be
For example, when a slave connects, it tells you what was the last
I'm sure that this simplistic approach has holes in it, I didn't
|Re: feature request discussion: non-restarting replication||Salvatore Sanfilippo||2/23/12 4:28 AM|
I agree with your analysis. Jeremy's proposal, while could be actually
However I think the implementation of incremental resync should use
In short it uses the trick of still accumulating the output buffer of
But back to the root cause for a moment, the problem is: "currently
I mean, even without partial resync, Redis should handle that better.
Please if you have some information on this matter and how I can
|Re: feature request discussion: non-restarting replication||melo||2/23/12 6:47 AM|
On Thu, Feb 23, 2012 at 12:28 PM, Salvatore Sanfilippo
I knew I'd read something about partial resync, but forgot to search
I like it, although it would only survive small downtimes (which will
> In short it uses the trick of still accumulating the output buffer of
I don't know if you can use the same buffer for all clients, unless
I assume this last two paragraphs are for Jeremy, since he is the one
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||3/14/12 11:35 AM|
Sorry for the delay on getting back to this issue... Here's what has happened to us a few times (with a bit more detail).
We have 10 hosts in two data centers (a and b). Let's call the hosts host1a, host1b, host2a, host2b, etc.
Every host runs 4 instances of redis-server and has 32GB of RAM. All "b" hosts replicate from "a" hosts, so:
host1b:63790 is a slave of host1a:63790
host1b:63791 is a slave of host1a:63791
host1b:63792 is a slave of host1a:63792
host1b:63793 is a slave of host1a:63793
And so on with the other 9 pairs.
Each redis-server was configured with:
And there's is no persistance aside from the .rdb files that are created at (1) shutdown or (2) during replication sync.
This is an important point: our instances are almost always "full" and we're relying on the lru to evict data continuously.
So, what happens is this:
(1) the network between the "a" and "b" hosts becomes interrupted
(2) the slaves in "b" lose contact with "a" and eventually timeout
(3) the slaves in "b" decide to re-sync -- ALL AT ONCE
(4) the redis instances in "a" each start to dump their .rdb files
(5) since there are several going at once, the dumping to disk is i/o bound
(6) the dumping takes longer than it should, which results in more dirty COW pages
(7) the fact that we're always full and evicting keys makes #6 worse
(8) the box starts to swap, which makes #7 worse
(9) we enter a death spiral which is hard to recover from
However, if we were to rsync one instance at a time (we already have external code for this, as I mentioned), this problem doesn't occur and our instances resync pretty quickly.
The only other solution, which sucks, is to really lower the max-memory on our instances quite a bit but that's a wasteful solution in my eyes.
Does this help clarify what we're seeing and why I believe my proposed fix (a non-restarting replication option) would help to prevent it?
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||3/14/12 11:50 AM|
Oh, and there are a few points I wanted to make specifically...
On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
But in our case, when we're trying to use as much RAM on the box for redis as we we can (across many instances), I wonder if the extra buffering would start to cause problems too.
I wouldn't say it that way. I'd say that redis assumes there is typically a single redis instance running on a given host. However, we're deploying them in a "1 instance per CPU core" environment. And our newer hosts are coming with 24 cores, which will just amplify the problem. (Thankfully they have SSDs so the disk i/o issue may be mitigated somewhat.)
Again, the real issue is not how redis handles re-sync. It does that well. But it doesn't give us enough control over what is currently an automatic behavior that ends up being harmful if you run enough instances on a large host.
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||3/14/12 1:17 PM|
And here's an implementation that seems to work in my testing:
I'll submit a pull request and see what happens. :-)
|Re: feature request discussion: non-restarting replication||hirose31||3/25/12 8:47 PM|
I am looking forward to merge 2.4 branch!
> >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org>
> >> > xmpp:m...@simplicidade.org
> >> > mailto:m...@simplicidade.org
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||6/19/12 9:43 AM|
Just to follow-up on this, I've ported my patch to 2.6-rc4 and we've been running that in production for a few days now.
I'd like to submit a pull request, but don't know if the maintainers are interested in merging it.
For an idea of how little it changes, here's the old 2.4 changes needed:
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||12/27/12 11:18 AM|
Ok, 6 months later I've ported that feature to 2.6 and submitted a pull request against 2.6:
Any interest aside from us at craigslist?
|Re: feature request discussion: non-restarting replication||dvirsky||12/27/12 11:29 AM|
To me this is one of the things I miss most in redis, and this is a nice step in the direction.
Being such a small optional patch, I'd really love to see this getting pulled.
BTW Isn't there some kind of solution to this planned as part of sentinel? and smooth replication planned as a part of 2.8?
|Re: feature request discussion: non-restarting replication||Josiah Carlson||12/27/12 1:56 PM|
I think you misread the patch. This just says whether or not a slave
would reconnect on connection failure.
|Re: feature request discussion: non-restarting replication||dvirsky||12/27/12 2:03 PM|
No, I read it (it wasn't that long!) and it's nice - not a silver bullet but as I said a simple first step.
Sentinel was going to limit the number of reconnections which is better, and 2.8 is supposed to introduce partial sync which will be the ultimate solution.
but as a workaround for the time being - why not?
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||12/27/12 2:07 PM|
Yeah, that's pretty much my thinking. We need (and use) this feature already. It keeps us from DoSing ourselves when the link between our datacenters fails and 8 redis instances on each box try to resync at the same time and the OOM killer gets busy on the master nodes--definitely not fun.
|Re: feature request discussion: non-restarting replication||dvirsky||12/27/12 2:10 PM|
what manages reconnections?
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||12/27/12 2:15 PM|
I have a script that runs via cron every few minutes and checks the replication state of all redis instances on localhost. If the host is expected to host masters, it simply exits. If it expected to host slaves, it will:
On Thu, Dec 27, 2012 at 2:10 PM, Dvir Volk <dvi...@gmail.com> wrote:
|Re: feature request discussion: non-restarting replication||dvirsky||12/27/12 2:31 PM|
It's cool to see something that simple running one of the world's largest websites :)
you're not running on a cloud provider, am I right? If so, how often and why do you usually see these disconnects?
|Re: feature request discussion: non-restarting replication||Jeremy Zawodny||12/27/12 3:12 PM|
Correct, we're self-hosted on our own eqipment.
We see the disconnects very rarely these days, but when we do see them they can be VERY painful without this patch.
I'd say half the time now it's due to planned maintenance and the other half is surprising.
If it'd be useful, I can probably post a slightly sanitized version of the script we use for this. I'd need to remove the dependency on our custom config module.