We encountered an issue recently after re-sizing our redis clusters a bit and it led to a wish for a simple feature. I wanted to present the case for it here to see if anyone else had ideas to add before putting it on github or even trying to just code it up and submit a pull request.
We have redis clusters in two data centers: A and B. Both clusters contain 10 machines and they all run 4 instances of redis-server. One data center is "active" and the other is "standby". The machines are "paired" across data centers, so redis1 in data center B is slaving from redis1 in data center A.
The problem is that occasionally the WAN link between the data centers is interrupted and the slaves in data center B decide they need to re-sync with their masters. Unfortunately, *all* the instances on each slave try to do this AT THE SAME TIME and that causes too much stress on the masters. Effectively, the masters are DoSd by the slaves all re-syncing at the same time.
We've worked around this by reducing the max memory size of the instances, but we'd really like to make more RAM available to redis and have a more controlled way of doing the re-sync.
We already have a process in place to run periodically on the redis slaves and ensure that they're replicating properly. If there's a problem, it re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is running well.
So what I'd like is a config directive in redis that says "if you're a slave and you lose contact with the master, do not re-sync." The idea is that I'd set this to true (it'd be false by default) and then my exiting script would handle those occasional times when slaves get disconnected.
Looking at the redis code, this should be fairly straightforward.
> We encountered an issue recently after re-sizing our redis clusters a bit and it led to a wish for a simple feature. I wanted to present the case for it here to see if anyone else had ideas to add before putting it on github or even trying to just code it up and submit a pull request.
> We have redis clusters in two data centers: A and B. Both clusters contain 10 machines and they all run 4 instances of redis-server. One data center is "active" and the other is "standby". The machines are "paired" across data centers, so redis1 in data center B is slaving from redis1 in data center A.
> The problem is that occasionally the WAN link between the data centers is interrupted and the slaves in data center B decide they need to re-sync with their masters. Unfortunately, *all* the instances on each slave try to do this AT THE SAME TIME and that causes too much stress on the masters. Effectively, the masters are DoSd by the slaves all re-syncing at the same time.
> We've worked around this by reducing the max memory size of the instances, but we'd really like to make more RAM available to redis and have a more controlled way of doing the re-sync.
> We already have a process in place to run periodically on the redis slaves and ensure that they're replicating properly. If there's a problem, it re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is running well.
> So what I'd like is a config directive in redis that says "if you're a slave and you lose contact with the master, do not re-sync." The idea is that I'd set this to true (it'd be false by default) and then my exiting script would handle those occasional times when slaves get disconnected.
> Looking at the redis code, this should be fairly straightforward.
> Comments or objections?
> Thanks,
> Jeremy > -- > You received this message because you are subscribed to the Google Groups "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
Scott: If you don't want replication when Redis first starts up, disable replication in the configuration file, then enable it once it is started via "SLAVEOF host port".
On Wed, Feb 22, 2012 at 2:18 PM, Scott Smith <sc...@ohlol.net> wrote: > +1 on that.
> We experience the same problem if a host restarts. Is there a solution that > would solve for both scenarios?
> On Feb 22, 2012, at 14:11, Jeremy Zawodny <Jer...@Zawodny.com> wrote:
> We encountered an issue recently after re-sizing our redis clusters a bit > and it led to a wish for a simple feature. I wanted to present the case for > it here to see if anyone else had ideas to add before putting it on github > or even trying to just code it up and submit a pull request.
> We have redis clusters in two data centers: A and B. Both clusters contain > 10 machines and they all run 4 instances of redis-server. One data center is > "active" and the other is "standby". The machines are "paired" across data > centers, so redis1 in data center B is slaving from redis1 in data center A.
> The problem is that occasionally the WAN link between the data centers is > interrupted and the slaves in data center B decide they need to re-sync with > their masters. Unfortunately, *all* the instances on each slave try to do > this AT THE SAME TIME and that causes too much stress on the masters. > Effectively, the masters are DoSd by the slaves all re-syncing at the same > time.
> We've worked around this by reducing the max memory size of the instances, > but we'd really like to make more RAM available to redis and have a more > controlled way of doing the re-sync.
> We already have a process in place to run periodically on the redis slaves > and ensure that they're replicating properly. If there's a problem, it > re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is > running well.
> So what I'd like is a config directive in redis that says "if you're a slave > and you lose contact with the master, do not re-sync." The idea is that I'd > set this to true (it'd be false by default) and then my exiting script would > handle those occasional times when slaves get disconnected.
> Looking at the redis code, this should be fairly straightforward.
> Comments or objections?
> Thanks,
> Jeremy
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> Scott: If you don't want replication when Redis first starts up, > disable replication in the configuration file, then enable it once it > is started via "SLAVEOF host port".
> - Josiah
> On Wed, Feb 22, 2012 at 2:18 PM, Scott Smith <sc...@ohlol.net> wrote: > > +1 on that.
> > We experience the same problem if a host restarts. Is there a solution > that > > would solve for both scenarios?
> > On Feb 22, 2012, at 14:11, Jeremy Zawodny <Jer...@Zawodny.com> wrote:
> > We encountered an issue recently after re-sizing our redis clusters a bit > > and it led to a wish for a simple feature. I wanted to present the case > for > > it here to see if anyone else had ideas to add before putting it on > github > > or even trying to just code it up and submit a pull request.
> > We have redis clusters in two data centers: A and B. Both clusters > contain > > 10 machines and they all run 4 instances of redis-server. One data > center is > > "active" and the other is "standby". The machines are "paired" across > data > > centers, so redis1 in data center B is slaving from redis1 in data > center A.
> > The problem is that occasionally the WAN link between the data centers is > > interrupted and the slaves in data center B decide they need to re-sync > with > > their masters. Unfortunately, *all* the instances on each slave try to do > > this AT THE SAME TIME and that causes too much stress on the masters. > > Effectively, the masters are DoSd by the slaves all re-syncing at the > same > > time.
> > We've worked around this by reducing the max memory size of the > instances, > > but we'd really like to make more RAM available to redis and have a more > > controlled way of doing the re-sync.
> > We already have a process in place to run periodically on the redis > slaves > > and ensure that they're replicating properly. If there's a problem, it > > re-starts replication ONCE INSTANCE AT A TIME and makes sure everything > is > > running well.
> > So what I'd like is a config directive in redis that says "if you're a > slave > > and you lose contact with the master, do not re-sync." The idea is that > I'd > > set this to true (it'd be false by default) and then my exiting script > would > > handle those occasional times when slaves get disconnected.
> > Looking at the redis code, this should be fairly straightforward.
> > Comments or objections?
> > Thanks,
> > Jeremy
> > -- > > You received this message because you are subscribed to the Google Groups > > "Redis DB" group. > > To post to this group, send email to redis-db@googlegroups.com. > > To unsubscribe from this group, send email to > > redis-db+unsubscribe@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/redis-db?hl=en.
> > -- > > You received this message because you are subscribed to the Google Groups > > "Redis DB" group. > > To post to this group, send email to redis-db@googlegroups.com. > > To unsubscribe from this group, send email to > > redis-db+unsubscribe@googlegroups.com. > > For more options, visit this group at > > http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
I can see the use of this, but I can't help but think that this is one of those things that maybe should be a special command instead of a configuration option. The command would be something like "SLAVEOF host port ONCE", which says that it will slave to that master until the link goes down, then it won't reconnect.
Why not a config file option? Because configuration files are the kinds of things that you set and never look at again, then 6 months down the line someone is digging through it and asking "wtf did we do that for?" I could get behind "SLAVEOF host port ONCE" if it was only available via a remote command, and not with a configuration option.
On Wed, Feb 22, 2012 at 2:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote: > We encountered an issue recently after re-sizing our redis clusters a bit > and it led to a wish for a simple feature. I wanted to present the case for > it here to see if anyone else had ideas to add before putting it on github > or even trying to just code it up and submit a pull request.
> We have redis clusters in two data centers: A and B. Both clusters contain > 10 machines and they all run 4 instances of redis-server. One data center is > "active" and the other is "standby". The machines are "paired" across data > centers, so redis1 in data center B is slaving from redis1 in data center A.
> The problem is that occasionally the WAN link between the data centers is > interrupted and the slaves in data center B decide they need to re-sync with > their masters. Unfortunately, *all* the instances on each slave try to do > this AT THE SAME TIME and that causes too much stress on the masters. > Effectively, the masters are DoSd by the slaves all re-syncing at the same > time.
> We've worked around this by reducing the max memory size of the instances, > but we'd really like to make more RAM available to redis and have a more > controlled way of doing the re-sync.
> We already have a process in place to run periodically on the redis slaves > and ensure that they're replicating properly. If there's a problem, it > re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is > running well.
> So what I'd like is a config directive in redis that says "if you're a slave > and you lose contact with the master, do not re-sync." The idea is that I'd > set this to true (it'd be false by default) and then my exiting script would > handle those occasional times when slaves get disconnected.
> Looking at the redis code, this should be fairly straightforward.
> Comments or objections?
> Thanks,
> Jeremy
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
Our config also has slaves start as masters. After boot we set SLAVEOF one machine after the next, like Jeremy mentioned.
We use a central management script that keeps track of all our redis machines. I think SLAVEOF ... ONCE would work well, because all we'd have to change is the central management tool.
I would add another INFO flag that states "slave_out_of_sync" or something similar. We're reading the INFO every minute anyways, so if a slave is out of sync we could just schedule it for another SLAVEOF .... ONCE call, which would effectively re-sync the machine.
On Wed, Feb 22, 2012 at 02:18:46PM -0800, Scott Smith scratched on the wall:
> +1 on that.
> We experience the same problem if a host restarts. Is there a solution > that would solve for both scenarios?
I might suggest the ability to configure a global lock file that is shared by all Redis instances on a single physical server. The lock could be used to block and/or delay [BG]SAVEs and/or SLAVEOF commands. This would allow a set of instances to insure only a single instance is attempting to save and/or sync at any given moment, reducing contention for these high-resource commands.
Ideally, you could configure different lock files for BG[SAVE] and SLAVEOF commands, although they might point to the same file.
The lock file could be a simple PID file. If the file existing (and the process exists), the system is locked. If no file exists, a process can grab the lock by simply writing out the file. Race conditions can be avoided with the proper flags to open(2).
In the case of SAVE, I would have the command immediately return an error if the lock cannot be acquired. For SLAVEOF, the command would simply go idle until the lock can be acquired. BGSAVE might go idle or might return... I'm not sure which makes more sense.
In the case of BGSAVE and SLAVEOF, the fork() would not be allowed until the instance owns the lock. The lock would then be released as soon as the child process exits (or, in the case of a SLAVEOF, when the initial bulk transfer is complete). You might also be able to configure a time-out, so that SLAVEOF returns an error after 300 seconds or something. Any time a command is outstanding, the system would check for the lock ever 250ms or some other configurable value.
If we really want to get fancy, we could allow a set of lock files, say .../redis-lock-[1-4].pid, to allow up to four operations at one time. This might be useful for very larger servers with, for example, a dozen instances. The lock files could still be used to limit "overhead" resource usage, but would allow more than one high-usage operation at a time.
Thoughts?
-j
-- Jay A. Kreibich < J A Y @ K R E I B I.C H >
"Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson
I seem to recall a discussion 6-9 months ago about this same situation. The thread centered around creating a config limit on the number of simultaneous slave SYNC commands a master will allow. I thought there was some progress made toward creating a patch and getting it included in a release?
On Wed, Feb 22, 2012 at 2:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote: > We encountered an issue recently after re-sizing our redis clusters a bit > and it led to a wish for a simple feature. I wanted to present the case for > it here to see if anyone else had ideas to add before putting it on github > or even trying to just code it up and submit a pull request.
> We have redis clusters in two data centers: A and B. Both clusters contain > 10 machines and they all run 4 instances of redis-server. One data center > is "active" and the other is "standby". The machines are "paired" across > data centers, so redis1 in data center B is slaving from redis1 in data > center A.
> The problem is that occasionally the WAN link between the data centers is > interrupted and the slaves in data center B decide they need to re-sync > with their masters. Unfortunately, *all* the instances on each slave try to > do this AT THE SAME TIME and that causes too much stress on the masters. > Effectively, the masters are DoSd by the slaves all re-syncing at the same > time.
> We've worked around this by reducing the max memory size of the instances, > but we'd really like to make more RAM available to redis and have a more > controlled way of doing the re-sync.
> We already have a process in place to run periodically on the redis slaves > and ensure that they're replicating properly. If there's a problem, it > re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is > running well.
> So what I'd like is a config directive in redis that says "if you're a > slave and you lose contact with the master, do not re-sync." The idea is > that I'd set this to true (it'd be false by default) and then my exiting > script would handle those occasional times when slaves get disconnected.
> Looking at the redis code, this should be fairly straightforward.
> Comments or objections?
> Thanks,
> Jeremy
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
On Feb 22, 11:30 pm, Josiah Carlson <josiah.carl...@gmail.com> wrote:
> Why not a config file option? Because configuration files are the
> kinds of things that you set and never look at again, then 6 months
> down the line someone is digging through it and asking "wtf did we do
> that for?"
On the other hand you can add comments to configuration files to
explain your choices, and you can version them. And you can even
put them in cfengine / puppet / chef if you want.
My favorite kind of configuration system for critical infrastructure
(which Redis has become) is something similar to Cisco's IOS, where
the configuration is dynamic but you can dump it to a file and copy
it to another instance.
Also, now that we have Lua in Redis, why not use it as a configuration
language? After all, it's pretty good at that.
yeah, this one was initiated by my pains with this situation. it was then my impression that my scenario of many slaves for one master was rare and this wasn't a priority. maybe things have changed since?
On Thu, Feb 23, 2012 at 9:33 AM, Greg Andrews <hvar...@gmail.com> wrote:
> I seem to recall a discussion 6-9 months ago about this same situation. > The thread centered around creating a config limit on the number of > simultaneous slave SYNC commands a master will allow. I thought there was > some progress made toward creating a patch and getting it included in a > release?
> -Greg
> On Wed, Feb 22, 2012 at 2:11 PM, Jeremy Zawodny <Jer...@zawodny.com>wrote:
>> We encountered an issue recently after re-sizing our redis clusters a bit >> and it led to a wish for a simple feature. I wanted to present the case for >> it here to see if anyone else had ideas to add before putting it on github >> or even trying to just code it up and submit a pull request.
>> We have redis clusters in two data centers: A and B. Both clusters >> contain 10 machines and they all run 4 instances of redis-server. One data >> center is "active" and the other is "standby". The machines are "paired" >> across data centers, so redis1 in data center B is slaving from redis1 in >> data center A.
>> The problem is that occasionally the WAN link between the data centers is >> interrupted and the slaves in data center B decide they need to re-sync >> with their masters. Unfortunately, *all* the instances on each slave try to >> do this AT THE SAME TIME and that causes too much stress on the masters. >> Effectively, the masters are DoSd by the slaves all re-syncing at the same >> time.
>> We've worked around this by reducing the max memory size of the >> instances, but we'd really like to make more RAM available to redis and >> have a more controlled way of doing the re-sync.
>> We already have a process in place to run periodically on the redis >> slaves and ensure that they're replicating properly. If there's a problem, >> it re-starts replication ONCE INSTANCE AT A TIME and makes sure everything >> is running well.
>> So what I'd like is a config directive in redis that says "if you're a >> slave and you lose contact with the master, do not re-sync." The idea is >> that I'd set this to true (it'd be false by default) and then my exiting >> script would handle those occasional times when slaves get disconnected.
>> Looking at the redis code, this should be fairly straightforward.
>> Comments or objections?
>> Thanks,
>> Jeremy
>> -- >> You received this message because you are subscribed to the Google Groups >> "Redis DB" group. >> To post to this group, send email to redis-db@googlegroups.com. >> To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
-- Dvir Volk System Architect, The Everything Project (formerly DoAT) http://everything.me
On Wed, Feb 22, 2012 at 10:18 PM, Scott Smith <sc...@ohlol.net> wrote: > +1 on that.
> We experience the same problem if a host restarts. Is there a solution that > would solve for both scenarios?
> On Feb 22, 2012, at 14:11, Jeremy Zawodny <Jer...@Zawodny.com> wrote:
> We encountered an issue recently after re-sizing our redis clusters a bit > and it led to a wish for a simple feature. I wanted to present the case for > it here to see if anyone else had ideas to add before putting it on github > or even trying to just code it up and submit a pull request.
> We have redis clusters in two data centers: A and B. Both clusters contain > 10 machines and they all run 4 instances of redis-server. One data center is > "active" and the other is "standby". The machines are "paired" across data > centers, so redis1 in data center B is slaving from redis1 in data center A.
> The problem is that occasionally the WAN link between the data centers is > interrupted and the slaves in data center B decide they need to re-sync with > their masters. Unfortunately, *all* the instances on each slave try to do > this AT THE SAME TIME and that causes too much stress on the masters. > Effectively, the masters are DoSd by the slaves all re-syncing at the same > time.
> We've worked around this by reducing the max memory size of the instances, > but we'd really like to make more RAM available to redis and have a more > controlled way of doing the re-sync.
> We already have a process in place to run periodically on the redis slaves > and ensure that they're replicating properly. If there's a problem, it > re-starts replication ONCE INSTANCE AT A TIME and makes sure everything is > running well.
> So what I'd like is a config directive in redis that says "if you're a slave > and you lose contact with the master, do not re-sync." The idea is that I'd > set this to true (it'd be false by default) and then my exiting script would > handle those occasional times when slaves get disconnected.
> Looking at the redis code, this should be fairly straightforward.
> Comments or objections?
> Thanks,
> Jeremy
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
-- Maybe she awoke to see the roommate's boyfriend swinging from the chandelier wearing a boar's head.
Something which you, I, and everyone else would call "Tuesday", of course.
On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote: > The problem is that occasionally the WAN link between the data centers is > interrupted and the slaves in data center B decide they need to re-sync with > their masters. Unfortunately, *all* the instances on each slave try to do > this AT THE SAME TIME and that causes too much stress on the masters. > Effectively, the masters are DoSd by the slaves all re-syncing at the same > time.
I understand the problem this is causing your system but I believe the solution you are presenting is targeting the symptom and not the root cause of the problem.
There is one event here that triggers a chain of two problems/symptoms:
* Event: connectivity loss between master and slave; * Problem 1: slave needs full re-sync with master; * Problem 2: N slaves doing this at the same time will cause a DoS on masters.
The solutions presented on this thread try to tackle Problem 2, how to prevent the DoS of the master, and although it is a valid problem and should be solved (I'm particularly fond of SLAVEOF host port ONCE myself), it doesn't fix the initial problem: the need for a full re-sync.
I would propose that, for each slave, a rotating AOF file should be kept, based on time or size, with older files being removed when slaves ACK back synchronization points reached.
For example, when a slave connects, it tells you what was the last sync point it saw, and the master only has to send the AOF's since that sync point. Every time the master rotates a slave AOF, it sends the new name to the SLAVE. Every time a slave ACKs a specific AOF sync point, all AOFs up-to that one can be removed (or archived, if your business rules require that).
I'm sure that this simplistic approach has holes in it, I didn't thought it out thoroughly yet, but my initial point still stands: you are fixing a symptom, not the cause. It might be enough, and thats fine, just pointing it out though :).
I agree with your analysis. Jeremy's proposal, while could be actually useful to mitigate the problem, does not fix the root cause. I also agree about incremental resync as a solution to many of this issues.
However I think the implementation of incremental resync should use the implementation proposed here:
In short it uses the trick of still accumulating the output buffer of the slaves for some time (or for some space) while the slave is not connected. Moreover there is a sliding window so that we don't discard the buffer sent to the slaves but take it for some time, since a slave may want to resync from an offset that is already flushed on the socket.
But back to the root cause for a moment, the problem is: "currently Redis does not handle well the case when multiple slaves want sync at once". I trust you about that, but I would understand why this happens.
I mean, even without partial resync, Redis should handle that better. Full resync should just be slower, but not a DoS. Redis is already optimized to do a single BSAVE on reconnection of multiple slaves, so what is actually DosSing it? Maybe the multiple bulk transfers generate too much I/O and we should trottle this stuff?
Please if you have some information on this matter and how I can reproduce it I would love to insert this fix into 2.6 if possible.
On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org> wrote: > Hi,
> On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote: >> The problem is that occasionally the WAN link between the data centers is >> interrupted and the slaves in data center B decide they need to re-sync with >> their masters. Unfortunately, *all* the instances on each slave try to do >> this AT THE SAME TIME and that causes too much stress on the masters. >> Effectively, the masters are DoSd by the slaves all re-syncing at the same >> time.
> I understand the problem this is causing your system but I believe the > solution you are presenting is targeting the symptom and not the root > cause of the problem.
> There is one event here that triggers a chain of two problems/symptoms:
> * Event: connectivity loss between master and slave; > * Problem 1: slave needs full re-sync with master; > * Problem 2: N slaves doing this at the same time will cause a DoS on masters.
> The solutions presented on this thread try to tackle Problem 2, how to > prevent the DoS of the master, and although it is a valid problem and > should be solved (I'm particularly fond of SLAVEOF host port ONCE > myself), it doesn't fix the initial problem: the need for a full > re-sync.
> I would propose that, for each slave, a rotating AOF file should be > kept, based on time or size, with older files being removed when > slaves ACK back synchronization points reached.
> For example, when a slave connects, it tells you what was the last > sync point it saw, and the master only has to send the AOF's since > that sync point. Every time the master rotates a slave AOF, it sends > the new name to the SLAVE. Every time a slave ACKs a specific AOF sync > point, all AOFs up-to that one can be removed (or archived, if your > business rules require that).
> I'm sure that this simplistic approach has holes in it, I didn't > thought it out thoroughly yet, but my initial point still stands: you > are fixing a symptom, not the cause. It might be enough, and thats > fine, just pointing it out though :).
> -- > You received this message because you are subscribed to the Google Groups "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
-- Salvatore 'antirez' Sanfilippo open source developer - VMware
http://invece.org "We are what we repeatedly do. Excellence, therefore, is not an act, but a habit." -- Aristotele
I knew I'd read something about partial resync, but forgot to search the issues :)
I like it, although it would only survive small downtimes (which will probably cover most of the situations, so no worries there), and I really like the use of bytes written/read as a sync marker. Simple and effective.
> In short it uses the trick of still accumulating the output buffer of > the slaves for some time (or for some space) while the slave is not > connected. Moreover there is a sliding window so that we don't discard > the buffer sent to the slaves but take it for some time, since a slave > may want to resync from an offset that is already flushed on the > socket.
I don't know if you can use the same buffer for all clients, unless you send the current byte count after the full dump.
> I mean, even without partial resync, Redis should handle that better. > Full resync should just be slower, but not a DoS. > Redis is already optimized to do a single BSAVE on reconnection of > multiple slaves, so what is actually DosSing it? Maybe the multiple > bulk transfers generate too much I/O and we should trottle this stuff?
> Please if you have some information on this matter and how I can > reproduce it I would love to insert this fix into 2.6 if possible.
I assume this last two paragraphs are for Jeremy, since he is the one with the problem.
Sorry for the delay on getting back to this issue... Here's what has happened to us a few times (with a bit more detail).
We have 10 hosts in two data centers (a and b). Let's call the hosts host1a, host1b, host2a, host2b, etc.
Every host runs 4 instances of redis-server and has 32GB of RAM. All "b" hosts replicate from "a" hosts, so:
host1b:63790 is a slave of host1a:63790 host1b:63791 is a slave of host1a:63791 host1b:63792 is a slave of host1a:63792 host1b:63793 is a slave of host1a:63793
And there's is no persistance aside from the .rdb files that are created at (1) shutdown or (2) during replication sync.
This is an important point: our instances are almost always "full" and we're relying on the lru to evict data continuously.
So, what happens is this:
(1) the network between the "a" and "b" hosts becomes interrupted (2) the slaves in "b" lose contact with "a" and eventually timeout (3) the slaves in "b" decide to re-sync -- ALL AT ONCE (4) the redis instances in "a" each start to dump their .rdb files (5) since there are several going at once, the dumping to disk is i/o bound (6) the dumping takes longer than it should, which results in more dirty COW pages (7) the fact that we're always full and evicting keys makes #6 worse (8) the box starts to swap, which makes #7 worse (9) we enter a death spiral which is hard to recover from
However, if we were to rsync one instance at a time (we already have external code for this, as I mentioned), this problem doesn't occur and our instances resync pretty quickly.
The only other solution, which sucks, is to really lower the max-memory on our instances quite a bit but that's a wasteful solution in my eyes.
Does this help clarify what we're seeing and why I believe my proposed fix (a non-restarting replication option) would help to prevent it?
Thanks,
Jeremy
On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <anti...@gmail.com>wrote:
> I agree with your analysis. Jeremy's proposal, while could be actually > useful to mitigate the problem, does not fix the root cause. > I also agree about incremental resync as a solution to many of this issues.
> However I think the implementation of incremental resync should use > the implementation proposed here:
> In short it uses the trick of still accumulating the output buffer of > the slaves for some time (or for some space) while the slave is not > connected. Moreover there is a sliding window so that we don't discard > the buffer sent to the slaves but take it for some time, since a slave > may want to resync from an offset that is already flushed on the > socket.
> But back to the root cause for a moment, the problem is: "currently > Redis does not handle well the case when multiple slaves want sync at > once". I trust you about that, but I would understand why this > happens.
> I mean, even without partial resync, Redis should handle that better. > Full resync should just be slower, but not a DoS. > Redis is already optimized to do a single BSAVE on reconnection of > multiple slaves, so what is actually DosSing it? Maybe the multiple > bulk transfers generate too much I/O and we should trottle this stuff?
> Please if you have some information on this matter and how I can > reproduce it I would love to insert this fix into 2.6 if possible.
> Salvatore
> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org> > wrote: > > Hi,
> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> > wrote: > >> The problem is that occasionally the WAN link between the data centers > is > >> interrupted and the slaves in data center B decide they need to re-sync > with > >> their masters. Unfortunately, *all* the instances on each slave try to > do > >> this AT THE SAME TIME and that causes too much stress on the masters. > >> Effectively, the masters are DoSd by the slaves all re-syncing at the > same > >> time.
> > I understand the problem this is causing your system but I believe the > > solution you are presenting is targeting the symptom and not the root > > cause of the problem.
> > There is one event here that triggers a chain of two problems/symptoms:
> > * Event: connectivity loss between master and slave; > > * Problem 1: slave needs full re-sync with master; > > * Problem 2: N slaves doing this at the same time will cause a DoS on > masters.
> > The solutions presented on this thread try to tackle Problem 2, how to > > prevent the DoS of the master, and although it is a valid problem and > > should be solved (I'm particularly fond of SLAVEOF host port ONCE > > myself), it doesn't fix the initial problem: the need for a full > > re-sync.
> > I would propose that, for each slave, a rotating AOF file should be > > kept, based on time or size, with older files being removed when > > slaves ACK back synchronization points reached.
> > For example, when a slave connects, it tells you what was the last > > sync point it saw, and the master only has to send the AOF's since > > that sync point. Every time the master rotates a slave AOF, it sends > > the new name to the SLAVE. Every time a slave ACKs a specific AOF sync > > point, all AOFs up-to that one can be removed (or archived, if your > > business rules require that).
> > I'm sure that this simplistic approach has holes in it, I didn't > > thought it out thoroughly yet, but my initial point still stands: you > > are fixing a symptom, not the cause. It might be enough, and thats > > fine, just pointing it out though :).
> > -- > > You received this message because you are subscribed to the Google > Groups "Redis DB" group. > > To post to this group, send email to redis-db@googlegroups.com. > > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> http://invece.org > "We are what we repeatedly do. Excellence, therefore, is not an act, > but a habit." -- Aristotele
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> I agree with your analysis. Jeremy's proposal, while could be actually > useful to mitigate the problem, does not fix the root cause. > I also agree about incremental resync as a solution to many of this issues.
> However I think the implementation of incremental resync should use > the implementation proposed here:
> In short it uses the trick of still accumulating the output buffer of > the slaves for some time (or for some space) while the slave is not > connected. Moreover there is a sliding window so that we don't discard > the buffer sent to the slaves but take it for some time, since a slave > may want to resync from an offset that is already flushed on the > socket.
But in our case, when we're trying to use as much RAM on the box for redis as we we can (across many instances), I wonder if the extra buffering would start to cause problems too.
> But back to the root cause for a moment, the problem is: "currently > Redis does not handle well the case when multiple slaves want sync at > once". I trust you about that, but I would understand why this > happens.
I wouldn't say it that way. I'd say that redis assumes there is typically a single redis instance running on a given host. However, we're deploying them in a "1 instance per CPU core" environment. And our newer hosts are coming with 24 cores, which will just amplify the problem. (Thankfully they have SSDs so the disk i/o issue may be mitigated somewhat.)
> I mean, even without partial resync, Redis should handle that better. > Full resync should just be slower, but not a DoS. > Redis is already optimized to do a single BSAVE on reconnection of > multiple slaves, so what is actually DosSing it? Maybe the multiple > bulk transfers generate too much I/O and we should trottle this stuff?
> Please if you have some information on this matter and how I can > reproduce it I would love to insert this fix into 2.6 if possible.
Again, the real issue is not how redis handles re-sync. It does that well. But it doesn't give us enough control over what is currently an automatic behavior that ends up being harmful if you run enough instances on a large host.
> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org> > wrote: > > Hi,
> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> > wrote: > >> The problem is that occasionally the WAN link between the data centers > is > >> interrupted and the slaves in data center B decide they need to re-sync > with > >> their masters. Unfortunately, *all* the instances on each slave try to > do > >> this AT THE SAME TIME and that causes too much stress on the masters. > >> Effectively, the masters are DoSd by the slaves all re-syncing at the > same > >> time.
> > I understand the problem this is causing your system but I believe the > > solution you are presenting is targeting the symptom and not the root > > cause of the problem.
> > There is one event here that triggers a chain of two problems/symptoms:
> > * Event: connectivity loss between master and slave; > > * Problem 1: slave needs full re-sync with master; > > * Problem 2: N slaves doing this at the same time will cause a DoS on > masters.
> > The solutions presented on this thread try to tackle Problem 2, how to > > prevent the DoS of the master, and although it is a valid problem and > > should be solved (I'm particularly fond of SLAVEOF host port ONCE > > myself), it doesn't fix the initial problem: the need for a full > > re-sync.
> > I would propose that, for each slave, a rotating AOF file should be > > kept, based on time or size, with older files being removed when > > slaves ACK back synchronization points reached.
> > For example, when a slave connects, it tells you what was the last > > sync point it saw, and the master only has to send the AOF's since > > that sync point. Every time the master rotates a slave AOF, it sends > > the new name to the SLAVE. Every time a slave ACKs a specific AOF sync > > point, all AOFs up-to that one can be removed (or archived, if your > > business rules require that).
> > I'm sure that this simplistic approach has holes in it, I didn't > > thought it out thoroughly yet, but my initial point still stands: you > > are fixing a symptom, not the cause. It might be enough, and thats > > fine, just pointing it out though :).
> > -- > > You received this message because you are subscribed to the Google > Groups "Redis DB" group. > > To post to this group, send email to redis-db@googlegroups.com. > > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
> http://invece.org > "We are what we repeatedly do. Excellence, therefore, is not an act, > but a habit." -- Aristotele
> -- > You received this message because you are subscribed to the Google Groups > "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com> wrote: > Sorry for the delay on getting back to this issue... Here's what has > happened to us a few times (with a bit more detail).
> We have 10 hosts in two data centers (a and b). Let's call the hosts > host1a, host1b, host2a, host2b, etc.
> Every host runs 4 instances of redis-server and has 32GB of RAM. All "b" > hosts replicate from "a" hosts, so:
> host1b:63790 is a slave of host1a:63790 > host1b:63791 is a slave of host1a:63791 > host1b:63792 is a slave of host1a:63792 > host1b:63793 is a slave of host1a:63793
> And there's is no persistance aside from the .rdb files that are created > at (1) shutdown or (2) during replication sync.
> This is an important point: our instances are almost always "full" and > we're relying on the lru to evict data continuously.
> So, what happens is this:
> (1) the network between the "a" and "b" hosts becomes interrupted > (2) the slaves in "b" lose contact with "a" and eventually timeout > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE > (4) the redis instances in "a" each start to dump their .rdb files > (5) since there are several going at once, the dumping to disk is i/o bound > (6) the dumping takes longer than it should, which results in more dirty > COW pages > (7) the fact that we're always full and evicting keys makes #6 worse > (8) the box starts to swap, which makes #7 worse > (9) we enter a death spiral which is hard to recover from
> However, if we were to rsync one instance at a time (we already have > external code for this, as I mentioned), this problem doesn't occur and our > instances resync pretty quickly.
> The only other solution, which sucks, is to really lower the max-memory on > our instances quite a bit but that's a wasteful solution in my eyes.
> Does this help clarify what we're seeing and why I believe my proposed fix > (a non-restarting replication option) would help to prevent it?
> Thanks,
> Jeremy
> On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <anti...@gmail.com>wrote:
>> Hello Pedro,
>> I agree with your analysis. Jeremy's proposal, while could be actually >> useful to mitigate the problem, does not fix the root cause. >> I also agree about incremental resync as a solution to many of this >> issues.
>> However I think the implementation of incremental resync should use >> the implementation proposed here:
>> In short it uses the trick of still accumulating the output buffer of >> the slaves for some time (or for some space) while the slave is not >> connected. Moreover there is a sliding window so that we don't discard >> the buffer sent to the slaves but take it for some time, since a slave >> may want to resync from an offset that is already flushed on the >> socket.
>> But back to the root cause for a moment, the problem is: "currently >> Redis does not handle well the case when multiple slaves want sync at >> once". I trust you about that, but I would understand why this >> happens.
>> I mean, even without partial resync, Redis should handle that better. >> Full resync should just be slower, but not a DoS. >> Redis is already optimized to do a single BSAVE on reconnection of >> multiple slaves, so what is actually DosSing it? Maybe the multiple >> bulk transfers generate too much I/O and we should trottle this stuff?
>> Please if you have some information on this matter and how I can >> reproduce it I would love to insert this fix into 2.6 if possible.
>> Salvatore
>> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org> >> wrote: >> > Hi,
>> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com> >> wrote: >> >> The problem is that occasionally the WAN link between the data centers >> is >> >> interrupted and the slaves in data center B decide they need to >> re-sync with >> >> their masters. Unfortunately, *all* the instances on each slave try to >> do >> >> this AT THE SAME TIME and that causes too much stress on the masters. >> >> Effectively, the masters are DoSd by the slaves all re-syncing at the >> same >> >> time.
>> > I understand the problem this is causing your system but I believe the >> > solution you are presenting is targeting the symptom and not the root >> > cause of the problem.
>> > There is one event here that triggers a chain of two problems/symptoms:
>> > * Event: connectivity loss between master and slave; >> > * Problem 1: slave needs full re-sync with master; >> > * Problem 2: N slaves doing this at the same time will cause a DoS on >> masters.
>> > The solutions presented on this thread try to tackle Problem 2, how to >> > prevent the DoS of the master, and although it is a valid problem and >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE >> > myself), it doesn't fix the initial problem: the need for a full >> > re-sync.
>> > I would propose that, for each slave, a rotating AOF file should be >> > kept, based on time or size, with older files being removed when >> > slaves ACK back synchronization points reached.
>> > For example, when a slave connects, it tells you what was the last >> > sync point it saw, and the master only has to send the AOF's since >> > that sync point. Every time the master rotates a slave AOF, it sends >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF sync >> > point, all AOFs up-to that one can be removed (or archived, if your >> > business rules require that).
>> > I'm sure that this simplistic approach has holes in it, I didn't >> > thought it out thoroughly yet, but my initial point still stands: you >> > are fixing a symptom, not the cause. It might be enough, and thats >> > fine, just pointing it out though :).
>> > -- >> > You received this message because you are subscribed to the Google >> Groups "Redis DB" group. >> > To post to this group, send email to redis-db@googlegroups.com. >> > To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> > For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.
>> http://invece.org >> "We are what we repeatedly do. Excellence, therefore, is not an act, >> but a habit." -- Aristotele
>> -- >> You received this message because you are subscribed to the Google Groups >> "Redis DB" group. >> To post to this group, send email to redis-db@googlegroups.com. >> To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.
> I'll submit a pull request and see what happens. :-)
> Jeremy
> On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
> > Sorry for the delay on getting back to this issue... Here's what has
> > happened to us a few times (with a bit more detail).
> > We have 10 hosts in two data centers (a and b). Let's call the hosts
> > host1a, host1b, host2a, host2b, etc.
> > Every host runs 4 instances of redis-server and has 32GB of RAM. All "b"
> > hosts replicate from "a" hosts, so:
> > host1b:63790 is a slave of host1a:63790
> > host1b:63791 is a slave of host1a:63791
> > host1b:63792 is a slave of host1a:63792
> > host1b:63793 is a slave of host1a:63793
> > And there's is no persistance aside from the .rdb files that are created
> > at (1) shutdown or (2) during replication sync.
> > This is an important point: our instances are almost always "full" and
> > we're relying on the lru to evict data continuously.
> > So, what happens is this:
> > (1) the network between the "a" and "b" hosts becomes interrupted
> > (2) the slaves in "b" lose contact with "a" and eventually timeout
> > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
> > (4) the redis instances in "a" each start to dump their .rdb files
> > (5) since there are several going at once, the dumping to disk is i/o bound
> > (6) the dumping takes longer than it should, which results in more dirty
> > COW pages
> > (7) the fact that we're always full and evicting keys makes #6 worse
> > (8) the box starts to swap, which makes #7 worse
> > (9) we enter a death spiral which is hard to recover from
> > However, if we were to rsync one instance at a time (we already have
> > external code for this, as I mentioned), this problem doesn't occur and our
> > instances resync pretty quickly.
> > The only other solution, which sucks, is to really lower the max-memory on
> > our instances quite a bit but that's a wasteful solution in my eyes.
> > Does this help clarify what we're seeing and why I believe my proposed fix
> > (a non-restarting replication option) would help to prevent it?
> > Thanks,
> > Jeremy
> > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <anti...@gmail.com>wrote:
> >> Hello Pedro,
> >> I agree with your analysis. Jeremy's proposal, while could be actually
> >> useful to mitigate the problem, does not fix the root cause.
> >> I also agree about incremental resync as a solution to many of this
> >> issues.
> >> However I think the implementation of incremental resync should use
> >> the implementation proposed here:
> >> In short it uses the trick of still accumulating the output buffer of
> >> the slaves for some time (or for some space) while the slave is not
> >> connected. Moreover there is a sliding window so that we don't discard
> >> the buffer sent to the slaves but take it for some time, since a slave
> >> may want to resync from an offset that is already flushed on the
> >> socket.
> >> But back to the root cause for a moment, the problem is: "currently
> >> Redis does not handle well the case when multiple slaves want sync at
> >> once". I trust you about that, but I would understand why this
> >> happens.
> >> I mean, even without partial resync, Redis should handle that better.
> >> Full resync should just be slower, but not a DoS.
> >> Redis is already optimized to do a single BSAVE on reconnection of
> >> multiple slaves, so what is actually DosSing it? Maybe the multiple
> >> bulk transfers generate too much I/O and we should trottle this stuff?
> >> Please if you have some information on this matter and how I can
> >> reproduce it I would love to insert this fix into 2.6 if possible.
> >> Salvatore
> >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org>
> >> wrote:
> >> > Hi,
> >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <Jer...@zawodny.com>
> >> wrote:
> >> >> The problem is that occasionally the WAN link between the data centers
> >> is
> >> >> interrupted and the slaves in data center B decide they need to
> >> re-sync with
> >> >> their masters. Unfortunately, *all* the instances on each slave try to
> >> do
> >> >> this AT THE SAME TIME and that causes too much stress on the masters.
> >> >> Effectively, the masters are DoSd by the slaves all re-syncing at the
> >> same
> >> >> time.
> >> > I understand the problem this is causing your system but I believe the
> >> > solution you are presenting is targeting the symptom and not the root
> >> > cause of the problem.
> >> > There is one event here that triggers a chain of two problems/symptoms:
> >> > * Event: connectivity loss between master and slave;
> >> > * Problem 1: slave needs full re-sync with master;
> >> > * Problem 2: N slaves doing this at the same time will cause a DoS on
> >> masters.
> >> > The solutions presented on this thread try to tackle Problem 2, how to
> >> > prevent the DoS of the master, and although it is a valid problem and
> >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE
> >> > myself), it doesn't fix the initial problem: the need for a full
> >> > re-sync.
> >> > I would propose that, for each slave, a rotating AOF file should be
> >> > kept, based on time or size, with older files being removed when
> >> > slaves ACK back synchronization points reached.
> >> > For example, when a slave connects, it tells you what was the last
> >> > sync point it saw, and the master only has to send the AOF's since
> >> > that sync point. Every time the master rotates a slave AOF, it sends
> >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF sync
> >> > point, all AOFs up-to that one can be removed (or archived, if your
> >> > business rules require that).
> >> > I'm sure that this simplistic approach has holes in it, I didn't
> >> > thought it out thoroughly yet, but my initial point still stands: you
> >> > are fixing a symptom, not the cause. It might be enough, and thats
> >> > fine, just pointing it out though :).
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> Groups "Redis DB" group.
> >> > To post to this group, send email to redis-db@googlegroups.com.
> >> > To unsubscribe from this group, send email to
> >> redis-db+unsubscribe@googlegroups.com.
> >> > For more options, visit this group at
> >>http://groups.google.com/group/redis-db?hl=en.
> >>http://invece.org > >> "We are what we repeatedly do. Excellence, therefore, is not an act,
> >> but a habit." -- Aristotele
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Redis DB" group.
> >> To post to this group, send email to redis-db@googlegroups.com.
> >> To unsubscribe from this group, send email to
> >> redis-db+unsubscribe@googlegroups.com.
> >> For more options, visit this group at
> >>http://groups.google.com/group/redis-db?hl=en.
> > I'll submit a pull request and see what happens. :-)
> > Jeremy
> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com>
> wrote:
> > > Sorry for the delay on getting back to this issue... Here's what has
> > > happened to us a few times (with a bit more detail).
> > > We have 10 hosts in two data centers (a and b). Let's call the hosts
> > > host1a, host1b, host2a, host2b, etc.
> > > Every host runs 4 instances of redis-server and has 32GB of RAM. All
> "b"
> > > hosts replicate from "a" hosts, so:
> > > host1b:63790 is a slave of host1a:63790
> > > host1b:63791 is a slave of host1a:63791
> > > host1b:63792 is a slave of host1a:63792
> > > host1b:63793 is a slave of host1a:63793
> > > And there's is no persistance aside from the .rdb files that are
> created
> > > at (1) shutdown or (2) during replication sync.
> > > This is an important point: our instances are almost always "full" and
> > > we're relying on the lru to evict data continuously.
> > > So, what happens is this:
> > > (1) the network between the "a" and "b" hosts becomes interrupted
> > > (2) the slaves in "b" lose contact with "a" and eventually timeout
> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
> > > (4) the redis instances in "a" each start to dump their .rdb files
> > > (5) since there are several going at once, the dumping to disk is i/o
> bound
> > > (6) the dumping takes longer than it should, which results in more
> dirty
> > > COW pages
> > > (7) the fact that we're always full and evicting keys makes #6 worse
> > > (8) the box starts to swap, which makes #7 worse
> > > (9) we enter a death spiral which is hard to recover from
> > > However, if we were to rsync one instance at a time (we already have
> > > external code for this, as I mentioned), this problem doesn't occur
> and our
> > > instances resync pretty quickly.
> > > The only other solution, which sucks, is to really lower the
> max-memory on
> > > our instances quite a bit but that's a wasteful solution in my eyes.
> > > Does this help clarify what we're seeing and why I believe my proposed
> fix
> > > (a non-restarting replication option) would help to prevent it?
> > > Thanks,
> > > Jeremy
> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <
> anti...@gmail.com>wrote:
> > >> Hello Pedro,
> > >> I agree with your analysis. Jeremy's proposal, while could be actually
> > >> useful to mitigate the problem, does not fix the root cause.
> > >> I also agree about incremental resync as a solution to many of this
> > >> issues.
> > >> However I think the implementation of incremental resync should use
> > >> the implementation proposed here:
> > >> In short it uses the trick of still accumulating the output buffer of
> > >> the slaves for some time (or for some space) while the slave is not
> > >> connected. Moreover there is a sliding window so that we don't discard
> > >> the buffer sent to the slaves but take it for some time, since a slave
> > >> may want to resync from an offset that is already flushed on the
> > >> socket.
> > >> But back to the root cause for a moment, the problem is: "currently
> > >> Redis does not handle well the case when multiple slaves want sync at
> > >> once". I trust you about that, but I would understand why this
> > >> happens.
> > >> I mean, even without partial resync, Redis should handle that better.
> > >> Full resync should just be slower, but not a DoS.
> > >> Redis is already optimized to do a single BSAVE on reconnection of
> > >> multiple slaves, so what is actually DosSing it? Maybe the multiple
> > >> bulk transfers generate too much I/O and we should trottle this stuff?
> > >> Please if you have some information on this matter and how I can
> > >> reproduce it I would love to insert this fix into 2.6 if possible.
> > >> Salvatore
> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org>
> > >> wrote:
> > >> > Hi,
> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <
> Jer...@zawodny.com>
> > >> wrote:
> > >> >> The problem is that occasionally the WAN link between the data
> centers
> > >> is
> > >> >> interrupted and the slaves in data center B decide they need to
> > >> re-sync with
> > >> >> their masters. Unfortunately, *all* the instances on each slave
> try to
> > >> do
> > >> >> this AT THE SAME TIME and that causes too much stress on the
> masters.
> > >> >> Effectively, the masters are DoSd by the slaves all re-syncing at
> the
> > >> same
> > >> >> time.
> > >> > I understand the problem this is causing your system but I believe
> the
> > >> > solution you are presenting is targeting the symptom and not the
> root
> > >> > cause of the problem.
> > >> > There is one event here that triggers a chain of two
> problems/symptoms:
> > >> > * Event: connectivity loss between master and slave;
> > >> > * Problem 1: slave needs full re-sync with master;
> > >> > * Problem 2: N slaves doing this at the same time will cause a DoS
> on
> > >> masters.
> > >> > The solutions presented on this thread try to tackle Problem 2, how
> to
> > >> > prevent the DoS of the master, and although it is a valid problem
> and
> > >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE
> > >> > myself), it doesn't fix the initial problem: the need for a full
> > >> > re-sync.
> > >> > I would propose that, for each slave, a rotating AOF file should be
> > >> > kept, based on time or size, with older files being removed when
> > >> > slaves ACK back synchronization points reached.
> > >> > For example, when a slave connects, it tells you what was the last
> > >> > sync point it saw, and the master only has to send the AOF's since
> > >> > that sync point. Every time the master rotates a slave AOF, it sends
> > >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF
> sync
> > >> > point, all AOFs up-to that one can be removed (or archived, if your
> > >> > business rules require that).
> > >> > I'm sure that this simplistic approach has holes in it, I didn't
> > >> > thought it out thoroughly yet, but my initial point still stands:
> you
> > >> > are fixing a symptom, not the cause. It might be enough, and thats
> > >> > fine, just pointing it out though :).
> > >> > --
> > >> > You received this message because you are subscribed to the Google
> > >> Groups "Redis DB" group.
> > >> > To post to this group, send email to redis-db@googlegroups.com.
> > >> > To unsubscribe from this group, send email to
> > >> redis-db+unsubscribe@googlegroups.com.
> > >> > For more options, visit this group at
> > >>http://groups.google.com/group/redis-db?hl=en.
> > >>http://invece.org > > >> "We are what we repeatedly do. Excellence, therefore, is not an act,
> > >> but a habit." -- Aristotele
> > >> --
> > >> You received this message because you are subscribed to the Google
> Groups
> > >> "Redis DB" group.
> > >> To post to this group, send email to redis-db@googlegroups.com.
> > >> To unsubscribe from this group, send email to
> > >> redis-db+unsubscribe@googlegroups.com.
> > >> For more options, visit this group at
> > >>http://groups.google.com/group/redis-db?hl=en.
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redis-db@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+unsubscribe@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
On Tue, Jun 19, 2012 at 9:43 AM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
> Just to follow-up on this, I've ported my patch to 2.6-rc4 and we've been
> running that in production for a few days now.
> I'd like to submit a pull request, but don't know if the maintainers are
> interested in merging it.
> For an idea of how little it changes, here's the old 2.4 changes needed:
>> > I'll submit a pull request and see what happens. :-)
>> > Jeremy
>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com>
>> wrote:
>> > > Sorry for the delay on getting back to this issue... Here's what has
>> > > happened to us a few times (with a bit more detail).
>> > > We have 10 hosts in two data centers (a and b). Let's call the hosts
>> > > host1a, host1b, host2a, host2b, etc.
>> > > Every host runs 4 instances of redis-server and has 32GB of RAM. All
>> "b"
>> > > hosts replicate from "a" hosts, so:
>> > > host1b:63790 is a slave of host1a:63790
>> > > host1b:63791 is a slave of host1a:63791
>> > > host1b:63792 is a slave of host1a:63792
>> > > host1b:63793 is a slave of host1a:63793
>> > > And there's is no persistance aside from the .rdb files that are
>> created
>> > > at (1) shutdown or (2) during replication sync.
>> > > This is an important point: our instances are almost always "full" and
>> > > we're relying on the lru to evict data continuously.
>> > > So, what happens is this:
>> > > (1) the network between the "a" and "b" hosts becomes interrupted
>> > > (2) the slaves in "b" lose contact with "a" and eventually timeout
>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
>> > > (4) the redis instances in "a" each start to dump their .rdb files
>> > > (5) since there are several going at once, the dumping to disk is i/o
>> bound
>> > > (6) the dumping takes longer than it should, which results in more
>> dirty
>> > > COW pages
>> > > (7) the fact that we're always full and evicting keys makes #6 worse
>> > > (8) the box starts to swap, which makes #7 worse
>> > > (9) we enter a death spiral which is hard to recover from
>> > > However, if we were to rsync one instance at a time (we already have
>> > > external code for this, as I mentioned), this problem doesn't occur
>> and our
>> > > instances resync pretty quickly.
>> > > The only other solution, which sucks, is to really lower the
>> max-memory on
>> > > our instances quite a bit but that's a wasteful solution in my eyes.
>> > > Does this help clarify what we're seeing and why I believe my
>> proposed fix
>> > > (a non-restarting replication option) would help to prevent it?
>> > > Thanks,
>> > > Jeremy
>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <
>> anti...@gmail.com>wrote:
>> > >> Hello Pedro,
>> > >> I agree with your analysis. Jeremy's proposal, while could be
>> actually
>> > >> useful to mitigate the problem, does not fix the root cause.
>> > >> I also agree about incremental resync as a solution to many of this
>> > >> issues.
>> > >> However I think the implementation of incremental resync should use
>> > >> the implementation proposed here:
>> > >> In short it uses the trick of still accumulating the output buffer of
>> > >> the slaves for some time (or for some space) while the slave is not
>> > >> connected. Moreover there is a sliding window so that we don't
>> discard
>> > >> the buffer sent to the slaves but take it for some time, since a
>> slave
>> > >> may want to resync from an offset that is already flushed on the
>> > >> socket.
>> > >> But back to the root cause for a moment, the problem is: "currently
>> > >> Redis does not handle well the case when multiple slaves want sync at
>> > >> once". I trust you about that, but I would understand why this
>> > >> happens.
>> > >> I mean, even without partial resync, Redis should handle that better.
>> > >> Full resync should just be slower, but not a DoS.
>> > >> Redis is already optimized to do a single BSAVE on reconnection of
>> > >> multiple slaves, so what is actually DosSing it? Maybe the multiple
>> > >> bulk transfers generate too much I/O and we should trottle this
>> stuff?
>> > >> Please if you have some information on this matter and how I can
>> > >> reproduce it I would love to insert this fix into 2.6 if possible.
>> > >> Salvatore
>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org>
>> > >> wrote:
>> > >> > Hi,
>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <
>> Jer...@zawodny.com>
>> > >> wrote:
>> > >> >> The problem is that occasionally the WAN link between the data
>> centers
>> > >> is
>> > >> >> interrupted and the slaves in data center B decide they need to
>> > >> re-sync with
>> > >> >> their masters. Unfortunately, *all* the instances on each slave
>> try to
>> > >> do
>> > >> >> this AT THE SAME TIME and that causes too much stress on the
>> masters.
>> > >> >> Effectively, the masters are DoSd by the slaves all re-syncing at
>> the
>> > >> same
>> > >> >> time.
>> > >> > I understand the problem this is causing your system but I believe
>> the
>> > >> > solution you are presenting is targeting the symptom and not the
>> root
>> > >> > cause of the problem.
>> > >> > There is one event here that triggers a chain of two
>> problems/symptoms:
>> > >> > * Event: connectivity loss between master and slave;
>> > >> > * Problem 1: slave needs full re-sync with master;
>> > >> > * Problem 2: N slaves doing this at the same time will cause a
>> DoS on
>> > >> masters.
>> > >> > The solutions presented on this thread try to tackle Problem 2,
>> how to
>> > >> > prevent the DoS of the master, and although it is a valid problem
>> and
>> > >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE
>> > >> > myself), it doesn't fix the initial problem: the need for a full
>> > >> > re-sync.
>> > >> > I would propose that, for each slave, a rotating AOF file should be
>> > >> > kept, based on time or size, with older files being removed when
>> > >> > slaves ACK back synchronization points reached.
>> > >> > For example, when a slave connects, it tells you what was the last
>> > >> > sync point it saw, and the master only has to send the AOF's since
>> > >> > that sync point. Every time the master rotates a slave AOF, it
>> sends
>> > >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF
>> sync
>> > >> > point, all AOFs up-to that one can be removed (or archived, if your
>> > >> > business rules require that).
>> > >> > I'm sure that this simplistic approach has holes in it, I didn't
>> > >> > thought it out thoroughly yet, but my initial point still stands:
>> you
>> > >> > are fixing a symptom, not the cause. It might be enough, and thats
>> > >> > fine, just pointing it out though :).
>> > >> > --
>> > >> > You received this message because you are subscribed to the Google
>> > >> Groups "Redis DB" group.
>> > >> > To post to this group, send email to redis-db@googlegroups.com.
>> > >> > To unsubscribe from this group, send email to
>> > >> redis-db+unsubscribe@googlegroups.com.
>> > >> > For more options, visit this group at
>> > >>http://groups.google.com/group/redis-db?hl=en.
>> > >>http://invece.org >> > >> "We are what we repeatedly do. Excellence, therefore, is not an act,
>> > >> but a habit." -- Aristotele
>> > >> --
>> > >> You received this message because you are subscribed to the Google
>> Groups
>> > >> "Redis DB" group.
>> > >> To post to this group, send email to redis-db@googlegroups.com.
>> > >> To unsubscribe from this group, send email to
>> > >> redis-db+unsubscribe@googlegroups.com.
>> > >> For more options, visit this group at
>> > >>http://groups.google.com/group/redis-db?hl=en.
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Redis DB" group.
>> To post to this group, send email to redis-db@googlegroups.com.
>> To unsubscribe from this group, send email to
>> redis-db+unsubscribe@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/redis-db?hl=en.
Nice!
To me this is one of the things I miss most in redis, and this is a nice
step in the direction.
Being such a small optional patch, I'd really love to see this getting
pulled.
BTW Isn't there some kind of solution to this planned as part of sentinel?
and smooth replication planned as a part of 2.8?
On Thu, Dec 27, 2012 at 9:18 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
> Ok, 6 months later I've ported that feature to 2.6 and submitted a pull
> request against 2.6:
>>> > I'll submit a pull request and see what happens. :-)
>>> > Jeremy
>>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com>
>>> wrote:
>>> > > Sorry for the delay on getting back to this issue... Here's what has
>>> > > happened to us a few times (with a bit more detail).
>>> > > We have 10 hosts in two data centers (a and b). Let's call the hosts
>>> > > host1a, host1b, host2a, host2b, etc.
>>> > > Every host runs 4 instances of redis-server and has 32GB of RAM.
>>> All "b"
>>> > > hosts replicate from "a" hosts, so:
>>> > > host1b:63790 is a slave of host1a:63790
>>> > > host1b:63791 is a slave of host1a:63791
>>> > > host1b:63792 is a slave of host1a:63792
>>> > > host1b:63793 is a slave of host1a:63793
>>> > > And there's is no persistance aside from the .rdb files that are
>>> created
>>> > > at (1) shutdown or (2) during replication sync.
>>> > > This is an important point: our instances are almost always "full"
>>> and
>>> > > we're relying on the lru to evict data continuously.
>>> > > So, what happens is this:
>>> > > (1) the network between the "a" and "b" hosts becomes interrupted
>>> > > (2) the slaves in "b" lose contact with "a" and eventually timeout
>>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
>>> > > (4) the redis instances in "a" each start to dump their .rdb files
>>> > > (5) since there are several going at once, the dumping to disk is
>>> i/o bound
>>> > > (6) the dumping takes longer than it should, which results in more
>>> dirty
>>> > > COW pages
>>> > > (7) the fact that we're always full and evicting keys makes #6 worse
>>> > > (8) the box starts to swap, which makes #7 worse
>>> > > (9) we enter a death spiral which is hard to recover from
>>> > > However, if we were to rsync one instance at a time (we already have
>>> > > external code for this, as I mentioned), this problem doesn't occur
>>> and our
>>> > > instances resync pretty quickly.
>>> > > The only other solution, which sucks, is to really lower the
>>> max-memory on
>>> > > our instances quite a bit but that's a wasteful solution in my eyes.
>>> > > Does this help clarify what we're seeing and why I believe my
>>> proposed fix
>>> > > (a non-restarting replication option) would help to prevent it?
>>> > > Thanks,
>>> > > Jeremy
>>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo <
>>> anti...@gmail.com>wrote:
>>> > >> Hello Pedro,
>>> > >> I agree with your analysis. Jeremy's proposal, while could be
>>> actually
>>> > >> useful to mitigate the problem, does not fix the root cause.
>>> > >> I also agree about incremental resync as a solution to many of this
>>> > >> issues.
>>> > >> However I think the implementation of incremental resync should use
>>> > >> the implementation proposed here:
>>> > >> In short it uses the trick of still accumulating the output buffer
>>> of
>>> > >> the slaves for some time (or for some space) while the slave is not
>>> > >> connected. Moreover there is a sliding window so that we don't
>>> discard
>>> > >> the buffer sent to the slaves but take it for some time, since a
>>> slave
>>> > >> may want to resync from an offset that is already flushed on the
>>> > >> socket.
>>> > >> But back to the root cause for a moment, the problem is: "currently
>>> > >> Redis does not handle well the case when multiple slaves want sync
>>> at
>>> > >> once". I trust you about that, but I would understand why this
>>> > >> happens.
>>> > >> I mean, even without partial resync, Redis should handle that
>>> better.
>>> > >> Full resync should just be slower, but not a DoS.
>>> > >> Redis is already optimized to do a single BSAVE on reconnection of
>>> > >> multiple slaves, so what is actually DosSing it? Maybe the multiple
>>> > >> bulk transfers generate too much I/O and we should trottle this
>>> stuff?
>>> > >> Please if you have some information on this matter and how I can
>>> > >> reproduce it I would love to insert this fix into 2.6 if possible.
>>> > >> Salvatore
>>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo <m...@simplicidade.org
>>> > >> wrote:
>>> > >> > Hi,
>>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny <
>>> Jer...@zawodny.com>
>>> > >> wrote:
>>> > >> >> The problem is that occasionally the WAN link between the data
>>> centers
>>> > >> is
>>> > >> >> interrupted and the slaves in data center B decide they need to
>>> > >> re-sync with
>>> > >> >> their masters. Unfortunately, *all* the instances on each slave
>>> try to
>>> > >> do
>>> > >> >> this AT THE SAME TIME and that causes too much stress on the
>>> masters.
>>> > >> >> Effectively, the masters are DoSd by the slaves all re-syncing
>>> at the
>>> > >> same
>>> > >> >> time.
>>> > >> > I understand the problem this is causing your system but I
>>> believe the
>>> > >> > solution you are presenting is targeting the symptom and not the
>>> root
>>> > >> > cause of the problem.
>>> > >> > There is one event here that triggers a chain of two
>>> problems/symptoms:
>>> > >> > * Event: connectivity loss between master and slave;
>>> > >> > * Problem 1: slave needs full re-sync with master;
>>> > >> > * Problem 2: N slaves doing this at the same time will cause a
>>> DoS on
>>> > >> masters.
>>> > >> > The solutions presented on this thread try to tackle Problem 2,
>>> how to
>>> > >> > prevent the DoS of the master, and although it is a valid problem
>>> and
>>> > >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE
>>> > >> > myself), it doesn't fix the initial problem: the need for a full
>>> > >> > re-sync.
>>> > >> > I would propose that, for each slave, a rotating AOF file should
>>> be
>>> > >> > kept, based on time or size, with older files being removed when
>>> > >> > slaves ACK back synchronization points reached.
>>> > >> > For example, when a slave connects, it tells you what was the last
>>> > >> > sync point it saw, and the master only has to send the AOF's since
>>> > >> > that sync point. Every time the master rotates a slave AOF, it
>>> sends
>>> > >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF
>>> sync
>>> > >> > point, all AOFs up-to that one can be removed (or archived, if
>>> your
>>> > >> > business rules require that).
>>> > >> > I'm sure that this simplistic approach has holes in it, I didn't
>>> > >> > thought it out thoroughly yet, but my initial point still stands:
>>> you
>>> > >> > are fixing a symptom, not the cause. It might be enough, and thats
>>> > >> > fine, just pointing it out though :).
>>> > >> > --
>>> > >> > You received this message because you are subscribed to the Google
>>> > >> Groups "Redis DB" group.
>>> > >> > To post to this group, send email to redis-db@googlegroups.com.
>>> > >> > To unsubscribe from this group, send email to
>>> > >> redis-db+unsubscribe@googlegroups.com.
>>> > >> > For more options, visit this group at
>>> > >>http://groups.google.com/group/redis-db?hl=en.
>>> > >>http://invece.org >>> > >> "We are what we repeatedly do. Excellence, therefore, is not an act,
>>> > >> but a habit." -- Aristotele
>>> > >> --
>>> > >> You received this message because you are subscribed to the Google
>>> Groups
>>> > >> "Redis DB" group.
>>> > >> To post to this group, send email to redis-db@googlegroups.com.
>>> > >> To unsubscribe from this group, send email to
On Thu, Dec 27, 2012 at 11:29 AM, Dvir Volk <dvir...@gmail.com> wrote:
> Nice!
> To me this is one of the things I miss most in redis, and this is a nice
> step in the direction.
> Being such a small optional patch, I'd really love to see this getting
> pulled.
> BTW Isn't there some kind of solution to this planned as part of sentinel?
> and smooth replication planned as a part of 2.8?
> On Thu, Dec 27, 2012 at 9:18 PM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
>> Ok, 6 months later I've ported that feature to 2.6 and submitted a pull
>> request against 2.6:
>>>> > I'll submit a pull request and see what happens. :-)
>>>> > Jeremy
>>>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <Jer...@zawodny.com>
>>>> > wrote:
>>>> > > Sorry for the delay on getting back to this issue... Here's what has
>>>> > > happened to us a few times (with a bit more detail).
>>>> > > We have 10 hosts in two data centers (a and b). Let's call the
>>>> > > hosts
>>>> > > host1a, host1b, host2a, host2b, etc.
>>>> > > Every host runs 4 instances of redis-server and has 32GB of RAM.
>>>> > > All "b"
>>>> > > hosts replicate from "a" hosts, so:
>>>> > > host1b:63790 is a slave of host1a:63790
>>>> > > host1b:63791 is a slave of host1a:63791
>>>> > > host1b:63792 is a slave of host1a:63792
>>>> > > host1b:63793 is a slave of host1a:63793
>>>> > > And there's is no persistance aside from the .rdb files that are
>>>> > > created
>>>> > > at (1) shutdown or (2) during replication sync.
>>>> > > This is an important point: our instances are almost always "full"
>>>> > > and
>>>> > > we're relying on the lru to evict data continuously.
>>>> > > So, what happens is this:
>>>> > > (1) the network between the "a" and "b" hosts becomes interrupted
>>>> > > (2) the slaves in "b" lose contact with "a" and eventually timeout
>>>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
>>>> > > (4) the redis instances in "a" each start to dump their .rdb files
>>>> > > (5) since there are several going at once, the dumping to disk is
>>>> > > i/o bound
>>>> > > (6) the dumping takes longer than it should, which results in more
>>>> > > dirty
>>>> > > COW pages
>>>> > > (7) the fact that we're always full and evicting keys makes #6 worse
>>>> > > (8) the box starts to swap, which makes #7 worse
>>>> > > (9) we enter a death spiral which is hard to recover from
>>>> > > However, if we were to rsync one instance at a time (we already have
>>>> > > external code for this, as I mentioned), this problem doesn't occur
>>>> > > and our
>>>> > > instances resync pretty quickly.
>>>> > > The only other solution, which sucks, is to really lower the
>>>> > > max-memory on
>>>> > > our instances quite a bit but that's a wasteful solution in my eyes.
>>>> > > Does this help clarify what we're seeing and why I believe my
>>>> > > proposed fix
>>>> > > (a non-restarting replication option) would help to prevent it?
>>>> > > Thanks,
>>>> > > Jeremy
>>>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo
>>>> > > <anti...@gmail.com>wrote:
>>>> > >> Hello Pedro,
>>>> > >> I agree with your analysis. Jeremy's proposal, while could be
>>>> > >> actually
>>>> > >> useful to mitigate the problem, does not fix the root cause.
>>>> > >> I also agree about incremental resync as a solution to many of this
>>>> > >> issues.
>>>> > >> However I think the implementation of incremental resync should use
>>>> > >> the implementation proposed here:
>>>> > >> In short it uses the trick of still accumulating the output buffer
>>>> > >> of
>>>> > >> the slaves for some time (or for some space) while the slave is not
>>>> > >> connected. Moreover there is a sliding window so that we don't
>>>> > >> discard
>>>> > >> the buffer sent to the slaves but take it for some time, since a
>>>> > >> slave
>>>> > >> may want to resync from an offset that is already flushed on the
>>>> > >> socket.
>>>> > >> But back to the root cause for a moment, the problem is: "currently
>>>> > >> Redis does not handle well the case when multiple slaves want sync
>>>> > >> at
>>>> > >> once". I trust you about that, but I would understand why this
>>>> > >> happens.
>>>> > >> I mean, even without partial resync, Redis should handle that
>>>> > >> better.
>>>> > >> Full resync should just be slower, but not a DoS.
>>>> > >> Redis is already optimized to do a single BSAVE on reconnection of
>>>> > >> multiple slaves, so what is actually DosSing it? Maybe the multiple
>>>> > >> bulk transfers generate too much I/O and we should trottle this
>>>> > >> stuff?
>>>> > >> Please if you have some information on this matter and how I can
>>>> > >> reproduce it I would love to insert this fix into 2.6 if possible.
>>>> > >> Salvatore
>>>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo
>>>> > >> <m...@simplicidade.org>
>>>> > >> wrote:
>>>> > >> > Hi,
>>>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny
>>>> > >> > <Jer...@zawodny.com>
>>>> > >> wrote:
>>>> > >> >> The problem is that occasionally the WAN link between the data
>>>> > >> >> centers
>>>> > >> is
>>>> > >> >> interrupted and the slaves in data center B decide they need to
>>>> > >> re-sync with
>>>> > >> >> their masters. Unfortunately, *all* the instances on each slave
>>>> > >> >> try to
>>>> > >> do
>>>> > >> >> this AT THE SAME TIME and that causes too much stress on the
>>>> > >> >> masters.
>>>> > >> >> Effectively, the masters are DoSd by the slaves all re-syncing
>>>> > >> >> at the
>>>> > >> same
>>>> > >> >> time.
>>>> > >> > I understand the problem this is causing your system but I
>>>> > >> > believe the
>>>> > >> > solution you are presenting is targeting the symptom and not the
>>>> > >> > root
>>>> > >> > cause of the problem.
>>>> > >> > There is one event here that triggers a chain of two
>>>> > >> > problems/symptoms:
>>>> > >> > * Event: connectivity loss between master and slave;
>>>> > >> > * Problem 1: slave needs full re-sync with master;
>>>> > >> > * Problem 2: N slaves doing this at the same time will cause a
>>>> > >> > DoS on
>>>> > >> masters.
>>>> > >> > The solutions presented on this thread try to tackle Problem 2,
>>>> > >> > how to
>>>> > >> > prevent the DoS of the master, and although it is a valid problem
>>>> > >> > and
>>>> > >> > should be solved (I'm particularly fond of SLAVEOF host port ONCE
>>>> > >> > myself), it doesn't fix the initial problem: the need for a full
>>>> > >> > re-sync.
>>>> > >> > I would propose that, for each slave, a rotating AOF file should
>>>> > >> > be
>>>> > >> > kept, based on time or size, with older files being removed when
>>>> > >> > slaves ACK back synchronization points reached.
>>>> > >> > For example, when a slave connects, it tells you what was the
>>>> > >> > last
>>>> > >> > sync point it saw, and the master only has to send the AOF's
>>>> > >> > since
>>>> > >> > that sync point. Every time the master rotates a slave AOF, it
>>>> > >> > sends
>>>> > >> > the new name to the SLAVE. Every time a slave ACKs a specific AOF
>>>> > >> > sync
>>>> > >> > point, all AOFs up-to that one can be removed (or archived, if
>>>> > >> > your
>>>> > >> > business rules require that).
>>>> > >> > I'm sure that this simplistic approach has holes in it, I didn't
>>>> > >> > thought it out thoroughly yet, but my initial point still stands:
>>>> > >> > you
>>>> > >> > are fixing a symptom, not the cause. It might be enough, and
>>>> > >> > thats
>>>> > >> > fine, just pointing it out though :).
>>>> > >> > --
>>>> > >> > You received this message because you are subscribed to the
>>>> > >> > Google
>>>> > >> Groups "Redis DB" group.
>>>> > >> > To post to this group, send email to redis-db@googlegroups.com.
>>>> > >> > To unsubscribe from
No, I read it (it wasn't that long!) and it's nice - not a silver bullet
but as I said a simple first step.
Sentinel was going to limit the number of reconnections which is better,
and 2.8 is supposed to introduce partial sync which will be the ultimate
solution.
but as a workaround for the time being - why not?
On Thu, Dec 27, 2012 at 11:56 PM, Josiah Carlson
<josiah.carl...@gmail.com>wrote:
> I think you misread the patch. This just says whether or not a slave
> would reconnect on connection failure.
> - Josiah
> On Thu, Dec 27, 2012 at 11:29 AM, Dvir Volk <dvir...@gmail.com> wrote:
> > Nice!
> > To me this is one of the things I miss most in redis, and this is a nice
> > step in the direction.
> > Being such a small optional patch, I'd really love to see this getting
> > pulled.
> > BTW Isn't there some kind of solution to this planned as part of
> sentinel?
> > and smooth replication planned as a part of 2.8?
> > On Thu, Dec 27, 2012 at 9:18 PM, Jeremy Zawodny <Jer...@zawodny.com>
> wrote:
> >> Ok, 6 months later I've ported that feature to 2.6 and submitted a pull
> >> request against 2.6:
> >>>> > I'll submit a pull request and see what happens. :-)
> >>>> > Jeremy
> >>>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <
> Jer...@zawodny.com>
> >>>> > wrote:
> >>>> > > Sorry for the delay on getting back to this issue... Here's what
> has
> >>>> > > happened to us a few times (with a bit more detail).
> >>>> > > We have 10 hosts in two data centers (a and b). Let's call the
> >>>> > > hosts
> >>>> > > host1a, host1b, host2a, host2b, etc.
> >>>> > > Every host runs 4 instances of redis-server and has 32GB of RAM.
> >>>> > > All "b"
> >>>> > > hosts replicate from "a" hosts, so:
> >>>> > > host1b:63790 is a slave of host1a:63790
> >>>> > > host1b:63791 is a slave of host1a:63791
> >>>> > > host1b:63792 is a slave of host1a:63792
> >>>> > > host1b:63793 is a slave of host1a:63793
> >>>> > > And there's is no persistance aside from the .rdb files that are
> >>>> > > created
> >>>> > > at (1) shutdown or (2) during replication sync.
> >>>> > > This is an important point: our instances are almost always "full"
> >>>> > > and
> >>>> > > we're relying on the lru to evict data continuously.
> >>>> > > So, what happens is this:
> >>>> > > (1) the network between the "a" and "b" hosts becomes interrupted
> >>>> > > (2) the slaves in "b" lose contact with "a" and eventually timeout
> >>>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
> >>>> > > (4) the redis instances in "a" each start to dump their .rdb files
> >>>> > > (5) since there are several going at once, the dumping to disk is
> >>>> > > i/o bound
> >>>> > > (6) the dumping takes longer than it should, which results in more
> >>>> > > dirty
> >>>> > > COW pages
> >>>> > > (7) the fact that we're always full and evicting keys makes #6
> worse
> >>>> > > (8) the box starts to swap, which makes #7 worse
> >>>> > > (9) we enter a death spiral which is hard to recover from
> >>>> > > However, if we were to rsync one instance at a time (we already
> have
> >>>> > > external code for this, as I mentioned), this problem doesn't
> occur
> >>>> > > and our
> >>>> > > instances resync pretty quickly.
> >>>> > > The only other solution, which sucks, is to really lower the
> >>>> > > max-memory on
> >>>> > > our instances quite a bit but that's a wasteful solution in my
> eyes.
> >>>> > > Does this help clarify what we're seeing and why I believe my
> >>>> > > proposed fix
> >>>> > > (a non-restarting replication option) would help to prevent it?
> >>>> > > Thanks,
> >>>> > > Jeremy
> >>>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo
> >>>> > > <anti...@gmail.com>wrote:
> >>>> > >> Hello Pedro,
> >>>> > >> I agree with your analysis. Jeremy's proposal, while could be
> >>>> > >> actually
> >>>> > >> useful to mitigate the problem, does not fix the root cause.
> >>>> > >> I also agree about incremental resync as a solution to many of
> this
> >>>> > >> issues.
> >>>> > >> However I think the implementation of incremental resync should
> use
> >>>> > >> the implementation proposed here:
> >>>> > >> In short it uses the trick of still accumulating the output
> buffer
> >>>> > >> of
> >>>> > >> the slaves for some time (or for some space) while the slave is
> not
> >>>> > >> connected. Moreover there is a sliding window so that we don't
> >>>> > >> discard
> >>>> > >> the buffer sent to the slaves but take it for some time, since a
> >>>> > >> slave
> >>>> > >> may want to resync from an offset that is already flushed on the
> >>>> > >> socket.
> >>>> > >> But back to the root cause for a moment, the problem is:
> "currently
> >>>> > >> Redis does not handle well the case when multiple slaves want
> sync
> >>>> > >> at
> >>>> > >> once". I trust you about that, but I would understand why this
> >>>> > >> happens.
> >>>> > >> I mean, even without partial resync, Redis should handle that
> >>>> > >> better.
> >>>> > >> Full resync should just be slower, but not a DoS.
> >>>> > >> Redis is already optimized to do a single BSAVE on reconnection
> of
> >>>> > >> multiple slaves, so what is actually DosSing it? Maybe the
> multiple
> >>>> > >> bulk transfers generate too much I/O and we should trottle this
> >>>> > >> stuff?
> >>>> > >> Please if you have some information on this matter and how I can
> >>>> > >> reproduce it I would love to insert this fix into 2.6 if
> possible.
> >>>> > >> Salvatore
> >>>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo
> >>>> > >> <m...@simplicidade.org>
> >>>> > >> wrote:
> >>>> > >> > Hi,
> >>>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny
> >>>> > >> > <Jer...@zawodny.com>
> >>>> > >> wrote:
> >>>> > >> >> The problem is that occasionally the WAN link between the data
> >>>> > >> >> centers
> >>>> > >> is
> >>>> > >> >> interrupted and the slaves in data center B decide they need
> to
> >>>> > >> re-sync with
> >>>> > >> >> their masters. Unfortunately, *all* the instances on each
> slave
> >>>> > >> >> try to
> >>>> > >> do
> >>>> > >> >> this AT THE SAME TIME and that causes too much stress on the
> >>>> > >> >> masters.
> >>>> > >> >> Effectively, the masters are DoSd by the slaves all re-syncing
> >>>> > >> >> at the
> >>>> > >> same
> >>>> > >> >> time.
> >>>> > >> > I understand the problem this is causing your system but I
> >>>> > >> > believe the
> >>>> > >> > solution you are presenting is targeting the symptom and not
> the
> >>>> > >> > root
> >>>> > >> > cause of the problem.
> >>>> > >> > There is one event here that triggers a chain of two
> >>>> > >> > problems/symptoms:
> >>>> > >> > * Event: connectivity loss between master and slave;
> >>>> > >> > * Problem 1: slave needs full re-sync with master;
> >>>> > >> > * Problem 2: N slaves doing this at the same time will cause a
> >>>> > >> > DoS on
> >>>> > >> masters.
> >>>> > >> > The solutions presented on this thread try to tackle Problem 2,
> >>>> > >> > how to
> >>>> > >> > prevent the DoS of the master, and although it is a valid
> problem
> >>>> > >> > and
> >>>> > >> > should be solved (I'm particularly fond of SLAVEOF host port
> ONCE
> >>>> > >> > myself), it doesn't fix the initial problem: the need for a
> full
> >>>> > >> > re-sync.
> >>>> > >> > I would propose that, for each slave, a rotating AOF file
> should
> >>>> > >> > be
> >>>> > >> > kept, based on time or size, with older files being removed
> when
> >>>> > >> > slaves ACK back synchronization points reached.
> >>>> > >> > For example, when a slave connects, it tells you what was the
> >>>> > >> > last
> >>>> > >> > sync point it saw, and the master only has to send the AOF's
> >>>> > >> > since
> >>>> > >> > that sync point. Every time the master rotates a slave AOF, it
> >>>> > >> > sends
> >>>> > >> > the new name to the SLAVE. Every time a slave ACKs a specific
> AOF
Yeah, that's pretty much my thinking. We need (and use) this feature
already. It keeps us from DoSing ourselves when the link between our
datacenters fails and 8 redis instances on each box try to resync at the
same time and the OOM killer gets busy on the master nodes--definitely not
fun.
On Thu, Dec 27, 2012 at 2:03 PM, Dvir Volk <dvir...@gmail.com> wrote:
> No, I read it (it wasn't that long!) and it's nice - not a silver bullet
> but as I said a simple first step.
> Sentinel was going to limit the number of reconnections which is better,
> and 2.8 is supposed to introduce partial sync which will be the ultimate
> solution.
> but as a workaround for the time being - why not?
> On Thu, Dec 27, 2012 at 11:56 PM, Josiah Carlson <josiah.carl...@gmail.com
> > wrote:
>> Dvir,
>> I think you misread the patch. This just says whether or not a slave
>> would reconnect on connection failure.
>> - Josiah
>> On Thu, Dec 27, 2012 at 11:29 AM, Dvir Volk <dvir...@gmail.com> wrote:
>> > Nice!
>> > To me this is one of the things I miss most in redis, and this is a nice
>> > step in the direction.
>> > Being such a small optional patch, I'd really love to see this getting
>> > pulled.
>> > BTW Isn't there some kind of solution to this planned as part of
>> sentinel?
>> > and smooth replication planned as a part of 2.8?
>> > On Thu, Dec 27, 2012 at 9:18 PM, Jeremy Zawodny <Jer...@zawodny.com>
>> wrote:
>> >> Ok, 6 months later I've ported that feature to 2.6 and submitted a pull
>> >> request against 2.6:
>> >>>> > I'll submit a pull request and see what happens. :-)
>> >>>> > Jeremy
>> >>>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <
>> Jer...@zawodny.com>
>> >>>> > wrote:
>> >>>> > > Sorry for the delay on getting back to this issue... Here's what
>> has
>> >>>> > > happened to us a few times (with a bit more detail).
>> >>>> > > We have 10 hosts in two data centers (a and b). Let's call the
>> >>>> > > hosts
>> >>>> > > host1a, host1b, host2a, host2b, etc.
>> >>>> > > Every host runs 4 instances of redis-server and has 32GB of RAM.
>> >>>> > > All "b"
>> >>>> > > hosts replicate from "a" hosts, so:
>> >>>> > > host1b:63790 is a slave of host1a:63790
>> >>>> > > host1b:63791 is a slave of host1a:63791
>> >>>> > > host1b:63792 is a slave of host1a:63792
>> >>>> > > host1b:63793 is a slave of host1a:63793
>> >>>> > > And so on with the other 9 pairs.
>> >>>> > > Each redis-server was configured with:
>> >>>> > > And there's is no persistance aside from the .rdb files that are
>> >>>> > > created
>> >>>> > > at (1) shutdown or (2) during replication sync.
>> >>>> > > This is an important point: our instances are almost always
>> "full"
>> >>>> > > and
>> >>>> > > we're relying on the lru to evict data continuously.
>> >>>> > > So, what happens is this:
>> >>>> > > (1) the network between the "a" and "b" hosts becomes interrupted
>> >>>> > > (2) the slaves in "b" lose contact with "a" and eventually
>> timeout
>> >>>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
>> >>>> > > (4) the redis instances in "a" each start to dump their .rdb
>> files
>> >>>> > > (5) since there are several going at once, the dumping to disk is
>> >>>> > > i/o bound
>> >>>> > > (6) the dumping takes longer than it should, which results in
>> more
>> >>>> > > dirty
>> >>>> > > COW pages
>> >>>> > > (7) the fact that we're always full and evicting keys makes #6
>> worse
>> >>>> > > (8) the box starts to swap, which makes #7 worse
>> >>>> > > (9) we enter a death spiral which is hard to recover from
>> >>>> > > However, if we were to rsync one instance at a time (we already
>> have
>> >>>> > > external code for this, as I mentioned), this problem doesn't
>> occur
>> >>>> > > and our
>> >>>> > > instances resync pretty quickly.
>> >>>> > > The only other solution, which sucks, is to really lower the
>> >>>> > > max-memory on
>> >>>> > > our instances quite a bit but that's a wasteful solution in my
>> eyes.
>> >>>> > > Does this help clarify what we're seeing and why I believe my
>> >>>> > > proposed fix
>> >>>> > > (a non-restarting replication option) would help to prevent it?
>> >>>> > > Thanks,
>> >>>> > > Jeremy
>> >>>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo
>> >>>> > > <anti...@gmail.com>wrote:
>> >>>> > >> Hello Pedro,
>> >>>> > >> I agree with your analysis. Jeremy's proposal, while could be
>> >>>> > >> actually
>> >>>> > >> useful to mitigate the problem, does not fix the root cause.
>> >>>> > >> I also agree about incremental resync as a solution to many of
>> this
>> >>>> > >> issues.
>> >>>> > >> However I think the implementation of incremental resync should
>> use
>> >>>> > >> the implementation proposed here:
>> >>>> > >> In short it uses the trick of still accumulating the output
>> buffer
>> >>>> > >> of
>> >>>> > >> the slaves for some time (or for some space) while the slave is
>> not
>> >>>> > >> connected. Moreover there is a sliding window so that we don't
>> >>>> > >> discard
>> >>>> > >> the buffer sent to the slaves but take it for some time, since a
>> >>>> > >> slave
>> >>>> > >> may want to resync from an offset that is already flushed on the
>> >>>> > >> socket.
>> >>>> > >> But back to the root cause for a moment, the problem is:
>> "currently
>> >>>> > >> Redis does not handle well the case when multiple slaves want
>> sync
>> >>>> > >> at
>> >>>> > >> once". I trust you about that, but I would understand why this
>> >>>> > >> happens.
>> >>>> > >> I mean, even without partial resync, Redis should handle that
>> >>>> > >> better.
>> >>>> > >> Full resync should just be slower, but not a DoS.
>> >>>> > >> Redis is already optimized to do a single BSAVE on reconnection
>> of
>> >>>> > >> multiple slaves, so what is actually DosSing it? Maybe the
>> multiple
>> >>>> > >> bulk transfers generate too much I/O and we should trottle this
>> >>>> > >> stuff?
>> >>>> > >> Please if you have some information on this matter and how I can
>> >>>> > >> reproduce it I would love to insert this fix into 2.6 if
>> possible.
>> >>>> > >> Salvatore
>> >>>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo
>> >>>> > >> <m...@simplicidade.org>
>> >>>> > >> wrote:
>> >>>> > >> > Hi,
>> >>>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny
>> >>>> > >> > <Jer...@zawodny.com>
>> >>>> > >> wrote:
>> >>>> > >> >> The problem is that occasionally the WAN link between the
>> data
>> >>>> > >> >> centers
>> >>>> > >> is
>> >>>> > >> >> interrupted and the slaves in data center B decide they need
>> to
>> >>>> > >> re-sync with
>> >>>> > >> >> their masters. Unfortunately, *all* the instances on each
>> slave
>> >>>> > >> >> try to
>> >>>> > >> do
>> >>>> > >> >> this AT THE SAME TIME and that causes too much stress on the
>> >>>> > >> >> masters.
>> >>>> > >> >> Effectively, the masters are DoSd by the slaves all
>> re-syncing
>> >>>> > >> >> at the
>> >>>> > >> same
>> >>>> > >> >> time.
>> >>>> > >> > I understand the problem this is causing your system but I
>> >>>> > >> > believe the
>> >>>> > >> > solution you are presenting is targeting the symptom and not
>> the
>> >>>> > >> > root
>> >>>> > >> > cause of the problem.
>> >>>> > >> > There is one event here that triggers a chain of two
>> >>>> > >> > problems/symptoms:
>> >>>> > >> > * Event: connectivity loss between master and slave;
>> >>>> > >> > * Problem 1: slave needs full re-sync with master;
>> >>>> > >> > * Problem 2: N slaves doing this at the same time will cause
>> a
>> >>>> > >> > DoS on
>> >>>> > >> masters.
>> >>>> > >> > The solutions presented on this thread try to tackle Problem
>> 2,
>> >>>> > >> > how to
>> >>>> > >> > prevent the DoS of the master, and although it is a valid
>> problem
>> >>>> > >> > and
>> >>>> > >> > should be solved (I'm particularly fond of SLAVEOF host port
>> ONCE
>> >>>> > >> > myself), it doesn't fix the initial problem: the need
On Fri, Dec 28, 2012 at 12:07 AM, Jeremy Zawodny <Jer...@zawodny.com> wrote:
> Yeah, that's pretty much my thinking. We need (and use) this feature
> already. It keeps us from DoSing ourselves when the link between our
> datacenters fails and 8 redis instances on each box try to resync at the
> same time and the OOM killer gets busy on the master nodes--definitely not
> fun.
> Jeremy
> On Thu, Dec 27, 2012 at 2:03 PM, Dvir Volk <dvir...@gmail.com> wrote:
>> No, I read it (it wasn't that long!) and it's nice - not a silver bullet
>> but as I said a simple first step.
>> Sentinel was going to limit the number of reconnections which is better,
>> and 2.8 is supposed to introduce partial sync which will be the ultimate
>> solution.
>> but as a workaround for the time being - why not?
>> On Thu, Dec 27, 2012 at 11:56 PM, Josiah Carlson <
>> josiah.carl...@gmail.com> wrote:
>>> Dvir,
>>> I think you misread the patch. This just says whether or not a slave
>>> would reconnect on connection failure.
>>> - Josiah
>>> On Thu, Dec 27, 2012 at 11:29 AM, Dvir Volk <dvir...@gmail.com> wrote:
>>> > Nice!
>>> > To me this is one of the things I miss most in redis, and this is a
>>> nice
>>> > step in the direction.
>>> > Being such a small optional patch, I'd really love to see this getting
>>> > pulled.
>>> > BTW Isn't there some kind of solution to this planned as part of
>>> sentinel?
>>> > and smooth replication planned as a part of 2.8?
>>> > On Thu, Dec 27, 2012 at 9:18 PM, Jeremy Zawodny <Jer...@zawodny.com>
>>> wrote:
>>> >> Ok, 6 months later I've ported that feature to 2.6 and submitted a
>>> pull
>>> >> request against 2.6:
>>> >>>> > I'll submit a pull request and see what happens. :-)
>>> >>>> > Jeremy
>>> >>>> > On Wed, Mar 14, 2012 at 11:35 AM, Jeremy Zawodny <
>>> Jer...@zawodny.com>
>>> >>>> > wrote:
>>> >>>> > > Sorry for the delay on getting back to this issue... Here's
>>> what has
>>> >>>> > > happened to us a few times (with a bit more detail).
>>> >>>> > > We have 10 hosts in two data centers (a and b). Let's call the
>>> >>>> > > hosts
>>> >>>> > > host1a, host1b, host2a, host2b, etc.
>>> >>>> > > Every host runs 4 instances of redis-server and has 32GB of RAM.
>>> >>>> > > All "b"
>>> >>>> > > hosts replicate from "a" hosts, so:
>>> >>>> > > host1b:63790 is a slave of host1a:63790
>>> >>>> > > host1b:63791 is a slave of host1a:63791
>>> >>>> > > host1b:63792 is a slave of host1a:63792
>>> >>>> > > host1b:63793 is a slave of host1a:63793
>>> >>>> > > And so on with the other 9 pairs.
>>> >>>> > > Each redis-server was configured with:
>>> >>>> > > And there's is no persistance aside from the .rdb files that are
>>> >>>> > > created
>>> >>>> > > at (1) shutdown or (2) during replication sync.
>>> >>>> > > This is an important point: our instances are almost always
>>> "full"
>>> >>>> > > and
>>> >>>> > > we're relying on the lru to evict data continuously.
>>> >>>> > > So, what happens is this:
>>> >>>> > > (1) the network between the "a" and "b" hosts becomes
>>> interrupted
>>> >>>> > > (2) the slaves in "b" lose contact with "a" and eventually
>>> timeout
>>> >>>> > > (3) the slaves in "b" decide to re-sync -- ALL AT ONCE
>>> >>>> > > (4) the redis instances in "a" each start to dump their .rdb
>>> files
>>> >>>> > > (5) since there are several going at once, the dumping to disk
>>> is
>>> >>>> > > i/o bound
>>> >>>> > > (6) the dumping takes longer than it should, which results in
>>> more
>>> >>>> > > dirty
>>> >>>> > > COW pages
>>> >>>> > > (7) the fact that we're always full and evicting keys makes #6
>>> worse
>>> >>>> > > (8) the box starts to swap, which makes #7 worse
>>> >>>> > > (9) we enter a death spiral which is hard to recover from
>>> >>>> > > However, if we were to rsync one instance at a time (we already
>>> have
>>> >>>> > > external code for this, as I mentioned), this problem doesn't
>>> occur
>>> >>>> > > and our
>>> >>>> > > instances resync pretty quickly.
>>> >>>> > > The only other solution, which sucks, is to really lower the
>>> >>>> > > max-memory on
>>> >>>> > > our instances quite a bit but that's a wasteful solution in my
>>> eyes.
>>> >>>> > > Does this help clarify what we're seeing and why I believe my
>>> >>>> > > proposed fix
>>> >>>> > > (a non-restarting replication option) would help to prevent it?
>>> >>>> > > Thanks,
>>> >>>> > > Jeremy
>>> >>>> > > On Thu, Feb 23, 2012 at 4:28 AM, Salvatore Sanfilippo
>>> >>>> > > <anti...@gmail.com>wrote:
>>> >>>> > >> Hello Pedro,
>>> >>>> > >> I agree with your analysis. Jeremy's proposal, while could be
>>> >>>> > >> actually
>>> >>>> > >> useful to mitigate the problem, does not fix the root cause.
>>> >>>> > >> I also agree about incremental resync as a solution to many of
>>> this
>>> >>>> > >> issues.
>>> >>>> > >> However I think the implementation of incremental resync
>>> should use
>>> >>>> > >> the implementation proposed here:
>>> >>>> > >> In short it uses the trick of still accumulating the output
>>> buffer
>>> >>>> > >> of
>>> >>>> > >> the slaves for some time (or for some space) while the slave
>>> is not
>>> >>>> > >> connected. Moreover there is a sliding window so that we don't
>>> >>>> > >> discard
>>> >>>> > >> the buffer sent to the slaves but take it for some time, since
>>> a
>>> >>>> > >> slave
>>> >>>> > >> may want to resync from an offset that is already flushed on
>>> the
>>> >>>> > >> socket.
>>> >>>> > >> But back to the root cause for a moment, the problem is:
>>> "currently
>>> >>>> > >> Redis does not handle well the case when multiple slaves want
>>> sync
>>> >>>> > >> at
>>> >>>> > >> once". I trust you about that, but I would understand why this
>>> >>>> > >> happens.
>>> >>>> > >> I mean, even without partial resync, Redis should handle that
>>> >>>> > >> better.
>>> >>>> > >> Full resync should just be slower, but not a DoS.
>>> >>>> > >> Redis is already optimized to do a single BSAVE on
>>> reconnection of
>>> >>>> > >> multiple slaves, so what is actually DosSing it? Maybe the
>>> multiple
>>> >>>> > >> bulk transfers generate too much I/O and we should trottle this
>>> >>>> > >> stuff?
>>> >>>> > >> Please if you have some information on this matter and how I
>>> can
>>> >>>> > >> reproduce it I would love to insert this fix into 2.6 if
>>> possible.
>>> >>>> > >> Salvatore
>>> >>>> > >> On Thu, Feb 23, 2012 at 12:18 PM, Pedro Melo
>>> >>>> > >> <m...@simplicidade.org>
>>> >>>> > >> wrote:
>>> >>>> > >> > Hi,
>>> >>>> > >> > On Wed, Feb 22, 2012 at 10:11 PM, Jeremy Zawodny
>>> >>>> > >> > <Jer...@zawodny.com>
>>> >>>> > >> wrote:
>>> >>>> > >> >> The problem is that occasionally the WAN link between the
>>> data
>>> >>>> > >> >> centers
>>> >>>> > >> is
>>> >>>> > >> >> interrupted and the slaves in data center B decide they
>>> need to
>>> >>>> > >> re-sync with
>>> >>>> > >> >> their masters. Unfortunately, *all* the instances on each
>>> slave
>>> >>>> > >> >> try to
>>> >>>> > >> do
>>> >>>> > >> >> this AT THE SAME TIME and that causes too much stress on the
>>> >>>> > >> >> masters.
>>> >>>> > >> >> Effectively, the masters are DoSd by the slaves all
>>> re-syncing
>>> >>>> > >> >> at the
>>> >>>> > >> same
>>> >>>> > >> >> time.
>>> >>>> > >> > I understand the problem this is causing your system but I
>>> >>>> > >> > believe the
>>> >>>> > >> > solution you are presenting is targeting the symptom and not
>>> the
>>> >>>> > >> > root
>>> >>>> > >> > cause of the problem.
>>> >>>> > >> > There is one event here that triggers a chain of two
>>> >>>> > >> > problems/symptoms:
>>> >>>> > >> > * Event: connectivity loss between master and slave;
>>> >>>> > >> > * Problem 1: slave needs full re-sync with master;
>>> >>>> > >> > * Problem 2: N slaves doing this at the same time will