Using Redis cluster with no slaves

3,134 views
Skip to first unread message

Paul S

unread,
Aug 14, 2015, 3:24:58 AM8/14/15
to Redis DB
I tried to setup a Redis (3.0.2) Cluster of 3 nodes with masters only. I don't need data replication and therefore no need for synchronization to slaves (but would like to use Redis cluster/Sentinel features and be ready to enable replication if needed).

I killed one of the masters hoping to see the other 2 master nodes start serving failed node's key slots. It didn't happen. The cluster is in failed state and doesn't recover.

Is this expected behavior? Is there a way to have a viable Redis cluster with masters only?

Thank you,
Paul

Jan-Erik Rediger

unread,
Aug 14, 2015, 3:54:55 AM8/14/15
to redi...@googlegroups.com
Yes, this is expected behavior.
Only slaves can take over, as they have a replica of the same slots.
Other masters serve different slots and therefore can't take over for
the slots of the killed master.

There is no way to have a viable Cluster without attached replicas.
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

Javier Guerra Giraldez

unread,
Aug 14, 2015, 11:32:00 AM8/14/15
to redi...@googlegroups.com, Jan-Erik Rediger
On Fri, Aug 14, 2015 at 2:54 AM, Jan-Erik Rediger <jan...@fnordig.de> wrote:
> Only slaves can take over, as they have a replica of the same slots.
> Other masters serve different slots and therefore can't take over for
> the slots of the killed master.


I guess Paul is OK with losing 33% of stored data and keep working on
reduced capacity. Maybe the whole cluster is only used as a cache?

--
Javier

Paul S

unread,
Aug 15, 2015, 12:18:28 AM8/15/15
to Redis DB
Thank you for your replies.

Yes, it is intended to be a fast cache, not a data store, and losing 1 shard of data is OK.

sha...@peer5.com

unread,
Aug 17, 2015, 9:31:35 AM8/17/15
to Redis DB
I'm really interested in this feature as well, because when Redis is used as a cache you don't care about data loss

Paul S

unread,
Aug 17, 2015, 11:49:03 PM8/17/15
to Redis DB
I would be really nice if one can turn replication on/off via configuration.

The Baldguy

unread,
Aug 18, 2015, 1:33:55 AM8/18/15
to Redis DB
I've worked with many who *used* to think that way. To say that you don't care about data in a cache is to apply a minor case to a general one. Take, for example, using it as a cache to ease load on a database, such as an SQL one. When you get to the point the cache truly matters and suddenly you have no data in your cache you can easily watch your SQL servers fall over because they can't handle the load you used the cache for. This has, and continues, to take out major sites. I've also seen this behavior in the "oh we have issues, just flush the cache" mindset which produces the same problem for the same reasons. The cache gets cold and the backend databases overheat.

It isn't the data which often matters in a caching scenario but the *availability* of that data and the load you are avoiding by using the cache. If the load is low enough that tossing your cache entirely is a reasonable option, I'd question the need for the cache in the first place. If you know or expect that such load will require the use of a cache, you should realize that the loss of the cache will mean a cascading failure is highly likely to occur. When that happens you will wish you had replication. ;)

Frankly if you truly have no concern for backend servers (perhaps you are only caching transient generated data), and don't care for replication, your complexity level is severely lowered by using a client library which does the sharding and simply throwing redis instances at it. Alternatively, you can use twemproxy which essentially does that for you and doesn't need replication, redis cluster, or hobbled variants of Redis Cluster.

Cheers,
Bill

Paul Sydell

unread,
Aug 18, 2015, 11:41:16 AM8/18/15
to redi...@googlegroups.com
Good point about resources overloading Baldguy.  Perhaps a few more details would explain better what I'm after.

Our use case is a fast cache not a data store. Performance is the first priority, data availability is coming next. 
Currently, we use 30+ standalone Redis servers and our clients do sharding using consistent hashing. No proxies to avoid extra network hop.  In case of a single host failure we loose 1/30th of the cached data. Overall, the load is evenly spread, no major spikes.

I started looking at the Redis Cluster as an alternative to our existing deployment to introduce replication for data consistency.
There are 2 concerns I have:
1. Performance - haven't tested it myself yet but found some people report noticeable throughput/latency degradation comparing to a single instance. I get this as a compromise between consistency and performance and I hoped to get the Redis cluster running for now without replication to get cluster performance on par with standalone instances.
2. The whole cluster is unresponsive during failovers (not just affected master/slave pair) - this may be a showstopper for us. I tested a cluster with 3 masters and 3 slaves on a single host with continuous set/get operations of a single cluster client. I see thousands of operations fail during the time failover is taking place after killing a master even though the 2/3 of keys in this test case are routed to the other 2 masters. This is what will cause a spike on other resources you brought up and may cause a lot of grief during high load. 

Any suggestions?

Bill Anderson

unread,
Aug 18, 2015, 10:41:40 PM8/18/15
to redi...@googlegroups.com




On Aug 18, 2015, at 10:41, Paul Sydell <psy...@gmail.com> wrote:

Good point about resources overloading Baldguy.  Perhaps a few more details would explain better what I'm after.

Our use case is a fast cache not a data store. Performance is the first priority, data availability is coming next. 
Currently, we use 30+ standalone Redis servers and our clients do sharding using consistent hashing. No proxies to avoid extra network hop.  In case of a single host failure we loose 1/30th of the cached data. Overall, the load is evenly spread, no major spikes.
Nice. :)

I started looking at the Redis Cluster as an alternative to our existing deployment to introduce replication for data consistency.
A reasonable exploration. 

There are 2 concerns I have:
1. Performance - haven't tested it myself yet but found some people report noticeable throughput/latency degradation comparing to a single instance. I get this as a compromise between consistency and performance and I hoped to get the Redis cluster running for now without replication to get cluster performance on par with standalone instances.

It is unlikely you'll get on-par performance. Cluster is doing the work of sharding your clients are doing so server-side performance will necessarily be lower based solely on that - replication or not. Further there is the topology management and tracking which will also have a small penalty a single instance won't have. This would be the case even without replication. 

How much that penalty is I can't yet say either though I am working on it. But if you're in need of high throughout and low latency I'm certain you'll see it. Of course, how much it would affect your systems I can not say. Indeed, there should be some trade off with not running slot calculations client side, but I doubt it enough to to counter the full effects (after all, clients are often more concurrent than a server). 

2. The whole cluster is unresponsive during failovers (not just affected master/slave pair) - this may be a showstopper for us. I tested a cluster with 3 masters and 3 slaves on a single host with continuous set/get operations of a single cluster client. I see thousands of operations fail during the time failover is taking place after killing a master even though the 2/3 of keys in this test case are routed to the other 2 masters. This is what will cause a spike on other resources you brought up and may cause a lot of grief during high load. 

Agreed that doesn't sound good. That said I'd try the test on different machines or at least different VMs. Sometimes doing this type of testing on a single machine introduces issues and errors which don't happen on multiple machines.  It certainly needs more examination and if possible logs. I'd be up for trying to replicate it so we can get to the bottom of it. 


Any suggestions?

Frankly from what you have described I'd retain the client side hashing/sharding, for now, but add a sentinel constellation and use sentinel connection management if and when you want or need the availability. Indeed if you design it right you can use sentinel to act as "hash slot" discovery meaning you could have expandable sharding by mapping slots to a name used to define the pod in sentinel. Of course with some judicious coding you could do that without replication at all. I've done it and it works out I pretty nicely. 

Using diskless replication should minimize replication based latency - which should mostly only exist during unsynced slave connections. 

Alternatively, and if your command set allows, I'd look at using twemproxy. It is a proxy, yes. However, you might be able to account for the hop latency by running them in a one to one ratio with backend nodes and using some form of load balancing to have each serving a lower number, possibly even running a twemproxy locally to each  redis node, talking to its local instance over a Unix socket (I seem to recall twemproxy adding that capability). But it wouldn't necessarily be a good path to replication. 

That said I think your best bet is to improve your node management and retain client-side hashing for now, while measuring the direct effects of replication in a basic pod setup and compare to your requirements. Ultimately to achieve the availability requirement replication will be needed. Thus, minimizing this would be your best case. 

Perhaps with effort we can isolate and quantify the reported degradation in performance and either correct it or at least clarify when it is a consideration. And of course, isolate and correct the problem you're seeing with the cluster failure. 

Cheers,
Bill



On Mon, Aug 17, 2015 at 10:33 PM, The Baldguy <ucn...@gmail.com> wrote:
I've worked with many who *used* to think that way. To say that you don't care about data in a cache is to apply a minor case to a general one. Take, for example, using it as a cache to ease load on a database, such as an SQL one. When you get to the point the cache truly matters and suddenly you have no data in your cache you can easily watch your SQL servers fall over because they can't handle the load you used the cache for. This has, and continues, to take out major sites. I've also seen this behavior in the "oh we have issues, just flush the cache" mindset which produces the same problem for the same reasons. The cache gets cold and the backend databases overheat.

It isn't the data which often matters in a caching scenario but the *availability* of that data and the load you are avoiding by using the cache. If the load is low enough that tossing your cache entirely is a reasonable option, I'd question the need for the cache in the first place. If you know or expect that such load will require the use of a cache, you should realize that the loss of the cache will mean a cascading failure is highly likely to occur. When that happens you will wish you had replication. ;)

Frankly if you truly have no concern for backend servers (perhaps you are only caching transient generated data), and don't care for replication, your complexity level is severely lowered by using a client library which does the sharding and simply throwing redis instances at it. Alternatively, you can use twemproxy which essentially does that for you and doesn't need replication, redis cluster, or hobbled variants of Redis Cluster.

Cheers,
Bill

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/kzY-0U2Abvs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.

Paul Sydell

unread,
Aug 20, 2015, 12:44:16 AM8/20/15
to redi...@googlegroups.com
Bill,
Thanks for the input. Good points.
I'll test a cluster setup on 3 hosts. I also suspect my test client code isn't recovering properly when one of masters is going down. Will fix it and let you know.
Paul

xusha...@163.com

unread,
Aug 24, 2015, 10:41:40 AM8/24/15
to Redis DB
of course,it's a expected behavior!, if you want used redis-cluster with masters only, you should ensure all of the masters are always good running, where some of the masters are offline, then migrate the slots which are in the failed nodes to the remaining masters, so that you can continued use the redis-cluster, otherwise , cluster will be failed! 


在 2015年8月14日星期五 UTC+8下午3:24:58,Paul S写道:
Reply all
Reply to author
Forward
0 new messages