Redis multi master proof of concept (instead of cluster) feedback?

1,815 views
Skip to first unread message

Adam Liverman

unread,
Sep 27, 2014, 12:26:43 PM9/27/14
to redi...@googlegroups.com
I've been playing around with redis (linux and msopentech), with monitor, replication, etc and decided to try a proof of concept.

The concept was multi master async replication.

Initial tests seem to work quite well, with pretty decent performance. 

Node1, Node2, Node3

Write to any, will be replicated to all. Circular replication is dealt with as long as you follow a few simple rules.

Writes on my test machines go from around 70k/s, to about 58k/s, so there is  a penalty.

This is horribly alpha, but I am encouraged by the results at least.

I have planned out node downtime (using the buffering it already does), and new node/replacement. (clone/manipulate buffer on nodes)

Key expires are also replicated, but do have an issue with node downtime/replacement though.  I  have an idea or two on how to mitigate the issue.

Note, I've only tried simple payloads so far.

Conflict resolution would be handled similar to how you handle it in MySQL replication master/master.

For any concurrent issues (if they bother your application), With a load balancer, create three pools. (with custom monitors)

Redis-Read
Redis-Write
Redis-All

Redis-Write would be a fail over mode. so Nod1->Node2->Node3.
Redis-Read would be all the same priority, or something like Node2|Node3 - > Node1 (since it is handling writes)

Anything you need to immediately read after you write, go to "Redis-Write"  (for read or write)

For slower moving data, that you want to read, or concurrency/latency isn't an issue for your writes, go to "Redis-Read" or "Redis-All" depending on your situation.

Feedback? This worth spending my time on finishing? 




Josiah Carlson

unread,
Sep 28, 2014, 1:28:04 AM9/28/14
to redi...@googlegroups.com
If you could make it work, someone would likely pay you money for it. Hell, I'd buy it. :)

 - Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Paul L

unread,
Sep 29, 2014, 4:46:55 AM9/29/14
to redi...@googlegroups.com
Combined with a layer3 or layer4 load balancer, you could really be onto something.  if you get it working, i would use this over redis cluster 

Dvir Volk

unread,
Sep 29, 2014, 5:02:30 AM9/29/14
to redi...@googlegroups.com
It can also be combined with a hash ring based partitioning client, that balances writes per key and not randomly, making the chance for conflicts virtually non existent unless there's some network issue. 

Salvatore Sanfilippo

unread,
Sep 29, 2014, 5:10:59 AM9/29/14
to Redis DB
Hello Adam,

"multi-master" is just a different way to say: a distributed system
where you can read/write to multiple nodes.

I wonder what is the way data is replicated: synchronous or
asynchronous. What happens during netsplits, how conflicts are
resolved on rejoin, what are the safety and liveness properties of the
system, and so forth.

Regards,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

"One would never undertake such a thing if one were not driven on by
some demon whom one can neither resist nor understand."
— George Orwell

Dvir Volk

unread,
Sep 29, 2014, 7:17:00 AM9/29/14
to redi...@googlegroups.com
Salvatore, isn't it possible to just tune the parameters of a redis cluster to essentially function as N monolithic masters replicating everything to everyone?


Salvatore Sanfilippo

unread,
Sep 29, 2014, 11:11:52 AM9/29/14
to Redis DB
On Mon, Sep 29, 2014 at 1:16 PM, Dvir Volk <dv...@everything.me> wrote:
> Salvatore, isn't it possible to just tune the parameters of a redis cluster
> to essentially function as N monolithic masters replicating everything to
> everyone?

It is basically not possible, without to incur into huge limitations.
Let's start with a simple model, we have masters A, B, C, and each
other replicate to each other in an asynchronous way.
Let's assume no partition is possible and everything is perfectly
connected, no process ever fails, and so forth.

Client C1 writes to A: RPUSH mylist first
... nothing happens for a few seconds ...
Client C1 writes to A: RPUSH mylist foo
Cleint C2 writes to A: RPUSH mylist bar

There are only two ways to fix this consistency issue:

1) Implement blocking replication and an agreement protocol: not
viable in the case of Redis for latency concerns.
2) Implement a continuous form of conflict resolution, since in the
proposed multi-master setup, there is not even AFAIK a "preferred"
master for a given subset of keys. So you can have Client1 appending
elements to list mylist in Node1, and Client2 appending elements to
the list myliset in Node2. Basically you need a lot of meta-data per
element, and continuously fixing the generated conflicts.

There is very little to gain from a setup like that. If you want an
convergent system with asynchronous replication, it is anyway much
better that the preferred node is a single one, and is used as long as
there are no failures/partitions happening right now. Merging values
to converge again should be the exception of inconsistencies created
because of failures.

So if you tell me, I want to fork Redis in order to change the merge
function of Redis Cluster, so that instead of being "the key of the
latest elected master wins" is "keys are merged using CRDTs or
whatever", I understand your effort, but a multi master as in: all the
nodes have the same data, each node is *normally* able to reply to any
kind of query, being it a write or a read, does not make sense in the
context of asynchronous replication, and even makes stuff more complex
in the context of synchronous replication (so, for example, Raft has a
single leader per epoch and a new one is elected only if the old
leader is not available).

However it is fun to talk about distributed systems and the different
designs, but my original message was a WTF call about, if you are
making a proposal to change or implement a distributed system, to
state at least the basics of what you are doing, in terms of
availability and safety guarantees: how data is replicated? What
happens during failures? How are writes processed? And so forth.

Salvatore

Salvatore Sanfilippo

unread,
Sep 29, 2014, 11:12:55 AM9/29/14
to Redis DB
On Mon, Sep 29, 2014 at 5:11 PM, Salvatore Sanfilippo <ant...@gmail.com> wrote:
> Client C1 writes to A: RPUSH mylist first
> ... nothing happens for a few seconds ...
> Client C1 writes to A: RPUSH mylist foo
> Cleint C2 writes to A: RPUSH mylist bar

Sorry here I'm assuming that B and C gets the foo and bar element in
inverted order, so you end with "first foo bar" and "first bar foo".

Ajit N Deshpande

unread,
Sep 29, 2014, 11:47:55 AM9/29/14
to redi...@googlegroups.com
FWIW - circular replication in a ring would be very welcome for my
implementations (cache in front of mysql with 4 redis nodes in 2
different DCs). The minor performance impact would be acceptable. True
multi-master as in each node in a cluster replicates to/from every
other node is not necessary.

Thanks,
Ajit

Salvatore Sanfilippo

unread,
Sep 29, 2014, 12:02:21 PM9/29/14
to Redis DB
Ring replication, if every master should be able to process writes,
has the same identical issue. One node may initiate a non-commutative
write targeting a given key before an early write performed in another
node, targeting the same key, is circularly replicated to it.

Ajit N Deshpande

unread,
Sep 29, 2014, 12:09:57 PM9/29/14
to redi...@googlegroups.com
And I would be OK with those risks. We use similar topology for the
back-end Mysql systems and the consistency trade-offs we make for the
availability are acceptable. So, if its technically not too complex to
support a ring topology with the only caveat being strict guarantees
about consistency, then I would love for Redis to support it. Even if
it means that users need to configure a giant "i_know_what_i_am_doing
= 1" in the config file.

Thanks,
Ajit

echo ma

unread,
Sep 29, 2014, 9:50:04 PM9/29/14
to redi...@googlegroups.com
I want this so much, i would like to use multi-master redis rather than redis cluster.

在 2014年9月28日星期日UTC+8上午12时26分43秒,Adam Liverman写道:

Adam Liverman

unread,
Sep 29, 2014, 11:56:30 PM9/29/14
to redi...@googlegroups.com
Wow, this became a bit more than I was expecting. Thanks for the weigh in Salvatore, I do appreciate it.


Soo... for my reply, hopefully I can answer your questions. 

 (I put this together fairly quickly, so please forgive any errors, and I hope it looks correct once I post)


Network Partition / Split Brain
Conflict resolution / Concurrency issues  / ACID 
Failure resolution



Network Partition /Split brain
--------------------------------------

There is just no good way to deal with this imo. Always messy/complicated. To get around it I've seen attempts where there is a cable connecting the two boxes for the heartbeat/information exchange.

While this is a good way to make splits unlikely, not a very scale-able approach

Way we tend to do it is

Switch1-----------Switch2

A                        A
B                        B
C                        C

Network cards are setup in a fail-over setup. (two ports).

All traffic in A, B , C go through switch 1, unless switch one dies, or the network card does. then through switch 2.
DNS is always done via HOST (if not just hardcoded IP Addresses)

If Switch 2 dies, oh well.
If swtich 1 dies, failover to switch2.

If a port on switch 1 dies, that client will route to swtich 2, which will and still be able to connect to the members. 

So my idea is to just make network partition difficult to happen.  (though it still can, which would be painful)


Conflict resolution / Concurrency issues  / ACID 
----------------------------------------------------------------
Another one of the  crap situations. Only way to deal with this is to get the equivalent of a shared lock (on multiple nodes this is somewhat of a pain), write your data, and release the lock.

In this case, for the most part I'm ignoring it and mainly dealing with it via application logic and load balancer rules. (to deal with it would a complicated and slow process)

I did say in my original post though, I said if you wanted concurrency, you would have to have a write pool, in your load balancer (I use F5 load balancers in my work so I'm used to having them available)

(forgive the drawing, made a quick image off a google doc drawing, which I am unaccustomed to using)
(FYI . A note on the drawing, you see a "Redis-Replicator" outside the main process doing status notifications to the load balancer and the replication. I did that to keep the modification of Redis at a bare minimum)

In this illustration Node1 is the primary write, node 2 would be a fail-over situation only.  When node1 is flagged out of production, all connects are immediately sent a reset bit or what not, and now node2 is the only available write node.

For *consistent* reads, you would go to the primary write pool, as well as all writes that you want to 'consistently' write to. (sometime you just don't care, such as incrementing a counter or what not)
For eh, close enough , data can go to the 'read pool'.

This is very similar to the current "Master-> Slave" setup redis supports already, except Node2 is already a master on Node1's failure, and Node1 is already s slave when it comes back up.

When Node1 is done, it won't let itself come back in, unless it is 'up to date' with Node2

Now, is there an ability to lose consistency? You bet. If you don't follow the rules it could be very bad for you, but if you follow the rules for  the *most* part things should stay consistent. (I've worked with mysql master/master for awhile now and understand it can have some got-ya's that can bite you if you are not careful and design your application around them)  





Failure resolution/ Replacing node
--------------------------------------------------------------

Nodes, have buffers for each "partner node". As long as that buffer isn't exhausted everything is fine. Aka, bring a node down for patching, bring it back up it will stay out of production due to the status monitor, but start getting messages from the other nodes. 
(generally this will only be from the write node). If a node is out too long as it exhausts its buffer, that node is 'voted out' by one of the other two nodes and will not be allowed to flag 'ok' without manual intervention. 

if node1 *dies* while node 3 is out, node three when coming back in will not be able to 'rejoin' as one of the other nodes is not there as well unless manual intervention. IMO this is a almost a worst case scenario, as you will have to bring down Node2, and copy data to node 3 and reset replication. This is downtime for whatever application there is. This can possibly be mitigated by writing the buffer's to Disk, though that will slow down writes considerably. Copying a flat file to one node to another shouldn't be that long though. 


And as a little status update, I've replicated pretty much any type of payload (string/binary)

After a little work, and with pipe-lining i can get 

Node1 <= 186k/s running benchmark of  1 million set operations

Redis-Replication  <= 100k/s replicating while Node1 is getting writes, then 220+k/s once the writes are done. (need to investigate this)

Node2 <= receives at 100k/s then 220k/s


Now, this was only a small weekend project, and I am here to learn and if what I am trying to do will ultimately fail, I'd rather hear from the experts and not waste my time :)

Salvatore Sanfilippo

unread,
Sep 30, 2014, 10:54:38 AM9/30/14
to Redis DB
We are forced to assume crappy networks and likely partitions, since you run stuff on virtualized cloud environments where we have no actual control on what is going to happen, so I understand there are real-world ways to limit the probability a particularly dangerous partition happens, but that would limit in a serious way the applicability of the solution.

But after reading this, my big question is, why not Redis Cluster instead? You have better scalability since it offers automatic sharding between N masters, provides protection against network partitions because isolated masters, even in "odd" partitions where they get isolated with writing clients, have a time limit to accept writes and become unavailable ASAP, and have a clear semantic about what happens with data (master with greater configEpoch, whcih is, the last that failed over a given slot, eventually wins).

Salvatore


"Fear makes the wolf bigger than he is."
       — German proverb

Adam Liverman

unread,
Sep 30, 2014, 11:15:11 PM9/30/14
to redi...@googlegroups.com
Q: my big question is, why not Redis Cluster instead?

I could have a lack of understanding, so if I am wrong please correct me!

Redis Cluster 
--------------------
From my understanding, Redis cluster works similar to how hard drives work on in RAID.

For the sake of keeping it simple, assume only six slots, each node has 1 gig of ram, and only two nodes.

Raid-0 configuration.

A (1,2,3)  - 1g
B (4,5,6)  - 1g

Total RAM = 2G

Connect to A, with a key that is slotted for B, get a redirect to B. Connect to B give information.
Connect to B, looking for a key in A, get redirect to A, Get information from A

This assumes you have full connections to A/B

Lose B, lost cluster.

This setup has no redundancy, so we go to the next level, Raid 10.
 
Raid-10 configuration.

A   (1,2,3) - 1g
A1 (1,2,3)

B   (4,5,6)  - 1g
B1 (4,5,6)


Total Ram - still 2G

This will require 4 OS instances (or should, I am not worried about instance crashing but more like hardware failure)

Positives:  more nodes, more RAM. Conflict resolution minimized as Keys stay on one where the single threaded nature can keep things consistent.
(again correct me if I am wrong)
Negatives: 4 instances to be redundant, Need direct access to all nodes (instead of behind a single IP on a load balancer), client side logic to make a lot of the magic work efficiently, very large data structures on one key is only on one node or set of slaves so read/write can be lopsided


Master/Master is more like, Raid -1 

For sake of simplicity, assume only two nodes.

A(1,2,3,4,5,6)  -1g
B(1,2,3,4,5,6)

Total ram - 1G, but only requires 2 nodes for redundancy, with options of even more for availability (2 +N nodes)

Positives: Can be hosted behind a single IP on a load balancer. Fewer nodes for redundancy. (Three being quite optimal), can lose all nodes except for 1 and still be "OK"
Negatives: RAM usage, just like Raid-1 on hard drives only have the total space of one of the nodes. Conflict resolution/Consistency can be a pain unless you take great care with load balancer pools/application logic  as replication is async.
(aka multi-threading concurrency in normal application programming)  and slow networks could be a problem maybe?

Although the negatives sound bad, most can be mitigated (RAM is cheap, networks can be fast+ reliable, conflict resolution depends on your problem domain and can use write pool instead)

Negatives in cluster are smaller, but are not quite as easy to mitigate?

Redis Cluster seems to be an excellent solution to the problem of redundancy and scalability, though (at least so far)  I think a master/master setup may have some merits too.

Example: 

At my work, we have three tiers of servers.

WebServers
----------------
Application Servers
--------------------------
Database Servers

each tier is fully redundant (2 nodes min for any service, usually 3-4+), so that only one node in each tier needs to be alive for everything to be OK. Each layer publishes a virtual server for each given "service". (databases, custom services, website, etc). Having direct access to nodes is generally a 'no-no', thus a master/master setup in these situations seems more desirable for redundancy

Your thoughts Salvatore? (again I appreciate your time in this matter)

Matt Stancliff

unread,
Oct 1, 2014, 7:54:40 AM10/1/14
to redi...@googlegroups.com

On Oct 1, 2014, at 4:15 AM, Adam Liverman <live...@gmail.com> wrote:

> This will require 4 OS instances (or should, I am not worried about instance crashing but more like hardware failure)
>
> Positives: more nodes, more RAM. Conflict resolution minimized as Keys stay on one where the single threaded nature can keep things consistent.
> (again correct me if I am wrong)
> Negatives: 4 instances to be redundant, Need direct access to all nodes (instead of behind a single IP on a load balancer), client side logic to make a lot of the magic work efficiently, very large data structures on one key is only on one node or set of slaves so read/write can be lopsided

Correct on all counts. Redis Cluster is based around Redis design principles of being fast + accurate + highly available. The only way to guarantee those conditions is through increased exact-copy redundancy.

> Total ram - 1G, but only requires 2 nodes for redundancy, with options of even more for availability (2 +N nodes)
>
> Positives: Can be hosted behind a single IP on a load balancer. Fewer nodes for redundancy. (Three being quite optimal), can lose all nodes except for 1 and still be "OK"
> Negatives: RAM usage, just like Raid-1 on hard drives only have the total space of one of the nodes. Conflict resolution/Consistency can be a pain unless you take great care with load balancer pools/application logic as replication is async.
> (aka multi-threading concurrency in normal application programming) and slow networks could be a problem maybe?

Correct again!

> Although the negatives sound bad, most can be mitigated

Let’s see where this goes...

> (RAM is cheap,

Okay, if you run your own hardware (but who does that anymore?) You can buy 1U a machine with 1 TB RAM for under $20,000 these days.

> networks can be fast+ reliable

I fell off the couch laughing at that one.

Try a refresher course in http://queue.acm.org/detail.cfm?id=2655736 and http://en.wikipedia.org/wiki/Fallacies_of_distributed_computing

> conflict resolution depends on your problem domain and can use write pool instead)

I’m not sure what a write pool is in that scenario?

The _only_ reliable way to have multi-master operations is if each key in your system has a conflict resolution function. [fun fact: git is a distributed system. When the dataset diverges, the conflict resolution policy is a merge commit.]

You can either have simple conflict resolution policies like “last write wins” or more complex ones like CRDTs that say “if two sets diverge, union the sets.”

Then, on top of all that, you need merkle trees to resolve diverging keyspaces and live gossiping of metadata to perform continuous cleanup operations.

By that point, you’ve just reinvented Dynamo/Riak.

> Redis Cluster seems to be an excellent solution to the problem of redundancy and scalability, though (at least so far) I think a master/master setup may have some merits too.

It does have merits, but anything multi-master is also the domain of thousands of research papers and dozens of person-years of actual engineering+debugging work.

> thus a master/master setup in these situations seems more desirable for redundancy

If the integrity of your data matters, Redis Cluster will give you the best results. If only "redundancy of server processes” matters, then you can run a master-master DB without built in conflict resolution operations that will eventually corrupt your data silently, but your processes will still remain fully redundant.


-Matt

Adam Liverman

unread,
Oct 1, 2014, 3:42:16 PM10/1/14
to redi...@googlegroups.com
Thank you for your post and appreciate your feedback, but I think we are missing each other on what I was originally trying to achieve. 

Conflict resolution
----------------------------------
Sorry if I haven't been clear. From my first post.

<firstPost>
For any concurrent issues (if they bother your application), With a load balancer, create three pools. (with custom monitors)

Redis-Read
Redis-Write
Redis-All

Redis-Write would be a fail over mode. so Nod1->Node2->Node3.
Redis-Read would be all the same priority, or something like Node2|Node3 - > Node1 (since it is handling writes)

Anything you need to immediately read after you write, go to "Redis-Write"  (for read or write)

For slower moving data, that you want to read, or concurrency/latency isn't an issue for your writes, go to "Redis-Read" or "Redis-All" depending on your situation.
</firstPost>

In *NO* way am I proposing a  MASTER/MASTER where it okay to write to *any* master and expect conflict resolution to take place. It Depends on your data and what you are trying to do.
For the most part it is done just like it is done now with MASTER->SLAVE->SLAVE. 

For writes it actually looks more like this what I am proposing

MASTER<->SLAVE1<->SLAVE2

You only write to master unless there is a failure/downtime/patching that load balances to SLAVE1 for writes.

All reads would also go there where "concurrency matters"(aka, write something and immediately read it).  You have an option though to write to SLAVE1 or SLAVE2 if your application doesn't care (counters for instance, or unique views of an email, or whatever).

Then the master comes back up, it gets the updates it missed, and becomes master once more. (load balancer moves traffic to it)


RAM is Cheap
------------------------

Well, I do believe ram is cheaper than buying similar CPU/other hardware.

IE. 

A  - $20,000 1TB RAM
B - $20,000 1TB RAM

vs

A  - ??? 512GB RAM
A1 - ??? 512GB RAM

B - ?? 512GB RAM
B1 ?? 512GB RAM

Both setups have the same RAM total, setup 1 will not be lop sided and can use all of the ram available.

I do believe setup 2, would cost more (as well as more maintenance costs).. but I haven't done the math.

Now, I will admit setup 2 will scale a hell of a lot better. (if you have crazy requirements)



Network reliability
--------------------------------
Now I do think this was somewhat taken out of context. 

I in no way think networks are reliable...I read the study and it was like having a studying saying "water is wet". Of course it is.

Each device has a certain reliability factor, more devices you add more unreliable it becomes. 1+1 = 2.

What I was talking about is in an earlier post about network partitions on a much smaller scale of just two switches, between 3 Redis nodes.

<previousPost>
Network Partition /Split brain
--------------------------------------

There is just no good way to deal with this imo. Always messy/complicated. To get around it I've seen attempts where there is a cable connecting the two boxes for the heartbeat/information exchange.

While this is a good way to make splits unlikely, not a very scale-able approach

Way we tend to do it is

Switch1-----------Switch2

A                        A
B                        B
C                        C

Network cards are setup in a fail-over setup. (two ports).

All traffic in A, B , C go through switch 1, unless switch one dies, or the network card does. then through switch 2.
 
If Switch 2 dies, oh well.
If Swtich 1 dies, failover to Switch2.

If a port on switch 1 dies, that client will route to swtich 2, which will and still be able to connect to the members.
</previousPost>

Am I wrong in assuming this can be considered reliable?

Having said all this, I am now sure if I should even continue the project,.Not that it cannot be done as it is actually fairly easy, but that there is a possibly of people getting burned by not understanding and then would look unfavorably to Redis. (eg, like people complain sometimes about MySQL, when they tried to use a Non-ACID storage engine and complain they lost data and think MySQL sucks.. possibly a bad example but you get the point). 

I like the Redis project (though I am new to it), and would love to use it in my production environment, but it had some thorns on it that made it somewhat distasteful, so I was trying to resolve them. :\

Josiah Carlson

unread,
Oct 1, 2014, 6:59:50 PM10/1/14
to redi...@googlegroups.com
Hopping in here for a second...

On Wed, Oct 1, 2014 at 12:42 PM, Adam Liverman <live...@gmail.com> wrote:
Thank you for your post and appreciate your feedback, but I think we are missing each other on what I was originally trying to achieve. 

Conflict resolution
----------------------------------
Sorry if I haven't been clear. From my first post.

<firstPost>
For any concurrent issues (if they bother your application), With a load balancer, create three pools. (with custom monitors)

Redis-Read
Redis-Write
Redis-All

Redis-Write would be a fail over mode. so Nod1->Node2->Node3.
Redis-Read would be all the same priority, or something like Node2|Node3 - > Node1 (since it is handling writes)

Anything you need to immediately read after you write, go to "Redis-Write"  (for read or write)

For slower moving data, that you want to read, or concurrency/latency isn't an issue for your writes, go to "Redis-Read" or "Redis-All" depending on your situation.
</firstPost>

In *NO* way am I proposing a  MASTER/MASTER where it okay to write to *any* master and expect conflict resolution to take place. It Depends on your data and what you are trying to do.
For the most part it is done just like it is done now with MASTER->SLAVE->SLAVE. 

For writes it actually looks more like this what I am proposing

MASTER<->SLAVE1<->SLAVE2

You only write to master unless there is a failure/downtime/patching that load balances to SLAVE1 for writes.

If there is failure/downtime on the master, you don't want to write to either of the slaves because 1) just because slave X can't reach the master doesn't mean that some other client can't connect to master, 2) sane* conflict resolution is not generally possible with the structures that Redis uses and the semantics that Redid guarantees.

Also, as a general 'load balancing' consideration... unless you are running at the limit of your connection capability in Redis, writing to the slave to 'load balance' doesn't make any sense. Because conflict resolution isn't really possible, the commands all have to be executed on the master for their results to make sense (assuming a nonzero replication delay, which is *always* the case), so the slave would just be forwarding calls to the master and buffering the results to the client.

All reads would also go there where "concurrency matters"(aka, write something and immediately read it).  You have an option though to write to SLAVE1 or SLAVE2 if your application doesn't care (counters for instance, or unique views of an email, or whatever).

I'm not sure this really makes sense, given the lack of conflict resolution, etc. Unless the intent is to say, "on writes, slaves act like twemproxy forwarding to the master, but allow for local reads".

Then the master comes back up, it gets the updates it missed, and becomes master once more. (load balancer moves traffic to it)

This is never guaranteed.

RAM is Cheap
------------------------
[snip] 
Both setups have the same RAM total, setup 1 will not be lop sided and can use all of the ram available.

I do believe setup 2, would cost more (as well as more maintenance costs).. but I haven't done the math.

Now, I will admit setup 2 will scale a hell of a lot better. (if you have crazy requirements)

Costs are a function of provider. Unless you find providers with a setup to compare prices, it's just hand waving (I know of a provider that rents boxes with up to 6 TB of memory, for those truly in the market).


Network reliability
--------------------------------
Now I do think this was somewhat taken out of context. 

I in no way think networks are reliable...I read the study and it was like having a studying saying "water is wet". Of course it is.

Each device has a certain reliability factor, more devices you add more unreliable it becomes. 1+1 = 2.

What I was talking about is in an earlier post about network partitions on a much smaller scale of just two switches, between 3 Redis nodes.

When designing a system to be used in real life, you can't really assume trivial topologies. It's good to start, but you have to operate under the assumption that people are going to try to do really weird stuff, and you are going to have to write something that operates reasonably sane in the face of user insanity.


<previousPost>
Network Partition /Split brain
--------------------------------------

There is just no good way to deal with this imo. Always messy/complicated. To get around it I've seen attempts where there is a cable connecting the two boxes for the heartbeat/information exchange.

While this is a good way to make splits unlikely, not a very scale-able approach

Way we tend to do it is

Switch1-----------Switch2

A                        A
B                        B
C                        C

Network cards are setup in a fail-over setup. (two ports).

All traffic in A, B , C go through switch 1, unless switch one dies, or the network card does. then through switch 2.
 
If Switch 2 dies, oh well.
If Swtich 1 dies, failover to Switch2.

If a port on switch 1 dies, that client will route to swtich 2, which will and still be able to connect to the members.
</previousPost>

Am I wrong in assuming this can be considered reliable?

Yes. But not for the reasons you expect. You can't consider it to be reliable because it is an *example* of a possible network topology, and the number of people who are going to be running a reasonable equivalent to that topology in reality is tiny compared to the number of people who are going to be running something completely different. Targeting this one topology as an expectation of what people will be running on is ignoring much/most of the actual client base.

Now if this is *your* topology, and you are building a product for *your* topology, then that is a different story.


Having said all this, I am now sure if I should even continue the project,.Not that it cannot be done as it is actually fairly easy, but that there is a possibly of people getting burned by not understanding and then would look unfavorably to Redis. (eg, like people complain sometimes about MySQL, when they tried to use a Non-ACID storage engine and complain they lost data and think MySQL sucks.. possibly a bad example but you get the point).

People know how regular Redis with slaves works. Some people have been reading about and/or using Redis cluster to know more or less how cluster works. Unless the general semantics of your solution are close to the existing semantics, it will indeed be confusing. But if you can explain what you are doing using the vocabulary already known to Redis users, it might not be confusing.


I like the Redis project (though I am new to it), and would love to use it in my production environment, but it had some thorns on it that made it somewhat distasteful, so I was trying to resolve them. :\

You may want to start a new thread that describes the problems you were facing. There are more than a few of us that are likely to be able to help you.


 - Josiah

Adam Liverman

unread,
Oct 1, 2014, 10:29:37 PM10/1/14
to redi...@googlegroups.com
Thank you Josiah, appreciate your feedback, and agree with you on pretty much all cases.

After giving it a lot of thought, I have come to the conclusion that master/master async is probably a bad idea just due to data inconsistency.  Load balancers can help with it, by funneling all clients to only one  node at any given time, but there is a period before and after a failover (assuming perfect network) that there is a chance of inconsistent data pretty much no matter what due to Async nature. Like running around with a knife in your hand, it will work most of the time until you kill someone.  

So far that I see, if you want to keep data consistent there is only really two good options.

1) lock all nodes for each key update (delay on writes and subject to issues with network splits/node downtime)
2) All modifiable data is on one node where you can guarantee consistent data 

The rest is just a mess of conflict resolution, which gets even worse if the client is making write decisions based off data it reads. 
 
Just as a personal musing, interesting when you think 1 and 2 share a lot of similarities, as a computer is jut a network of different components trying to maintain consistent state using certain shared lock resources. It just works a hell of a lot faster than nodes on a network and if a part of it goes down, the entire thing goes down.. well most of the time)

Thanks for all the feedback!  I should have done a bit more research first before I brought it forth it seems. 

Ezequiel Lovelle

unread,
Oct 2, 2014, 9:16:08 AM10/2/14
to redi...@googlegroups.com
Maybe a new entire project that act like a kind of middleware to write keys in all nodes (with locking) and load balance the read querys?
If this is the scenario seems to me more easy to write 'smarts' clients and natural 'slaveof' or use redis cluster instead.

Best regards.

Reply all
Reply to author
Forward
0 new messages