Writing a redis cluster driver for golang

658 views
Skip to first unread message

Brian Picciano

unread,
Nov 14, 2014, 1:23:00 PM11/14/14
to redi...@googlegroups.com
I'm the maintainer for the radix golang library (https://github.com/fzzy/radix), and I'm working on adding a cluster sub-package. You can find that code here:


The support for simple commands is pretty solid imo. It will transparently redirect for MOVED and ASK errors, and deal with updating the slots internally. However, I'm wondering what the best course of action would be for certain edge-cases.

1) For a single command, on a MOVED, I'm currently only updating the slot that command's key occupied; I'm not doing a CLUSTER SLOTS and re-discovering the full topology. My reasoning for this is that if there's a large change in slots that is on-going (happening one slot at a time), I don't want to call CLUSTER SLOTS before the full change has propagated, since then I'll just end up calling it again and again until the full topology change has actually propagated, which might be expensive. Doing it the way I currently have will end up with an extra call per slot that has changed, but I feel that the hit from that is negligible, given the relatively small number of slots, and it's a cleaner solution. Am I wrong in this thinking?

2) This one is somewhat related to the previous one. My question is what do I do when a node becomes unreachable? Or, more specifically, at what point do I stop trying to handle errors and return an error to the user? For example, if the master of a set of slots segfaults and isn't restarted, the slave will takeover those slots. But my client still has pointers for all those slots pointing at the old, dead master. When a command hits one of those slots it will get a connection closed error or something. At that point I could a) just return the error to the user and let them call Reset b) call Reset implicitly, which will call CLUSTER SLOTS and re-create the topology c) try the command on a different connection, knowing it won't work but just hoping to get a MOVED to the correct node. If there was some kind of pubsub system which sent out alerts about topology changes, like there is with sentinel, that would be ideal, but as far as I can find that doesn't exist for cluster.

3) This one is unrelated to the previous two. I have some minimal support for pipelining requests in a cluster in the driver right now. It will send a set of commands to a node chosen by the first command which has a key. For example, MULTI, SET foo bar, SET bar baz, EXEC will go to the node chosen by the key "foo". Right now I've not implemented any MOVED or ASK handling for the pipelining though. My reasoning is that if two different parts of the pipe are MOVED to different nodes things could get messy real quick What if it was a transaction? I can't just split up a transaction into two transactions. What if all the keys are on the same slot, but that slot is in the middle of getting migrated? Half the keys could get ASK'd, which again will mess up transactions. If command order matters on the application level and a command in the middle of the pipe gets ASK'd, I can't just stop sending commands after that point since they all get sent at once, so the commands will get processed in a different order than intended. This can be solved with a transaction, but then we hit the problems mentioned before. I guess I just want to make sure I'm not way overthinking this problem and there's some simple solution I'm not thinking of, or if I'm right and it's better to let the user decide for themselves what they want to do about errors.

Sorry this is so lengthy, I just want to make sure I'm doing the right things before I push the code to master and am stuck with whatever poor decisions I make.

Josiah Carlson

unread,
Nov 14, 2014, 6:23:29 PM11/14/14
to redi...@googlegroups.com
Replies inline.

On Fri, Nov 14, 2014 at 10:23 AM, Brian Picciano <mediocr...@gmail.com> wrote:
I'm the maintainer for the radix golang library (https://github.com/fzzy/radix), and I'm working on adding a cluster sub-package. You can find that code here:


The support for simple commands is pretty solid imo. It will transparently redirect for MOVED and ASK errors, and deal with updating the slots internally. However, I'm wondering what the best course of action would be for certain edge-cases.

1) For a single command, on a MOVED, I'm currently only updating the slot that command's key occupied; I'm not doing a CLUSTER SLOTS and re-discovering the full topology. My reasoning for this is that if there's a large change in slots that is on-going (happening one slot at a time), I don't want to call CLUSTER SLOTS before the full change has propagated, since then I'll just end up calling it again and again until the full topology change has actually propagated, which might be expensive. Doing it the way I currently have will end up with an extra call per slot that has changed, but I feel that the hit from that is negligible, given the relatively small number of slots, and it's a cleaner solution. Am I wrong in this thinking?

I think this is perfectly reasonable.

2) This one is somewhat related to the previous one. My question is what do I do when a node becomes unreachable? Or, more specifically, at what point do I stop trying to handle errors and return an error to the user? For example, if the master of a set of slots segfaults and isn't restarted, the slave will takeover those slots. But my client still has pointers for all those slots pointing at the old, dead master. When a command hits one of those slots it will get a connection closed error or something. At that point I could a) just return the error to the user and let them call Reset b) call Reset implicitly, which will call CLUSTER SLOTS and re-create the topology c) try the command on a different connection, knowing it won't work but just hoping to get a MOVED to the correct node. If there was some kind of pubsub system which sent out alerts about topology changes, like there is with sentinel, that would be ideal, but as far as I can find that doesn't exist for cluster.

A connection close can be caused by any one of a dozen reasons completely unrelated to whether or not the Redis cluster node for those slots is up and available for requests. Assuming that it is caused by a down node is probably jumping the gun a bit.

I would do is the following:
1. try to reconnect to the node
2. if the connection fails, pick any other node and give it a call (you are looking for a MOVED/ASK error, like your option 'c')
3. if that fails, reset and call CLUSTER SLOTS on any connection that works

Regarding topology updates... I agree that it should be generally available. Even just "slot X is migrating from Y to Z" and "slot X has been migrated from Y to Z" would be sufficient for most use-cases, including topology updates for down cluster master nodes.

3) This one is unrelated to the previous two. I have some minimal support for pipelining requests in a cluster in the driver right now. It will send a set of commands to a node chosen by the first command which has a key. For example, MULTI, SET foo bar, SET bar baz, EXEC will go to the node chosen by the key "foo". Right now I've not implemented any MOVED or ASK handling for the pipelining though. My reasoning is that if two different parts of the pipe are MOVED to different nodes things could get messy real quick What if it was a transaction?

Redis cluster does not support multi-node transactions. So if your transactions are attempting to execute on different cluster nodes, then your transaction is broken.
 
I can't just split up a transaction into two transactions.

You can (on the other hand) raise an error on the client telling them, "multi-node transactions are not supported by Redis". Because they aren't supported.
 
What if all the keys are on the same slot, but that slot is in the middle of getting migrated?

Use the WATCH, MULTI, EXEC sequence. If your keys are being migrated, the whole transaction will fail due to the check just before EXEC of the keys that were WATCHed. You may need to do key sniffing in all possible commands to ensure that you are WATCHing all keys to get this right.

Alternatively, almost all MULTI/EXEC transactions can be translated into an equivalent Lua script, which requires* all keys be provided as part of the execution, which are trivially checked before sending to make sure they are on the same shard, and which is checked at Redis to verify all keys are still on that machine.

In either case, raise an error to the client as there isn't sufficient information to make a reasonable estimate as to when the migration will complete.
 
Half the keys could get ASK'd, which again will mess up transactions. If command order matters on the application level and a command in the middle of the pipe gets ASK'd, I can't just stop sending commands after that point since they all get sent at once, so the commands will get processed in a different order than intended. This can be solved with a transaction, but then we hit the problems mentioned before. I guess I just want to make sure I'm not way overthinking this problem and there's some simple solution I'm not thinking of, or if I'm right and it's better to let the user decide for themselves what they want to do about errors.

There are several errors that are not able to be addressed automatically by a client library. That's okay. Just make sure to document the potential error connections, and optionally offer suggestions as to how to address the issue.

Sorry this is so lengthy, I just want to make sure I'm doing the right things before I push the code to master and am stuck with whatever poor decisions I make.

Writing correct software is important. Take your time. And even when you are just about ready... I'd still drop it in a feature branch and hold off on calling it "ready" for a while.

 - Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Josiah Carlson

unread,
Nov 14, 2014, 6:35:12 PM11/14/14
to redi...@googlegroups.com
Oh, and as for the "required*" for Lua script keys...

* This is a strongly worded request, not a technical requirement for proper execution**. But everyone should treat them as a requirement simply because it is orders of magnitudes easier to understand the implications of required keys.

** To make it a technical requirement would require validating every call to Redis from a Lua script to ensure the key was passed in KEYS. Instead, Redis cluster just makes sure that the key would go in one of the shards it is responsible for, and non-cluster Redis doesn't do anything special.


 - Josiah

Matt Stancliff

unread,
Nov 15, 2014, 11:18:52 AM11/15/14
to redi...@googlegroups.com

> On Nov 14, 2014, at 13:23, Brian Picciano <mediocr...@gmail.com> wrote:
>
> If there was some kind of pubsub system which sent out alerts about topology changes, like there is with sentinel, that would be ideal, but as far as I can find that doesn't exist for cluster.

That was suggested, but got rejected with reason "Given that clients are guaranteed to get an error when they have a wrong map, what's the gain of implementing clients with an open connection to the Pub/Sub channel?” (See https://github.com/antirez/redis/pull/1782)

I still think it’s useful though. Currently there’s no way to view (or monitor or graph or anomaly-detect) live state changes in a cluster unless you read log files for every cluster instance or poll the CLUSTER NODES continuously.


-Matt

Josiah Carlson

unread,
Nov 16, 2014, 11:19:55 AM11/16/14
to redi...@googlegroups.com
Matt,

I am not sure that when Salvatore made that decision he was aware that there might be implications with respect to correctness in a MULTI/EXEC transaction or even an async pipeline as discussed in the other thread.

 - Josiah



Brian Picciano

unread,
Nov 18, 2014, 12:20:24 AM11/18/14
to redi...@googlegroups.com
Thanks for the detailed answers Josiah, I really appreciate the help. I'll try to clean up what I've got and see where that gets me, and I'll definitely be posting my final revision of the code both here and the go user group before this hits master.

Brian Picciano

unread,
Nov 22, 2014, 4:18:53 PM11/22/14
to redi...@googlegroups.com
Alright, so I've implemented the reconnect logic and such, as well as did testing on reconnect and redistribution logic. If anyone who knows go wants to take a look at my code and let me know what they think, mostly about the usability of the api and readability of the code, I'd really appreciate any comments.

Reply all
Reply to author
Forward
0 new messages