redis clustering, proxy

Marc Byrd

unread,

May 13, 2010, 1:28:49 PM5/13/10

to redi...@googlegroups.com

Where are we on the redis clustering concepts, approach?

I'm resending a suggestion from a few days ago, hope it gets more traction this time. I guess the details don't matter nearly as much as the proposed principals - are there other such principals we should be working from?

Cheers,

m

--

Greetings,

The original proposal involves less changes to redis-server, so gets my vote - of the two options given.

I hope you might consider this humble proposal, which may be only a slight variation of your first approach:

roxi - redis proxy, somewhat like moxi for memcached:

speaks redis protocol, therefore may be used by existing clients
starts w/ fork of redis-server (or make command line option?)
add super-great redis client lib in C, async (credis?) [see e.g. libmemcached]
provides L1 caching for most frequently used items
provides cluster config info that clients can optionally use to directly access redis-server instances - has time to live - allows clients to maintain consistent hashing and bypass proxy for some operations (clients must honor TTL)
aggregates stats with pubsub
replication would work as it does in redis-server - can replicate from server to proxy, proxy to server, or proxy to proxy - this provides for but does not require high availability configuration
configuration replicated everywhere with pubsub
can have many roxi instances, clients can consistently hash as they do today
Allows for override of key mapping (e.g. for related sets)

In particular the two principals most important to existing users, (re-use, backward compatibility, simplicity) are:

keep redis-server simple
proxy should speak redis protocol

Cheers!

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Tim Lossen

unread,

May 15, 2010, 4:19:06 AM5/15/10

to Redis DB

marc,

+1 from me both for the roxi concept, and for the principles you
propose.

i think it is important to keep redis itself as simple as possible.
not everybody needs multiple redis instances, and there are already
a lot of other solutions (cassandra, riak ...) which are specifically
designed as clusters from the start.

what i like about the roxi approach is that it would give us quite
some
flexibility regarding the deployment topology.

for example, you could put one roxi instance on every application
server.
this would keep latency overhead to a minimum, while allowing for
things
like redis server failover, dynamic cluster reconfiguration etc. --
all
without having to touch or restart your app servers!

as you already pointed out, another *huge* benefit of this approach is
that all existing clients -- and there are already quite a lot --
just
keep working and do not have to be made "cluster-aware" in any way.

unfortunately, i am not a c coder -- otherwise i would get started
working
on roxi right now.

cheers
tim

ps: at the risk of sounding like a spelling nazi -- a "principal" is
something entirely different from a "principle". i see this mistake
all the
time .... maybe it is even *harder* for native speakers to keep the
two
apart? ;)

On May 13, 7:28 pm, Marc Byrd <dr.marc.b...@gmail.com> wrote:
> Where are we on the redis clustering concepts, approach?
>
> I'm resending a suggestion from a few days ago, hope it gets more traction
> this time. I guess the details don't matter nearly as much as the proposed
> principals - are there other such principals we should be working from?
>
> Cheers,
>
> m
>
> --
>
> Greetings,
>
> The original proposal involves less changes to redis-server, so gets my vote
> - of the two options given.
>
> I hope you might consider this humble proposal, which may be only a slight
> variation of your first approach:
>
> roxi - redis proxy, somewhat like moxi for memcached:
>

> 1. speaks redis protocol, therefore may be used by existing clients
> 2. starts w/ fork of redis-server (or make command line option?)
> 3. add super-great redis client lib in C, async (credis?) [see e.g.
> libmemcached]
> 4. provides L1 caching for most frequently used items
> 5. provides cluster config info that clients can optionally use to

> directly access redis-server instances - has time to live - allows clients
> to maintain consistent hashing and bypass proxy for some operations (clients
> must honor TTL)

> 6. aggregates stats with pubsub
> 7. replication would work as it does in redis-server - can replicate from

> server to proxy, proxy to server, or proxy to proxy - this provides for but
> does not require high availability configuration

> 8. configuration replicated everywhere with pubsub
> 9. can have many roxi instances, clients can consistently hash as they do
> today
> 10. Allows for override of key mapping (e.g. for related sets)

>
> In particular the two principals most important to existing users, (re-use,
> backward compatibility, simplicity) are:
>

> 1. keep redis-server simple
> 2. proxy should speak redis protocol

Salvatore Sanfilippo

unread,

May 15, 2010, 4:36:05 AM5/15/10

to redi...@googlegroups.com

On Thu, May 13, 2010 at 7:28 PM, Marc Byrd <dr.mar...@gmail.com> wrote:

> speaks redis protocol, therefore may be used by existing clients

No doubt about that

> starts w/ fork of redis-server (or make command line option?)

Implementation details ;) For now there are much more big design
issues going on. For instance, should it be a proxy or not? And things
like that, more later.

> add super-great redis client lib in C, async (credis?) [see e.g.
> libmemcached]

Not sure why this helps. You mean that the cluster should be
implemented using a fast C library?
If it is a proxy in theory it can be a good idea but I suspect that a
specialized thing is going to be needed.
I mean, because it's a proxy, when you perform a read operation, there
is no need to really parse the reply from the point of view of the
proxy, it should just be able to pass the reply back.

> provides L1 caching for most frequently used items

No convinced at all :) This is already implemented by nodes that are
in-memory or with VM active (that is, L1).

> provides cluster config info that clients can optionally use to directly

This is a good idea. That is. If the clustering solution will be
implemented directly in the redis-server instance, that is a solution
I'm starting to like more and more actually, this nodes will be able
to forward the requests for keys that are not their business to other
nodes.

But of course if the client will contact the right node is better,
right? Less latency. So if there is a command to get an "hint" about
what node to contact in order to get smaller latency, this is a good
thing. If the information is not up to date or plainly wrong,
everything will work anyway (but slower).

> access redis-server instances - has time to live - allows clients to
> maintain consistent hashing and bypass proxy for some operations (clients
> must honor TTL)

Don't like this. I think the final version should be totally
transparent for clients.

I started with a proxy in design document #1
Then I moved to a more performant version in document #2 with clients
connecting directly to the proxy
Then (especially tanks to a good discussion with Derek Collison) I
realized that we can have the best of the words:
a proxy directly implemented into the redis-server itself (so that
there are no intermediate layers) that is able to provide hits to
clients (so that good clients will go as fast as solution #2).

So in a well implemented cluster, nodes will rarely proxy at all.

> aggregates stats with pubsub

Not sure about that. I think that Pub Sub is already a distributed
system per se for the majority of use cases.

> replication would work as it does in redis-server - can replicate from
> server to proxy, proxy to server, or proxy to proxy - this provides for but
> does not require high availability configuration

I think replication should not used at all with the proxy. Every node
must be replicated M-1 times for configuration.
So if you config M = 1 you have no fail tolerance, M = 2 means that a
single node can go down, and so forth.
In general M-1 nodes can go down. If >= M nodes will be down all the
nodes will start to refuse queries at all.

> configuration replicated everywhere with pubsub

About that, I think that is better if just the different nodes are
mutually connected. I mean: all the nodes.
In order to exchange information and detect if some node is down and so forth.

So if you have N nodes, you have N-1 open connection in every node,
but this is not going a problem for a number of reasons:

1) I think that if our cluster works well with 100 or 200 nodes it is
already ok to start
2) we can switch later to a routed model if needed
3) connections can be relaxed if needed when clients have the right
hits and usually are talking directly with the right nodes, and will
be established again if needed.

Have to figure the details still to be honest. An alternative is that
we have two froms of connections: TCP for proxy and UDP as control
channel in order for the nodes to exchange information. I like more
the single TCP connection model.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
http://invece.org

"Once you have something that grows faster than education grows,
you’re always going to get a pop culture.", Alan Kay

Reply all

Reply to author

Forward