redis swarm

Tim Lossen

unread,

Oct 18, 2012, 4:22:16 PM10/18/12

to redi...@googlegroups.com

hi all,

i would like to share our current redis setup, which is slightly unusual.

- redis is not the main datastore, but a cache for frequently used data, like highscore lists.
- we have a cluster of application servers (currently 2, later probably around 8).
- each server runs a local redis instance, and only ever talks to this instance.
- cache writes are broadcast to all app servers via UDP over a private network, and then written to redis.

this way, the whole redis swarm is always (more or less) in sync. the interesting point is that this setup is "masterless" -- any single instance can fail without affecting the others.

sure, this does not offer the same consistency as redis cluster, but for some uses cases it is "good enough" -- while considerably simpler.

cheers
tim

--
http://tim.lossen.de

Sergei Tulentsev

unread,

Oct 20, 2012, 5:29:46 PM10/20/12

to redi...@googlegroups.com

This is an interesting design, but it's not "Web Scale" :)

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

--
Best regards,
Sergei Tulentsev

Felix Gallo

unread,

Oct 20, 2012, 6:05:33 PM10/20/12

to redi...@googlegroups.com

Tim and I are in the same business, and sometimes, if we're lucky, we
get some pretty amusing scaling opportunities ("facebook featured us
*where*??").

I think it's worth pointing out that he's showcasing yet another good
design case for Redis -- as a local cache for the application servers
themselves to use, when changes are relatively rare/unimportant (e.g.
high score lists, A/B testing group changes, daily offers, etc...),
but changes do happen, and hitting a traditional database might be too
heavyweight. Bringing up temporary/special purpose redis
intallations to handle small tasks ('here are the 10,000 users that we
want to offer the pink unicorn pants to at half price, go') is
something which you'd never do with oracle/mysql/riak or even mongodb,
but which is pretty easy and natural for redis.

F.

Tim Lossen

unread,

Oct 21, 2012, 3:38:28 PM10/21/12

to redi...@googlegroups.com

well, caching is actually a pretty common use case for redis i would
say, and not restricted to gaming ....

anyway, currently i am thinking about how to extract the broadcast
stuff from the application layer and turn it into something generic
and reusable.

i see at least two possible approaches:

a) a swarm daemon which runs next to each redis instance. here is a
proof-of-concept version i hacked in ruby:

https://gist.github.com/3914390

(hooking into monitor is perhaps not the best choice -- connecting as
slave might be more robust?)

b) pushing the functionality into redis itself. in "swarm mode", redis
would broadcast every (whitelisted) command to the whole swarm, and
every swarm member would apply commands received via UDP.

i'd give implementing this a try, but i'm afraid my c skills are not
up to scratch. maybe somebody else feels motivated to tackle this?

any other ideas / opinions regarding redis swarm?

cheers
tim

ps: BTW, could you put me on that pink unicorn pants list, felix? ;)

http://tim.lossen.de

Felix Gallo

unread,

Oct 21, 2012, 7:12:17 PM10/21/12

to redi...@googlegroups.com

Tim writes:
> any other ideas / opinions regarding redis swarm?

I'm sure you've thought of the usual set of problems, but I'll mention
them anyway:

1. Using a nonpersistent queue (e.g. redis pub/sub) would make
recovery situations more difficult because swarm members can't resume
after interruption and have to redump the world
2. Either you serialize the writes to the swarm inside the redis main
loop and so writes take O(N network calls) longer, or you have to come
up with an eventual consistency band-aid (vector clocks, data
versions, etc., see Riak), or you just accept the risk and the
internet gets mad at you for being pragmatic and not trying to be cool
(e.g. redis's built in master/slave and aof implementations)
3. This general concept is directly inside Erlang's wheelhouse, and
hence Riak's wheelhouse; if excited enough about keeping a masterless
kv/document database, it might be worth determining that Riak doesn't
work for the use case

However I salute you for coming up with interesting ideas, in my
humble and limited opinion the redis cluster is underdesigned and so
the space is ripe for intriguing solutions to more problems than it
solves.

> ps: BTW, could you put me on that pink unicorn pants list, felix? ;)

done...just accept the box where it says I can post anything I want to
your timeline...

F.

CharSyam

unread,

Oct 21, 2012, 7:48:24 PM10/21/12

to redi...@googlegroups.com

Hi, Tim and Felix

How many requests does your server process? and Do you have any kind of retry method?

In my rule of thumb, Some UDP Packets are often lost when they get many requests Event though There are in same network.

2012/10/21 Felix Gallo <felix...@gmail.com>

Josiah Carlson

unread,

Oct 22, 2012, 12:51:08 AM10/22/12

to redi...@googlegroups.com

If you are looking for something that is good for syndicating UDP
messages out, look no further than syslog-ng. About all you'd need to
do is to implement the fan-out from each server to the others with a
simple configuration (identical configuration on all of the boxes,
easily updated/reloaded), along with a simple daemon that runs next to
your Redis server (running in any language you want) that handles the
writing to Redis. Then all you'd need to do is send the right kind of
log message to syslog, and it would be auto-syndicated out. Your
daemons could even be smart enough to ignore messages from the same
host it is running from.

I've actually been sitting on a 95% complete open source project whose
only purpose is to participate on either end of the syslog UDP stream
for exactly this kind of stuff (I'd planned on using it for counters,
as I've implemented it a few times already, and wanted to use it for
Redis-based counters, Graphite-based counters, Google Analytics-based
counters, etc.).

- Josiah

Brian Knox

unread,

Oct 22, 2012, 8:20:20 AM10/22/12

to redi...@googlegroups.com

Speaking of syslog - I wrote the omhiredis plugin for rsyslog and it
is available in the recent rsyslog 7.2 release. It is at the moment
more of a proof of concept just to get the feature working. I
unfortunately have been strapped for time and need to get back to it.

I hope to soon add support for rsyslogs batching features (which would
allow you to use multi exec / async pipelining with the plugin).
Also, the template system is much improved in rsyslog 7 and I need to
play around with how this will allow easier construction of redis
commands.

The plugin is in very early stages and I know there's issues with it -
I'll certainly accept patches for it for features if someone else gets
to them before I do.

Brian

Bryce Baril

unread,

Oct 26, 2012, 4:09:42 PM10/26/12

to redi...@googlegroups.com

This is very similar to a cache setup on a telephony platform that I
worked on a couple years ago. There were a few key differences -- e.g.
SQLite instead of Redis -- but generally the same idea of local cache
instances populated via UDP. By the end we had something like 60
cache instances and not a single one matched.

The inconsistency from cache node to cache node became so great that
we had to replace it, emergency style. Ironically for this example we
used Redis (standard Master/Slave replication) to replace it, with
great success.

In our use case the original developers had undervalued cache
consistency to the customer experience on the platform and
overestimated how consistent the individual nodes would be. With a
small pool of Redis slaves we were able to get the consistency we
needed while dramatically increasing the performance and capacity of
our cache layer.

-Bryce

Tim Lossen

unread,

Oct 27, 2012, 3:18:32 AM10/27/12

to redi...@googlegroups.com

the basic idea is that each application server is self-contained. let's say we have five servers, each serving a fifth of the user population. if one goes down, 80% of users are not affected.

the expectation is that we will have *more* (but less severe) outages. of course it depends very much on the use case if this is a feasible strategy or not.

On 2012-10-26, at 18:21 , Jonathan Haddad wrote:

> I'm not sure what the advantage is of this over using redis w/ a master / slave setup. Personally, I prefer to keep each machine dedicated to a particular task. My web servers and my redis servers are logically different, thus live on different boxes.

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/5BHu5DMrdeIJ.

> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

--
http://tim.lossen.de

Tim Lossen

unread,

Oct 27, 2012, 3:29:40 AM10/27/12

to redi...@googlegroups.com

hmmmm ..... very interesting! i hope our cache nodes will not diverge that much, as a lot of the cache keys are updated frequently (multiple times per day), so should be "self-healing".

plan b is to run a little "anti-entropy" script on each server that re-broadcasts a few random entries per second.

i sure hope we don't have to replace redis with sqlite in the end ... ;)

tim

--
http://tim.lossen.de

Josiah Carlson

unread,

Oct 27, 2012, 1:44:39 PM10/27/12

to redi...@googlegroups.com

Did you ever discover the cause of the inconsistency? In practice with
a network that isn't overloaded, I've not seen the drop rate of UDP
packets push past .1%.

Regards,
- Josiah

Jonathan Haddad

unread,

Oct 27, 2012, 5:31:09 PM10/27/12

to redi...@googlegroups.com

I get the idea, it's just not practical, at least in my experience. It doesn't scale at all and results in weird bugs.

Issue #1: Space

You will have every piece of cached data duplicated N number of times (n being the number of servers). 20 servers = 20 copies?

You're memory bound by the size of your entire cache. If you have 20GB of data in redis, every one of your web servers (all 20 of them) will require at least that amount of RAM, rather than having 3 redis servers w/ 20 GB. If you're serving this up in Amazon, you'll need 20 high mem instances just to run your app. That's ridiculous.

A common scaling strategy is to partition your cache across different redis instances. User data in 1 set of servers, photo data in another, forum in another, etc. This lets your cache grow past the limits of an individual machine. You're not going to be able to do this using your strategy.

Issue #2: Redis is much faster than your web servers

A single redis server is going to serve many times the number of requests your web server is capable of doing. You could easily run 20 web servers doing a hundred requests a second to a single redis server. Use sentinel with 2 slaves and you're good to go for a long time. (and you've got failover)

Issue #3: Inconsistency

I don't think I need to expand on this, it's been covered already, other than you're considering using an overly complicated setup to compensate for what I think is a poor architecture decision.

> plan b is to run a little "anti-entropy" script on each server that re-broadcasts a few random entries per second

Of course, maybe you don't need to worry about scale, but if you don't I'm not sure why you'd even need to use a cache to begin with.

Jon

Bryce Baril

unread,

Oct 28, 2012, 1:09:27 PM10/28/12

to redi...@googlegroups.com

This was a VOIP system, so the network was reasonably loaded with RTP
traffic, but we generally had enough overhead. UDP was only part of
the problem, there were other architecture issues that caused the
cache listeners to be so busy they would miss some UDP broadcasts. If
not we might have just switched to TCP and called it good, but the
system as it was would not have been able to keep up without fixing
the blocking IO issues.

In terms of the "anti-entropy" component, this system had something
analogous -- just make sure that it reads directly from the source and
is not also populated by the same UDP broadcasts. This was yet another
problem with the system, occasionally a stale update would be
rebroadcast as truth from that piece.

-Bryce

Jeremy Zawodny

unread,

Nov 3, 2012, 1:49:25 PM11/3/12

to redi...@googlegroups.com

Interesting...

We have a masterless deployment we're designing at craigslist as well for a specific application. I'll be presenting it at our local Perl Mongers (San Francisco) meeting at the end of the month and will share the slides here as well. It's good to see we're not the only ones doing things a little out of the ordinary. :-)

http://www.meetup.com/San-Francisco-Perl-Mongers/events/86790152/

Jeremy

On Thu, Oct 18, 2012 at 1:22 PM, Tim Lossen <t...@lossen.de> wrote:

Jeremy Zawodny

unread,

Apr 8, 2013, 12:12:53 PM4/8/13

to Nathan Davis, redi...@googlegroups.com

Ah, good point! I need to check and see if we can post the video and slides. I'm afraid the slides won't be terribly useful without the video, but I also haven't looked to see how the video came out either.

Too many plates spinning... but I'll check and see.

Jeremy

On Thu, Mar 28, 2013 at 10:59 AM, Nathan Davis <nat...@courseload.com> wrote:

Hi Jeremy,
I'm interested in hearing more about your "out of the ordinary" use of Redis. Couldn't find the slides posted...

Thanks,
Nathan

Reply all

Reply to author

Forward