[release] Redis 3.0.0 is out.

Salvatore Sanfilippo

unread,

Apr 1, 2015, 10:43:56 AM4/1/15

to Redis DB

Hello,

after many years of interleaved intense efforts and no care at all (in
order to focus to other features), the first stable version of Redis
with native support for clustering is out.
I see this more like a beginning of a new stage, than the termination
of an old one. Here we are giving the foundations to make it simpler
for people to work with Redis in larger environments.
What Redis 3.0.0 is providing today, will likely take 1 or 2 years to
mature, with our community that can build an experience on what we
got, and get some feeling about what we should aim for the future.

Redis 3.0.0 contains many more features, but the list does not provide
justice to the work this release received, since an impressive number
of things were backported into 2.8 in the course of the releases in
order to provide them to the users ASAP, because I was conscious of
the fact 3.0.0 was special, and it needed much more time before to
reach stability. Now that we returned, hopefully, into a normal
workflow, new features are no longer backported into stable releases.
Redis 3.0.0 is actually the first example of this new development
model: there are already interesting new things into the "unstable"
branch that I avoided putting into the mix of 3.0.0.

The cluster specification was updated for the release. You may want to
read it again if you are interested in Redis Cluster:
http://redis.io/topics/cluster-spec
We also have a new section with Cluster commands in the redis.io web
site: http://redis.io/commands#cluster

Other new things implemented in 3.0.0 that may be significative to
users not interested in Redis Cluster are:

* An improved LRU algorithm, ways more precise to evict the older keys.
* Redis 3.0.0 is generally sensibly faster under high (pipelined) load.
* AOF rewrite was reworked in order to reduce latency spikes with slow
disks when the process undertakes the final "write" of the accumulated
buffer.

Here is the full changelog:
https://raw.githubusercontent.com/antirez/redis/3.0/00-RELEASENOTES

Before closing the line, just a few more words. I believe that Redis
Cluster will be a major thing in the Redis ecosystem. Maybe it works
great from day one, maybe it will need a few more iterations, and
possibly with 3.2 we'll improve support for many stuff, but my guess
is that Redis 3.0.0 today, in some way, changes what Redis is. Redis
used to be this little cute instance you run to solve certain
problems. Later we evolved it into something more, with Sentinel there
is an HA story to tell, and it is possible to use Redis for more
complex tasks. Now with Redis Cluster the attempt is to give users the
step forward, a system that out of the box is capable of a certain
degree of automatic scalability and fault tolerance, a degree that
should be enough for many users to change their view of Redis as a
tool, and as the possible applications it is suitable for.

I'm happy about the journey that started when I realized Redis was
limited by its single instance nature, and today ends with this first
stable release of Redis Cluster. I would do it again, and I really
hope you'll have fun with Redis 3.0.0, the kind of fun we code
addicted have from time to time while trying to make shit done.

Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - Pivotal http://pivotal.io

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Dvir Volk

unread,

Apr 1, 2015, 11:06:30 AM4/1/15

to Redis DB

Congrats, Salvatore!

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Yiftach Shoolman

unread,

Apr 1, 2015, 11:08:19 AM4/1/15

to redi...@googlegroups.com

Well done Salvatore !

Congrats, Salvatore!

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

Yiftach Shoolman
+972-54-7634621

Leandro Ferreira

unread,

Apr 1, 2015, 12:00:46 PM4/1/15

to redi...@googlegroups.com

Congratulations, this is a very very welcome release, i've been waiting for it since you start to talk about it!

Thank you for your efforts!

Rakan Alhneiti

unread,

Apr 1, 2015, 12:00:50 PM4/1/15

to redi...@googlegroups.com

Congratulations Salvatore!

Many thanks for your constant efforts working on Redis!

Josiah Carlson

unread,

Apr 1, 2015, 12:47:07 PM4/1/15

to redi...@googlegroups.com

Congratulations :)

- Josiah

Cyrus Bakhtiyari

unread,

Apr 1, 2015, 1:31:50 PM4/1/15

to redi...@googlegroups.com

Congrats Salvatore! Great work.

Ayan R

unread,

Apr 1, 2015, 1:31:56 PM4/1/15

to redi...@googlegroups.com

Congratulations! This is a huge milestone. Thanks for all your hard work and the team who worked on this.

Chaitanya Malla

unread,

Apr 1, 2015, 1:31:56 PM4/1/15

to redi...@googlegroups.com

Congrats. Huge thanks for the update and team.

On Wednesday, April 1, 2015 at 7:43:56 AM UTC-7, Salvatore Sanfilippo wrote:

Alberto Gimeno Brieba

unread,

Apr 1, 2015, 1:44:24 PM4/1/15

to redi...@googlegroups.com

Congrats!!!

Marc Byrd

unread,

Apr 1, 2015, 2:23:43 PM4/1/15

to redi...@googlegroups.com

Thanks Salvatore and all contributors!

m

On Wed, Apr 1, 2015 at 10:44 AM, Alberto Gimeno Brieba <gime...@gmail.com> wrote:

Congrats!!!

Matt Bailey

unread,

Apr 1, 2015, 5:00:01 PM4/1/15

to redi...@googlegroups.com

Congrats!!! Man, I hope this isn't an April fool's joke...

On Wednesday, April 1, 2015 at 7:43:56 AM UTC-7, Salvatore Sanfilippo wrote:

Andres Arias

unread,

Apr 1, 2015, 5:02:27 PM4/1/15

to redi...@googlegroups.com

Thanks, Salvatore!

Great work.

Jacobo G

unread,

Apr 1, 2015, 5:10:20 PM4/1/15

to redi...@googlegroups.com

Congrats!!!

tiredpixel

unread,

Apr 1, 2015, 5:10:21 PM4/1/15

to redi...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Congratulations, Salvatore and contributors. I've been eagerly
awaiting this release, keeping an eye on this list and visiting the
website downloads page sometimes daily. :D Certainly, native cluster
support ushers in a new and exciting era in the story of Redis. Thank
you!

Peace,
tiredpixel

On 01/04/2015 15:43, Salvatore Sanfilippo wrote:
> after many years of interleaved intense efforts and no care at all
> (in order to focus to other features), the first stable version of
> Redis with native support for clustering is out.

Sam Munkes

unread,

Apr 1, 2015, 5:10:21 PM4/1/15

to redi...@googlegroups.com

Thanks Salvatore for all your hard work!!

On Wednesday, April 1, 2015 at 7:43:56 AM UTC-7, Salvatore Sanfilippo wrote:

Salvatore Sanfilippo

unread,

Apr 1, 2015, 5:58:29 PM4/1/15

to Redis DB

Replying to myself just to tell you thanks!

Felix Gallo

unread,

Apr 1, 2015, 6:45:16 PM4/1/15

to redi...@googlegroups.com

Salvatore --

Looking at the cluster documentation, I have a hard time putting together the story of what guarantees it provides so that I can reason about its behavior. Can you provide a synopsis of what you believe the guarantees are, and perhaps what some example use cases might be?

F.

孙赫

unread,

Apr 2, 2015, 1:18:46 AM4/2/15

to redi...@googlegroups.com

Congrats!!!

在 2015年4月1日星期三 UTC+8下午10:43:56，Salvatore Sanfilippo写道：

Wei Jin

unread,

Apr 2, 2015, 1:18:46 AM4/2/15

to redi...@googlegroups.com

Kudos, Salvatore. Thank you.

Freire Zhu

unread,

Apr 2, 2015, 2:01:02 AM4/2/15

to redi...@googlegroups.com

Great work, thank you very much!

在 2015年4月1日星期三 UTC+8下午10:43:56，Salvatore Sanfilippo写道：

Hello,

Rulin Zhuo

unread,

Apr 2, 2015, 2:03:03 AM4/2/15

to redi...@googlegroups.com

Thanks Salvatore and all contributors!

在 2015年4月1日星期三 UTC+8下午10:43:56，Salvatore Sanfilippo写道：

Hello,

Gianluigi Belli

unread,

Apr 2, 2015, 2:45:04 AM4/2/15

to redi...@googlegroups.com

Thank you!!!

Pedro Melo

unread,

Apr 2, 2015, 2:53:43 AM4/2/15

to redi...@googlegroups.com

Thank you Salvatore, for the years of good work…

Best regards,

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

Pedro Melo
@pedromelo
http://www.simplicidade.org/
xmpp:me...@simplicidade.org
mailto:me...@simplicidade.org

Salvatore Sanfilippo

unread,

Apr 2, 2015, 3:46:14 AM4/2/15

to Redis DB

On Thu, Apr 2, 2015 at 12:44 AM, Felix Gallo <felix...@gmail.com> wrote:
> Salvatore --
>
> Looking at the cluster documentation, I have a hard time putting together
> the story of what guarantees it provides so that I can reason about its
> behavior. Can you provide a synopsis of what you believe the guarantees
> are, and perhaps what some example use cases might be?

Hello Felix,

the fact it's hard from the doc to tell what are the guarantees of
Redis Cluster is not good! ;-) Please tell me what was not clear from
the doc, and what in your mental model of Redis Cluster is missing,
after reading the doc, to understand what happens. I'll likely do a
worse job here trying to compress the cluster spec in a few lines.
However just to provide an overview of the main ideas you should check
in the spec, here is a minor recap:

Basically Redis Cluster is a "last failover wins" eventually
consistent system, without merge function, the slave that failed over
a given set of hash slots the last simply wins. There are strong
guarantees for the order of failovers (see configEpoch section). An
eventual property of Redis Cluster is that the minority side of the
partition stops accepting writes, providing more safety from the point
of view of amount of writes you can lose in certain failure modes,
specifically when clients are partitioned together with a minority of
masters (that can be failed over in another side of the partition).
Another eventual property is the propagation of the Cluster
configuration, eventually every node will update its configuration
with the hash slot -> serving node entry having the higher
configEpoch. A side effect of this is that returning nodes are always
able to re-join the cluster in some way, either as slaves, or as
masters if no node in the meantime was able to failover them. It is
possible to specify a maximum disconnection time (from the master) for
a slave to be capable of being elected, but by default this is
disabled. Redis Cluster runs a best-effort algorithm when there are
many slaves competing to be elected in order to provide an advantage
to the slave with the most updated data set, however if the best slave
does not failover the master in a short time, other slaves are able to
failover, so this is just an heuristic.

One of the best ways to check by naked eyes how Redis Cluster behaves
during failures is to run the consistency test distributed with
redis-rb-cluster client, while simulating different kind of events in
the cluster.

Thanks,
Salvatore

Trieu Tran Quoc

unread,

Apr 2, 2015, 5:09:48 AM4/2/15

to redi...@googlegroups.com

Great, congrats!

DJ

unread,

Apr 2, 2015, 5:09:48 AM4/2/15

to redi...@googlegroups.com

Salvatore, congratulations on the release. I am curious about this line from the Redis cluster spec -

Redis Cluster design avoids conflicting versions of the same key-value pair in multiple nodes since in the case of the Redis data model this is not always desirable: values in Redis are often very large, it is common to see lists or sorted sets with millions of elements.

How is the performance with millions of sorted set elements? What sizes of the elements are we talking about here?

Salvatore Sanfilippo

unread,

Apr 2, 2015, 5:16:10 AM4/2/15

to Redis DB

Hello DJ, this line talks about a problem that Redis Cluster does not
try to solve *because* there is the problem of large values. It means
that if two nodes, during partitions or failures, believe to be both
in charge of the same key, and this key is very large and receives
writes in two different partitions, there will be no merge attempt,
but instead, Redis Cluster will just follow the "last failover wins"
rule and will discard the values into one node, and use the values
into another node. So the performance of Redis Cluster with very large
values is the same as the one of the Redis single instance. However
there is one specific case where Redis Cluster have to deal with very
large values, that is, when we reshard an hash slot containing some
very large value from one instance to another instance. In that case,
MIGRATE will block for the time needed to migrate the key. With many
million of elements this may mean blocking for seconds. It's a good
idea to avoid resharding hash slots where such huge keys exist, and
instead reshard the other hash slots if we want to lower the amount of
memory a given node is using, however, currently, there is no easy way
to do this, since redis-trib does not try to attempt a "big keys"
analysis. This is one thing I would like to solve in Redis 3.2.

Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

pcdinh

unread,

Apr 2, 2015, 5:23:36 AM4/2/15

to redi...@googlegroups.com

Great jobs. Thanks you.

Felix Gallo

unread,

Apr 2, 2015, 12:33:49 PM4/2/15

to redi...@googlegroups.com

Hi Salvatore --

Thanks for the condensation, it was helpful. So if I am understanding the spec right:

1. Starting at time of network partition, any sequence of client reads could be false from a cluster-majority perspective until the partition is healed -- either 'stale' (progress has been made on the other side of the split brain) or 'false future' (doomed progress has been made in a minority).

2. Similarly, during a partition, any sequence of client writes could be destined to be ignored (doomed progress being made in a minority).

3. At partition time, it's further possible that some extra set of writes are lost by dying or minority-partitioned masters, because they acknowledge local writes to clients before they replicate (for speed).

4. Recovery can take a long time with large keys and stop progress for the duration of key transfer.

And so from a mental model / reasoning perspective, I end up with:

A. Redis-cluster comes with the possibility that some arbitrarily large subset of your clients could enter into an arbitrarily long false alternate data universe.

B. The lifetime of that universe, while expected to be very short in practice, does depend on network bandwidth and reliability, suggesting operationally that Redis-cluster should be deployed in the same data center to minimize risk.

C. Recovering from that universe could take an arbitrarily long time.

I'm having a tough time coming up with a safe/usable use case in which the cluster is accepting any kind of causal writes and not acting as a straight cache. I feel like I have to be missing something.

F.

Jason Lantz

unread,

Apr 2, 2015, 4:12:04 PM4/2/15

to redi...@googlegroups.com

Thanks for all your hard work Salvatore!!!!!

k kiran

unread,

Apr 2, 2015, 4:12:04 PM4/2/15

to redi...@googlegroups.com

Congrats !!! Long Long wait ended.

Salvatore Sanfilippo

unread,

Apr 2, 2015, 4:40:01 PM4/2/15

to Redis DB

On Thu, Apr 2, 2015 at 6:33 PM, Felix Gallo <felix...@gmail.com> wrote:
> Hi Salvatore --
>
> Thanks for the condensation, it was helpful. So if I am understanding the
> spec right:
>
> 1. Starting at time of network partition, any sequence of client reads
> could be false from a cluster-majority perspective until the partition is
> healed -- either 'stale' (progress has been made on the other side of the
> split brain) or 'false future' (doomed progress has been made in a
> minority).

Exactly. An important thing is that this behavior is time bound. As
soon as the minority side feels to be isolated (node-timeout time
elapsed without contact with the majority) it stops accepting writes.
Moreover if the partition heals before the node-timeout time, no
stale read or lost write happens.

> 2. Similarly, during a partition, any sequence of client writes could be
> destined to be ignored (doomed progress being made in a minority).

Yes, this is like "1" basically. You can lose writes even if they are
sent in the majority partition in case of very complex sequence of
successive partitions.
For example you are in a partition with the majority of masters,
including master M2, having its slaves S2a and S2b partitioned in the
minority.
A client writes to M2, then the network makes M2 no longer available
in the minority partition, but S2a and S2b back again.

It is possible to use "WAIT" in order to make sure a write is
propagated at least, for example, to one slave, however my feeling is
that if you can't tolerate this kind of issues Redis is not for you.

Before Cluster the same happened in this way:

1. Client is writing to a master.
2. Master-Slave link breaks.
3. More writes received by master ...
4. Master is no longer available.
5. Sentinel (or any other system) failovers the master with the slave
that miss some write.

Basically there are a number of different sequences of partitions you
can invent that will violate write safety because of the use of
asynchronous replication and last-failover-wins.

Those failure modes can be made a lot less likely using a combination
of Redis Cluster features and Redis normal features: configuration of
appropriate node timeout, feature of Redis to stop writes if not
enough slaves are online, WAIT, and so forth.

> 3. At partition time, it's further possible that some extra set of writes
> are lost by dying or minority-partitioned masters, because they acknowledge
> local writes to clients before they replicate (for speed).

Yes, this is always true even without partitions, but just considering
single master failures. With async replication is always possible that
the client get acknowledged before the replication propagates the
write.
In practical terms is very hard to trigger this in real-world
simulations, since once the event loop is re-entered, the write
reaches the sockets of the slaves and the clients (for the ACK), so
the window is usually small.

> 4. Recovery can take a long time with large keys and stop progress for the
> duration of key transfer.

No, recovery is pretty immediate. It's resharding that may have issues
with large keys, but this is, at least currently, an use triggered
operation.
On failures you'll see the slave immediately impersonating the master
after the FAIL state is reached for the master.

> And so from a mental model / reasoning perspective, I end up with:
>
> A. Redis-cluster comes with the possibility that some arbitrarily large
> subset of your clients could enter into an arbitrarily long false alternate
> data universe.

It's more time bound then arbitrarily long, because of two features:

1. Minority side stops accepting queries.
2. Slaves can be asked to don't failover after a given amount of
disconnection time from the master.

> B. The lifetime of that universe, while expected to be very short in
> practice, does depend on network bandwidth and reliability, suggesting
> operationally that Redis-cluster should be deployed in the same data center
> to minimize risk.

See above, you can bound it a lot, but I anyway agree with your conclusion.
What you can do is actually to have a multiple-DC setup where slaves
are only there in order to be mass-promoted using "CLUSTER FAILOVER
TAKEOVER" in the event of a disaster, but usually would not be used at
all.

> C. Recovering from that universe could take an arbitrarily long time.

If we consider recovering also to be able to talk with an available
cluster yes, since, for example, clients in the minority partition
will be unable to talk with the nodes for al the time the partition
lasts.
However the time they'll experiment the stale reads or acked lost
writes is at max node-timeout.

Note that there is nothing forcing us to don't go into protection
mode, in the minority side, even before node-timeout. There is no such
option for this but would be possible to say, for the sake of failure
detection the unavailability time is 30 seconds, but you go into
protection after 2 seconds and stop accepting queries in the minority
side. Looks useful and I was tempted to add this multiple times.

> I'm having a tough time coming up with a safe/usable use case in which the
> cluster is accepting any kind of causal writes and not acting as a straight
> cache. I feel like I have to be missing something.

TLDR: from this point of view, it is exactly like Redis master-slave,
but safer because there is a majority to check in order to enter
protection mode, and to better orchestrate fail overs.
Using WAIT it is possible to ensure that at least N slaves have a copy
of the data, which makes the write a lot safer. However it is still
possible to find an attacker-chosen sequence of partitions that will
still failover the wrong salve, but in practical terms it is ways
safer. However with WAIT you are using synchronous replication, which
is not very Redis-level performance experience, so I would use this
only in specific points in the application where a given write is
particularly important.

Cheers,
Salvatore

李代立

unread,

Apr 2, 2015, 9:00:08 PM4/2/15

to redi...@googlegroups.com

Hello 3.0.0, thanks Salvatore!

-----邮件原件-----
发件人: redi...@googlegroups.com [mailto:redi...@googlegroups.com] 代表 Salvatore Sanfilippo
发送时间: 2015年4月1日 22:43
收件人: Redis DB
主题: [redis-db] [release] Redis 3.0.0 is out.

Wim Mostmans

unread,

Apr 3, 2015, 4:40:47 AM4/3/15

to redi...@googlegroups.com

Nice job!

A.J. Brown

unread,

Apr 3, 2015, 5:04:14 PM4/3/15

to redi...@googlegroups.com

"There is no clustering, though" has been a major blocker for me in my discussions with others about Redis. While there are many ways around this, I'm now glad to be able to say "yes there is." instead.

Thank you for the hard work that everyone has put into this release!

Wen er

unread,

Apr 6, 2015, 1:00:13 PM4/6/15

to redi...@googlegroups.com

Finally! Congratulations :-) ☺️

在 2015年4月1日星期三 UTC+8下午10:43:56，Salvatore Sanfilippo写道：

Hello,

Manuel Meurer

unread,

Apr 6, 2015, 4:55:08 PM4/6/15

to redi...@googlegroups.com

Awesome, thank you for your brilliant work on Redis!

Odysseusailoon KEleven

unread,

Aug 13, 2020, 2:09:23 AM8/13/20

to Redis DB

thank you for your work

Reply all

Reply to author

Forward